Remember Cassettes?
Giving long-term memory to your agents and grounding them in your data doesn’t need to be complicated.
After building numerous knowledge bases, I'm convinced that both the data and its downstream representations should be stored in a single place.
They should be tightly coupled.
That's why we'll use SQL to build a multimodal RAG system for vectors to meet where your data already likely lives.
Our focus would be on four common use-cases:
- Semantic text search
- Semantic image search
- Cross-modal search
- Hybrid text search
For database we'll use Postgres on Supabase, specifically with Drizzle ORM, but you can just as easily use any other relational DB - one with a vector extension.
Axioms
Before we get in the weeds, here are two strong opinions held lightly:
- You don't need a dedicated vector database
- Long context windows will not fully eliminate the need for RAG (or whatever form information retrieval takes in the future)
A Primer
When your app scales to store millions of documents or images à la rich context for agents, you'll need to store their retrievable representations i.e. vectors, somewhere.
At the risk of being reductive, here are the 3 main pieces in a typical retrieval system:
- Chunking: Breaking down text into semantically coherent units (e.g. sentences, paragraphs, etc.)
- Embedding: Representing each chunk as an array of numbers. These can be sparse (TF-IDF, BM25) or dense (deep neural net embeddings)
- Retrieval: Finding the most relevant chunks for a given query
Beyond this you can have rerankers, metadata filters, late interaction models and more but we won't cover them here just yet.
Incase of text, we'll assume that you've obtained chunks from your data. For images, ideally store them in a S3 or R2 bucket so we have access to the original files.
The idea is that we want to yield self-contained, atomic units of information - whether that's a paragraph of text or a single image.
Any good vector store needs to satisfy 3 fundamental axioms from day one:
- Multi-embedding: Ability to store embeddings of different dimensions (e.g. 1024 for text, 4096 for images)
- Multi-modality: Ability to handle multiple modalities of data (text, images, audio, video, etc.)
- Multi-tenancy: Clear separation and enforcement of data ownership
The Schema
First, enable the required PostgreSQL extensions:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
Then define the schema using Drizzle ORM like so:
import { pgTable, uuid, text, timestamp, jsonb, vector, pgEnum, check, index, pgPolicy } from 'drizzle-orm/pg-core';
import { sql } from 'drizzle-orm';
import { authenticatedRole } from 'drizzle-orm/supabase';
// we define row level security (RLS) policies for user owned data
export const users_own_policy = [
pgPolicy('users_select_own', {
for: 'select',
to: authenticatedRole,
using: sql`auth.uid() = id`,
}),
pgPolicy('users_insert_own', {
for: 'insert',
to: authenticatedRole,
withCheck: sql`auth.uid() = id`,
}),
pgPolicy('users_update_own', {
for: 'update',
to: authenticatedRole,
using: sql`auth.uid() = id`,
withCheck: sql`auth.uid() = id`,
}),
];
// sample users table
export const users = pgTable('users', {
id: uuid('id').defaultRandom().primaryKey(),
email: text('email').notNull().unique(),
name: text('name').notNull(),
createdAt: timestamp('created_at', { withTimezone: true }).defaultNow(),
updatedAt: timestamp('updated_at', { withTimezone: true }).defaultNow(),
}, (table) => [
index('idx_users_email').on(table.email),
...users_own_policy,
]);
// we define an organization table for multi-tenancy
export const organizations = pgTable('organizations', {
id: uuid('id').defaultRandom().primaryKey(),
userId: uuid('user_id')
.notNull()
.references(() => users.id, { onDelete: 'cascade' }),
name: text('name').notNull(),
createdAt: timestamp('created_at', { withTimezone: true }).defaultNow(),
updatedAt: timestamp('updated_at', { withTimezone: true }).defaultNow(),
}, (table) => [
index('idx_organizations_user_id').on(table.userId),
...users_own_policy,
]).enableRLS();
export const modalityEnum = pgEnum('modality', ['text', 'image']);
// and here we have our core vectors table which references both the users and organization
export const vectors = pgTable('vectors', {
id: uuid('id').defaultRandom().primaryKey(),
organizationId: uuid('organization_id')
.notNull()
.references(() => organizations.id, { onDelete: 'cascade' }),
userId: uuid('user_id')
.notNull()
.references(() => users.id, { onDelete: 'cascade' }),
content: text('content'),
chunk: text('chunk'),
embedding1024: vector('embedding_1024', { dimensions: 1024 }),
imageUrl: text('image_url'),
embedding4096: vector('embedding_4096', { dimensions: 4096 }),
modality: modalityEnum('modality').notNull().default('text'),
metadata: jsonb('metadata').notNull().default({}),
createdAt: timestamp('created_at', { withTimezone: true }).defaultNow(),
}, (table) => [
index('idx_vectors_org_user').on(table.organizationId, table.userId),
index('idx_vectors_modality').on(table.modality),
check('check_text_embedding', sql`
(modality = 'text' AND embedding_1024 IS NOT NULL AND embedding_4096 IS NULL) OR
(modality = 'image' AND embedding_4096 IS NOT NULL AND embedding_1024 IS NULL)
`),
// we use GIN index for keyword search
index('vectors_keyword_idx').using(
'gin',
sql`to_tsvector('english', ${table.chunk})`
),
// we use HNSW index for vector similarity search
index('vectors_embedding_idx').using(
'hnsw',
table.embedding.op('vector_cosine_ops')
),
...users_own_policy,
]).enableRLS();
Migrations
First, install pnpm i drizzle-orm postgres dotenv followed by pnpm i -D drizzle-kit and then configure Drizzle in drizzle.config.ts:
import { defineConfig } from 'drizzle-kit';
export default defineConfig({
schema: './db/schema.ts',
out: './migrations',
dialect: 'postgresql',
schemaFilter: ['public'],
verbose: true,
strict: true,
dbCredentials: {
url: process.env.DATABASE_URL!,
},
});
Run these commands to generate and push changes to the database:
npx drizzle-kit generate
npx drizzle-kit migrate
Usage
Inserting Vectors
import { db } from './db';
import { vectors, users, organizations } from './schema';
import { eq, and, sql } from 'drizzle-orm';
// text document
await db.insert(vectors).values({
organizationId: 'org-uuid',
userId: 'user-uuid',
content: 'Machine learning is a subset of AI...',
chunk: 'Machine learning is a subset of AI...',
embedding1024: Array(1024).fill(0), // replace with actual embedding
modality: 'text',
metadata: { source: 'ml_textbook.pdf', page: 15 },
});
// image
await db.insert(vectors).values({
organizationId: 'org-uuid',
userId: 'user-uuid',
imageUrl: 'https://s3.amazonaws.com/bucket/image.png',
embedding4096: Array(4096).fill(0), // replace with actual embedding
modality: 'image',
metadata: { alt_text: 'neural network diagram' },
});
Querying Vectors by Similarity
Text Semantic Search
const queryEmbedding = Array(1024).fill(0); // replace with actual query embedding
const results = await db
.select()
.from(vectors)
.where(
and(
eq(vectors.organizationId, 'org-uuid'),
eq(vectors.userId, 'user-uuid'),
eq(vectors.modality, 'text')
)
)
.orderBy(sql`embedding_1024 <=> ${queryEmbedding}`)
.limit(10);
Image Semantic Search
const imageQueryEmbedding = Array(4096).fill(0); // replace with actual query embedding
const imageResults = await db
.select()
.from(vectors)
.where(
and(
eq(vectors.organizationId, 'org-uuid'),
eq(vectors.userId, 'user-uuid'),
eq(vectors.modality, 'image')
)
)
.orderBy(sql`embedding_4096 <=> ${imageQueryEmbedding}`)
.limit(10);
Cross-Modal Search (Find Images Related to a Text Query)
// use a multimodal embedding model (e.g. jina-clip-v2, multimodalembedding@001 or nova-2-multimodal-embeddings-v1) that maps
// both text and images to the same embedding space
// NOTE: the dimensionality might be different depending on the model so change constraint in the schema accordingly
const crossModalQueryEmbedding = Array(4096).fill(0); // replace with actual query embedding
const relatedImages = await db
.select({
id: vectors.id,
imageUrl: vectors.imageUrl,
metadata: vectors.metadata,
})
.from(vectors)
.where(
and(
eq(vectors.organizationId, 'org-uuid'),
eq(vectors.modality, 'image')
)
)
.orderBy(sql`embedding_4096 <=> ${crossModalQueryEmbedding}`) // replace with actual embedding
.limit(5);
Hybrid Text Search Using Rank Reciprocal Fusion (RRF)
Create a SQL function on Supabase with RRF for combining both keyword and semantic search:
create or replace function hybrid_text_search(
query_text text,
query_embedding vector(1024),
org_id uuid,
uid uuid,
match_count int,
full_text_weight float = 1,
semantic_weight float = 1,
rrf_k int = 50
)
returns table (
id uuid,
chunk text,
metadata jsonb,
semantic_score float
)
language sql
as $$
with full_text as (
select
vectors.id,
row_number() over(
order by ts_rank_cd(to_tsvector('english', chunk), websearch_to_tsquery(query_text)) desc
) as rank_ix
from vectors
where organization_id = org_id
and user_id = uid
and modality = 'text'
and to_tsvector('english', chunk) @@ websearch_to_tsquery(query_text)
order by rank_ix
limit least(match_count, 30) * 2
),
semantic as (
select
vectors.id,
1 - (embedding_1024 <=> query_embedding) as cosine_similarity,
row_number() over (order by embedding_1024 <=> query_embedding) as rank_ix
from vectors
where organization_id = org_id
and user_id = uid
and modality = 'text'
order by rank_ix
limit least(match_count, 30) * 2
)
select
vectors.id,
vectors.chunk,
vectors.metadata,
semantic.cosine_similarity as semantic_score
from full_text
full outer join semantic on full_text.id = semantic.id
join vectors on coalesce(full_text.id, semantic.id) = vectors.id
order by
coalesce(1.0 / (rrf_k + full_text.rank_ix), 0.0) * full_text_weight +
coalesce(1.0 / (rrf_k + semantic.rank_ix), 0.0) * semantic_weight desc
limit least(match_count, 30)
$$;
Call the function from your application:
const results = await db.execute(sql`
select * from hybrid_text_search(
${query},
${queryEmbedding}::vector,
${organizationId}::uuid,
${userId}::uuid,
10
)
`);
How to Pick a Model
- Check the MTEB leaderboard to pick best performing model
- Incase of open-source models, base it on your compute budget and tolerance for latency
- Run custom evals on specific domains and use-cases
A Retrospective
It's been fascinating to see the unbundling and then subsequent bundling of vectors into traditional SQL databases.
Vector databases like Pinecone, and Weaviate for example, were the first wave which were built on the assumption that embeddings will require their own layer, separate from the core application DB.
While not entirely without promise (especially for billions of vectors), I think it wildly overestimated the need for dedicated infra and tooling for embeddings.
Most projects for that matter are better off with a battle-tested SQL database.