Skip to content
Vector Databases · Production RAG Retrieval, Honest Defaults

Vector database development — production RAG retrieval, honest defaults

Vector databases are the retrieval layer of modern AI applications — and the 2026 honest answer is “it depends on scale, your stack, and whether you need hybrid search.” Our default for most projects: pgvector inside Postgres or Supabase — production-grade in 2026, matching dedicated vector DBs at the 1-10M scale where most apps live, keeping your data in one place you own. We migrate to Pinecone, Qdrant, Weaviate, or Milvus when there’s a measured reason. For RAG that needs hybrid keyword + vector, Elasticsearch usually wins. We architect the whole retrieval layer — embeddings, chunking, reranking, evaluation — not just pick a database.

pgvector DEFAULT · PINECONE · QDRANT · WEAVIATE · MILVUSembedding pipelines · hybrid retrieval · reranking · production observability
~50M vectors¹
pgvector’s production ceiling on a well-provisioned single Postgres node — most apps fit
1M+ tokens²
Modern LLM context windows — vector DBs are now smart retrieval, not essential storage
5 production-ready³
pgvector, Pinecone, Qdrant, Weaviate, Milvus — we deploy all five honestly

Vector database development for AI applications

In 2026 the vector database isn’t the essential storage layer for your AI app — it’s the smart retrieval layer that controls costs and improves quality. The honest version of which one is much shorter than the FAQs suggest.

Vector databases are the backbone of modern AI applications, powering everything from semantic search to retrieval-augmented generation (RAG), AI agents, recommendation systems, and intelligent document retrieval. At NerdHeadz we design and build vector search infrastructure that scales from prototype to production — using pgvector, Pinecone, Qdrant, Weaviate, and Milvus.

Our 2026 default: for most projects we start with pgvector inside Supabase or Postgres. The “Postgres is slow for vectors” narrative is dead — Supabase’s own benchmarks show pgvector HNSW matching or beating Qdrant on equivalent compute at 99% accuracy, and Supabase, Neon, and Instacart run it in production at significant scale. For projects in the 1–10M vector range where most apps live, pgvector keeps your data in one place you own — the selfware-thesis answer to vector retrieval. We migrate to dedicated vector DBs when there’s a measured reason — not when a vendor pitch suggests it.

For production RAG that needs hybrid retrieval — keyword precision via BM25 combined with vector semantic recall — pure vector DBs alone often underperform. Elasticsearch (or OpenSearch) with native hybrid search is usually the right answer for that shape; see our Elasticsearch page for that side of the architecture. Most production RAG stacks are hybrid.

We work across the full retrieval stack — not just the database. Embedding pipeline design, hybrid retrieval architecture, cross-encoder reranking, evaluation pipelines that measure retrieval quality honestly, production observability for the failure modes that actually bite. The vector DB is the smallest engineering problem in production RAG; we treat the whole layer with the same care.

What we actually build

  • Vector DB selection & setup

    pgvector + Supabase as our 2026 default for most projects; Pinecone Serverless when zero-ops managed matters; Qdrant or Weaviate self-hosted when scale, performance, or open-source preference shape the call; Milvus for billion-vector workloads. Honest decision per project, configured for your real query patterns.

  • Embedding pipeline development

    Robust pipelines transforming text, images, and structured data into vector embeddings. Model selection (OpenAI text-embedding-3-large, Cohere Embed v3, open-source), chunking strategy (the highest-impact decision in RAG), metadata schema design, batch processing, deduplication, version management.

  • RAG integration

    Connecting vector retrieval to LLMs (Anthropic, OpenAI, Gemini) for grounded responses. Prompt engineering, context-window management, citation generation, agentic patterns with MCP tool use.

  • Hybrid search & reranking

    The production retrieval pattern that beats pure vector alone: BM25 keyword + dense vector + (optional) ELSER sparse, fused with reciprocal rank fusion, then cross-encoder reranking for the top-k. For projects that need this depth, we layer Elasticsearch/OpenSearch alongside the vector DB.

  • Performance & cost optimization

    HNSW parameter tuning, quantization (BBQ, scalar, binary), index strategy per query pattern, caching layers, batch sizing. The difference between “works at 1M vectors” and “works at 100M vectors with sub-50ms latency” is hours of careful tuning — we do it properly.

  • Migration & scaling

    pgvector → Qdrant when single-node Postgres limits start to bind (~50M vectors). Pinecone → self-hosted alternatives when costs cross $600+/mo and DevOps capacity exists. Chroma → production-grade option after prototyping. Real migrations with parity validation, not “lift and shift” promises.

The 2026 strategic frame — what vector DBs are actually for now

An important reframe most buyers haven’t fully absorbed: the role of vector databases in 2026 is meaningfully different from 2023.

2023’s frame: “Your LLM can’t fit your data. You need a vector database to store and retrieve it.” Vector DBs were positioned as essential storage — the only way to make your data available to an LLM with a small context window.

2026’s reality: Claude Opus, GPT, and Gemini all support 1M+ token context windows. An LLM can now fit a small book in a single prompt. The reason to use a vector database has shifted from “you have to” to “it controls costs and improves quality”:

  • Cost control: stuffing 500K tokens into every query costs ~$2.50 per call with Claude Opus. Retrieving the right 5K tokens and sending those costs ~$0.03. Vector retrieval is now a 100× cost-optimization layer, not an enabler.
  • Quality: LLM accuracy degrades with longer contexts — the “lost in the middle” phenomenon is well-documented. Smart retrieval of the most relevant 5–20K tokens consistently outperforms naive long-context stuffing.
  • Latency: long-context queries are slow. Retrieval + a small-context call is consistently faster than huge-context queries.
  • Auditability: when you retrieve specific chunks, you can cite them. When you stuff a huge context, the LLM blends sources opaquely.

This reframe matters for buyer decisions. If your project has a small corpus that fits in 200K–500K tokens, you may not need a vector DB at all — just cache the corpus in the LLM prompt. If your corpus is larger, queries are frequent enough that cost-per-call matters, or you need citations and auditability, vector retrieval earns its place. We help you decide which case you’re actually in — and architect accordingly.

The pgvector + Supabase default — the selfware-thesis answer

For most projects we start with pgvector inside Postgres or Supabase. It’s the 2026 selfware-thesis default: vector retrieval inside the database you already own, no separate infrastructure, no platform lock-in. Three reasons.

  • It’s production-grade in 2026

    The “Postgres is slow for vectors” narrative is dead. Since pgvector 0.5.0 brought HNSW indexing, performance matches or beats dedicated vector DBs at 1–10M scale — Supabase’s own benchmarks show pgvector HNSW outperforming Qdrant on equivalent compute at 99% accuracy. Companies including Supabase, Neon, and Instacart run pgvector in production at significant scale. The 0.7+ release series (current in 2026) adds parallel index builds, improved HNSW, and better memory management.

  • Your data stays in one place you own

    Vector embeddings live in the same Postgres database as the rest of your application data — same auth, same connection pool, same backup strategy, same query language (SQL with vector operations). No syncing between a primary DB and a separate vector store. No additional infrastructure to monitor. No vendor lock-in. This is the architectural simplicity the broader selfware thesis is built around.

  • Real filtering and joins, not just similarity

    Because pgvector is Postgres, you get SQL filtering, joins, transactions, and relational integrity alongside vector search. SELECT … WHERE category = $1 AND tenant_id = $2 ORDER BY embedding <-> $3 LIMIT 10 — metadata filtering and semantic search in one query, with the full Postgres optimizer behind it. Dedicated vector DBs all support filtering but most don’t do it as cleanly.

When pgvector starts to bind

The honest production ceiling is single-node Postgres limits — roughly ~50M vectors on a well-provisioned instance (depending on dimension size and query patterns). Beyond that, dedicated vector DBs that scale horizontally (Qdrant, Milvus, Pinecone) become the right call. The next block is the honest decision for when that migration is worth doing — and when it isn’t.

The 5 production vector DBs — honestly compared

Five real options in 2026. Each wins different slices. Here’s the honest map we use to pick — ending with our per-case recommendation. pgvector is highlighted as our default; it earns the lead, it doesn’t win every row.

DimensionpgvectordefaultPineconeQdrantWeaviateMilvus
What it isPostgres extensionManaged-only cloudRust-based, open-sourceOpen-source, schema-richOpen-source, distributed
Scale ceiling~50M (single Postgres node)Effectively unlimited (managed)Excellent single-node; multi-node availableStrong distributed modeBillion-vector+
Hybrid searchNative via Postgres FTS + pgvectorLimited (recent sparse-dense support)Recent BM25 + dense fusionBest-in-class native hybridDistributed hybrid
FilteringFull SQL — joins, indicesMetadata filteringBest-in-class filter performanceSchema-drivenPartition-based
HostingWherever Postgres runsManaged-onlySelf-host or CloudSelf-host or CloudSelf-host or Zilliz Cloud
LicensePostgreSQL (open)ProprietaryApache 2.0BSD-3Apache 2.0
Starting cost~$25/mo Supabase Pro$25/mo Starter$30/mo self-host VPSFree self-host or $25+/mo CloudFree self-host or Zilliz Cloud
Cost at scaleLinear with Postgres computeClimbs with usageLinear self-host; managed climbsLinear self-hostLinear self-host
Our pick when🟢 Default — already on Postgres/Supabase, 1–10M scale, want one DB to manage🟢 Zero-ops managed required, RAG-first team, willing to pay for simplicity🟢 Open-source preferred, self-hosted, best filtering, scaling beyond the pgvector ceiling🟢 Hybrid search is structural, schema-rich needs, multi-tenancy critical🟢 Billion-vector workloads, mature distributed sharding needed
  • pgvectordefault

    What it is
    Postgres extension
    Scale ceiling
    ~50M (single Postgres node)
    Hybrid search
    Native via Postgres FTS + pgvector
    Filtering
    Full SQL — joins, indices
    Hosting
    Wherever Postgres runs
    License
    PostgreSQL (open)
    Starting cost
    ~$25/mo Supabase Pro
    Cost at scale
    Linear with Postgres compute
    Our pick when
    🟢 Default — already on Postgres/Supabase, 1–10M scale, want one DB to manage
  • Pinecone

    What it is
    Managed-only cloud
    Scale ceiling
    Effectively unlimited (managed)
    Hybrid search
    Limited (recent sparse-dense support)
    Filtering
    Metadata filtering
    Hosting
    Managed-only
    License
    Proprietary
    Starting cost
    $25/mo Starter
    Cost at scale
    Climbs with usage
    Our pick when
    🟢 Zero-ops managed required, RAG-first team, willing to pay for simplicity
  • Qdrant

    What it is
    Rust-based, open-source
    Scale ceiling
    Excellent single-node; multi-node available
    Hybrid search
    Recent BM25 + dense fusion
    Filtering
    Best-in-class filter performance
    Hosting
    Self-host or Cloud
    License
    Apache 2.0
    Starting cost
    $30/mo self-host VPS
    Cost at scale
    Linear self-host; managed climbs
    Our pick when
    🟢 Open-source preferred, self-hosted, best filtering, scaling beyond the pgvector ceiling
  • Weaviate

    What it is
    Open-source, schema-rich
    Scale ceiling
    Strong distributed mode
    Hybrid search
    Best-in-class native hybrid
    Filtering
    Schema-driven
    Hosting
    Self-host or Cloud
    License
    BSD-3
    Starting cost
    Free self-host or $25+/mo Cloud
    Cost at scale
    Linear self-host
    Our pick when
    🟢 Hybrid search is structural, schema-rich needs, multi-tenancy critical
  • Milvus

    What it is
    Open-source, distributed
    Scale ceiling
    Billion-vector+
    Hybrid search
    Distributed hybrid
    Filtering
    Partition-based
    Hosting
    Self-host or Zilliz Cloud
    License
    Apache 2.0
    Starting cost
    Free self-host or Zilliz Cloud
    Cost at scale
    Linear self-host
    Our pick when
    🟢 Billion-vector workloads, mature distributed sharding needed

Most production stacks are hybrid — pgvector for the in-stack default + Elasticsearch for hybrid keyword+vector + occasionally a dedicated vector DB for scale. We pick by use case, not by single-tool dogma. See our Elasticsearch page for the hybrid-retrieval side of this picture.

Hybrid retrieval — when vector + Elasticsearch beats pure vector alone

Most production RAG benefits from hybrid retrieval — combining keyword precision and vector semantic recall. Pure vector has real weaknesses; pure keyword has real weaknesses. The honest 2026 answer is often to use both, fused intelligently.

What pure vector retrieval misses

  • Exact-match queries — “SKU ABC-123” or “the user named John Smith” should match the literal string, not semantic neighbours. Vector retrieval can semantically over-broaden.
  • Out-of-distribution terms — proper nouns, brand names, acronyms, code identifiers often have no meaningful semantic neighbours and are best retrieved by keyword.
  • Recency and freshness signals — keyword indexes naturally support boost-by-date; vector retrieval treats all neighbours equally.

What pure keyword retrieval misses

  • Semantic paraphrases — “doctor” and “physician” mean the same thing; BM25 doesn’t know that.
  • Conceptual relationships — “how do I reduce my taxes?” matches documents about “deductions” and “credits” via semantic similarity, not literal keywords.
  • Cross-lingual matching — multilingual embeddings naturally bridge languages; keyword indexes don’t.

The production answer is hybrid: BM25 keyword + dense vector + (optional) ELSER sparse, fused with reciprocal rank fusion. Elasticsearch (or OpenSearch) ships this natively in a single _search call; pgvector + Postgres FTS does it via composed queries; dedicated vector DBs (Weaviate, recent Qdrant, recent Pinecone) increasingly support hybrid natively at varying maturity.

For projects that need genuine hybrid retrieval at scale, we usually layer Elasticsearch alongside the application — see the Elasticsearch page for that side of the architecture. For projects where pure vector retrieval is what’s needed, the vector DB landscape on this page is the right place. The two pages are complementary, not competing.

The whole retrieval layer — what actually matters in production RAG

Most “vector DB comparison” guides obsess over indexing speed and latency benchmarks. The honest production reality: the vector DB is the smallest engineering problem in your RAG system. Four things matter more.

  • Highest impact

    1. Embedding strategy

    Which model (OpenAI text-embedding-3-large, Cohere Embed v3, open-source) — and the chunking strategy that feeds it. Chunk size, overlap, semantic boundaries, document hierarchy, metadata schema. This is the highest-impact decision in your entire RAG pipeline — get it wrong and no vector DB choice will save you.

  • Candidate quality

    2. Retrieval architecture

    Single-stage vs multi-stage retrieval. Pre-filtering vs post-filtering. Metadata schema for hybrid filter+similarity. Query expansion. Routing across multiple indexes (per-tenant, per-language, per-document-type). This shapes whether you’re retrieving the right candidates before ranking even starts.

  • The missing piece

    3. Reranking

    The vector DB returns top-100 candidates; a cross-encoder reranker (Cohere Rerank, BGE, custom) reorders them by deeper relevance, taking the top 5–10 to send to the LLM. This step is where retrieval quality goes from “okay” to “production-grade” — and it’s frequently the missing piece in struggling RAG systems.

  • Know it works

    4. Evaluation & observability

    Retrieval-quality metrics (hit rate, MRR, NDCG) measured against a real evaluation set, not vibes. Production observability for the failure modes that bite (empty retrievals, semantic drift, embedding-model regressions, slow queries). Without this, you can’t tell whether your RAG is working — only whether it doesn’t crash.

We architect all four — not just pick the database. For deeper context on retrieval architecture, see our RAG service page.

Production sizing & pricing — the honest math

Two honest pictures: what production vector infrastructure actually requires (RAM scaling reality), and how the five options compare on cost as projects grow.

Visual 1 · RAM per vector count

RAM required for an in-memory HNSW index — 1536-dim vectors

Rule of thumb: ~6–8 GB RAM per 1M vectors of 1536 dimensions for an HNSW index in memory. Most apps live in the 1–10M range where pgvector + Supabase or a single-node Qdrant fits comfortably. Beyond ~50M you’re in distributed-system territory. Quantization (BBQ, scalar, binary) compresses these numbers significantly — see our Elasticsearch page for the BBQ deep dive. ¹

Visual 2 · monthly cost at 50M vectors

Cost at production-large scale — illustrative monthly at 50M vectors

The full picture across three scales: at prototype (100K vectors) all five run ~$0–30/mo and are roughly comparable. At small production (5M) pgvector + Supabase ($25–100) usually wins on simplicity — one database, one bill — vs $80–200 for Pinecone. At large production (50M+, shown above) Pinecone managed climbs fastest, while self-hosted Qdrant / Weaviate / Milvus give better cost control if DevOps capacity exists. Vector DB is typically a small fraction of total AI cost — LLM tokens dominate. We model your real query volume before recommending. ²

When vector databases aren’t the answer — and we’ll say so

If your corpus is small enough to fit in a modern LLM’s context window (Claude Opus, GPT-4o, Gemini all support 1M+ tokens now), you may not need a vector database at all — caching the corpus in the LLM prompt may be the simpler answer. We’ll do the math on cost-per-query vs context-stuffing honestly before recommending vector infrastructure.

If your retrieval is fundamentally about exact-match queries — finding documents by ID, SKU, customer name, code identifier — keyword search (Postgres FTS, Elasticsearch, or Meilisearch) often outperforms vector retrieval. Semantic similarity is the wrong tool for “find this specific thing.”

If your retrieval is fundamentally hybrid — needing both keyword precision and semantic recall — Elasticsearch/OpenSearch with native hybrid search usually beats stacking pgvector with a separate keyword layer.

And if your application doesn’t actually need semantic search — if the user query maps cleanly to filter parameters and structured queries — a real database with proper indexes will outperform any vector retrieval. Vector DBs are the right tool for semantic similarity over unstructured or semi-structured content. Outside that window, simpler tools usually win, and we’ll say so.

Proof · Clients

Teams who picked NerdHeadz to build production RAG and vector retrieval.

From embedding pipelines and pgvector deployments to hybrid-retrieval RAG with reranking and evaluation — what a buyer evaluating a real retrieval engagement actually cares about.

01 / 07

This system has been a dream of mine for almost a year. I have tried to build it myself and finally came to the conclusion I needed help. The NerdHeadz team has built me exactly what I was dreaming about and more! Working with them has been an absolute pleasure. I can't thank them enough.

Amy Olson
Founder & Airbnb Listing Strategist, Smart Hosting Hub
3+
Years of industry leadership
30+
Experts ready to build
60+
Projects delivered on time
90%
Client retention
3+
Years of industry leadership
30+
Engineers ready to build
60+
Projects delivered on time
90%
Client retention

Why teams pick NerdHeadz for vector database work

  • pgvector + Supabase as our default.

    The selfware-thesis answer to vector retrieval — production-grade, one DB to manage, no platform lock-in. We default here for the 1–10M scale where most apps live.

  • All five production options, deployed honestly.

    pgvector, Pinecone, Qdrant, Weaviate, Milvus — we know each deeply and pick honestly per project. No vendor preference, no single-tool dogma. The right database for your actual scale and stack.

  • The whole retrieval layer, not just the DB.

    Embedding pipelines, chunking strategy, hybrid retrieval, cross-encoder reranking, evaluation, production observability. The vector DB is the smallest engineering problem; we treat the rest with the same care.

  • Hybrid retrieval when it earns its place.

    For production RAG needing both keyword precision and semantic recall, we layer Elasticsearch alongside the vector DB. Complementary, not competing — and we architect honestly.

Vector database development — FAQ

A vector database stores data as high-dimensional vectors (numerical representations) and enables similarity search — finding items that are semantically similar rather than just matching keywords. You need one if you’re building AI features like semantic search, recommendations, RAG (retrieval-augmented generation), image similarity, AI agents with memory, or any application that needs to understand meaning rather than just match text. 2026 update: with LLM context windows now reaching 1M+ tokens, vector DBs are increasingly a cost-and-quality optimization rather than essential storage — see the strategic-frame block above.

Vector retrieval & semantic search work we’ve shipped

Production vector retrieval and semantic search across insurance NLP, AI content tools, and faceted marketplace search — three genuinely retrieval-relevant builds.

View full portfolio →

Sources & citations

  1. Groovy Web, Pinecone vs pgvector vs Chroma vs Weaviate 2026: Best Vector DB by Use Case — pgvector production-grade, single-node Postgres ceiling (~50M), RAM rule of thumb.
  2. Get AI Perks, Best Vector Databases 2026: Pinecone vs Weaviate vs Qdrant vs Chroma — the LLM context-window reframe (vector DBs as smart retrieval, not essential storage), pricing at scale.
  3. Tensoria, Pinecone vs Qdrant vs Weaviate vs pgvector — 100M Vector Benchmark 2026 — production landscape consolidated to 5 options, RAM scaling, ingest throughput.
  4. Firecrawl, Best Vector Databases in 2026: A Complete Comparison Guide — VectorDBBench numbers, managed vs self-hosted trade-offs.
  5. Deepak Gupta, Top 5 Vector Databases 2026 — Pinecone Serverless analysis, production landscape.
  6. MyEngineeringPath, Pinecone vs Qdrant vs Weaviate — Which Vector DB for Your RAG? 2026 — phase-3 maturity, multi-tenancy improvements.
  7. CallSphere, Vector Database Benchmarks 2026: pgvector 0.9, Qdrant, Weaviate, Milvus, LanceDB — April 2026 benchmark suite.
  8. Vecstore, Vector Database Performance Compared — Supabase pgvector HNSW benchmarks vs Qdrant.
  9. pgvector official documentation — HNSW indexing, 0.7+ release series, parallel index builds.
  10. Supabase and Neon production deployment case studies.
  11. NerdHeadz vector database engagement experience.

The vector database landscape evolved significantly through 2024–2026 — Pinecone Serverless GA, pgvector HNSW maturity, OpenSearch v3, Elasticsearch BBQ quantization. Verify current vendor versions, pricing, and feature parity at publish; figures verified as of 2026-Q2.

Let’s scope your retrieval layer

Building production RAG or AI retrieval? Let’s talk.

30-minute scoping call. Whether you’re starting a RAG project from scratch, evaluating which vector DB fits your stack, hitting pgvector’s ceiling and considering migration, or have a struggling RAG system that needs honest diagnosis — we’ll architect the right retrieval layer (database, embeddings, reranking, evaluation) and send a fixed-price quote.