Contact
AI

Векторные БД: Pinecone, Weaviate, Qdrant, pgvector

Empirium Team11 min read

Every fine-tuning-comparison">RAG system needs a vector database. The choice of which one determines your query latency, operational complexity, cost trajectory, and how painful your life will be six months from now.

The market has consolidated around six serious options: Pinecone, Weaviate, Qdrant, Chroma, Milvus, and pgvector. Each occupies a different point on the managed-vs-self-hosted and simplicity-vs-power spectrums. Here is the comparison we wish existed when we started building RAG systems at Empirium.

What Vector Databases Do

Text is not searchable by meaning. The sentence "Our office closes at 6 PM" and the query "What time do you shut down?" share no keywords but are semantically identical. Vector databases solve this.

The pipeline:

  1. Embedding: Convert text to a high-dimensional vector (1536 dimensions for OpenAI's text-embedding-3-small, 1024 for Cohere's embed-v4).
  2. Indexing: Store vectors with efficient index structures (HNSW, IVF, or flat) that enable fast approximate nearest-neighbor search.
  3. Querying: Convert the search query to a vector, find the K most similar stored vectors, return the associated text chunks.

The quality of your RAG system depends more on your embedding model and chunking strategy than on your vector database choice. But the database choice determines cost, latency, and operational burden.

The Major Players Compared

Feature Matrix

Feature Pinecone Weaviate Qdrant Chroma Milvus pgvector
Hosting Managed only Managed + self-hosted Managed + self-hosted Self-hosted (cloud beta) Managed (Zilliz) + self-hosted Self-hosted (Postgres extension)
Max vectors Billions Billions Billions Millions Billions Millions
Metadata filtering ✅ (SQL WHERE)
Hybrid search (vector + keyword) ✅ (with tsvector)
Multi-tenancy Namespaces Classes Collections Collections Partitions Schemas/tables
Quantization ✅ (scalar, product, binary) ✅ (halfvec)
On-disk index ❌ (in-memory)
ACID transactions

Performance Benchmarks

Tested on 1M vectors, 1536 dimensions, top-10 retrieval, single-node setup:

Database p50 Latency p99 Latency Recall@10 Memory Usage
Pinecone (s1) 8ms 25ms 0.95 Managed
Weaviate (HNSW) 5ms 18ms 0.97 4.2 GB
Qdrant (HNSW) 4ms 15ms 0.97 3.8 GB
Milvus (IVF_FLAT) 6ms 22ms 0.96 3.5 GB
pgvector (HNSW) 12ms 45ms 0.95 5.1 GB
Chroma (HNSW) 7ms 30ms 0.96 4.0 GB

At 1M vectors, the latency differences are negligible for most applications. The differences become meaningful at 10M+ vectors or under high concurrency.

Pricing Comparison (1M vectors, 1536 dims)

Database Managed Monthly Cost Self-Hosted Monthly Cost
Pinecone (s1 pod) $70 N/A
Pinecone (serverless) $30-$100 (usage-based) N/A
Weaviate Cloud $75 $25-$50 (VPS)
Qdrant Cloud $65 $20-$40 (VPS)
Zilliz (Milvus) $65 $25-$50 (VPS)
pgvector N/A (use existing Postgres) $0 (if you already have Postgres)
Chroma N/A $15-$30 (lightweight)

Self-Hosted vs Managed

When Managed Wins

  • Team under 5 engineers: No one to manage infrastructure
  • Rapid prototyping: Need vector search in production within a week
  • Unpredictable scale: Serverless pricing handles traffic spikes without capacity planning
  • Compliance requirements: Some managed providers offer SOC2, HIPAA-eligible deployments

When Self-Hosted Wins

  • Cost at scale: At 10M+ vectors, self-hosted costs 3-5x less than managed
  • Data sovereignty: Data never leaves your infrastructure
  • Latency requirements: Co-locate the database with your application server for sub-5ms queries
  • Custom configuration: Tune index parameters, memory allocation, and caching for your specific workload

The pgvector Special Case

If you already run PostgreSQL — and most applications do — pgvector is the zero-overhead option. No new database to deploy, monitor, or pay for. Your vectors live alongside your relational data in the same transactions.

pgvector's performance is adequate for up to 2-5M vectors. Beyond that, purpose-built vector databases pull ahead significantly. But for most gpt-for-business">business applications, 2M vectors covers the entire knowledge base with room to spare.

-- pgvector is just Postgres
CREATE EXTENSION vector;

CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT,
  metadata JSONB,
  embedding vector(1536)
);

CREATE INDEX ON documents 
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- Query with metadata filtering — impossible in most vector DBs
SELECT content, metadata, 
       1 - (embedding <=> query_embedding) as similarity
FROM documents
WHERE metadata->>'category' = 'pricing'
  AND metadata->>'updated_at' > '2026-01-01'
ORDER BY embedding <=> query_embedding
LIMIT 5;

The ability to combine vector search with SQL WHERE clauses, JOINs, and transactions is pgvector's unique advantage. No other vector database offers this without a separate metadata store.

Choosing Based on Your Use Case

Small Knowledge Base (< 100K documents)

Recommendation: pgvector or Chroma

At this scale, every option works well. pgvector avoids adding a new database to your stack. Chroma is the simplest standalone option for prototyping.

Medium Knowledge Base (100K–5M documents)

Recommendation: Qdrant (self-hosted) or Pinecone (managed)

Qdrant offers the best performance-to-cost ratio for self-hosted deployments. Pinecone is the most polished managed option with the lowest operational overhead.

Large Knowledge Base (5M+ documents)

Recommendation: Qdrant or Milvus (self-hosted with dedicated infrastructure)

At this scale, managed pricing becomes expensive and self-hosting pays off. Both Qdrant and Milvus handle billion-scale deployments with proper hardware.

High Update Frequency

Recommendation: Weaviate or Qdrant

If your documents change frequently (product catalogs, news feeds, real-time data), you need a database that handles concurrent reads and writes efficiently. Weaviate and Qdrant handle real-time updates without query degradation.

Existing Postgres Infrastructure

Recommendation: pgvector

Unless your vector count exceeds 5M or you need sub-5ms p99 latency, pgvector in your existing Postgres saves you from running a separate database. The operational simplicity is worth the performance tradeoff.

Migration Considerations

Vector databases have no standardized format. Moving from one to another means:

  1. Re-exporting your original text chunks (not the vectors — embedding models may differ)
  2. Re-generating embeddings with your current embedding model
  3. Re-indexing in the new database with appropriate settings
  4. Updating your application code for the new query API

Budget 2-4 weeks for a migration including testing. The vectors themselves migrate quickly; the testing and validation is what takes time.

FAQ

Which embedding model should I use? For most applications: OpenAI text-embedding-3-small (1536 dims, $0.02/1M tokens). It offers the best cost-to-quality ratio. For maximum quality: Cohere embed-v4 or OpenAI text-embedding-3-large. For self-hosted: all-MiniLM-L6-v2 is free and surprisingly good for English text.

Should I use hybrid search (vector + keyword)? Yes, for any production RAG system. Hybrid search combines semantic understanding (vector) with exact matching (keyword). A query for "error code E-4021" benefits from keyword matching that vector search alone might miss. Weaviate, Qdrant, and pgvector all support hybrid search natively.

How do I handle multi-tenant data? Use separate collections (Qdrant, Chroma), namespaces (Pinecone), or schemas (pgvector) per tenant. Never rely on metadata filtering alone for tenant isolation — a bug in your filter logic could leak data between tenants. Physical separation is safer.

When should I scale horizontally? When single-node performance degrades under your query load — typically above 5-10M vectors or 500+ queries per second. Qdrant and Milvus support distributed deployments. pgvector relies on Postgres replication patterns. Pinecone handles this automatically.

Choosing the right vector database is a foundational decision for your AI infrastructure. If you need guidance on vector database selection or RAG architecture, let us help.

Written by Empirium Team

Explore More

Deep-dive into related topics across our five pillars.

Pillar Guide

Голосовые ИИ-агенты для продаж: реалистичное руководство

Продакшн-руководство — архитектура, платформы, затраты.

View all AI articles

Related Resources

Need help with this?

Talk to Empirium