Pinecone, the leading closed-source vector database provider, is known for being fast, scalable, and easy to use. Its ability to allow users to perform blazing-fast vector search makes it a popular choice for large-scale RAG applications. Our initial infrastructure for Confident AI, the world’s first open-source evaluation infrastructure for LLMs, utilized Pinecone to cluster LLM observability log data in production. However, after weeks of experimentation, we made the decision to replace it entirely with pgvector. Pinecone’s simplistic design is deceptive due to several hidden complexities, particularly in integrating with existing data storage solutions. For example, it forces a complicated architecture and its restrictive metadata storage capacity made it troublesome for managing data-intensive workloads.
In this article, I will explain why vector databases like Pinecone might not be the best choice for LLM applications and when you should avoid it.
On Pinecone’s website, they highlight their key features as:
“Unlock powerful vector search with Pinecone — intuitive to use, designed for speed, and effortlessly scalable.”
But do you really need a dedicated vector database to “unlock powerful vector search”? Given the necessity to cluster and search for log data stored in a PostgreSQL DB, we initially considered Pinecone’s offering due to its promise of fast semantic search and scalability. However, it became evident that the bottleneck in using a closed-source search solution like Pinecone is primarily the latency from network requests, not the search operation itself. Moreover, while Pinecone does provide scalability through the adjustment of resources such as vCPU, RAM, and disk (termed as “pods”), the requirement to deploy another database, solely dedicated to the task of semantic search, unduly complicates a standard data storage architecture. Lastly, due to its strict metadata limitations, a two-step process is required: an initial vector search in Pinecone, followed by a query to the main database to retrieve the data associated with the retrieved vector embeddings.
For those who may not be aware, vector databases were originally developed for large enterprises to store substantial amounts of vector embeddings for training ML models. However, it now appears that every vector database company is advocating for the need for a dedicated vector database provider in your LLM application tech stack.
While Pinecone’s s2, p1, and p2 pods can scale both horizontally to increase QPS (query per second) or vertically (x1, x2, x4, x8) to fit more vectors on a single pod, it still lacks the integrations to address significant challenges faced by large-scale workloads:
These deficiencies turn Pinecone into a scalability hell due to its architectural limitations in data handling. For these reasons, you should consider adopting a vectorized option of your current data storage solution where possible, instead of using a standalone vector database.
Previously pgvector only supported IVFFlat indexing which was known for merely average performance, but ever since HNWS was introduced it now outperforms all three of pod types when utilizing the ANN benchmarking methodology, a standard for benchmarking vector databases. (although interestingly, there are benchmarks that shows pgvector’s IVFFlat index outperforms s1 pods on the same compute and manages 143% more QPS)
Data taken from Supabase. It shows that for all three pod types (scaled vertically to match storage capacity), pgvector outperforms Pinecone in both accuracy and QPS on the same compute.
Pinecone has excellent features for POC projects but requires substantial effort to maintain a scalable and performant search infrastructure. It allows you to perform vector search on multiple data sources, but however was unnecessary for our use case at Confident. If you’re looking to perform vector search on existing, single-sourced data, consider adopting a data storage solution with a built-in vectorized option instead of using a standalone vector database.
Find us on GitHub ⭐ to follow our journey in building the world’s first open-source evaluation infrastructure for LLMs, and thank you for reading.
Subscribe to our weekly newsletter to stay confident in the AI systems you build.
In this article, I'll share how JudgmentalGPT, our in-house evaluator was built using OpenAI's Assistants.
In this interactive tutorial, I'll show you how to become a Midjournalist to create image you image.