Vector Databases in Practice: Moving Past the Tutorial
You've embedded some text and done a similarity search against it. Congratulations — you've completed every vector database tutorial on the internet. Now let me tell you about the fifteen things that will break when you try to do this for real.
I've been building RAG (Retrieval-Augmented Generation) systems for the last year, primarily for MSP tooling where we need to search across vendor catalogs, product documentation, and customer histories. Here's what I've actually learned.
Chunking strategy is everything
The way you split your documents into chunks determines about 80% of your retrieval quality. I'm not exaggerating. You can have the best embedding model in the world, and if your chunks are wrong, your results will be garbage.
What doesn't work: splitting on fixed character counts. What works better: splitting on semantic boundaries — paragraphs, sections, logical units of information. What works best: overlapping chunks with metadata that preserves the parent document context.
For product catalogs, I split on individual product entries and include the category hierarchy as metadata. For documentation, I split on sections and include the document title and section path. The metadata is as important as the content.
Hybrid search wins
Pure vector similarity search is not enough. In practice, I use hybrid search — combining vector similarity with keyword matching (BM25) — for almost everything. Why? Because sometimes the user is searching for an exact product SKU or a specific term, and semantic similarity alone won't find it.
Most vector databases now support hybrid search natively. Use it. Weight it toward semantic for broad queries and toward keyword for specific lookups. Better yet, let the query determine the weighting dynamically.
Embedding model selection matters less than you think
People spend weeks debating which embedding model to use. In my experience, the difference between the top five embedding models is marginal compared to the difference between good and bad chunking strategies.
Pick a model, ship it, and iterate. If your retrieval quality is bad, fix your chunking before you swap your embedding model.
The re-ranking step
Raw vector search returns results ordered by cosine similarity. This is a decent first pass, but for production quality you want a re-ranking step that considers additional signals: recency, document authority, user context, and the specific phrasing of the query.
I use a lightweight cross-encoder re-ranker after the initial vector search. It adds latency (50-100ms typically) but significantly improves the relevance of the top results. For user-facing applications, this is worth it every time.
Operational concerns nobody mentions
- Index size and cost: Embeddings aren't free. A million 1536-dimensional vectors takes real storage. Plan for this.
- Update patterns: When source documents change, you need to re-embed and update. Build this pipeline on day one.
- Monitoring: Track retrieval quality metrics. Log what was retrieved vs. what was actually useful. This is your feedback loop.