Vector Databases for Semantic Search

Keyword search matches exact words; it fails when users phrase things differently. Semantic search matches meaning by representing text, images or audio as embeddings — high-dimensional vectors where similar items sit close together. A vector database is built to store these vectors and find the nearest ones to a query, fast, at scale — the backbone of RAG and recommendation.

Working principle

An embedding model maps each item to a vector (often hundreds to thousands of dimensions). Similarity is measured by distance (cosine or Euclidean). Comparing a query against millions of vectors exhaustively is too slow, so vector databases use Approximate Nearest Neighbour (ANN) indexes — most commonly HNSW, a navigable small-world graph — that trade a little accuracy for enormous speed, returning the top-k most similar items in milliseconds.

Figure 1. Items and queries share an embedding space; an approximate-nearest-neighbour index retrieves the most semantically similar results quickly.

Table 1. Keyword vs. vector search
Property	Keyword (lexical)	Vector (semantic)
Matches	Exact terms	Meaning / similarity
Synonyms	Misses	Handles naturally
Index	Inverted index	ANN (HNSW, IVF)
Best with	Precise terms	Natural language, multimodal

Key trade-offThe core trade-off is the ANN recall vs. latency knob: better recall costs more compute. Hybrid search blends lexical and vector results to get the best of both.

Applications

Retrieval layer for RAG and LLM memory
Semantic and multimodal (image/audio) search
Recommendation, deduplication and anomaly detection

References & further reading

Malkov & Yashunin, “Efficient and robust approximate nearest neighbor search using HNSW,” IEEE TPAMI, 2018.
Johnson et al., “Billion-scale similarity search with GPUs (FAISS),” IEEE Big Data, 2019.
Mikolov et al., “Efficient Estimation of Word Representations in Vector Space,” 2013.