Keyword search matches exact words; it fails when users phrase things differently. Semantic search matches meaning by representing text, images or audio as embeddings — high-dimensional vectors where similar items sit close together. A vector database is built to store these vectors and find the nearest ones to a query, fast, at scale — the backbone of RAG and recommendation.
Working principle
An embedding model maps each item to a vector (often hundreds to thousands of dimensions). Similarity is measured by distance (cosine or Euclidean). Comparing a query against millions of vectors exhaustively is too slow, so vector databases use Approximate Nearest Neighbour (ANN) indexes — most commonly HNSW, a navigable small-world graph — that trade a little accuracy for enormous speed, returning the top-k most similar items in milliseconds.
| Property | Keyword (lexical) | Vector (semantic) |
|---|---|---|
| Matches | Exact terms | Meaning / similarity |
| Synonyms | Misses | Handles naturally |
| Index | Inverted index | ANN (HNSW, IVF) |
| Best with | Precise terms | Natural language, multimodal |
Key trade-offThe core trade-off is the ANN recall vs. latency knob: better recall costs more compute. Hybrid search blends lexical and vector results to get the best of both.
Applications
- Retrieval layer for RAG and LLM memory
- Semantic and multimodal (image/audio) search
- Recommendation, deduplication and anomaly detection
References & further reading
- Malkov & Yashunin, “Efficient and robust approximate nearest neighbor search using HNSW,” IEEE TPAMI, 2018.
- Johnson et al., “Billion-scale similarity search with GPUs (FAISS),” IEEE Big Data, 2019.
- Mikolov et al., “Efficient Estimation of Word Representations in Vector Space,” 2013.