RAG with Reranking — Hybrid Search Pipeline

Build a Retrieval-Augmented Generation pipeline with hybrid search (vector + keyword) and a reranking step for higher precision answers.

by Promptsy Team

685 views314 copies

+31

Backend Development Data Science #reranking #rag #vector-db #hybrid-search #embeddings #ai

Prompt Discussion

Prompt

## Task
RAG pipeline with hybrid search and reranking for high-precision Q&A.

## Requirements
- Vector DB: pgvector, Pinecone, or Qdrant
- Embeddings: OpenAI text-embedding-3-small or Cohere embed-v3
- Reranker: Cohere rerank or cross-encoder model
- Language: Python or TypeScript

## Pipeline
```
Query → [Hybrid Search] → [Rerank] → [LLM Generate]

1. Hybrid Search (parallel):
   a. Vector search: embed query → top 20 by cosine similarity
   b. Keyword search: BM25/FTS on same corpus → top 20
   c. Merge results using Reciprocal Rank Fusion (RRF)

2. Rerank:
   - Take merged top 30 results
   - Rerank with cross-encoder (query, document) pairs
   - Keep top 5

3. Generate:
   - Inject top 5 chunks as context
   - System prompt: "Answer based only on provided context"
   - Include source citations
```

## Implementation Notes
1. Chunk documents at 512 tokens with 50-token overlap
2. Store metadata: source URL, title, chunk index
3. Cache embeddings — don't re-embed on every query
4. Include "I don't have enough information" when context is insufficient
5. Return confidence score based on reranker scores

Compatible models

Claude Code Copilot (GitHub)

Gallery

No gallery images yet.

Version history

Discussion

Start discussion→

No comments yet. Start the discussion