Advanced Hybrid Search and Reranking for Agentic RAG

Master advanced retrieval strategies by combining dense embeddings, sparse BM25 tokenizers, dynamic metadata filtering, and Cross-Encoder reranking.

Jun 19, 20266 min readFollow

Topics You Will Master

Understanding the difference between dense semantic search and sparse exact keyword indexing
Creating structured query metadata extractors using Gemini 2.5 structured output schema
Constructing dynamic Qdrant filter conditions to prevent multi-tenant data leakage
Reranking similarity search outputs using HuggingFace Cross-Encoder models

Standard vector similarity search often falls short when querying financial and technical documents. Exact numbers, fiscal quarters, and company names can be lost in dense embedding vectors. By using a hybrid search strategy that combines dense embeddings, sparse token counts (BM25), metadata filters, and Cross-Encoder reranking, developers can build highly accurate retrieval engines.

This guide details configuring and executing advanced retrieval pipelines using Qdrant, LangChain, and deep learning rerankers.

95% OFF

Agentic RAG with LangChain and LangGraph - Ollama

Step-by-step guide to RAG with LangChain, LangGraph, and Ollama | DeepSeek R1, QWEN, LLAMA, FAISS.

Enroll Now — 95% OFF →

Dense, Sparse, and Hybrid Retrieval

A robust RAG system uses multiple representations of data:

  1. Dense Retrieval: Encodes text chunks into dense multidimensional vector arrays (e.g., 3072 dimensions). This is ideal for searching by meaning and matching synonym concepts (e.g., matching "cash on hand" to "liquidity").
  2. Sparse Retrieval: Maps text using term frequencies (like BM25). This is ideal for matching specific terminology, exact model names, numbers, or unique identifiers (e.g., matching "Apple Q1 2024").
  3. Hybrid Retrieval: Combines sparse and dense lists using search-fusion techniques (like Reciprocal Rank Fusion) to yield a balanced result set.

Let's initialize the hybrid vector index connection:

PYTHON
from dotenv import load_dotenv
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_qdrant import QdrantVectorStore, RetrievalMode, FastEmbedSparse

load_dotenv()

COLLECTION_NAME = "financial_docs"

# Configure dense and sparse embedding models
dense_embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

# Open connection to the existing Qdrant collection
vector_store = QdrantVectorStore.from_existing_collection(
    embedding=dense_embeddings,
    sparse_embedding=sparse_embeddings,
    collection_name=COLLECTION_NAME,
    url="http://localhost:6333",
    retrieval_mode=RetrievalMode.HYBRID
)

Extracting Query Metadata Filters with LLMs

To query a specific document without retrieving data from unrelated years or companies, extract metadata filters directly from the user's natural language input.

Define the structural schema scripts/schema.py:

PYTHON
from enum import Enum
from typing import Optional
from pydantic import BaseModel, Field

class DocType(str, Enum):
    TEN_K = "10-k"
    TEN_Q = "10-q"
    EIGHT_K = "8-k"

class FiscalQuarter(str, Enum):
    Q1 = "q1"
    Q2 = "q2"
    Q3 = "q3"
    Q4 = "q4"

class ChunkMetadata(BaseModel):
    company_name: Optional[str] = Field(default=None, description="Company name (lowercase, e.g. 'amazon', 'apple')")
    doc_type: Optional[DocType] = Field(default=None, description="Document type (10-k, 10-q, 8-k)")
    fiscal_year: Optional[str] = Field(default=None, description="Fiscal year (e.g. '2024')")
    fiscal_quarter: Optional[FiscalQuarter] = Field(default=None, description="Fiscal quarter (q1-q4)")

    model_config = {"use_enum_values": True}

Construct the metadata filter parser:

PYTHON
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")

def extract_filters(user_query: str) -> dict:
    prompt = f"""
    Extract metadata filters from the query. Return None for fields not mentioned.

    <USER QUERY STARTS>
    {user_query}
    </USER QUERY ENDS>

    #### EXAMPLES
    COMPANY MAPPINGS:
    - Amazon/AMZN -> amazon
    - Google/Alphabet/GOOGL/GOOG -> google
    - Apple/AAPL -> apple

    DOC TYPE:
    - Annual report -> 10-k
    - Quarterly report -> 10-q

    EXAMPLES:
    "Amazon Q3 2024 revenue" -> {{"company_name": "amazon", "doc_type": "10-q", "fiscal_year": "2024", "fiscal_quarter": "q3"}}
    "Apple 2023 annual report" -> {{"company_name": "apple", "doc_type": "10-k", "fiscal_year": "2023"}}

    Extract metadata based on the user query only:
    """
    structured_llm = llm.with_structured_output(ChunkMetadata)
    metadata = structured_llm.invoke(prompt)
    
    if metadata:
        return metadata.model_dump(exclude_none=True)
    return {}

# Verify filter parsing logic
print(extract_filters("What was Amazon's profit in Q1 2023?"))
OUTPUT
{'company_name': 'amazon', 'doc_type': '10-q', 'fiscal_year': '2023', 'fiscal_quarter': 'q1'}

Dynamic Qdrant Metadata Filtering

Use the extracted metadata dictionary to build type-safe Qdrant filtering objects. This filters out irrelevant documents before computing vector similarity matches.

PYTHON
from qdrant_client.models import Filter, FieldCondition, MatchValue

def hybrid_search(query: str, k: int = 5):
    filters = extract_filters(query)
    qdrant_filter = None

    if filters:
        conditions = [
            FieldCondition(key=f"metadata.{key}", match=MatchValue(value=value))
            for key, value in filters.items()
        ]
        qdrant_filter = Filter(must=conditions)

    # Execute dynamic filtered vector search
    results = vector_store.similarity_search(query=query, k=k, filter=qdrant_filter)
    return results

# Test execution with target metadata constraints
results = hybrid_search("What is Amazon's cash flow in Q1 2024?", k=3)
for idx, doc in enumerate(results):
    print(f"[{idx}] Source: {doc.metadata['source_file']} (Page {doc.metadata['page']})")
OUTPUT
[0] Source: amazon 10-q q1 2024.md (Page 28)
[1] Source: amazon 10-q q1 2024.md (Page 26)
[2] Source: amazon 10-q q1 2024.md (Page 12)

Cross-Encoder Reranking

Embedding models are excellent at finding candidate text chunks but are less precise at sorting them. Bi-Encoders represent documents and queries independently, whereas a Cross-Encoder computes the similarity score of query-document pairs simultaneously, capturing deeper textual relationships.

Using a Cross-Encoder as a post-retrieval step significantly improves RAG accuracy.

PYTHON
from langchain_community.cross_encoders import HuggingFaceCrossEncoder

RERANKER_MODEL = "BAAI/bge-reranker-base"

def rerank_results(query: str, documents: list, top_k: int = 5):
    if not documents:
        return []

    # Initialize the cross-encoder model using CUDA if available
    reranker = HuggingFaceCrossEncoder(model_name=RERANKER_MODEL, model_kwargs={'device': 'cuda'})
    
    # Pair the query with each document text
    query_doc_pairs = [(query, doc.page_content) for doc in documents]
    scores = reranker.score(query_doc_pairs)

    # Sort documents based on similarity score
    reranked = sorted(zip(scores, documents), key=lambda x: x[0], reverse=True)
    
    # Return the top K sorted documents
    return [doc for score, doc in reranked[:top_k]]

Run the complete pipeline to retrieve and rerank documents:

PYTHON
query = "what is the revenue of apple in 2024?"
retrieved_docs = hybrid_search(query, k=10)
reranked_docs = rerank_results(query, retrieved_docs, top_k=3)

for idx, doc in enumerate(reranked_docs):
    print(f"Rank {idx+1}: {doc.metadata['source_file']} (Page {doc.metadata['page']})")
    print(doc.page_content[:200].strip())
    print("-" * 50)
OUTPUT
Rank 1: apple 10-k 2024.md (Page 26)
## Products and Services Performance
The following table shows net sales by category for 2024, 2023 and 2022 (dollars in millions):
--------------------------------------------------
Rank 2: apple 10-k 2024.md (Page 28)
Operating income for 2024 grew by 8% compared to the prior year period, driven by expansion of our Services segment sales and sustained margins.
--------------------------------------------------
Rank 3: apple 10-k 2024.md (Page 12)
Selected Financial Data: Net Sales was $ 391,035 million for the fiscal year ended September 28, 2024.
--------------------------------------------------

Found this useful? Keep building with me.

New tutorials every week on YouTube — or go deeper with a full structured course.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments