Standard vector similarity search often falls short when querying financial and technical documents. Exact numbers, fiscal quarters, and company names can be lost in dense embedding vectors. By using a hybrid search strategy that combines dense embeddings, sparse token counts (BM25), metadata filters, and Cross-Encoder reranking, developers can build highly accurate retrieval engines.
This guide details configuring and executing advanced retrieval pipelines using Qdrant, LangChain, and deep learning rerankers.
Dense, Sparse, and Hybrid Retrieval
A robust RAG system uses multiple representations of data:
- Dense Retrieval: Encodes text chunks into dense multidimensional vector arrays (e.g., 3072 dimensions). This is ideal for searching by meaning and matching synonym concepts (e.g., matching "cash on hand" to "liquidity").
- Sparse Retrieval: Maps text using term frequencies (like BM25). This is ideal for matching specific terminology, exact model names, numbers, or unique identifiers (e.g., matching "Apple Q1 2024").
- Hybrid Retrieval: Combines sparse and dense lists using search-fusion techniques (like Reciprocal Rank Fusion) to yield a balanced result set.
Let's initialize the hybrid vector index connection:
from dotenv import load_dotenv
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_qdrant import QdrantVectorStore, RetrievalMode, FastEmbedSparse
load_dotenv()
COLLECTION_NAME = "financial_docs"
# Configure dense and sparse embedding models
dense_embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")
# Open connection to the existing Qdrant collection
vector_store = QdrantVectorStore.from_existing_collection(
embedding=dense_embeddings,
sparse_embedding=sparse_embeddings,
collection_name=COLLECTION_NAME,
url="http://localhost:6333",
retrieval_mode=RetrievalMode.HYBRID
)
Extracting Query Metadata Filters with LLMs
To query a specific document without retrieving data from unrelated years or companies, extract metadata filters directly from the user's natural language input.
Define the structural schema scripts/schema.py:
from enum import Enum
from typing import Optional
from pydantic import BaseModel, Field
class DocType(str, Enum):
TEN_K = "10-k"
TEN_Q = "10-q"
EIGHT_K = "8-k"
class FiscalQuarter(str, Enum):
Q1 = "q1"
Q2 = "q2"
Q3 = "q3"
Q4 = "q4"
class ChunkMetadata(BaseModel):
company_name: Optional[str] = Field(default=None, description="Company name (lowercase, e.g. 'amazon', 'apple')")
doc_type: Optional[DocType] = Field(default=None, description="Document type (10-k, 10-q, 8-k)")
fiscal_year: Optional[str] = Field(default=None, description="Fiscal year (e.g. '2024')")
fiscal_quarter: Optional[FiscalQuarter] = Field(default=None, description="Fiscal quarter (q1-q4)")
model_config = {"use_enum_values": True}
Construct the metadata filter parser:
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
def extract_filters(user_query: str) -> dict:
prompt = f"""
Extract metadata filters from the query. Return None for fields not mentioned.
<USER QUERY STARTS>
{user_query}
</USER QUERY ENDS>
#### EXAMPLES
COMPANY MAPPINGS:
- Amazon/AMZN -> amazon
- Google/Alphabet/GOOGL/GOOG -> google
- Apple/AAPL -> apple
DOC TYPE:
- Annual report -> 10-k
- Quarterly report -> 10-q
EXAMPLES:
"Amazon Q3 2024 revenue" -> {{"company_name": "amazon", "doc_type": "10-q", "fiscal_year": "2024", "fiscal_quarter": "q3"}}
"Apple 2023 annual report" -> {{"company_name": "apple", "doc_type": "10-k", "fiscal_year": "2023"}}
Extract metadata based on the user query only:
"""
structured_llm = llm.with_structured_output(ChunkMetadata)
metadata = structured_llm.invoke(prompt)
if metadata:
return metadata.model_dump(exclude_none=True)
return {}
# Verify filter parsing logic
print(extract_filters("What was Amazon's profit in Q1 2023?"))
{'company_name': 'amazon', 'doc_type': '10-q', 'fiscal_year': '2023', 'fiscal_quarter': 'q1'}
Dynamic Qdrant Metadata Filtering
Use the extracted metadata dictionary to build type-safe Qdrant filtering objects. This filters out irrelevant documents before computing vector similarity matches.
from qdrant_client.models import Filter, FieldCondition, MatchValue
def hybrid_search(query: str, k: int = 5):
filters = extract_filters(query)
qdrant_filter = None
if filters:
conditions = [
FieldCondition(key=f"metadata.{key}", match=MatchValue(value=value))
for key, value in filters.items()
]
qdrant_filter = Filter(must=conditions)
# Execute dynamic filtered vector search
results = vector_store.similarity_search(query=query, k=k, filter=qdrant_filter)
return results
# Test execution with target metadata constraints
results = hybrid_search("What is Amazon's cash flow in Q1 2024?", k=3)
for idx, doc in enumerate(results):
print(f"[{idx}] Source: {doc.metadata['source_file']} (Page {doc.metadata['page']})")
[0] Source: amazon 10-q q1 2024.md (Page 28)
[1] Source: amazon 10-q q1 2024.md (Page 26)
[2] Source: amazon 10-q q1 2024.md (Page 12)
Cross-Encoder Reranking
Embedding models are excellent at finding candidate text chunks but are less precise at sorting them. Bi-Encoders represent documents and queries independently, whereas a Cross-Encoder computes the similarity score of query-document pairs simultaneously, capturing deeper textual relationships.
Using a Cross-Encoder as a post-retrieval step significantly improves RAG accuracy.
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
RERANKER_MODEL = "BAAI/bge-reranker-base"
def rerank_results(query: str, documents: list, top_k: int = 5):
if not documents:
return []
# Initialize the cross-encoder model using CUDA if available
reranker = HuggingFaceCrossEncoder(model_name=RERANKER_MODEL, model_kwargs={'device': 'cuda'})
# Pair the query with each document text
query_doc_pairs = [(query, doc.page_content) for doc in documents]
scores = reranker.score(query_doc_pairs)
# Sort documents based on similarity score
reranked = sorted(zip(scores, documents), key=lambda x: x[0], reverse=True)
# Return the top K sorted documents
return [doc for score, doc in reranked[:top_k]]
Run the complete pipeline to retrieve and rerank documents:
query = "what is the revenue of apple in 2024?"
retrieved_docs = hybrid_search(query, k=10)
reranked_docs = rerank_results(query, retrieved_docs, top_k=3)
for idx, doc in enumerate(reranked_docs):
print(f"Rank {idx+1}: {doc.metadata['source_file']} (Page {doc.metadata['page']})")
print(doc.page_content[:200].strip())
print("-" * 50)
Rank 1: apple 10-k 2024.md (Page 26)
## Products and Services Performance
The following table shows net sales by category for 2024, 2023 and 2022 (dollars in millions):
--------------------------------------------------
Rank 2: apple 10-k 2024.md (Page 28)
Operating income for 2024 grew by 8% compared to the prior year period, driven by expansion of our Services segment sales and sustained margins.
--------------------------------------------------
Rank 3: apple 10-k 2024.md (Page 12)
Selected Financial Data: Net Sales was $ 391,035 million for the fiscal year ended September 28, 2024.
--------------------------------------------------