#langchain#faiss#vector-store#embeddings#retrieval#rag#nomic-embed-text#chunking#tiktoken#python#ollama

Vector Stores and Retrievals with FAISS

Build a FAISS vector store from PDF documents using Ollama embeddings — chunk, embed, index, and retrieve semantically relevant content for RAG applications.

Jun 4, 2026 at 10:30 AM8 min readFollowFollow (Hindi)

Topics You Will Master

Loading and chunking PDF documents with PyMuPDFLoader and RecursiveCharacterTextSplitter
Measuring token counts with tiktoken before and after chunking
Generating dense vector embeddings with OllamaEmbeddings and nomic-embed-text
Building a FAISS index from scratch with IndexFlatL2 and InMemoryDocstore
Adding document chunks to the vector store and verifying the index count
Performing similarity search directly on the vector store with vector_store.search()
Using as_retriever() with three search strategies: similarity, similarity_score_threshold, and mmr
Saving the populated FAISS vector store to disk with save_local()
Best For

Python developers building their first RAG pipeline who need to understand how documents are transformed into searchable vector indexes.

Expected Outcome

A persisted FAISS vector store containing 311 embedded chunks from a health supplements research dataset, retrievable with multiple search strategies and ready to plug into a RAG chain.

A vector store is the core of any RAG system. It converts document chunks into dense numerical vectors (embeddings), stores them in an index optimized for fast nearest-neighbour search, and retrieves the most semantically relevant chunks for any query. This lesson builds that store step by step — from raw PDFs to a persisted FAISS index — using Ollama-hosted embeddings and LangChain's vector store abstractions.

Prerequisites: langchain-community, langchain-ollama, langchain-text-splitters, faiss-cpu, pymupdf, tiktoken, and python-dotenv installed. Ollama running with both qwen3 and nomic-embed-text models pulled. A rag-dataset/ folder containing PDFs (clone from https://github.com/laxmimerit/rag-dataset).

Note

Install FAISS with pip install faiss-cpu (CPU-only). For GPU acceleration use pip install faiss-gpu. On Windows you may also need to set KMP_DUPLICATE_LIB_OK=True to avoid an OpenMP conflict if multiple math libraries are loaded.

LangChain & Ollama — Local AI Development

Build production-ready LLM apps entirely on your own hardware. No API keys, no cloud costs.

Enroll on Udemy →

Setup

PYTHON
import os
import warnings
from dotenv import load_dotenv

os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'
warnings.filterwarnings("ignore")

load_dotenv()
OUTPUT
True

KMP_DUPLICATE_LIB_OK prevents a runtime crash on Windows when both PyTorch and FAISS link their own OpenMP runtime.


Document Loader

Walk the rag-dataset/ directory and load every PDF file page-by-page using PyMuPDFLoader. Each page becomes one Document object:

PYTHON
from langchain_community.document_loaders import PyMuPDFLoader
import os

pdfs = []
for root, dirs, files in os.walk("rag-dataset"):
    for file in files:
        if file.endswith(".pdf"):
            pdfs.append(os.path.join(root, file))

docs = []
for pdf in pdfs:
    loader = PyMuPDFLoader(pdf)
    temp = loader.load()
    docs.extend(temp)

len(docs)
OUTPUT
64

64 pages across all PDFs in the dataset.


Document Chunking

Raw pages from PDFs are often long — too long to embed as a single unit or to fit usefully inside a retrieval prompt. RecursiveCharacterTextSplitter splits each page into smaller overlapping chunks:

PYTHON
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks = text_splitter.split_documents(docs)
  • chunk_size=1000 — maximum characters per chunk
  • chunk_overlap=100 — characters shared between adjacent chunks to preserve context at split boundaries

Verifying Chunk Sizes with tiktoken

PYTHON
import tiktoken

encoding = tiktoken.encoding_for_model("gpt-4o-mini")
len(encoding.encode(chunks[0].page_content)), len(encoding.encode(chunks[1].page_content)), len(encoding.encode(docs[1].page_content))
OUTPUT
(294, 219, 922)

Before chunking, page 1 of the document was 922 tokens. After chunking, the first two chunks are 294 and 219 tokens respectively — well within any model's context window and small enough for high-precision embedding.

Tip

The chunk_size parameter is in characters, not tokens. Roughly 1 token ≈ 4 characters for English text, so a chunk_size=1000 yields chunks of approximately 200–300 tokens — a good balance between precision (small chunks surface precise answers) and context (large chunks preserve paragraph-level meaning).


Document Vector Embedding

Imports

PYTHON
from langchain_ollama import OllamaEmbeddings

import faiss
from langchain_community.vectorstores import FAISS
from langchain_community.docstore.in_memory import InMemoryDocstore

Embedding Model

nomic-embed-text is a high-quality, open-source embedding model hosted locally through Ollama. It produces 768-dimensional vectors:

PYTHON
embeddings = OllamaEmbeddings(model='nomic-embed-text', base_url='http://localhost:11434')

Verify the embedding dimension by embedding a test string:

PYTHON
vector = embeddings.embed_query("Hello World")
len(vector)

This returns 768 — the number of dimensions in each nomic-embed-text embedding vector.

Building the FAISS Index

IndexFlatL2 is a flat (brute-force) index that computes exact L2 (Euclidean) distance between all stored vectors and the query vector. It is exact but scales linearly with the number of vectors. For small to medium datasets (up to several hundred thousand chunks), it is the right default choice:

PYTHON
index = faiss.IndexFlatL2(len(vector))
index.ntotal, index.d
OUTPUT
(0, 768)

ntotal is 0 (empty index) and d is 768 (the vector dimension).

Creating the LangChain FAISS Vector Store

Wrap the FAISS index in a LangChain FAISS vector store. This adds document storage (InMemoryDocstore) and a mapping from FAISS index positions to document IDs:

PYTHON
vector_store = FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

vector_store.index.ntotal, vector_store.index.d
OUTPUT
(0, 768)

The store is initialized but still empty.

Adding Chunks to the Vector Store

PYTHON
ids = vector_store.add_documents(documents=chunks)

add_documents embeds every chunk using OllamaEmbeddings, adds the vectors to the FAISS index, and stores the corresponding Document objects in the docstore. Each chunk gets a unique UUID:

PYTHON
len(ids), vector_store.index.ntotal
OUTPUT
(311, 311)

311 chunks were embedded and indexed. The 64 original pages became 311 overlapping chunks after splitting.


Retrieval

Direct Search on the Vector Store

Use vector_store.search() for a quick similarity lookup:

PYTHON
question = "how to gain muscle mass?"
docs = vector_store.search(query=question, k=5, search_type="similarity")

Returns the top 5 most semantically similar chunks to the query. Each result is a Document with page_content and metadata (source file, page number, format, dates):

PLAINTEXT
docs
PYTHON
[Document(id='99f5925c-...', metadata={'source': 'rag-dataset\\gym supplements\\2. High Prevalence of Supplement Intake.pdf', 'page': 8, 'total_pages': 11, ...},
  page_content='and strength gain among men. We detected more prevalent protein and creatine supplementation
among younger compared to older fitness center users, whereas the opposite was found for vitamin
supplementation...'),
 Document(id='fb6f7c4b-...', metadata={'source': 'rag-dataset\\gym supplements\\2. High Prevalence of Supplement Intake.pdf', 'page': 5, ...},
  page_content='for two training goals. Improving health was named by 59%, 60%, 75%, and 89% as a training goal
among the four age groups...'),
 Document(id='fd2726cd-...', metadata={'source': 'rag-dataset\\gym supplements\\1. Analysis of Actual Fitness Supplement.pdf', 'page': 0, ...},
  page_content='acids than traditional protein sources. Its numerous benefits have made it a popular choice
for snacks and drinks among consumers. Another widely embraced supplement is caffeine...'),
 ...]

The retriever correctly surfaces passages from gym supplement research papers in response to a question about muscle gain.


Retriever Strategies

as_retriever() wraps the vector store as a LangChain Retriever — the standard interface used by LCEL chains. Three search strategies are available:

1. Similarity (Default)

Returns the k nearest vectors regardless of their absolute distance scores:

PYTHON
retriever = vector_store.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 3}
)

retriever.invoke(question)

Returns the 3 closest chunks for "how to gain muscle mass?" — the same top results as vector_store.search().

2. Similarity Score Threshold

Only returns results whose similarity score exceeds a minimum threshold. Useful for preventing the retriever from returning irrelevant chunks when no good match exists:

PYTHON
retriever = vector_store.as_retriever(
    search_type='similarity_score_threshold',
    search_kwargs={'k': 3, 'score_threshold': 0.1}
)

question = "how to lose weight?"
retriever.invoke(question)

Returns chunks specifically about weight loss supplements from the health supplements research PDF (page 12 — "Dietary Supplements and Weight Loss" section), filtered to only those exceeding the 0.1 score threshold.

Note

In FAISS with IndexFlatL2, the similarity score is computed from L2 distance. A lower threshold (e.g., 0.1) allows more results through. A higher threshold makes the filter stricter. Tune score_threshold based on your dataset to avoid empty results or irrelevant matches.

3. MMR — Maximal Marginal Relevance

MMR balances relevance (similarity to the query) and diversity (dissimilarity between returned chunks). It prevents retrieving multiple near-duplicate chunks from the same page:

PYTHON
retriever = vector_store.as_retriever(
    search_type='mmr',
    search_kwargs={'k': 3, 'fetch_k': 20, 'lambda_mult': 1}
)

docs = retriever.invoke(question)
docs
  • fetch_k=20 — initially fetches 20 candidates by similarity
  • lambda_mult=1 — controls relevance vs. diversity trade-off (1 = full relevance, 0 = maximum diversity)

The returned chunks are the most relevant and maximally diverse subset from the 20 candidates.

Tip

Use mmr when your dataset has redundant content (e.g., many similar paragraphs across multiple papers). similarity is fine for small or well-curated datasets. similarity_score_threshold is ideal when you need to handle the "no good answer exists" case gracefully.


Saving the Vector Store to Disk

Persist the populated vector store so it can be loaded in later notebooks without re-embedding:

PYTHON
db_name = "health_supplements"
vector_store.save_local(db_name)

This creates a health_supplements/ directory containing:

  • index.faiss — the serialized FAISS index with all 311 vectors
  • index.pkl — the docstore and index_to_docstore_id mapping (pickled)

The next lesson (RAG) loads this saved store directly with FAISS.load_local().


Quick Reference

Full Build Pipeline

PYTHON
# 1. Load PDFs
from langchain_community.document_loaders import PyMuPDFLoader
import os

pdfs = [os.path.join(r, f) for r, _, fs in os.walk("rag-dataset") for f in fs if f.endswith(".pdf")]
docs = []
for pdf in pdfs:
    docs.extend(PyMuPDFLoader(pdf).load())

# 2. Chunk
from langchain_text_splitters import RecursiveCharacterTextSplitter
chunks = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100).split_documents(docs)

# 3. Embed + Index
from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.docstore.in_memory import InMemoryDocstore
import faiss

embeddings = OllamaEmbeddings(model='nomic-embed-text', base_url='http://localhost:11434')
vector = embeddings.embed_query("test")
index = faiss.IndexFlatL2(len(vector))

vector_store = FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)
vector_store.add_documents(documents=chunks)

# 4. Save
vector_store.save_local("health_supplements")

Retriever Strategies at a Glance

PYTHON
# Similarity (top-k)
retriever = vector_store.as_retriever(search_type='similarity', search_kwargs={'k': 3})

# With minimum score filter
retriever = vector_store.as_retriever(
    search_type='similarity_score_threshold',
    search_kwargs={'k': 3, 'score_threshold': 0.1}
)

# Maximal Marginal Relevance (diverse results)
retriever = vector_store.as_retriever(
    search_type='mmr',
    search_kwargs={'k': 3, 'fetch_k': 20, 'lambda_mult': 1}
)

What You Built

In this lesson you built the complete document ingestion and indexing layer of a RAG system:

  • Loaded 64 pages from a research PDF dataset with PyMuPDFLoader
  • Chunked them into 311 overlapping 1000-character chunks with RecursiveCharacterTextSplitter
  • Embedded every chunk into 768-dimensional vectors using nomic-embed-text via Ollama
  • Indexed all vectors in a FAISS IndexFlatL2 for exact nearest-neighbour search
  • Retrieved semantically relevant chunks using three strategies — similarity, similarity_score_threshold, and mmr
  • Persisted the complete index to disk for reuse across sessions

The saved health_supplements/ vector store is the starting point for the next lesson, where a full RAG chain will load it and answer questions grounded entirely in these documents.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments