Conversational RAG Chatbot with Chainlit

Build an end-to-end conversational RAG chatbot in a single file with RAGWire, LangChain agent, and Chainlit UI.

Jun 18, 20267 min readFollow

Topics You Will Master

Building a complete RAG chatbot in a single Python file
Using Chainlit's on_chat_start and on_message handlers for chat lifecycle
Implementing drag-and-drop document upload with ingest_directory
Adding conversational memory with InMemorySaver per session

Chainlit is an open-source Python framework for building conversational AI interfaces. Combined with RAGWire and a LangChain agent, it produces a fully functional RAG chatbot in a single file — drag-and-drop document upload, conversational memory, and tool-calling retrieval with no frontend code required.

Complete the RAGWire Architecture and Setup and RAGWire Providers and Components articles first.

95% OFF

Advanced RAG – Build & Deploy Production GenAI Apps

Build RAGWire from scratch — multi-agent RAG with LangGraph, CrewAI, AutoGen, FastAPI, and Chainlit.

Enroll Now — 95% OFF →

Architecture

This chatbot runs as a single process with three layers:

  • Chainlit — Chat UI with file upload, message handling, and streaming
  • LangChain Agent — Tool-calling agent with get_filter_context and search_documents
  • RAGWire — Document ingestion and retrieval pipeline backed by Qdrant Cloud

Configuration

Use the same Gemini + Qdrant Cloud config from previous articles:

YAML
# config_gemini_qdrant.yaml
embeddings:
  provider: "google"
  model: "models/gemini-embedding-001"
  api_key: "${GOOGLE_API_KEY}"

llm:
  provider: "google"
  model: "gemini-2.5-flash"
  api_key: "${GOOGLE_API_KEY}"

vectorstore:
  url: "${QDRANT_URL}"
  api_key: "${QDRANT_API_KEY}"
  collection_name: "finance-rag-google-qdrant"
  use_sparse: true
  force_recreate: false

retriever:
  search_type: "hybrid"
  top_k: 5
  auto_filter: false

metadata:
  config_file: "finance_metadata.yaml"

logging:
  level: "INFO"
  console_output: true
  colored: false
  log_file: "./.log/ragwire.log"

Environment Variables

Create a .env file with the required API keys:

BASH
GOOGLE_API_KEY=your_google_api_key
QDRANT_URL=https://your-cluster.cloud.qdrant.io:6333
QDRANT_API_KEY=your_qdrant_api_key

Dependencies

BASH
pip install ragwire chainlit langchain langchain-google-genai langgraph python-dotenv

The Complete Chatbot

Create app.py — the entire chatbot is a single file:

PYTHON
from dotenv import load_dotenv
load_dotenv()

from ragwire import RAGWire
from langchain.agents import create_agent
from langchain.tools import tool
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage
from langgraph.checkpoint.memory import InMemorySaver

import chainlit as cl
from typing import Optional
import tempfile, os

rag = RAGWire("config_gemini_qdrant.yaml")

RAG Tools

The same two tools from the notebook pipeline, now shared with the Chainlit agent:

PYTHON
@tool
def get_filter_context(query: str) -> str:
    """Get available metadata fields, stored values, and filter suggestions for a query.

    Call this before search_documents when the query involves a specific company,
    year, or document type. Skip for purely semantic queries.
    """
    return rag.get_filter_context(query)

@tool
def search_documents(query: str, filters=None):
    """Search the document knowledge base for relevant information.

    Args:
        query: The search query
        filters: Optional metadata filters from get_filter_context.
    """
    results = rag.retrieve(query=query, filters=filters)
    if not results:
        return "No relevant information is found!"
    else:
        return results

LLM and System Prompt

PYTHON
model = ChatGoogleGenerativeAI(model="gemini-2.5-flash")
memory = InMemorySaver()

SYSTEM_PROMPT = """
    You are a helpful document assistant.
    For complex questions, break them down into simpler sub-questions and answer each one before forming a final answer.
    Always call search_documents to find information before answering.
    If the query mentions a company, year, or document type, call get_filter_context first.
    If no documents are found, say so honestly — never make up an answer.
    Always mention the source document in your answer."""

Chat Start Handler

on_chat_start runs once when a user opens the chat. It creates a fresh agent with its own memory session:

PYTHON
@cl.on_chat_start
async def on_chat_start():
    agent = create_agent(
        model=model,
        tools=[get_filter_context, search_documents],
        system_prompt=SYSTEM_PROMPT,
        checkpointer=memory
    )

    cl.user_session.set('agent', agent)
    cl.user_session.set('thread_id', cl.context.session.id)

    await cl.Message(content="Hello! Upload documents (drag & drop) or ask me a question.").send()

Message Handler

on_message handles every incoming message.

It supports two modes:

  1. File upload — If the message contains attached files, copy them to a temporary directory and ingest with RAGWire
  2. Chat query — Otherwise, invoke the agent with the user's question
PYTHON
@cl.on_message
async def on_message(message: cl.Message):
    agent = cl.user_session.get('agent')
    thread_id = cl.user_session.get('thread_id')

    if message.elements:
        with tempfile.TemporaryDirectory() as tmpdir:
            for elem in message.elements:
                dest = os.path.join(tmpdir, elem.name)
                with open(elem.path, 'rb') as src, open(dest, 'wb') as dst:
                    dst.write(src.read())

            msg = cl.Message(content="Ingesting documents...")
            await msg.send()

            stats = rag.ingest_directory(tmpdir)
            msg.content = f"Files have been ingested. Stats: {stats}"
            await msg.update()

            return

    config = {'configurable': {'thread_id': thread_id}}

    response_msg = cl.Message(content='Thinking...')
    await response_msg.send()

    result = await agent.ainvoke(
        {'messages': [HumanMessage(message.content)]},
        config=config
    )

    response_msg.content = result['messages'][-1].text
    await response_msg.update()

Key Implementation Details

  • cl.user_session stores the agent and thread ID per user session, ensuring each user gets an independent agent with its own memory
  • cl.context.session.id provides a unique session identifier used as the thread_id for the InMemorySaver checkpointer
  • agent.ainvoke is the async version of invoke, required inside Chainlit's async handlers
  • tempfile.TemporaryDirectory creates a temporary directory for uploaded files, which is automatically cleaned up after ingestion
  • message.elements contains the list of files attached to a message via drag-and-drop or the upload button

Welcome Page

Create chainlit.md in the same directory as app.py to customise the welcome screen:

MARKDOWN
# Welcome to the RAGWire Document Assistant

Upload documents (drag & drop) or ask questions about your ingested documents.

## Supported File Types
- PDF, DOCX, XLSX, PPTX, TXT, MD

Tip

If you do not want a welcome screen, leave chainlit.md empty.

Running the Chatbot

BASH
chainlit run app.py

On Linux/macOS: The command is identical. Chainlit opens a browser window at http://localhost:8000 by default.

The chatbot is now live. Drag and drop PDF files into the chat to ingest them, then ask questions. The agent uses get_filter_context to discover available metadata and search_documents to retrieve relevant chunks with optional filters. Conversational memory persists across messages within the same session.

Important

This simple chatbot uses InMemorySaver — memory is lost when the process restarts. For persistent chat history across sessions, see the Chainlit Chat Frontend article, which adds SQLite-backed history, authentication, and PDF export.

Found this useful? Keep building with me.

New tutorials every week on YouTube — or go deeper with a full structured course.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments