#langchain#streamlit#chatbot#streaming#runnablewithmessagehistory#sqlchatmessagehistory#messagesplaceholder#python#ollama

Build Your Own Chatbot with LangChain

Build a streaming, multi-session chatbot web app with LangChain, Ollama, and Streamlit — using persistent SQL memory and token-by-token streaming output.

Jun 4, 2026 at 10:30 AM6 min readFollowFollow (Hindi)

Topics You Will Master

Scaffolding a Streamlit chat application with st.chat_message and st.chat_input
Streaming LLM output token-by-token with .stream() and st.write_stream()
Managing per-user conversation history with SQLChatMessageHistory and a session_id text input
Wiring RunnableWithMessageHistory into a Streamlit UI for persistent multi-turn conversations
Resetting a session's history with a "Start New Conversation" button
Replaying the in-memory display history on Streamlit reruns
Best For

Python developers who completed the Chat Message Memory lesson and want to wrap it in a working browser-based chatbot UI.

Expected Outcome

A fully functional streaming chatbot web app running at http://localhost:8501 — users can switch between named sessions, have multi-turn conversations with persistent SQL memory, and start fresh with one click.

This lesson builds a complete chatbot UI on top of the RunnableWithMessageHistory pattern from the Chat Message Memory guide.

The app adds:

  • A Streamlit web interface with a real chat layout (st.chat_message, st.chat_input)
  • Token-by-token streaming so users see the response as it is generated
  • A user ID input to support multiple independent sessions
  • A "Start New Conversation" button to wipe history and begin fresh

Prerequisites: All previous lessons completed. Install Streamlit: pip install streamlit. Ollama running with qwen3.

LangChain & Ollama — Local AI Development

Build production-ready LLM apps entirely on your own hardware. No API keys, no cloud costs.

Enroll on Udemy →

Full Application Code

Save this as chat_stream.py and run with streamlit run chat_stream.py.

PYTHON
# chat_stream.py

import streamlit as st

from dotenv import load_dotenv
from langchain_ollama import ChatOllama

from langchain_core.prompts import (
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    ChatPromptTemplate,
    MessagesPlaceholder
)

from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import SQLChatMessageHistory

from langchain_core.output_parsers import StrOutputParser

load_dotenv('./../.env')

st.title("Make Your Own Chatbot")
st.write("Chat with me! Catch me at https://youtube.com/kgptalkie")

base_url = "http://localhost:11434"
model = 'qwen3'

user_id = st.text_input("Enter your user id", "default_user")

def get_session_history(session_id):
    return SQLChatMessageHistory(session_id, connection="sqlite:///chat_history.db")

if "chat_history" not in st.session_state:
    st.session_state.chat_history = []

if st.button("Start New Conversation"):
    st.session_state.chat_history = []
    history = get_session_history(user_id)
    history.clear()

for message in st.session_state.chat_history:
    with st.chat_message(message['role']):
        st.markdown(message['content'])

### LLM Setup
llm = ChatOllama(base_url=base_url, model=model)

system = SystemMessagePromptTemplate.from_template("You are helpful assistant.")
human = HumanMessagePromptTemplate.from_template("{input}")

messages = [system, MessagesPlaceholder(variable_name='history'), human]

prompt = ChatPromptTemplate(messages=messages)

chain = prompt | llm | StrOutputParser()

runnable_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key='input',
    history_messages_key='history'
)

def chat_with_llm(session_id, input):
    for output in runnable_with_history.stream(
        {'input': input},
        config={'configurable': {'session_id': session_id}}
    ):
        yield output

prompt = st.chat_input("What is up?")

if prompt:
    st.session_state.chat_history.append({'role': 'user', 'content': prompt})

    with st.chat_message("user"):
        st.markdown(prompt)

    with st.chat_message("assistant"):
        response = st.write_stream(chat_with_llm(user_id, prompt))

    st.session_state.chat_history.append({'role': 'assistant', 'content': response})

Code Walkthrough

1. Load Environment Variables

PYTHON
load_dotenv('./../.env')

Loads LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, or any other keys from .env. On Windows, adjust the path to '.env' if your .env file is in the same directory as the script.


2. Page Header and User ID Input

PYTHON
st.title("Make Your Own Chatbot")
st.write("Chat with me! Catch me at https://youtube.com/kgptalkie")

user_id = st.text_input("Enter your user id", "default_user")

user_id is the session_id passed to SQLChatMessageHistory. Different users get completely isolated conversation histories stored in separate rows in chat_history.db.


3. History Factory

PYTHON
def get_session_history(session_id):
    return SQLChatMessageHistory(session_id, connection="sqlite:///chat_history.db")

Creates (or opens) a SQLite database named chat_history.db in the script's directory. Returns a history object scoped to the given session_id.


4. Streamlit Session State for Display History

PYTHON
if "chat_history" not in st.session_state:
    st.session_state.chat_history = []

Streamlit reruns the entire script on every user action. st.session_state persists values across reruns. chat_history is a Python list of {'role': ..., 'content': ...} dicts used only to redraw the chat bubbles — it is separate from the SQL history.


5. Start New Conversation Button

PYTHON
if st.button("Start New Conversation"):
    st.session_state.chat_history = []
    history = get_session_history(user_id)
    history.clear()

Two things happen on click:

  1. Display history (st.session_state.chat_history) is cleared — the chat bubbles disappear
  2. SQL history (history.clear()) is wiped — the model loses all context for this session_id

Note

If the user changes the user_id text input after a conversation, a new empty session starts automatically — no button click needed. The old session's SQL history remains intact in the database.


6. Redrawing Existing Chat Bubbles

PYTHON
for message in st.session_state.chat_history:
    with st.chat_message(message['role']):
        st.markdown(message['content'])

On every rerun, this loop renders all prior messages from session_state into the chat layout. Without this, the conversation would disappear on every new message.


7. LLM and Chain Setup

PYTHON
llm = ChatOllama(base_url=base_url, model=model)

system = SystemMessagePromptTemplate.from_template("You are helpful assistant.")
human = HumanMessagePromptTemplate.from_template("{input}")

messages = [system, MessagesPlaceholder(variable_name='history'), human]

prompt = ChatPromptTemplate(messages=messages)

chain = prompt | llm | StrOutputParser()
  • MessagesPlaceholder(variable_name='history') reserves the slot where conversation history is injected between the system message and the current user input
  • The chain is prompt | llm | StrOutputParser() — same as all previous LCEL examples

8. Wrapping with Memory

PYTHON
runnable_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key='input',
    history_messages_key='history'
)
  • input_messages_key='input' — the {'input': ...} dict key that holds the user's current message
  • history_messages_key='history' — must match MessagesPlaceholder's variable_name

9. Streaming Generator

PYTHON
def chat_with_llm(session_id, input):
    for output in runnable_with_history.stream(
        {'input': input},
        config={'configurable': {'session_id': session_id}}
    ):
        yield output

.stream() (not .invoke()) returns an iterator that yields string chunks as tokens arrive from the LLM. The function is a generator (yield) so it can be consumed lazily by st.write_stream().

Tip

This is the key difference from the notebook version: .stream() instead of .invoke(). The SQL history is still written after the full response is assembled — streaming only affects what the user sees in real time.


10. Chat Input and Response

PYTHON
prompt = st.chat_input("What is up?")

if prompt:
    st.session_state.chat_history.append({'role': 'user', 'content': prompt})

    with st.chat_message("user"):
        st.markdown(prompt)

    with st.chat_message("assistant"):
        response = st.write_stream(chat_with_llm(user_id, prompt))

    st.session_state.chat_history.append({'role': 'assistant', 'content': response})

Step by step:

  1. st.chat_input renders the text box at the bottom of the page and returns the submitted text (or None if nothing submitted)
  2. The user message is appended to session_state.chat_history and displayed immediately
  3. st.write_stream() consumes the generator and renders tokens one-by-one into the assistant chat bubble as they arrive
  4. response is the fully assembled string returned by st.write_stream() after streaming completes — it is then saved to session_state for redrawing on the next rerun

Running the App

BASH
# Windows
streamlit run chat_stream.py

# Linux / macOS
streamlit run chat_stream.py

Open http://localhost:8501 in your browser. The app starts with an empty chat. Type a message, press Enter, and watch the assistant stream its reply token by token.

Important

The user_id text input is evaluated before any chat is rendered. If you change user_id mid-conversation, the chat_history in session_state still shows the old messages visually — but the model will use the new session's SQL history. Click "Start New Conversation" after changing user_id to sync them.


Architecture Overview

PYTHON
Browser (Streamlit UI)
    │
    ├── st.text_input(user_id)          ← selects the session
    ├── st.button("Start New")          ← clears SQL + display history
    ├── st.chat_message (loop)          ← redraws prior messages
    ├── st.chat_input                   ← captures new user message
    │
    ▼
chat_with_llm(user_id, prompt)          ← generator using .stream()
    │
    ▼
RunnableWithMessageHistory
    ├── get_session_history(user_id)    ← loads from SQLite
    │       └── SQLChatMessageHistory  ← chat_history.db
    │
    ├── ChatPromptTemplate
    │       ├── SystemMessage
    │       ├── MessagesPlaceholder    ← history injected here
    │       └── HumanMessage {input}
    │
    ├── ChatOllama (qwen3)             ← streams tokens
    └── StrOutputParser                ← yields string chunks
    │
    ▼
st.write_stream()                       ← renders chunks in real-time

What You Built

You now have a complete, production-style chatbot application:

Feature Implementation
Streaming output runnable_with_history.stream() + st.write_stream()
Persistent memory SQLChatMessageHistorychat_history.db
Multi-user sessions session_id from st.text_input
Conversation reset history.clear() + session_state.chat_history = []
History display on rerun Loop over st.session_state.chat_history
Prompt template with history MessagesPlaceholder in ChatPromptTemplate

The same pattern — RunnableWithMessageHistory + SQLChatMessageHistory + a Streamlit streaming UI — scales directly to production chatbots backed by PostgreSQL, Redis, or any other BaseChatMessageHistory implementation LangChain supports.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments