In the previous lesson, we built a highly reliable two-stage LLM pipeline that extracts structured JSON data from PDF resumes. However, running a Jupyter Notebook is not a viable production deployment.
In this lesson, we wrap that exact pipeline in a Streamlit web application. Because we separated our LLM logic into scripts/llm.py, deploying the app requires writing fewer than 40 lines of UI code.
Prerequisites: Ensure you have completed the Resume Parsing lesson. You will need streamlit, pymupdf, and the scripts/llm.py module from the previous section.
pip install streamlit
The Application Code (app.py)
Create an app.py file in the same directory as your scripts/ folder.
1. Imports and Basic Setup
import streamlit as st
import pymupdf
from scripts.llm import ask_llm, validate_json
st.title("Resume Parsing")
st.write("Upload a resume in PDF format to extract information")
We import streamlit for the UI, pymupdf to handle the PDF text extraction, and our two custom LangChain functions (ask_llm and validate_json) from the scripts.llm module.
2. Handling File Uploads in Memory
In a notebook, we loaded PDFs from a hardcoded file path on disk. In a web app, users upload files directly from their browser. To maximize performance and security, we process the uploaded file in memory as a byte stream rather than saving it to the server's hard drive.
uploaded_file = st.file_uploader("Choose a file")
if uploaded_file is not None:
# Read the uploaded file into memory as bytes
bytearray = uploaded_file.read()
# Open the byte stream with PyMuPDF
pdf = pymupdf.open(stream=bytearray, filetype="pdf")
context = ""
# Extract text from every page
for page in pdf:
context = context + "\n\n" + page.get_text()
pdf.close()
3. Executing the Pipeline with UX Feedback
LLM calls take time — especially local ones running on Ollama. If the app freezes while processing, users will assume it broke and refresh the page.
We wrap our pipeline calls in st.spinner() blocks to provide visual feedback while the user waits.
question = """You are tasked with parsing a job resume. Your goal is to extract relevant information in a valid structured 'JSON' format.
Do not write preambles or explanations."""
if st.button("Parse Resume"):
# Run the first LLM pass (Semantic Extraction)
with st.spinner("Parsing Resume..."):
response = ask_llm(context=context, question=question)
# Run the second LLM pass (JSON Validation)
with st.spinner("Validating JSON..."):
response = validate_json(response)
# Display the final output
st.write("**Extracted Information**")
st.write(response)
st.write("You can copy the JSON output and use it in your application.")
# Show a celebration animation on success!
st.balloons()
Running the Application
To start the server, run the following command in your terminal from the directory containing app.py:
streamlit run app.py
Streamlit will launch a local web server (typically at http://localhost:8501) and automatically open it in your default browser.
The User Flow
- The user clicks "Browse files" and selects a PDF resume.
- The user clicks the "Parse Resume" button.
- The UI shows a spinning "Parsing Resume..." indicator while the
StrOutputParserchain extracts the text. - The UI changes to "Validating JSON..." while the
JsonOutputParserchain strictly formats the output. - Balloons animate on the screen, and the structured JSON dictionary is rendered cleanly on the page.
Deployment Considerations
While this app runs perfectly on your local machine using Ollama, deploying it to a public cloud (like AWS, Render, or Streamlit Community Cloud) requires a few adjustments:
- Local vs. Cloud LLMs: Ollama runs locally on your machine. If you deploy this Streamlit app to the cloud, you must either deploy Ollama to a cloud server (which requires expensive GPU instances) or change the model in
scripts/llm.pyto a managed API like OpenAI (ChatOpenAI), Anthropic (ChatAnthropic), or AWS Bedrock. - State Management: If multiple users upload resumes simultaneously, you may need to use Streamlit's
@st.cache_dataorst.session_stateto ensure UI state doesn't leak between interactions. - File Size Limits: Ensure your web server or reverse proxy (like Nginx) is configured to accept files large enough for standard PDF resumes (e.g., 5-10MB).
What You Built
In this final lesson, you completed the full journey from raw data to a production-ready application:
- You designed a modular LLM architecture, separating backend LangChain logic (
llm.py) from frontend UI logic (app.py). - You processed in-memory file uploads using PyMuPDF instead of reading from disk.
- You implemented asynchronous UI feedback using
st.spinner()to keep users engaged during long LLM inferences. - You deployed a working Streamlit app that parses unstructured resumes into structured, machine-readable JSON in real time.