Deploy Resume Parsing with Streamlit

Wrap the two-stage LLM resume parser in a Streamlit web app — upload PDFs and view extracted JSON data in real time.

Jun 4, 2026Updated Jun 14, 20266 min readFollow

Topics You Will Master

Understanding the difference between script-based PDF processing and handling uploaded files in memory
Reading byte arrays from a Streamlit file_uploader directly into PyMuPDF without saving to disk
Integrating the modular two-stage LLM pipeline (ask_llm and validate_json) into a web app
Enhancing UX with Streamlit's st.spinner() to provide feedback during long LLM calls

In the previous lesson, we built a highly reliable two-stage LLM pipeline that extracts structured JSON data from PDF resumes. However, running a Jupyter Notebook is not a viable production deployment.

In this lesson, we wrap that exact pipeline in a Streamlit web application. Because we separated our LLM logic into scripts/llm.py, deploying the app requires writing fewer than 40 lines of UI code.

Flow diagram of uploaded PDF bytes passing directly from uploader to memory, PyMuPDF, and LLM pipeline

Prerequisites: Ensure you have completed the Resume Parsing lesson. You will need streamlit, pymupdf, and the scripts/llm.py module from the previous section.

BASH
pip install streamlit

LangChain & Ollama — Local AI Development

Build production-ready LLM apps entirely on your own hardware. No API keys, no cloud costs.

Enroll on Udemy →

The Application Code (app.py)

Create an app.py file in the same directory as your scripts/ folder.

Architecture diagram showing separation of frontend UI in app.py from backend LLM logic in scripts/llm.py

1. Imports and Basic Setup

PYTHON
import streamlit as st
import pymupdf
from scripts.llm import ask_llm, validate_json

st.title("Resume Parsing")
st.write("Upload a resume in PDF format to extract information")

We import streamlit for the UI, pymupdf to handle the PDF text extraction, and our two custom LangChain functions (ask_llm and validate_json) from the scripts.llm module.

2. Handling File Uploads in Memory

In a notebook, we loaded PDFs from a hardcoded file path on disk. In a web app, users upload files directly from their browser. To maximize performance and security, we process the uploaded file in memory as a byte stream rather than saving it to the server's hard drive.

Side-by-side comparison of loading files from disk versus processing file uploads in memory via byte streams

PYTHON
uploaded_file = st.file_uploader("Choose a file")

if uploaded_file is not None:
    # Read the uploaded file into memory as bytes
    bytearray = uploaded_file.read()
    
    # Open the byte stream with PyMuPDF
    pdf = pymupdf.open(stream=bytearray, filetype="pdf")

    context = ""
    # Extract text from every page
    for page in pdf:
        context = context + "\n\n" + page.get_text()

    pdf.close()

3. Executing the Pipeline with UX Feedback

LLM calls take time — especially local ones running on Ollama. If the app freezes while processing, users will assume it broke and refresh the page.

We wrap our pipeline calls in st.spinner() blocks to provide visual feedback while the user waits.

Flow diagram of UI feedback and status spinners during the execution of long LLM extraction calls

PYTHON
question = """You are tasked with parsing a job resume. Your goal is to extract relevant information in a valid structured 'JSON' format. 
                Do not write preambles or explanations."""

if st.button("Parse Resume"):
    # Run the first LLM pass (Semantic Extraction)
    with st.spinner("Parsing Resume..."):
        response = ask_llm(context=context, question=question)

    # Run the second LLM pass (JSON Validation)
    with st.spinner("Validating JSON..."):
        response = validate_json(response)
    
    # Display the final output
    st.write("**Extracted Information**")
    st.write(response)

    st.write("You can copy the JSON output and use it in your application.")

    # Show a celebration animation on success!
    st.balloons()

Running the Application

To start the server, run the following command in your terminal from the directory containing app.py:

BASH
streamlit run app.py

Streamlit will launch a local web server (typically at http://localhost:8501) and automatically open it in your default browser.

The User Flow

  1. The user clicks "Browse files" and selects a PDF resume.
  2. The user clicks the "Parse Resume" button.
  3. The UI shows a spinning "Parsing Resume..." indicator while the StrOutputParser chain extracts the text.
  4. The UI changes to "Validating JSON..." while the JsonOutputParser chain strictly formats the output.
  5. Balloons animate on the screen, and the structured JSON dictionary is rendered cleanly on the page.

Deployment Considerations

While this app runs perfectly on your local machine using Ollama, deploying it to a public cloud (like AWS, Render, or Streamlit Community Cloud) requires a few adjustments:

  1. Local vs. Cloud LLMs: Ollama runs locally on your machine. If you deploy this Streamlit app to the cloud, you must either deploy Ollama to a cloud server (which requires expensive GPU instances) or change the model in scripts/llm.py to a managed API like OpenAI (ChatOpenAI), Anthropic (ChatAnthropic), or AWS Bedrock.
  2. State Management: If multiple users upload resumes simultaneously, you may need to use Streamlit's @st.cache_data or st.session_state to ensure UI state doesn't leak between interactions.
  3. File Size Limits: Ensure your web server or reverse proxy (like Nginx) is configured to accept files large enough for standard PDF resumes (e.g., 5-10MB).

What You Built

In this final lesson, you completed the full journey from raw data to a production-ready application:

  • You designed a modular LLM architecture, separating backend LangChain logic (llm.py) from frontend UI logic (app.py).
  • You processed in-memory file uploads using PyMuPDF instead of reading from disk.
  • You implemented asynchronous UI feedback using st.spinner() to keep users engaged during long LLM inferences.
  • You deployed a working Streamlit app that parses unstructured resumes into structured, machine-readable JSON in real time.

Found this useful? Keep building with me.

New tutorials every week on YouTube — or go deeper with a full structured course.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments