#langchain#output-parsing#pydantic#json#csv#stroutputparser#structured-output#python#ollama

LangChain Output Parsing

Parse LLM responses into structured Python objects — Pydantic models, JSON dicts, and CSV lists — using LangChain's output parsers and with_structured_output().

Jun 4, 2026 at 10:30 AM9 min readFollowFollow (Hindi)

Topics You Will Master

What output parsers are and the two methods every parser implements
Parsing raw LLM text into a validated Pydantic model with PydanticOutputParser
Using parser.get_format_instructions() to inject schema instructions into any prompt
Appending a parser to a chain: prompt | llm | parser
Using .with_structured_output() as a simpler alternative to PydanticOutputParser
Parsing LLM responses as raw Python dicts with JsonOutputParser
Extracting comma-separated lists with CommaSeparatedListOutputParser
Best For

Python developers building LangChain applications who need structured, machine-readable output (not just raw text) from a local LLM.

Expected Outcome

Three working parser pipelines — Pydantic, JSON, and CSV — that coerce any LLM string response into typed Python objects you can use directly in your code.

Output parsers bridge the gap between an LLM's free-form text response and the structured Python objects your application actually needs. Instead of writing regex or json.loads() manually, you define a schema once and let the parser handle formatting instructions, parsing, and validation.

Every LangChain output parser implements two core methods:

  • get_format_instructions() — returns a string of instructions telling the model exactly how to format its response
  • parse() — takes the model's raw string response and returns a structured Python object

Prerequisites: LangChain installed with langchain-ollama and python-dotenv. Ollama running with qwen3. Familiar with LangChain chains — see the Chains guide.

LangChain & Ollama — Local AI Development

Build production-ready LLM apps entirely on your own hardware. No API keys, no cloud costs.

Enroll on Udemy →

Available Output Parsers

Parser Output Type Notes
StrOutputParser str Extracts .content from AIMessage as plain text
PydanticOutputParser Pydantic BaseModel instance Validates against a schema; raises on invalid JSON
JsonOutputParser dict Returns a raw Python dict; less strict than Pydantic
CommaSeparatedListOutputParser list[str] Splits comma-separated model output into a Python list
DatetimeOutputParser Not available in LangChain v1

Setup

PYTHON
from dotenv import load_dotenv

load_dotenv('.env')
OUTPUT
True

On Linux/macOS: use load_dotenv('./../.env') if your .env is in a parent directory.

PYTHON
from langchain_ollama import ChatOllama
from langchain_core.prompts import (
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    ChatPromptTemplate,
    PromptTemplate
)

base_url = "http://localhost:11434"
model = 'qwen3'

llm = ChatOllama(base_url=base_url, model=model)

Pydantic Output Parser

PydanticOutputParser validates the model's JSON response against a Pydantic schema. If the model returns malformed JSON or a missing required field, Pydantic raises a validation error.

Defining the Schema

PYTHON
from typing import Optional
from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser

class Joke(BaseModel):
    """Joke to tell user"""

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline of the joke")
    rating: Optional[int] = Field(description="The rating of the joke is from 1 to 10", default=None)

Creating the Parser

PYTHON
parser = PydanticOutputParser(pydantic_object=Joke)

Inspecting Format Instructions

get_format_instructions() generates the JSON schema description that you inject into the prompt so the model knows what format to produce:

PYTHON
instruction = parser.get_format_instructions()
print(instruction)
OUTPUT
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
{"description": "Joke to tell user", "properties": {"setup": {"description": "The setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline of the joke", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "The rating of the joke is from 1 to 10", "title": "Rating"}}, "required": ["setup", "punchline"]}

Building the Prompt with Injected Instructions

Use partial_variables to bake the format instructions into the prompt template so they are always included automatically:

PYTHON
prompt = PromptTemplate(
    template='''
    Answer the user query with a joke. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)

chain = prompt | llm

Checking Raw LLM Output (Without Parser)

Run the chain without the parser first to see the raw JSON string the model produces:

PYTHON
output = chain.invoke({'query': 'Tell me a joke about the cat'})
print(output.content)
JSON
{
  "setup": "Why did the cat bring a ladder to the party?",
  "punchline": "They wanted to climb the social ladder!",
  "rating": 7
}

Adding the Parser to the Chain

Append the parser to convert the raw JSON string into a validated Joke instance:

PYTHON
chain = prompt | llm | parser
output = chain.invoke({'query': 'Tell me a joke about the dogs'})
print(output)
PYTHON
setup='Why did the dog bring a ladder to the park?' punchline='He heard the treats were on a high shelf!' rating=8

output is now a Joke Pydantic object. Access fields directly:

PLAINTEXT
output.setup      # "Why did the dog bring a ladder to the park?"
output.punchline  # "He heard the treats were on a high shelf!"
output.rating     # 8

Structured Output with .with_structured_output()

.with_structured_output() is a simpler alternative that skips the manual prompt injection step. You attach the schema directly to the LLM and the method handles format instructions internally.

Baseline (Unstructured)

PYTHON
output = llm.invoke('Tell me a joke about the cat')
print(output.content)
OUTPUT
Why did the cat join a band?
Because it wanted to be in the **meow-sic** industry! 🎵🐾

*(Bonus: The band's name was "Whisker & The Purr-fectionists.")* 😸

With Structured Output

PYTHON
structured_llm = llm.with_structured_output(Joke)
PYTHON
output = structured_llm.invoke('Tell me a joke about the cat')
print(output)
PYTHON
setup='Why did the cat sit on the computer?' punchline='To keep the mouse away!' rating=None

Note

rating is None here because the field is Optional and the model didn't include it. PydanticOutputParser with explicit format instructions tends to produce more complete output including optional fields. .with_structured_output() is faster to set up but gives the model less explicit guidance.

Tip

.with_structured_output() also accepts a TypedDict class or a JSON Schema dict as its argument — not just Pydantic models.


JSON Output Parser

JsonOutputParser returns a raw Python dict without Pydantic validation. Use it when you want structured output but don't need field-level type enforcement.

PYTHON
from langchain_core.output_parsers import JsonOutputParser

parser = JsonOutputParser(pydantic_object=Joke)
print(parser.get_format_instructions())
OUTPUT
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
{"description": "Joke to tell user", "properties": {"setup": {...}, "punchline": {...}, "rating": {...}}, "required": ["setup", "punchline"]}

Building and Running the JSON Chain

PYTHON
prompt = PromptTemplate(
    template='''
    Answer the user query with a joke. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)

chain = prompt | llm
output = chain.invoke({'query': 'Tell me a joke about the cat'})
print(output.content)
JSON
{
  "setup": "Why did the cat sit on the computer?",
  "punchline": "Because it heard there was a mouse in the keyboard!",
  "rating": 7
}

Append the JsonOutputParser to get a Python dict directly:

PYTHON
chain = prompt | llm | parser
output = chain.invoke({'query': 'Tell me a joke about the cat'})
print(output)
JSON
{
  "setup": "Why did the cat sit on the keyboard?",
  "punchline": "Because it wanted to keep the mouse in the house.",
  "rating": 8
}

Access values like any Python dict:

PLAINTEXT
output['setup']     # 'Why did the cat sit on the keyboard?'
output['rating']    # 8

Note

JsonOutputParser also supports streaming — it incrementally yields partial dicts as tokens arrive, unlike PydanticOutputParser which must wait for the complete response to validate.


CSV Output Parser

CommaSeparatedListOutputParser is the simplest parser for list-type outputs. The model is instructed to return a comma-separated string which the parser splits into a Python list[str].

PYTHON
from langchain_core.output_parsers import CommaSeparatedListOutputParser

parser = CommaSeparatedListOutputParser()

print(parser.get_format_instructions())
OUTPUT
Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`

Building the CSV Chain

PYTHON
format_instruction = parser.get_format_instructions()

prompt = PromptTemplate(
    template='''
    Answer the user query with a list of values. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': format_instruction}
)
PYTHON
chain = prompt | llm | parser

output = chain.invoke({'query': 'generate my website seo keywords. I have content about the NLP and LLM.'})
print(output)

The output is a Python list of strings, for example:

JSON
[
  "NLP",
  "LLM",
  "natural language processing",
  "large language models",
  "AI",
  "machine learning",
  "text processing",
  "deep learning",
  "transformers",
  "chatbots"
]

On Linux/macOS: all code above runs identically — no OS-specific differences.


Quick Reference

Parser Comparison

Parser Import Output type Best for
StrOutputParser langchain_core.output_parsers str Plain text extraction
PydanticOutputParser langchain_core.output_parsers Pydantic model Validated typed structs
JsonOutputParser langchain_core.output_parsers dict Structured output with streaming
CommaSeparatedListOutputParser langchain_core.output_parsers list[str] Simple enumeration

Two Ways to Get Structured Output

PYTHON
# Method 1 — PydanticOutputParser (explicit, more control)
parser = PydanticOutputParser(pydantic_object=Joke)
prompt = PromptTemplate(
    template="...\n{format_instruction}\n\nQuery: {query}\nAnswer:",
    input_variables=['query'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)
chain = prompt | llm | parser
output = chain.invoke({'query': 'Tell me a joke about cats'})
# output is a Joke instance

# Method 2 — with_structured_output (simpler, less control)
structured_llm = llm.with_structured_output(Joke)
output = structured_llm.invoke('Tell me a joke about cats')
# output is a Joke instance

Key Imports

PYTHON
from langchain_core.output_parsers import (
    StrOutputParser,
    PydanticOutputParser,
    JsonOutputParser,
    CommaSeparatedListOutputParser,
)
from pydantic import BaseModel, Field
from typing import Optional

What You Built

In this lesson you went from receiving a raw AIMessage string to getting fully typed, validated Python objects directly out of an LCEL chain.

Here is what each parser gives you:

  • PydanticOutputParser — the most explicit approach: injects a JSON schema into the prompt via get_format_instructions(), then validates and deserializes the response into a Pydantic model. Best when field-level validation matters.
  • .with_structured_output() — the fastest path: bind the schema to the LLM directly and skip manual prompt injection. Best for quick prototyping or when the model reliably follows implicit instructions.
  • JsonOutputParser — returns a raw Python dict without Pydantic validation. Supports streaming, making it the right choice when you want partial JSON results as tokens arrive.
  • CommaSeparatedListOutputParser — the simplest option: instructs the model to return a comma-separated string and splits it into a list[str]. Ideal for keyword generation, tag lists, or any enumeration.

The partial_variables pattern — baking get_format_instructions() into the PromptTemplate once — means your chain always sends the correct schema to the model without you needing to pass it manually on each .invoke() call.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments