Text to SQL Agent with LangChain

Text-to-SQL lets users query databases in plain English. Instead of writing SELECT AVG(salary) FROM employees JOIN dept_emp ON ..., a user types "What is the average salary in the Sales department?" and the agent figures out the rest. In this lesson, we will build that agent with a safety-first approach: every query is validated before execution, destructive operations are blocked, and failed queries are corrected and retried automatically.

The dataset is the employees database, a SQLite port of the classic MySQL employees test database with 300,024 employee records across 6 tables.

Prerequisites: langchain, langchain-ollama, langchain-community, langchain-core, python-dotenv installed. Ollama running with qwen3. The employees_db-full-1.0.6.db SQLite file in a db/ subdirectory.

Step 1: Database Setup

PYTHON

import re
from langchain_ollama import ChatOllama
from langchain_community.utilities import SQLDatabase
from langchain_core.tools import tool
from langchain_core.messages import SystemMessage
from langchain.agents import create_agent
from dotenv import load_dotenv

load_dotenv()

OUTPUT

True

First, we connect to the SQLite employees database:

PYTHON

print("Setting up employees database...")

db = SQLDatabase.from_uri("sqlite:///db/employees_db-full-1.0.6.db")

try:
    tables = db.get_usable_table_names()
    print(f"✔ Database connected successfully")
    print(f"✔ Found {len(tables)} tables: {', '.join(tables)}")
except Exception as e:
    print(f"✘ Database connection failed: {e}")

SCHEMA = db.get_table_info()
print("✔ Connected to employees database")

OUTPUT

Setting up employees database...
✔ Database connected successfully
✔ Found 6 tables: departments, dept_emp, dept_manager, employees, salaries, titles
✔ Connected to employees database

tables

['departments', 'dept_emp', 'dept_manager', 'employees', 'salaries', 'titles']

Let's verify with raw SQL:

PYTHON

db.run("SELECT name FROM sqlite_master WHERE type='table';")

OUTPUT

"[('departments',), ('dept_emp',), ('dept_manager',), ('employees',), ('salaries',), ('titles',)]"

PYTHON

db.run("select count(*) from employees")

OUTPUT

'[(300024,)]'

Here, we can see 300,024 employee records. SCHEMA (from db.get_table_info()) contains the CREATE TABLE statements with sample rows. We pass it to the LLM as context when generating queries.

Step 2: Models Setup

PYTHON

llm = ChatOllama(
    model="qwen3",
    base_url="http://localhost:11434",
    temperature=0
)

response = llm.invoke("Hello, how are you?")
response.pretty_print()
print("✔ Initialized Ollama chat model")

OUTPUT

================================== Ai Message ==================================

Hello! I'm just a chatbot, so I don't have feelings, but I'm here and ready to help! 😊 How can I assist you today?
✔ Initialized Ollama chat model

temperature=0 is essential for SQL generation. The same question must produce the same query every time.

Step 3: SQL Tools

Tool 1: `get_database_schema`

This is the first tool the agent calls for any new question. It fetches the schema for one table, or for all tables at once:

PYTHON

@tool
def get_database_schema(table_name: str = None) -> str:
    """Get database schema information for SQL query generation.
    Use this first to understand table structure before creating queries."""
    print(f"🔍 Getting schema for: {table_name if table_name else 'all tables'}")

    if table_name:
        try:
            tables = db.get_usable_table_names()
            if table_name.lower() in [t.lower() for t in tables]:
                result = db.get_table_info([table_name])
                print(f"✔ Retrieved schema for table: {table_name}")
                return result
            else:
                return f"Error: Table '{table_name}' not found. Available tables: {', '.join(tables)}"
        except Exception as e:
            return f"Error getting table info: {e}"
    else:
        print("✔ Retrieved full database schema")
        return SCHEMA

Let's test it:

PYTHON

result = get_database_schema.invoke({'table_name': 'departments'})
print(result)

PYTHON

🔍 Getting schema for: departments
✔ Retrieved schema for table: departments

CREATE TABLE departments (
	dept_no CHAR(4) NOT NULL,
	dept_name VARCHAR(40) NOT NULL,
	PRIMARY KEY (dept_no),
	UNIQUE (dept_name)
)

/*
3 rows from departments table:
dept_no	dept_name
d009	Customer Service
d005	Development
d002	Finance
*/

Tool 2: `generate_sql_query`

Diagram showing how passing the database schema as context lets the LLM generate accurate SQL without hallucinating table or column names

Schema context lets the LLM generate accurate SQL without hallucinating table or column names.

This tool generates a SELECT query from a plain English question, using the database schema as context:

PYTHON

@tool
def generate_sql_query(question: str, schema_info: str = None) -> str:
    """Generate a SQL SELECT query from a natural language question using database schema.
    Always use this after getting schema information."""
    print(f"🔺 Generating SQL for: {question[:100]}...")

    schema_to_use = schema_info if schema_info else SCHEMA

    prompt = f"""
                Based on this database schema:
                {schema_to_use}

                Generate a SQL query to answer this question: {question}

                Rules:
                - Use only SELECT statements
                - Include only existing columns and tables
                - Add appropriate WHERE, GROUP BY, ORDER BY clauses as needed
                - Limit results to 10 rows unless specified otherwise
                - Use proper SQL syntax for SQLite

                Return only the SQL query, nothing else.
                """

    try:
        response = llm.invoke(prompt)
        query = response.content.strip()
        print(f"✔ Generated SQL query")
        return query
    except Exception as e:
        return f"Error generating query: {e}"

Let's test it:

PYTHON

result = generate_sql_query.invoke({"question": "what is maximum salary in employees"})
print(result)

OUTPUT

🔺 Generating SQL for: what is maximum salary in employees...
✔ Generated SQL query
SELECT MAX(salary) FROM salaries;

Let's verify directly:

PYTHON

db.run("SELECT MAX(salary) FROM salaries;")

OUTPUT

'[(158220,)]'

Here, we can see the maximum salary in the database is 158,220.

Tool 3: `validate_sql_query`

This tool checks the generated SQL for safety before execution. It blocks all destructive operations:

Diagram showing the validation layer blocking destructive SQL commands so only SELECT queries reach the database

Destructive SQL commands are blocked; only SELECT queries are allowed to reach the database.

PYTHON

@tool
def validate_sql_query(query: str) -> str:
    """Validate SQL query for safety and syntax before execution.
    Returns 'Valid: <query>' if safe or 'Error: <message>' if unsafe."""
    print(f"🔍 Validating SQL: {query[:100]}...")

    clean_query = query.strip()

    # Remove markdown SQL code block markers (```sql ... ```)
    clean_query = re.sub(r'```sql\s*', '', clean_query, flags=re.IGNORECASE)
    clean_query = re.sub(r'```\s*', '', clean_query)
    clean_query = clean_query.strip().rstrip(";")

    # Check 1: Must be a SELECT statement
    if not clean_query.lower().startswith("select"):
        return "Error: Only SELECT statements are allowed"

    # Check 2: Block dangerous SQL keywords
    dangerous_keywords = ['INSERT', 'UPDATE', 'DELETE', 'ALTER', 'DROP', 'CREATE', 'TRUNCATE']
    query_upper = clean_query.upper()

    for keyword in dangerous_keywords:
        if keyword in query_upper:
            return f"Error: {keyword} operations are not allowed"

    print("✔ Query validation passed")
    return f"Valid: {clean_query}"

Let's test both a safe and an unsafe query:

PYTHON

result = validate_sql_query.invoke({"query": "SELECT MAX(salary) FROM salaries;"})
print(result)

result = validate_sql_query.invoke({"query": "Update MAX(salary) FROM salaries;"})
print(result)

OUTPUT

🔍 Validating SQL: SELECT MAX(salary) FROM salaries;...
✔ Query validation passed
Valid: SELECT MAX(salary) FROM salaries

🔍 Validating SQL: Update MAX(salary) FROM salaries;...
Error: Only SELECT statements are allowed

Tool 4: `execute_sql_query`

This tool runs a validated query against the database. It strips the "Valid: " prefix from the validator output and handles errors:

PYTHON

@tool
def execute_sql_query(query: str) -> str:
    """Execute a validated SQL query and return results.
    Only use this after validating the query for safety."""
    print(f"🚀 Executing SQL: {query[:100]}...")

    try:
        clean_query = query.strip()
        if clean_query.startswith("Valid: "):
            clean_query = clean_query[7:]  # Remove "Valid: " prefix

        clean_query = re.sub(r'```sql\s*', '', clean_query, flags=re.IGNORECASE)
        clean_query = re.sub(r'```\s*', '', clean_query)
        clean_query = clean_query.strip().rstrip(";")

        result = db.run(clean_query)
        print("✔ Query executed successfully")

        if result:
            return f"Query Results:\n{result}"
        else:
            return "Query executed successfully but returned no results."

    except Exception as e:
        error_msg = f"Execution Error: {str(e)}"
        print(f"✘ {error_msg}")
        return error_msg

Let's test it:

PYTHON

execute_sql_query.invoke({"query": "SELECT MAX(salary) FROM salaries;"})

OUTPUT

🚀 Executing SQL: SELECT MAX(salary) FROM salaries;...
✔ Query executed successfully
'Query Results:\n[(158220,)]'

Tool 5: `fix_sql_error`

When a query fails validation or execution, the agent calls this tool to generate a corrected version:

PYTHON

@tool
def fix_sql_error(original_query: str, error_message: str, question: str) -> str:
    """Fix a failed SQL query by analyzing the error and generating a corrected version.
    Use this when validation or execution fails."""
    print(f"🔺 Fixing SQL error: {error_message[:100]}...")

    fix_prompt = f"""
                    The following SQL query failed:
                    Query: {original_query}
                    Error: {error_message}
                    Original Question: {question}

                    Database Schema:
                    {SCHEMA}

                    Analyze the error and provide a corrected SQL query that:
                    1. Fixes the specific error mentioned
                    2. Still answers the original question
                    3. Uses only valid table and column names from the schema
                    4. Follows SQLite syntax rules

                    Return only the corrected SQL query, nothing else.
                    """

    try:
        response = llm.invoke(fix_prompt)
        fixed_query = response.content.strip()
        print("✔ Generated fixed SQL query")
        return fixed_query
    except Exception as e:
        return f"Error generating fix: {e}"

Let's test it on a deliberately wrong query:

PYTHON

fix_sql_error.invoke({
    "original_query": "Update MAX(salary) FROM salaries;",
    "error_message": "Error: Only SELECT statements are allowed",
    "question": "what is maximum salary in employees"
})

OUTPUT

🔺 Fixing SQL error: Error: Only SELECT statements are allowed...
✔ Generated fixed SQL query
'SELECT MAX(salary) FROM salaries;'

Here, we can see the fix tool turned the destructive UPDATE into a proper SELECT.

Step 4: System Prompt

The system prompt spells out the agent's workflow. Each of the five tools maps to a numbered step:

PYTHON

SQL_SYSTEM_PROMPT = f"""You are an expert SQL analyst working with an employees database.

Database Schema:
{SCHEMA}

Your workflow for answering questions:
1. Use `get_database_schema` first to understand available tables and columns (if needed)
2. Use `generate_sql_query` to create SQL based on the question
3. Use `validate_sql_query` to check the query for safety and syntax
4. Use `execute_sql_query` to run the validated query
5. If there's an error, use `fix_sql_error` to correct it and try again (up to 3 times)
6. Provide a clear answer based on the query results

Rules:
- Always follow the workflow step by step
- If a query fails, use the fix tool and try again
- Provide clear, informative answers
- Be precise with table and column names
- Handle errors gracefully and try to fix them
- If you fail after 3 attempts, explain what went wrong

Available tools for each step:
- get_database_schema: Get table structure info
- generate_sql_query: Create SQL from question
- validate_sql_query: Check query safety/syntax
- execute_sql_query: Run the query
- fix_sql_error: Fix failed queries

Remember: Always validate queries before executing them for safety.
"""

Step 5: Create the SQL Agent

PYTHON

tools = [
    get_database_schema,
    generate_sql_query,
    validate_sql_query,
    execute_sql_query,
    fix_sql_error
]

sql_agent = create_agent(
    llm,
    tools,
    system_prompt=SQL_SYSTEM_PROMPT
)

print("✔ Created SQL agent with create_agent")

OUTPUT

✔ Created SQL agent with create_agent

PYTHON

sql_agent

OUTPUT

<langgraph.graph.state.CompiledStateGraph object at 0x000001D28BDDBE90>

Step 6: Query Functions

Streaming `ask_sql()` Helper

PYTHON

def ask_sql(question: str):
    """Ask the SQL agent a question using the full workflow."""
    print(f"\n{'='*60}")
    print(f"SQL AGENT - Question: {question}")
    print('='*60)

    for event in sql_agent.stream({"messages": question}, stream_mode="values"):
        msg = event["messages"][-1]

        if hasattr(msg, 'tool_calls') and msg.tool_calls:
            for tc in msg.tool_calls:
                print(f"\n🔺 Using Tool: {tc['name']}")
                print(f"Args: {str(tc['args'])[:200]}")

        elif hasattr(msg, 'content') and msg.content:
            print(f"\n💼 Answer:\n{msg.content}")

Can the Agent Fix Its Own Mistakes?

Diagram showing failed queries feeding back into fix_sql_error, where corrected SQL is generated and retried automatically

Failed queries feed back into fix_sql_error; the corrected SQL is retried automatically.

Let's run a complex multi-table join query:

PYTHON

ask_sql("What is the average salary of employees in the Sales department?")

PYTHON

============================================================
SQL AGENT - Question: What is the average salary of employees in the Sales department?
============================================================

💼 Answer:
What is the average salary of employees in the Sales department?

🔺 Using Tool: get_database_schema
Args: {'table_name': 'departments'}
🔍 Getting schema for: departments
✔ Retrieved schema for table: departments

💼 Answer:
CREATE TABLE departments (
	dept_no CHAR(4) NOT NULL,
	dept_name VARCHAR(40) NOT NULL,
	...

🔺 Using Tool: generate_sql_query
Args: {'question': 'What is the average salary of employees in the Sales department?', 'schema_info': '...'}
🔺 Generating SQL for: What is the average salary of employees in the Sales department?...
✔ Generated SQL query

💼 Answer:
SELECT AVG(salary) AS average_salary
FROM employees
JOIN departments ON employees.dept_no = departments.dept_no
WHERE departments.dept_name = 'Sales';

🔺 Using Tool: validate_sql_query
Args: {'query': 'SELECT AVG(salary) AS average_salary\nFROM employees\nJOIN departments ON...'}
🔍 Validating SQL: SELECT AVG(salary) AS average_salary...
✔ Query validation passed

💼 Answer:
Valid: SELECT AVG(salary) AS average_salary
FROM employees
JOIN departments ON employees.dept_no = departments.dept_no
WHERE departments.dept_name = 'Sales'

🔺 Using Tool: execute_sql_query
Args: {'query': 'SELECT AVG(salary) AS average_salary\nFROM employees\nJOIN dept_emp...'}
🚀 Executing SQL: SELECT AVG(salary) AS average_salary
FROM employees
JOIN dept_emp ON employees.emp_no = dept_emp.emp_no...
✘ Execution Error: (sqlite3.OperationalError) near "JOIN": syntax error
[SQL: SELECT AVG(salary)...JOIN salaries ON employees.emp_no = salaries.emp_no]

💼 Answer:
Execution Error: near "JOIN": syntax error...

🔺 Using Tool: fix_sql_error
Args: {'error_message': 'near "JOIN": syntax error', 'original_query': '...'}

Note

This trace shows the self-healing loop in action. The first generated query had a JOIN ordering error. The agent detected the execution failure, called fix_sql_error, generated a corrected query with proper JOIN ordering, and retried, all without any manual intervention.

How Does the Whole Workflow Fit Together?

Diagram of the five-tool SQL agent pipeline: inspect schema, generate SQL, validate, execute, and answer

Five tools form a pipeline: inspect the schema, generate SQL, validate it, execute it, and answer.

PYTHON

User Question: "What is the average salary in Sales?"
     │
     ▼
get_database_schema("departments")    ← Understand table structure
     │
     ▼
generate_sql_query(question, schema)  ← LLM writes the SELECT query
     │
     ▼
validate_sql_query(query)             ← Safety check: must be SELECT only
     │         │
     │  [FAIL] └──── Error: dangerous keyword → STOP
     │
     ▼
execute_sql_query(query)              ← Run against SQLite
     │         │
     │  [FAIL] └──── Execution error → fix_sql_error() → retry (max 3x)
     │
     ▼
LLM synthesizes final answer from query results

What You Built

In this lesson, we built a production-safe Text-to-SQL agent. Let me tabulate the five tools for your better understanding.

Tool	Role	Safety
`get_database_schema`	Inspect table structure before generating queries	Read-only
`generate_sql_query`	LLM generates SELECT query from natural language	SELECT only (instructed)
`validate_sql_query`	Regex validation, blocks 7 destructive keywords	Enforced by code, not LLM
`execute_sql_query`	Runs validated query against SQLite	Only receives pre-validated queries
`fix_sql_error`	LLM rewrites failed queries with error context	SELECT only (instructed)

Key design decisions:

Safety is enforced in code: validate_sql_query uses regex, not LLM judgment
Self-healing: the fix_sql_error + retry loop handles LLM SQL generation mistakes automatically
Schema in system prompt: the agent always has full schema context, reducing hallucinated column names

This is how a Text-to-SQL agent works. The user speaks English, the tools speak SQL, and the validation layer stands between the model and the database.

Text to SQL Agent with LangChain

LangChain & Ollama - Local AI Development

Step 1: Database Setup

Step 2: Models Setup

Step 3: SQL Tools

Tool 1: `get_database_schema`

Tool 2: `generate_sql_query`

Tool 3: `validate_sql_query`

Tool 4: `execute_sql_query`

Tool 5: `fix_sql_error`

Step 4: System Prompt

Step 5: Create the SQL Agent

Step 6: Query Functions

Streaming `ask_sql()` Helper

Can the Agent Fix Its Own Mistakes?

How Does the Whole Workflow Fit Together?

What You Built

Found this useful? Keep building with me.

Latest recommendations you might like

Agentic RAG with LangChain, FAISS, and Ollama

LangChain Agents with create_agent

LangChain Expression Language & Chains

LangChain Chat Message Memory

Find this tutorial useful?

Discussion & Comments

LangChain & Ollama - Local AI Development

Step 1: Database Setup

Step 2: Models Setup

Step 3: SQL Tools

Tool 1: get_database_schema

Tool 2: generate_sql_query

Tool 3: validate_sql_query

Tool 4: execute_sql_query

Tool 5: fix_sql_error

Step 4: System Prompt

Step 5: Create the SQL Agent

Step 6: Query Functions

Streaming ask_sql() Helper

Can the Agent Fix Its Own Mistakes?

How Does the Whole Workflow Fit Together?

What You Built

Found this useful? Keep building with me.

Latest recommendations you might like

Agentic RAG with LangChain, FAISS, and Ollama

LangChain Agents with create_agent

LangChain Expression Language & Chains

LangChain Chat Message Memory

Find this tutorial useful?

Discussion & Comments

Tool 1: `get_database_schema`

Tool 2: `generate_sql_query`

Tool 3: `validate_sql_query`

Tool 4: `execute_sql_query`

Tool 5: `fix_sql_error`

Streaming `ask_sql()` Helper