What Makes This Different
Most Text-to-SQL tutorials show you how to generate SQL from natural language. This one goes further:- Knowledge-Based Query Generation: The agent searches a knowledge base before writing SQL, ensuring consistent patterns
- Data Quality Handling: Instead of cleaning messy data, the agent learns to handle inconsistencies (mixed types, date formats, naming conventions)
- Self-Learning Loop: Users can save validated queries, which the agent retrieves for similar future questions
What You’ll Learn
| Concept | Description |
|---|---|
| Semantic Model | Define table metadata to guide query generation |
| Knowledge Base | Store and retrieve query patterns and data quality notes |
| Data Quality Handling | Handle type mismatches and inconsistencies without ETL |
| Self-Learning | Save validated queries to improve future responses |
| Agentic Memory | Remember user preferences across sessions |
Prerequisites
- Python 3.12+
- Docker (for PostgreSQL with pgvector)
- OpenAI API key
Setup
1
Clone the repository
2
Create and activate virtual environment
3
Install dependencies
4
Set environment variables
5
Start PostgreSQL with pgvector
6
Check setup
7
Load F1 data and knowledge base
Run the Agent
Basic Queries
Run simple aggregation and filtering queries:- Simple aggregation queries (counts, sums)
- Filtering by year or driver
- Using the semantic model to find relevant tables
Self-Learning Loop
See how the agent saves validated queries to improve:- Query execution and validation
- Saving queries to the knowledge base
- How saved queries improve future responses
Edge Cases
Test complex queries and error handling:Evaluate Accuracy
Run automated accuracy testing:Agent Configuration
| Parameter | Purpose |
|---|---|
model | GPT-5.2 for query generation and reasoning |
db | PostgreSQL connection for query execution |
knowledge | Vector database storing query patterns and table metadata |
SQLTools | Execute SQL queries against the database |
ReasoningTools | Think step-by-step before query construction |
save_validated_query | Custom tool to persist successful queries |
enable_agentic_memory | Remember user preferences across sessions |
search_knowledge | Search knowledge base before writing SQL |
add_history_to_context | Include recent conversation for context |
How It Works
Query Workflow
Knowledge Base
The knowledge base stores three types of information:| Type | Purpose |
|---|---|
| Table metadata | Column names, types, and descriptions |
| Query patterns | Reusable SQL patterns for common operations |
| Validated queries | User-approved queries saved for retrieval |
Semantic Model
The semantic model provides high-level context about available tables:Self-Learning Loop
- Agent executes a query and validates results
- Agent asks: “Would you like to save this query?”
- If confirmed, stores the question, SQL, and explanation
- Future similar questions retrieve this pattern automatically
Troubleshooting
Database connection refused
Database connection refused
Ensure PostgreSQL is running:If not running:
Query returns wrong results for position
Query returns wrong results for position
The
position column in drivers_championship is TEXT, not INT. Use string comparison:Knowledge base not found
Knowledge base not found
Ensure you’ve loaded the knowledge base: