Core Concepts
AI Agents
How GoPie's intelligent agents understand and execute your data queries
GoPie uses advanced AI agents built with LangGraph to transform natural language queries into precise SQL statements and meaningful visualizations. This document explains the agent architecture, decision-making process, and how they work together to provide intelligent data analysis.
Agent Architecture
LangGraph Workflow
Core Components
1. Query Router
The entry point that analyzes user intent:
- Natural language understanding
- Query classification (simple/complex/multi-dataset)
- Context extraction
- Tool selection
2. Schema Agent
Understands your data structure:
- Searches relevant tables and columns
- Uses vector similarity for semantic matching
- Provides context to SQL generation
- Handles ambiguous references
3. SQL Agent
Generates optimized SQL queries:
- Context-aware SQL generation
- Dialect-specific optimization
- Security constraint enforcement
- Performance consideration
4. Validation Agent
Ensures query correctness:
- Syntax validation
- Permission checking
- Resource limit verification
- Safety analysis
5. Execution Agent
Manages query execution:
- Engine selection (PostgreSQL/DuckDB)
- Resource allocation
- Progress tracking
- Result streaming
6. Visualization Agent
Creates meaningful charts:
- Automatic chart type selection
- Data-driven visualization choices
- Interactive feature generation
- Export format handling
Natural Language Processing
Understanding User Intent
Query Parsing
User: "Show me sales by region for last quarter"
Extracted:
- Action: aggregate query
- Measure: sales
- Dimension: region
- Time: last quarter
- Visualization: implicit bar/column chart
Context Awareness
The agents maintain conversation context:
- Previous queries in session
- Referenced datasets
- User preferences
- Domain knowledge
Semantic Understanding
Entity Recognition
- Tables: "customers", "users", "accounts" → user_accounts table
- Columns: "revenue", "sales", "income" → revenue_amount column
- Time: "last month", "Q3", "yesterday" → appropriate date filters
- Aggregations: "total", "average", "by month" → SUM(), AVG(), GROUP BY
Ambiguity Resolution
When multiple interpretations exist:
- Use conversation context
- Apply domain heuristics
- Consider data types
- Ask clarifying questions
SQL Generation
Template-Based Generation
Query Templates
-- Aggregation Template
SELECT
{dimensions},
{aggregate_function}({measure}) as {alias}
FROM {table}
WHERE {filters}
GROUP BY {dimensions}
ORDER BY {order_column} {direction}
-- Time Series Template
SELECT
DATE_TRUNC('{interval}', {date_column}) as period,
{aggregations}
FROM {table}
WHERE {date_column} >= {start_date}
GROUP BY period
ORDER BY period
Optimization Strategies
Query Optimization
- Index usage hints
- Join order optimization
- Subquery vs CTE decisions
- Materialized view utilization
Performance Considerations
# Query complexity estimation
if estimated_rows > 1_000_000:
use_sampling = True
add_limit_clause = True
prefer_duckdb = True
Tool Selection
Available Tools
1. Schema Search Tool
- Vector similarity search
- Keyword matching
- Metadata retrieval
- Relationship discovery
2. SQL Execution Tool
- Query execution
- Result formatting
- Error handling
- Performance monitoring
3. Python Code Tool
- Complex calculations
- Data transformations
- Statistical analysis
- Custom visualizations
4. Visualization Tool
- Chart generation
- Style customization
- Interactive features
- Export capabilities
Tool Chain Example
Query: "What's the correlation between price and sales?"
Chain:
1. Schema Search → find price and sales columns
2. SQL Execution → retrieve data
3. Python Code → calculate correlation
4. Visualization → create scatter plot
Memory and Context
Conversation Memory
Short-term Memory
- Current session queries
- Active dataset context
- Recent results
- User corrections
Long-term Memory
- User preferences
- Common query patterns
- Dataset relationships
- Historical insights
Context Propagation
Initial Query: "Show me top customers"
Context Stored:
- Table: customers
- Metric: revenue
- Order: descending
- Limit: 10
Follow-up: "What about last year?"
Context Applied:
- Previous table and metric
- Add time filter for last year
- Maintain ordering and limit
Error Recovery
Intelligent Error Handling
SQL Errors
- Syntax Errors: Auto-correct common mistakes
- Missing Columns: Suggest alternatives
- Type Mismatches: Apply appropriate casts
- Permission Errors: Explain and suggest alternatives
Fallback Strategies
Primary: Execute exact SQL
Fallback 1: Simplify query
Fallback 2: Use alternative approach
Fallback 3: Explain limitation
Learning from Errors
- Pattern recognition
- Common mistake database
- Proactive suggestions
- Improved future queries
Advanced Capabilities
Multi-Step Reasoning
Complex Analysis
For queries requiring multiple steps:
- Break down into sub-queries
- Execute sequentially
- Combine results
- Apply final transformations
Example:
"Compare this year's growth rate to industry average"
Steps:
1. Calculate this year's metrics
2. Calculate growth rate
3. Fetch industry benchmarks
4. Compute comparison
5. Visualize differences
Adaptive Behavior
User Profiling
- Technical expertise level
- Preferred visualization types
- Common query patterns
- Domain expertise
Response Adaptation
if user.expertise == "beginner":
include_explanations = True
use_simple_charts = True
avoid_technical_terms = True
else:
provide_sql_query = True
include_advanced_options = True
Integration Points
LLM Integration
- OpenAI GPT-4 for understanding
- Local models for sensitive data
- Fallback to simpler models
- Cost optimization strategies
Vector Database
- Qdrant for semantic search
- Embedding models
- Similarity algorithms
- Cache strategies
Workflow Orchestration
- LangGraph state management
- Parallel execution
- Checkpoint recovery
- Performance monitoring
Best Practices
For Optimal Results
- Be Specific: Include table/column names when known
- Provide Context: Mention the analysis goal
- Use Examples: "Like the report from last week"
- Iterate: Refine queries based on results
Query Writing Tips
- Start simple, then add complexity
- Use business terms (agents will translate)
- Specify time ranges clearly
- Indicate preferred visualizations
Future Enhancements
Planned Capabilities
- Predictive analytics
- Anomaly detection
- Natural language explanations
- Automated insight generation
Research Areas
- Few-shot learning for new domains
- Multi-modal inputs (voice, images)
- Collaborative filtering
- Causal inference
Next Steps
- Explore Vector Search for semantic capabilities
- Learn about Database Architecture
- Understand Multi-tenancy for team usage