Core Concepts

AI Agents

How GoPie's intelligent agents understand and execute your data queries

GoPie uses advanced AI agents built with LangGraph to transform natural language queries into precise SQL statements and meaningful visualizations. This document explains the agent architecture, decision-making process, and how they work together to provide intelligent data analysis.

Agent Architecture

LangGraph Workflow

Core Components

1. Query Router

The entry point that analyzes user intent:

  • Natural language understanding
  • Query classification (simple/complex/multi-dataset)
  • Context extraction
  • Tool selection

2. Schema Agent

Understands your data structure:

  • Searches relevant tables and columns
  • Uses vector similarity for semantic matching
  • Provides context to SQL generation
  • Handles ambiguous references

3. SQL Agent

Generates optimized SQL queries:

  • Context-aware SQL generation
  • Dialect-specific optimization
  • Security constraint enforcement
  • Performance consideration

4. Validation Agent

Ensures query correctness:

  • Syntax validation
  • Permission checking
  • Resource limit verification
  • Safety analysis

5. Execution Agent

Manages query execution:

  • Engine selection (PostgreSQL/DuckDB)
  • Resource allocation
  • Progress tracking
  • Result streaming

6. Visualization Agent

Creates meaningful charts:

  • Automatic chart type selection
  • Data-driven visualization choices
  • Interactive feature generation
  • Export format handling

Natural Language Processing

Understanding User Intent

Query Parsing

User: "Show me sales by region for last quarter"

Extracted:
- Action: aggregate query
- Measure: sales
- Dimension: region
- Time: last quarter
- Visualization: implicit bar/column chart

Context Awareness

The agents maintain conversation context:

  • Previous queries in session
  • Referenced datasets
  • User preferences
  • Domain knowledge

Semantic Understanding

Entity Recognition

  • Tables: "customers", "users", "accounts" → user_accounts table
  • Columns: "revenue", "sales", "income" → revenue_amount column
  • Time: "last month", "Q3", "yesterday" → appropriate date filters
  • Aggregations: "total", "average", "by month" → SUM(), AVG(), GROUP BY

Ambiguity Resolution

When multiple interpretations exist:

  1. Use conversation context
  2. Apply domain heuristics
  3. Consider data types
  4. Ask clarifying questions

SQL Generation

Template-Based Generation

Query Templates

-- Aggregation Template
SELECT 
    {dimensions},
    {aggregate_function}({measure}) as {alias}
FROM {table}
WHERE {filters}
GROUP BY {dimensions}
ORDER BY {order_column} {direction}

-- Time Series Template
SELECT 
    DATE_TRUNC('{interval}', {date_column}) as period,
    {aggregations}
FROM {table}
WHERE {date_column} >= {start_date}
GROUP BY period
ORDER BY period

Optimization Strategies

Query Optimization

  • Index usage hints
  • Join order optimization
  • Subquery vs CTE decisions
  • Materialized view utilization

Performance Considerations

# Query complexity estimation
if estimated_rows > 1_000_000:
    use_sampling = True
    add_limit_clause = True
    prefer_duckdb = True

Tool Selection

Available Tools

1. Schema Search Tool

  • Vector similarity search
  • Keyword matching
  • Metadata retrieval
  • Relationship discovery

2. SQL Execution Tool

  • Query execution
  • Result formatting
  • Error handling
  • Performance monitoring

3. Python Code Tool

  • Complex calculations
  • Data transformations
  • Statistical analysis
  • Custom visualizations

4. Visualization Tool

  • Chart generation
  • Style customization
  • Interactive features
  • Export capabilities

Tool Chain Example

Query: "What's the correlation between price and sales?"

Chain:
1. Schema Search → find price and sales columns
2. SQL Execution → retrieve data
3. Python Code → calculate correlation
4. Visualization → create scatter plot

Memory and Context

Conversation Memory

Short-term Memory

  • Current session queries
  • Active dataset context
  • Recent results
  • User corrections

Long-term Memory

  • User preferences
  • Common query patterns
  • Dataset relationships
  • Historical insights

Context Propagation

Initial Query: "Show me top customers"
Context Stored:
  - Table: customers
  - Metric: revenue
  - Order: descending
  - Limit: 10

Follow-up: "What about last year?"
Context Applied:
  - Previous table and metric
  - Add time filter for last year
  - Maintain ordering and limit

Error Recovery

Intelligent Error Handling

SQL Errors

  1. Syntax Errors: Auto-correct common mistakes
  2. Missing Columns: Suggest alternatives
  3. Type Mismatches: Apply appropriate casts
  4. Permission Errors: Explain and suggest alternatives

Fallback Strategies

Primary: Execute exact SQL
Fallback 1: Simplify query
Fallback 2: Use alternative approach
Fallback 3: Explain limitation

Learning from Errors

  • Pattern recognition
  • Common mistake database
  • Proactive suggestions
  • Improved future queries

Advanced Capabilities

Multi-Step Reasoning

Complex Analysis

For queries requiring multiple steps:

  1. Break down into sub-queries
  2. Execute sequentially
  3. Combine results
  4. Apply final transformations

Example:

"Compare this year's growth rate to industry average"
Steps:
1. Calculate this year's metrics
2. Calculate growth rate
3. Fetch industry benchmarks
4. Compute comparison
5. Visualize differences

Adaptive Behavior

User Profiling

  • Technical expertise level
  • Preferred visualization types
  • Common query patterns
  • Domain expertise

Response Adaptation

if user.expertise == "beginner":
    include_explanations = True
    use_simple_charts = True
    avoid_technical_terms = True
else:
    provide_sql_query = True
    include_advanced_options = True

Integration Points

LLM Integration

  • OpenAI GPT-4 for understanding
  • Local models for sensitive data
  • Fallback to simpler models
  • Cost optimization strategies

Vector Database

  • Qdrant for semantic search
  • Embedding models
  • Similarity algorithms
  • Cache strategies

Workflow Orchestration

  • LangGraph state management
  • Parallel execution
  • Checkpoint recovery
  • Performance monitoring

Best Practices

For Optimal Results

  1. Be Specific: Include table/column names when known
  2. Provide Context: Mention the analysis goal
  3. Use Examples: "Like the report from last week"
  4. Iterate: Refine queries based on results

Query Writing Tips

  • Start simple, then add complexity
  • Use business terms (agents will translate)
  • Specify time ranges clearly
  • Indicate preferred visualizations

Future Enhancements

Planned Capabilities

  • Predictive analytics
  • Anomaly detection
  • Natural language explanations
  • Automated insight generation

Research Areas

  • Few-shot learning for new domains
  • Multi-modal inputs (voice, images)
  • Collaborative filtering
  • Causal inference

Next Steps