Your First Dataset

Step-by-step guide to uploading, exploring, and querying your first dataset in GoPie

This comprehensive tutorial walks you through uploading your first dataset to GoPie, exploring its features, and mastering both natural language and SQL queries. By the end, you'll understand how to leverage GoPie's full potential for data analysis.

Time Required: 15-20 minutes | Difficulty: Beginner-friendly

What You'll Learn

  • Upload and configure datasets
  • Explore data with natural language
  • Create visualizations instantly
  • Access auto-generated APIs
  • Advanced querying techniques
  • Best practices and tips

Prerequisites

Before starting, ensure you have:

  1. GoPie running locally (Quick Start Guide)
  2. A dataset file (CSV, Excel, or Parquet)
  3. Web browser opened to http://localhost:3000

Need Sample Data? Download our example datasets:

Step 1: Upload Your Dataset

  1. Open GoPie at http://localhost:3000
  2. Click "Datasets" in the navigation menu
  3. Click the "New Dataset" button

Choose Upload Method

Drag & Drop or Browse

  1. Drag your file into the upload area
  2. Or click "Browse" to select a file
  3. Supported formats: CSV, Excel, Parquet, JSON

File requirements:

  • Maximum size: 500MB (configurable)
  • First row should contain column headers
  • UTF-8 encoding recommended

Google Drive / OneDrive

  1. Click "Import from Cloud"
  2. Authenticate with your account
  3. Navigate to your file
  4. Select and import

Features:

  • Direct integration
  • Large file support
  • Automatic sync options

Database Connection

  1. Select "Database" as source
  2. Choose database type (PostgreSQL, MySQL)
  3. Enter connection details:
    Host: your-database.com
    Port: 5432
    Database: production
    Username: readonly_user
    Password: ********
  4. Select tables to import

Direct URL Import

  1. Select "Import from URL"
  2. Enter the file URL:
    https://example.com/data/sales-2024.csv
  3. Click "Import"

Supports:

  • Public URLs
  • Basic authentication
  • Automatic format detection

Configure Dataset

After upload, configure your dataset:

  1. Dataset Name: Give it a meaningful name

    Sales Data 2024
  2. Description (optional): Add context

    E-commerce sales transactions for fiscal year 2024
  3. Tags (optional): Add searchable tags

    sales, revenue, e-commerce, 2024
  4. Click "Create Dataset"

Step 2: Data Processing & Validation

Once uploaded, GoPie automatically:

Processing Steps

  1. File Parsing: Reads and validates file format
  2. Type Detection: Infers data types for each column
  3. Data Cleaning: Handles missing values and formatting
  4. Schema Generation: Creates optimized database schema
  5. Indexing: Builds indexes for fast querying
  6. Vector Embedding: Creates semantic search indexes

Processing Results

You'll see a summary screen showing:

File uploaded successfully
10,000 rows imported
15 columns detected
Schema created
API endpoints generated
Ready for queries!

Processing time: 12.3 seconds

Auto-Generated API: Your dataset is now available at:

GET http://localhost:8000/api/v1/datasets/{dataset-id}/data

Step 3: Explore Your Data

Schema Overview

GoPie displays your dataset schema:

Column NameData TypeSample ValuesNull Count
order_idINTEGER1001, 1002, 10030
customer_nameVARCHAR"John Doe", "Jane Smith"0
order_dateDATE2024-01-15, 2024-01-160
productVARCHAR"Laptop", "Phone"0
quantityINTEGER1, 2, 50
priceDECIMAL999.99, 1299.000
totalDECIMAL999.99, 2598.000

Column Descriptions

GoPie automatically generates descriptions:

  • order_id: Unique identifier for each order
  • customer_name: Name of the customer who placed the order
  • order_date: Date when the order was placed
  • product: Product name or SKU
  • quantity: Number of items ordered
  • price: Unit price of the product
  • total: Total order value (quantity × price)

Step 4: Natural Language Queries

Your First Query

In the chat interface, try these queries:

Simple Data Exploration

Show me the first 10 rows
How many orders are in the dataset?
What products do we sell?
List all columns in the table

Business Intelligence

What's the total revenue by month?
Show me the top 10 customers by total purchases
Which products generate the most revenue?
What's the average order value by product category?

Complex Analysis

Calculate month-over-month growth rate
Find customers who haven't ordered in the last 90 days
What's the correlation between order size and customer lifetime value?
Show me seasonal trends in product sales

Understanding the Response

For each query, GoPie provides:

  1. Natural Language Interpretation: What GoPie understood
  2. Generated SQL: The actual query executed
  3. Results Table: Data in tabular format
  4. Visualization: Automatic chart if applicable
  5. Export Options: Download as CSV, JSON, or Excel

Example response:

Query: "What's the total revenue by month?"

Generated SQL:
SELECT 
    DATE_TRUNC('month', order_date) as month,
    SUM(total) as revenue
FROM sales_data_2024
GROUP BY month
ORDER BY month;

Results: 
| month      | revenue    |
|------------|------------|
| 2024-01-01 | 125,432.50 |
| 2024-02-01 | 143,221.75 |
| 2024-03-01 | 156,789.25 |

Step 5: SQL Playground

Direct SQL Queries

Click on "SQL Editor" to write custom queries:

-- Customer segmentation by purchase frequency
WITH customer_stats AS (
    SELECT 
        customer_name,
        COUNT(DISTINCT order_id) as order_count,
        SUM(total) as total_spent,
        AVG(total) as avg_order_value,
        MAX(order_date) as last_order_date
    FROM sales_data_2024
    GROUP BY customer_name
)
SELECT 
    customer_name,
    order_count,
    total_spent,
    CASE 
        WHEN order_count >= 10 THEN 'VIP'
        WHEN order_count >= 5 THEN 'Regular'
        ELSE 'New'
    END as customer_segment
FROM customer_stats
ORDER BY total_spent DESC;

SQL Features

  • Syntax Highlighting: Color-coded SQL syntax
  • Auto-completion: Table and column name suggestions
  • Query History: Access previous queries
  • Execution Plan: View query performance
  • Results Export: Download query results

Step 6: Visualizations

Automatic Visualizations

GoPie automatically creates appropriate charts:

Query TypeVisualizationExample
Time SeriesLine ChartSales over time
CategoriesBar ChartRevenue by product
DistributionHistogramOrder value distribution
CorrelationScatter PlotPrice vs. quantity
CompositionPie ChartMarket share by category

Customizing Visualizations

Click "Customize Chart" to:

  1. Change chart type
  2. Adjust colors and styling
  3. Add titles and labels
  4. Configure axes
  5. Set data filters

Example Vega-Lite specification:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "mark": "bar",
  "encoding": {
    "x": {"field": "product", "type": "nominal"},
    "y": {"field": "revenue", "type": "quantitative", "aggregate": "sum"},
    "color": {"field": "category", "type": "nominal"}
  }
}

Step 7: Using the Auto-Generated API

API Endpoints

Your dataset automatically gets REST API endpoints:

# Get all data (paginated)
GET /api/v1/datasets/{dataset-id}/data

# With filters
GET /api/v1/datasets/{dataset-id}/data?product=Laptop&limit=100

# Aggregations
GET /api/v1/datasets/{dataset-id}/data/aggregate?group_by=product&aggregate=sum&column=total

# Search
GET /api/v1/datasets/{dataset-id}/data/search?q=iPhone

API Usage Examples

# Fetch data with filters
curl -X GET "http://localhost:8000/api/v1/datasets/abc123/data?limit=10&offset=0" \
  -H "Accept: application/json"

# Response
{
  "data": [
    {
      "order_id": 1001,
      "customer_name": "John Doe",
      "product": "Laptop",
      "total": 999.99
    }
  ],
  "pagination": {
    "total": 10000,
    "limit": 10,
    "offset": 0
  }
}
// Fetch data using fetch API
async function getSalesData() {
  const response = await fetch(
    'http://localhost:8000/api/v1/datasets/abc123/data?limit=10'
  );
  const data = await response.json();
  console.log(data);
}

// Using with React
const { data, error, isLoading } = useSWR(
  '/api/v1/datasets/abc123/data',
  fetcher
);
import requests
import pandas as pd

# Fetch data
response = requests.get(
    'http://localhost:8000/api/v1/datasets/abc123/data',
    params={'limit': 1000}
)

# Convert to DataFrame
df = pd.DataFrame(response.json()['data'])
print(df.head())

# Analyze
print(f"Total revenue: ${df['total'].sum():,.2f}")
  1. Import the OpenAPI spec from:

    http://localhost:8000/api/v1/openapi.json
  2. Use the generated collection

  3. Set base URL:

    {{baseUrl}} = http://localhost:8000
  4. Available endpoints will auto-populate

API Features

  • Pagination: Automatic pagination for large datasets
  • Filtering: Column-based filtering
  • Sorting: Sort by any column
  • Search: Full-text search across data
  • Aggregations: SUM, AVG, COUNT, etc.
  • Export Formats: JSON, CSV, Excel

Step 8: Advanced Features

1. Cross-Dataset Queries

If you have multiple related datasets:

Show me customers who bought products from both electronics and clothing categories

GoPie automatically detects relationships and performs JOINs.

2. Saved Queries

Save frequently used queries:

  1. After running a query, click "Save Query"
  2. Name it: "Monthly Revenue Report"
  3. Access from "Saved Queries" menu
  4. Schedule for regular execution

3. Data Refresh

For database-connected datasets:

  1. Go to Dataset Settings
  2. Enable "Auto Refresh"
  3. Set schedule (hourly, daily, weekly)
  4. Configure incremental updates

4. Sharing and Collaboration

Share your insights:

  • Public Link: Generate shareable URLs
  • Embed Code: Embed charts in websites
  • Export Reports: PDF or PowerPoint
  • API Access: Share API credentials

Best Practices

1. Data Preparation

Clean Data = Better Results

  • Ensure consistent date formats
  • Remove duplicate headers
  • Handle missing values appropriately
  • Use meaningful column names

2. Query Optimization

  • Start with simple queries and build complexity
  • Use filters to reduce data volume
  • Leverage indexes for large datasets
  • Cache frequently accessed results

3. Natural Language Tips

  • Be specific: "revenue by month" vs "monthly sales"
  • Include time ranges: "last 30 days", "Q1 2024"
  • Use business terms GoPie understands
  • Iterate and refine queries

4. Security Considerations

  • Use read-only database credentials
  • Implement row-level security if needed
  • Regularly audit API access
  • Encrypt sensitive data

Troubleshooting Common Issues

Upload fails with "Invalid format" error

Check:

  • File has headers in first row
  • No merged cells in Excel files
  • CSV uses comma delimiter (configurable)
  • File encoding is UTF-8
Natural language query returns unexpected results

Try:

  • Rephrasing the question
  • Using exact column names
  • Adding more context
  • Checking the generated SQL
API returns 404 error

Verify:

  • Dataset ID is correct
  • Dataset processing completed
  • API service is running
  • Authentication token (if enabled)

What's Next?

Congratulations! You've successfully:

  • Uploaded your first dataset
  • Explored data with natural language
  • Created visualizations
  • Accessed the auto-generated API

Continue Learning

  1. User Guide - Deep dive into all features
  2. Natural Language Tips - Master query writing
  3. API Development - Build apps with GoPie
  4. Data Sources - Connect to databases

Try These Challenges

  1. Upload a second dataset and perform a cross-dataset analysis
  2. Create a dashboard with multiple visualizations
  3. Build a simple web app using the API
  4. Schedule a daily report

You're Now a GoPie User!

Join our community Discord to share your experience, get help, and see what others are building with GoPie.