Your First Dataset
Step-by-step guide to uploading, exploring, and querying your first dataset in GoPie
This comprehensive tutorial walks you through uploading your first dataset to GoPie, exploring its features, and mastering both natural language and SQL queries. By the end, you'll understand how to leverage GoPie's full potential for data analysis.
Time Required: 15-20 minutes | Difficulty: Beginner-friendly
What You'll Learn
- Upload and configure datasets
- Explore data with natural language
- Create visualizations instantly
- Access auto-generated APIs
- Advanced querying techniques
- Best practices and tips
Prerequisites
Before starting, ensure you have:
- GoPie running locally (Quick Start Guide)
- A dataset file (CSV, Excel, or Parquet)
- Web browser opened to http://localhost:3000
Need Sample Data? Download our example datasets:
- Sales Data (CSV) - E-commerce transactions
- Customer Data (Excel) - Customer demographics
- Financial Data (Parquet) - Company financials
Step 1: Upload Your Dataset
Navigate to Datasets
- Open GoPie at http://localhost:3000
- Click "Datasets" in the navigation menu
- Click the "New Dataset" button
Choose Upload Method
Drag & Drop or Browse
- Drag your file into the upload area
- Or click "Browse" to select a file
- Supported formats: CSV, Excel, Parquet, JSON
File requirements:
- Maximum size: 500MB (configurable)
- First row should contain column headers
- UTF-8 encoding recommended
Google Drive / OneDrive
- Click "Import from Cloud"
- Authenticate with your account
- Navigate to your file
- Select and import
Features:
- Direct integration
- Large file support
- Automatic sync options
Database Connection
- Select "Database" as source
- Choose database type (PostgreSQL, MySQL)
- Enter connection details:
Host: your-database.com Port: 5432 Database: production Username: readonly_user Password: ********
- Select tables to import
Direct URL Import
- Select "Import from URL"
- Enter the file URL:
https://example.com/data/sales-2024.csv
- Click "Import"
Supports:
- Public URLs
- Basic authentication
- Automatic format detection
Configure Dataset
After upload, configure your dataset:
-
Dataset Name: Give it a meaningful name
Sales Data 2024
-
Description (optional): Add context
E-commerce sales transactions for fiscal year 2024
-
Tags (optional): Add searchable tags
sales, revenue, e-commerce, 2024
-
Click "Create Dataset"
Step 2: Data Processing & Validation
Once uploaded, GoPie automatically:
Processing Steps
- File Parsing: Reads and validates file format
- Type Detection: Infers data types for each column
- Data Cleaning: Handles missing values and formatting
- Schema Generation: Creates optimized database schema
- Indexing: Builds indexes for fast querying
- Vector Embedding: Creates semantic search indexes
Processing Results
You'll see a summary screen showing:
File uploaded successfully
10,000 rows imported
15 columns detected
Schema created
API endpoints generated
Ready for queries!
Processing time: 12.3 seconds
Auto-Generated API: Your dataset is now available at:
GET http://localhost:8000/api/v1/datasets/{dataset-id}/data
Step 3: Explore Your Data
Schema Overview
GoPie displays your dataset schema:
Column Name | Data Type | Sample Values | Null Count |
---|---|---|---|
order_id | INTEGER | 1001, 1002, 1003 | 0 |
customer_name | VARCHAR | "John Doe", "Jane Smith" | 0 |
order_date | DATE | 2024-01-15, 2024-01-16 | 0 |
product | VARCHAR | "Laptop", "Phone" | 0 |
quantity | INTEGER | 1, 2, 5 | 0 |
price | DECIMAL | 999.99, 1299.00 | 0 |
total | DECIMAL | 999.99, 2598.00 | 0 |
Column Descriptions
GoPie automatically generates descriptions:
- order_id: Unique identifier for each order
- customer_name: Name of the customer who placed the order
- order_date: Date when the order was placed
- product: Product name or SKU
- quantity: Number of items ordered
- price: Unit price of the product
- total: Total order value (quantity × price)
Step 4: Natural Language Queries
Your First Query
In the chat interface, try these queries:
Simple Data Exploration
Show me the first 10 rows
How many orders are in the dataset?
What products do we sell?
List all columns in the table
Business Intelligence
What's the total revenue by month?
Show me the top 10 customers by total purchases
Which products generate the most revenue?
What's the average order value by product category?
Complex Analysis
Calculate month-over-month growth rate
Find customers who haven't ordered in the last 90 days
What's the correlation between order size and customer lifetime value?
Show me seasonal trends in product sales
Understanding the Response
For each query, GoPie provides:
- Natural Language Interpretation: What GoPie understood
- Generated SQL: The actual query executed
- Results Table: Data in tabular format
- Visualization: Automatic chart if applicable
- Export Options: Download as CSV, JSON, or Excel
Example response:
Query: "What's the total revenue by month?"
Generated SQL:
SELECT
DATE_TRUNC('month', order_date) as month,
SUM(total) as revenue
FROM sales_data_2024
GROUP BY month
ORDER BY month;
Results:
| month | revenue |
|------------|------------|
| 2024-01-01 | 125,432.50 |
| 2024-02-01 | 143,221.75 |
| 2024-03-01 | 156,789.25 |
Step 5: SQL Playground
Direct SQL Queries
Click on "SQL Editor" to write custom queries:
-- Customer segmentation by purchase frequency
WITH customer_stats AS (
SELECT
customer_name,
COUNT(DISTINCT order_id) as order_count,
SUM(total) as total_spent,
AVG(total) as avg_order_value,
MAX(order_date) as last_order_date
FROM sales_data_2024
GROUP BY customer_name
)
SELECT
customer_name,
order_count,
total_spent,
CASE
WHEN order_count >= 10 THEN 'VIP'
WHEN order_count >= 5 THEN 'Regular'
ELSE 'New'
END as customer_segment
FROM customer_stats
ORDER BY total_spent DESC;
SQL Features
- Syntax Highlighting: Color-coded SQL syntax
- Auto-completion: Table and column name suggestions
- Query History: Access previous queries
- Execution Plan: View query performance
- Results Export: Download query results
Step 6: Visualizations
Automatic Visualizations
GoPie automatically creates appropriate charts:
Query Type | Visualization | Example |
---|---|---|
Time Series | Line Chart | Sales over time |
Categories | Bar Chart | Revenue by product |
Distribution | Histogram | Order value distribution |
Correlation | Scatter Plot | Price vs. quantity |
Composition | Pie Chart | Market share by category |
Customizing Visualizations
Click "Customize Chart" to:
- Change chart type
- Adjust colors and styling
- Add titles and labels
- Configure axes
- Set data filters
Example Vega-Lite specification:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"mark": "bar",
"encoding": {
"x": {"field": "product", "type": "nominal"},
"y": {"field": "revenue", "type": "quantitative", "aggregate": "sum"},
"color": {"field": "category", "type": "nominal"}
}
}
Step 7: Using the Auto-Generated API
API Endpoints
Your dataset automatically gets REST API endpoints:
# Get all data (paginated)
GET /api/v1/datasets/{dataset-id}/data
# With filters
GET /api/v1/datasets/{dataset-id}/data?product=Laptop&limit=100
# Aggregations
GET /api/v1/datasets/{dataset-id}/data/aggregate?group_by=product&aggregate=sum&column=total
# Search
GET /api/v1/datasets/{dataset-id}/data/search?q=iPhone
API Usage Examples
# Fetch data with filters
curl -X GET "http://localhost:8000/api/v1/datasets/abc123/data?limit=10&offset=0" \
-H "Accept: application/json"
# Response
{
"data": [
{
"order_id": 1001,
"customer_name": "John Doe",
"product": "Laptop",
"total": 999.99
}
],
"pagination": {
"total": 10000,
"limit": 10,
"offset": 0
}
}
// Fetch data using fetch API
async function getSalesData() {
const response = await fetch(
'http://localhost:8000/api/v1/datasets/abc123/data?limit=10'
);
const data = await response.json();
console.log(data);
}
// Using with React
const { data, error, isLoading } = useSWR(
'/api/v1/datasets/abc123/data',
fetcher
);
import requests
import pandas as pd
# Fetch data
response = requests.get(
'http://localhost:8000/api/v1/datasets/abc123/data',
params={'limit': 1000}
)
# Convert to DataFrame
df = pd.DataFrame(response.json()['data'])
print(df.head())
# Analyze
print(f"Total revenue: ${df['total'].sum():,.2f}")
-
Import the OpenAPI spec from:
http://localhost:8000/api/v1/openapi.json
-
Use the generated collection
-
Set base URL:
{{baseUrl}} = http://localhost:8000
-
Available endpoints will auto-populate
API Features
- Pagination: Automatic pagination for large datasets
- Filtering: Column-based filtering
- Sorting: Sort by any column
- Search: Full-text search across data
- Aggregations: SUM, AVG, COUNT, etc.
- Export Formats: JSON, CSV, Excel
Step 8: Advanced Features
1. Cross-Dataset Queries
If you have multiple related datasets:
Show me customers who bought products from both electronics and clothing categories
GoPie automatically detects relationships and performs JOINs.
2. Saved Queries
Save frequently used queries:
- After running a query, click "Save Query"
- Name it: "Monthly Revenue Report"
- Access from "Saved Queries" menu
- Schedule for regular execution
3. Data Refresh
For database-connected datasets:
- Go to Dataset Settings
- Enable "Auto Refresh"
- Set schedule (hourly, daily, weekly)
- Configure incremental updates
4. Sharing and Collaboration
Share your insights:
- Public Link: Generate shareable URLs
- Embed Code: Embed charts in websites
- Export Reports: PDF or PowerPoint
- API Access: Share API credentials
Best Practices
1. Data Preparation
Clean Data = Better Results
- Ensure consistent date formats
- Remove duplicate headers
- Handle missing values appropriately
- Use meaningful column names
2. Query Optimization
- Start with simple queries and build complexity
- Use filters to reduce data volume
- Leverage indexes for large datasets
- Cache frequently accessed results
3. Natural Language Tips
- Be specific: "revenue by month" vs "monthly sales"
- Include time ranges: "last 30 days", "Q1 2024"
- Use business terms GoPie understands
- Iterate and refine queries
4. Security Considerations
- Use read-only database credentials
- Implement row-level security if needed
- Regularly audit API access
- Encrypt sensitive data
Troubleshooting Common Issues
Upload fails with "Invalid format" error
Check:
- File has headers in first row
- No merged cells in Excel files
- CSV uses comma delimiter (configurable)
- File encoding is UTF-8
Natural language query returns unexpected results
Try:
- Rephrasing the question
- Using exact column names
- Adding more context
- Checking the generated SQL
API returns 404 error
Verify:
- Dataset ID is correct
- Dataset processing completed
- API service is running
- Authentication token (if enabled)
What's Next?
Congratulations! You've successfully:
- Uploaded your first dataset
- Explored data with natural language
- Created visualizations
- Accessed the auto-generated API
Continue Learning
- User Guide - Deep dive into all features
- Natural Language Tips - Master query writing
- API Development - Build apps with GoPie
- Data Sources - Connect to databases
Try These Challenges
- Upload a second dataset and perform a cross-dataset analysis
- Create a dashboard with multiple visualizations
- Build a simple web app using the API
- Schedule a daily report
You're Now a GoPie User!
Join our community Discord to share your experience, get help, and see what others are building with GoPie.