Dataset Configuration
Customize schemas, manage metadata, and optimize your datasets
After uploading or connecting data, GoPie provides powerful tools to configure and optimize your datasets. This includes schema editing, metadata management, relationships, and performance tuning.
Schema Management
Viewing Schema Information
Access dataset schema from the dataset page:
-- Schema includes:
- Column names and data types
- Nullable constraints
- Primary/foreign keys
- Indexes
- Row count and size
GoPie automatically infers schema during import, but you can customize it to better match your needs.
Editing Column Properties
Data Type Changes
Modify column data types when needed:
Using the Schema Editor
- Navigate to your dataset
- Click "Schema" tab
- Click the edit icon next to a column
- Select new data type
- Preview impact and confirm
Available type conversions:
- String → Number (if valid)
- Number → String (always works)
- String → Date (with format)
- Date → String (with format)
- Boolean → Number (0/1)
Using SQL Commands
-- Change column type
ALTER TABLE sales_data
ALTER COLUMN price TYPE DECIMAL(10,2);
-- Cast with conversion
ALTER TABLE users
ALTER COLUMN age TYPE INTEGER
USING age::integer;
-- Change with default for nulls
ALTER TABLE orders
ALTER COLUMN status TYPE VARCHAR(50)
USING COALESCE(status, 'pending');
Type changes may fail if data cannot be converted. Always preview changes first.
Column Descriptions
Add descriptions to help users understand your data:
// Via API
await gopie.datasets.updateColumn({
dataset: 'sales_data',
column: 'arr',
description: 'Annual Recurring Revenue in USD',
display_name: 'Annual Revenue',
format: 'currency'
});
Column description best practices:
- Include units of measurement
- Explain abbreviations
- Note data sources
- Document calculations
- Specify time zones for dates
Dataset Aliases
Create user-friendly names:
# Original table name
database_table: "ord_txn_2024_v2_final"
# User-friendly alias
alias: "Orders 2024"
# Column aliases
column_aliases:
cust_id: "Customer ID"
txn_amt: "Transaction Amount"
ord_dt: "Order Date"
Column Metadata
Semantic Types
Assign semantic meaning to columns:
Semantic Type | Description | Benefits |
---|---|---|
Email addresses | Validation, privacy masking | |
Phone | Phone numbers | Formatting, validation |
Currency | Monetary values | Formatting, conversion |
Percentage | Percentage values | 0-100 validation, formatting |
URL | Web addresses | Validation, link rendering |
Latitude/Longitude | Geographic coordinates | Map visualization |
Format Settings
Configure display formatting:
{
"columns": {
"price": {
"format": "currency",
"currency": "USD",
"decimal_places": 2
},
"date": {
"format": "date",
"date_format": "YYYY-MM-DD"
},
"percentage": {
"format": "percentage",
"decimal_places": 1,
"multiply_by_100": false
}
}
}
Validation Rules
Set up data quality rules:
validations:
email:
- type: "format"
pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
age:
- type: "range"
min: 0
max: 150
status:
- type: "enum"
values: ["active", "pending", "cancelled"]
amount:
- type: "not_null"
- type: "positive"
Relationships
Defining Foreign Keys
Link tables together for better querying:
Identify Relationship
Find columns that reference other tables
-- orders.customer_id → customers.id
-- products.category_id → categories.id
Create Foreign Key
Define the relationship in GoPie
ALTER TABLE orders
ADD CONSTRAINT fk_customer
FOREIGN KEY (customer_id)
REFERENCES customers(id);
Enable Smart Joins
GoPie now suggests joins automatically
Relationship Types
Configure different relationship patterns:
// One-to-Many
{
"from": "customers.id",
"to": "orders.customer_id",
"type": "one_to_many",
"name": "customer_orders"
}
// Many-to-Many
{
"from": "products.id",
"through": "product_categories",
"to": "categories.id",
"type": "many_to_many",
"name": "product_categorization"
}
// One-to-One
{
"from": "users.id",
"to": "profiles.user_id",
"type": "one_to_one",
"name": "user_profile"
}
Relationship Benefits
- Auto-complete in natural language queries
- Join suggestions in SQL editor
- Referential integrity validation
- Cascade options for updates/deletes
- Visual relationship diagrams
Performance Optimization
Creating Indexes
Speed up queries with strategic indexes:
-- Single column index
CREATE INDEX idx_orders_date ON orders(order_date);
-- Composite index
CREATE INDEX idx_orders_customer_date
ON orders(customer_id, order_date);
-- Partial index
CREATE INDEX idx_active_users
ON users(email)
WHERE status = 'active';
-- Expression index
CREATE INDEX idx_orders_month
ON orders(DATE_TRUNC('month', order_date));
Index Recommendations
GoPie analyzes query patterns and suggests indexes:
Check the "Performance" tab for index recommendations based on your query history.
Partitioning
For large datasets, implement partitioning:
-- Range partitioning by date
CREATE TABLE orders_2024 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
-- List partitioning by region
CREATE TABLE orders_us PARTITION OF orders
FOR VALUES IN ('US', 'CA', 'MX');
-- Hash partitioning for distribution
CREATE TABLE orders_p0 PARTITION OF orders
FOR VALUES WITH (modulus 4, remainder 0);
Advanced Configuration
Computed Columns
Create calculated fields:
-- Add computed column
ALTER TABLE orders ADD COLUMN
total_with_tax DECIMAL(10,2)
GENERATED ALWAYS AS (subtotal * 1.08) STORED;
-- Virtual column (calculated on read)
CREATE VIEW orders_enriched AS
SELECT *,
EXTRACT(YEAR FROM order_date) as order_year,
CASE
WHEN total > 1000 THEN 'high'
WHEN total > 100 THEN 'medium'
ELSE 'low'
END as order_tier
FROM orders;
Default Values
Set intelligent defaults:
-- Static defaults
ALTER TABLE users
ALTER COLUMN created_at
SET DEFAULT CURRENT_TIMESTAMP;
-- Dynamic defaults
ALTER TABLE orders
ALTER COLUMN order_number
SET DEFAULT 'ORD-' || LPAD(NEXTVAL('order_seq')::text, 6, '0');
-- Conditional defaults
ALTER TABLE products
ALTER COLUMN status
SET DEFAULT CASE
WHEN discontinued = true THEN 'inactive'
ELSE 'active'
END;
Constraints
Ensure data quality with constraints:
-- Unique constraints
ALTER TABLE users
ADD CONSTRAINT unique_email UNIQUE (email);
-- Check constraints
ALTER TABLE products
ADD CONSTRAINT positive_price
CHECK (price > 0);
-- Complex constraints
ALTER TABLE orders
ADD CONSTRAINT valid_dates
CHECK (ship_date >= order_date);
Dataset Settings
Access Control
Configure who can access your dataset:
access:
visibility: "organization" # or "public", "private"
permissions:
viewers:
- "team:analytics"
- "user:[email protected]"
editors:
- "team:data-engineering"
restricted_columns:
- column: "ssn"
access: ["role:hr"]
- column: "salary"
access: ["role:management"]
Refresh Settings
For connected data sources:
{
"refresh": {
"schedule": "0 2 * * *", // 2 AM daily
"timezone": "America/New_York",
"incremental": {
"enabled": true,
"cursor_column": "updated_at",
"lookback_window": "1 hour"
},
"notifications": {
"on_success": ["[email protected]"],
"on_failure": ["[email protected]"]
}
}
}
Retention Policies
Manage data lifecycle:
retention:
# Keep detailed data for 90 days
detailed_retention: "90 days"
# Aggregate older data
aggregation:
after: "90 days"
group_by: ["date", "category"]
metrics: ["sum(amount)", "count(*)"]
# Archive after 1 year
archive:
after: "365 days"
destination: "s3://backups/gopie/archives/"
# Delete after 2 years
deletion:
after: "730 days"
require_confirmation: true
Schema Evolution
Handling Schema Changes
When source schemas change:
Detection
GoPie detects schema differences during sync
Review Changes
- New columns to add
- Removed columns to handle
- Type changes to resolve
Apply Updates
Choose how to handle each change:
- Auto-add new columns
- Keep removed columns with nulls
- Map changed types
Version Control
Track schema versions:
{
"schema_version": "2.1.0",
"changes": [
{
"version": "2.1.0",
"date": "2024-01-15",
"changes": [
"Added 'category' column",
"Renamed 'prod_id' to 'product_id'",
"Changed 'price' type to DECIMAL(10,2)"
]
}
],
"compatibility": "backward"
}
Best Practices
Naming Conventions
-
Use descriptive names
t1
,col1
,data
customers
,order_date
,total_revenue
-
Be consistent
- Choose
customer_id
orcustomerId
(not both) - Use same date format throughout
- Choose
-
Avoid reserved words
- Don't use SQL keywords as column names
- Add prefixes if needed:
user_order
notorder
Documentation
Document everything for team success:
- Dataset description - What it contains and why
- Column descriptions - What each field means
- Business rules - How data is calculated
- Update frequency - When data refreshes
- Contact info - Who maintains the dataset
Performance Tips
- Index frequently filtered columns
- Partition large tables by date
- Denormalize for read performance
- Archive old data regularly
- Monitor query patterns
What's Next?
- Data Cleaning - Ensure data quality
- Multi-Dataset Queries - Join configured datasets
- API Access - Access configured data via API