External Data Sources
Connect GoPie to databases, cloud storage, and external APIs
While uploading files is great for static data, connecting to external data sources enables real-time analysis and automatic updates. GoPie supports various external data sources including databases, cloud storage, and APIs.
Supported Data Sources
Cloud Storage
Google Drive Integration
Connect to spreadsheets and CSV files in Google Drive:
Authorize Access
Click "Connect Google Drive" and authorize GoPie to access your files.
Select Files
Browse your Drive and select files to sync. You can choose:
- Individual files
- Entire folders
- Shared drives
Configure Sync
Set up automatic synchronization:
- Manual: Update on demand
- Hourly: Check for changes every hour
- Daily: Sync once per day
- Real-time: Update when file changes (Pro plan)
Google Sheets are converted to CSV format during import. Formulas are evaluated and only values are imported.
Microsoft OneDrive Integration
Access Excel and CSV files from OneDrive:
Connect Account
Authenticate with your Microsoft account (personal or business).
Browse Files
Navigate through your OneDrive folders and SharePoint sites.
Select Data
Choose files or folders to monitor for changes.
Features:
- Support for OneDrive Personal and Business
- SharePoint document libraries
- Automatic Excel to table conversion
- Preserves data types from Excel
S3 and Compatible Object Storage
Connect to AWS S3, MinIO, or S3-compatible storage:
# Connection parameters
Endpoint: https://s3.amazonaws.com
Access Key: YOUR_ACCESS_KEY
Secret Key: YOUR_SECRET_KEY
Bucket: my-data-bucket
Path: /data/analytics/
Supported formats in S3:
- CSV, TSV files
- Parquet files
- JSON/JSONL files
- Compressed files (.gz, .zip)
Use IAM roles for secure access without embedding credentials. GoPie supports assuming roles with appropriate permissions.
Databases
PostgreSQL Connection
-- Connection string format
postgresql://username:password@host:port/database
-- With SSL
postgresql://username:password@host:port/database?sslmode=require
Features:
- Full schema introspection
- Support for views and materialized views
- JSON/JSONB column handling
- Array type support
- Custom type mapping
Add Connection
Navigate to Data Sources → Add Database → PostgreSQL
Enter Credentials
Provide connection details:
- Host and port
- Database name
- Username and password
- SSL settings
Select Tables
Choose which tables/views to import or sync
Configure Sync
Set up refresh schedule and incremental options
MySQL/MariaDB Connection
-- Connection string format
mysql://username:password@host:port/database
-- With parameters
mysql://username:password@host:port/database?charset=utf8mb4
Supported versions:
- MySQL 5.7+
- MariaDB 10.2+
- Amazon Aurora MySQL
- PlanetScale
For cloud databases, ensure your IP is whitelisted or use GoPie's static IPs for secure connections.
Microsoft SQL Server
-- Connection string format
sqlserver://username:password@host:port?database=dbname
-- With Windows Authentication
sqlserver://host?database=dbname&trustedConnection=true
Features:
- Support for schemas
- Temporal tables
- Computed columns
- Azure SQL Database compatible
Google BigQuery
Connect to BigQuery datasets:
Service Account
Create a service account with BigQuery Data Viewer permissions
Upload Credentials
Upload the JSON key file or paste credentials
Select Dataset
Choose project and dataset to connect
{
"type": "service_account",
"project_id": "your-project",
"private_key_id": "key-id",
"private_key": "-----BEGIN PRIVATE KEY-----\n...",
"client_email": "[email protected]"
}
APIs and Web Data
REST API Integration
Connect to any REST API that returns JSON data:
# API Configuration
Base URL: https://api.example.com/v1
Authentication: Bearer token
Headers:
- Authorization: Bearer YOUR_TOKEN
- Accept: application/json
Endpoints:
- /users
- /orders
- /products
GoPie automatically paginates through API results and handles rate limiting.
URL-Based Import
Import data directly from public URLs:
- CSV/TSV files
- JSON APIs
- RSS/Atom feeds
- Public Google Sheets
- GitHub raw files
# Example URLs
https://example.com/data/sales.csv
https://api.github.com/repos/owner/repo/issues
https://sheets.googleapis.com/v4/spreadsheets/SHEET_ID/values/Sheet1
Connection Security
Encryption
All connections use encryption:
- In Transit: TLS 1.2+ for all connections
- At Rest: AES-256 encryption for stored credentials
- Key Management: Rotating encryption keys
Access Control
- IP Whitelisting: Restrict access to known IPs
- VPN/Tunnel: Connect through secure tunnels
- Read-Only Access: Recommended for data sources
- Audit Logging: Track all access and queries
Best Practices
-
Use Read-Only Credentials
-- Create read-only user in PostgreSQL CREATE USER gopie_reader WITH PASSWORD 'secure_password'; GRANT CONNECT ON DATABASE mydb TO gopie_reader; GRANT USAGE ON SCHEMA public TO gopie_reader; GRANT SELECT ON ALL TABLES IN SCHEMA public TO gopie_reader;
-
Limit Access Scope
- Grant access only to needed tables
- Use views to restrict row-level access
- Apply column-level permissions where possible
-
Rotate Credentials
- Change passwords regularly
- Use expiring tokens when available
- Monitor access logs
Sync Configuration
Sync Modes
Mode | Description | Use Case |
---|---|---|
Full Refresh | Replace entire dataset | Small datasets, complete accuracy needed |
Incremental | Add new/modified records only | Large datasets with timestamps |
Append Only | Add new records, never update | Event logs, audit trails |
CDC (Enterprise) | Real-time change capture | Mission-critical data |
Incremental Sync Setup
For incremental syncs, specify:
- Cursor Column: Timestamp or auto-incrementing ID
- Update Strategy: How to handle modified records
- Delete Detection: Track removed records (optional)
-- Example: Configure incremental sync
Cursor Column: updated_at
Initial Value: 2024-01-01
Update Strategy: Merge by primary key (id)
Scheduling Options
- Manual: Trigger sync on demand
- Scheduled: Fixed intervals (5 min to monthly)
- Event-Based: Webhook triggers (Enterprise)
- Real-Time: Continuous sync (selected sources)
Managing Connected Sources
Monitoring
Track your data source health:
- Last Sync: When data was last updated
- Records Synced: Number of rows processed
- Sync Duration: Time taken to complete
- Error Count: Failed sync attempts
- Next Sync: Scheduled update time
Troubleshooting
Common issues and solutions:
Connection Timeout
# Increase timeout settings
Connection Timeout: 30s → 60s
Query Timeout: 120s → 300s
Authentication Failed
- Verify credentials are correct
- Check IP whitelist settings
- Ensure user has required permissions
- Test connection from GoPie's IPs
Schema Changes
When source schema changes, you may need to:
- Refresh schema metadata
- Update column mappings
- Adjust sync configuration
Performance Issues
- Add indexes on cursor columns
- Limit sync to specific tables/columns
- Use incremental sync for large tables
- Schedule syncs during off-peak hours
Advanced Features
Data Transformations
Apply transformations during sync:
- Column Mapping: Rename or reorder columns
- Type Casting: Convert data types
- Filtering: Exclude rows based on conditions
- Aggregations: Pre-aggregate data
-- Example transformation
SELECT
id,
UPPER(name) as name,
CAST(price AS DECIMAL(10,2)) as price,
created_at::date as order_date
FROM orders
WHERE status = 'completed'
Multiple Environments
Connect to different environments:
- Development: Test data and schemas
- Staging: Pre-production validation
- Production: Live data access
Webhooks and Triggers
Configure webhooks for:
- Sync completion notifications
- Error alerts
- Data quality checks
- Downstream processing
{
"event": "sync.completed",
"dataset": "sales_data",
"records": 15234,
"duration": "2m 34s",
"status": "success"
}
Best Practices
-
Start Small
- Test with a subset of data first
- Verify sync behavior before full import
- Monitor initial syncs closely
-
Document Connections
- Note source system details
- Document any transformations
- Track sync schedules and dependencies
-
Plan for Growth
- Design for increasing data volumes
- Consider sync frequency vs. freshness needs
- Monitor resource usage
-
Maintain Quality
- Set up data validation rules
- Monitor for schema drift
- Alert on anomalies
What's Next?
- Dataset Configuration - Customize imported schemas
- Multi-Dataset Queries - Join data from multiple sources
- API Authentication - Secure your data access