External Data Sources

While uploading files is great for static data, connecting to external data sources enables real-time analysis and automatic updates. GoPie supports various external data sources including databases, cloud storage, and APIs.

Supported Data Sources

Cloud Storage

Google Drive Integration

Connect to spreadsheets and CSV files in Google Drive:

Authorize Access

Click "Connect Google Drive" and authorize GoPie to access your files.

Select Files

Browse your Drive and select files to sync. You can choose:

Individual files
Entire folders
Shared drives

Configure Sync

Set up automatic synchronization:

Manual: Update on demand
Hourly: Check for changes every hour
Daily: Sync once per day
Real-time: Update when file changes (Pro plan)

Google Sheets are converted to CSV format during import. Formulas are evaluated and only values are imported.

Microsoft OneDrive Integration

Access Excel and CSV files from OneDrive:

Connect Account

Authenticate with your Microsoft account (personal or business).

Browse Files

Navigate through your OneDrive folders and SharePoint sites.

Select Data

Choose files or folders to monitor for changes.

Features:

Support for OneDrive Personal and Business
SharePoint document libraries
Automatic Excel to table conversion
Preserves data types from Excel

S3 and Compatible Object Storage

Connect to AWS S3, MinIO, or S3-compatible storage:

# Connection parameters
Endpoint: https://s3.amazonaws.com
Access Key: YOUR_ACCESS_KEY
Secret Key: YOUR_SECRET_KEY
Bucket: my-data-bucket
Path: /data/analytics/

Supported formats in S3:

CSV, TSV files
Parquet files
JSON/JSONL files
Compressed files (.gz, .zip)

Use IAM roles for secure access without embedding credentials. GoPie supports assuming roles with appropriate permissions.

Databases

PostgreSQL Connection

-- Connection string format
postgresql://username:password@host:port/database

-- With SSL
postgresql://username:password@host:port/database?sslmode=require

Features:

Full schema introspection
Support for views and materialized views
JSON/JSONB column handling
Array type support
Custom type mapping

Add Connection

Navigate to Data Sources → Add Database → PostgreSQL

Enter Credentials

Provide connection details:

Host and port
Database name
Username and password
SSL settings

Select Tables

Choose which tables/views to import or sync

Configure Sync

Set up refresh schedule and incremental options

MySQL/MariaDB Connection

-- Connection string format
mysql://username:password@host:port/database

-- With parameters
mysql://username:password@host:port/database?charset=utf8mb4

Supported versions:

MySQL 5.7+
MariaDB 10.2+
Amazon Aurora MySQL
PlanetScale

For cloud databases, ensure your IP is whitelisted or use GoPie's static IPs for secure connections.

Microsoft SQL Server

-- Connection string format
sqlserver://username:password@host:port?database=dbname

-- With Windows Authentication
sqlserver://host?database=dbname&trustedConnection=true

Features:

Support for schemas
Temporal tables
Computed columns
Azure SQL Database compatible

Google BigQuery

Connect to BigQuery datasets:

{
  "type": "service_account",
  "project_id": "your-project",
  "private_key_id": "key-id",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...",
  "client_email": "[email protected]"
}

APIs and Web Data

REST API Integration

Connect to any REST API that returns JSON data:

# API Configuration
Base URL: https://api.example.com/v1
Authentication: Bearer token
Headers:
  - Authorization: Bearer YOUR_TOKEN
  - Accept: application/json

Endpoints:
  - /users
  - /orders
  - /products

GoPie automatically paginates through API results and handles rate limiting.

URL-Based Import

Import data directly from public URLs:

CSV/TSV files
JSON APIs
RSS/Atom feeds
Public Google Sheets
GitHub raw files

# Example URLs
https://example.com/data/sales.csv
https://api.github.com/repos/owner/repo/issues
https://sheets.googleapis.com/v4/spreadsheets/SHEET_ID/values/Sheet1

Connection Security

Encryption

All connections use encryption:

In Transit: TLS 1.2+ for all connections
At Rest: AES-256 encryption for stored credentials
Key Management: Rotating encryption keys

Access Control

IP Whitelisting: Restrict access to known IPs
VPN/Tunnel: Connect through secure tunnels
Read-Only Access: Recommended for data sources
Audit Logging: Track all access and queries

Best Practices

Use Read-Only Credentials

-- Create read-only user in PostgreSQL
CREATE USER gopie_reader WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE mydb TO gopie_reader;
GRANT USAGE ON SCHEMA public TO gopie_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO gopie_reader;

Limit Access Scope
- Grant access only to needed tables
- Use views to restrict row-level access
- Apply column-level permissions where possible
Rotate Credentials
- Change passwords regularly
- Use expiring tokens when available
- Monitor access logs

Sync Configuration

Sync Modes

Mode	Description	Use Case
Full Refresh	Replace entire dataset	Small datasets, complete accuracy needed
Incremental	Add new/modified records only	Large datasets with timestamps
Append Only	Add new records, never update	Event logs, audit trails
CDC (Enterprise)	Real-time change capture	Mission-critical data

Incremental Sync Setup

For incremental syncs, specify:

Cursor Column: Timestamp or auto-incrementing ID
Update Strategy: How to handle modified records
Delete Detection: Track removed records (optional)

-- Example: Configure incremental sync
Cursor Column: updated_at
Initial Value: 2024-01-01
Update Strategy: Merge by primary key (id)

Scheduling Options

Manual: Trigger sync on demand
Scheduled: Fixed intervals (5 min to monthly)
Event-Based: Webhook triggers (Enterprise)
Real-Time: Continuous sync (selected sources)

Managing Connected Sources

Monitoring

Track your data source health:

Last Sync: When data was last updated
Records Synced: Number of rows processed
Sync Duration: Time taken to complete
Error Count: Failed sync attempts
Next Sync: Scheduled update time

Troubleshooting

Common issues and solutions:

Connection Timeout

# Increase timeout settings
Connection Timeout: 30s → 60s
Query Timeout: 120s → 300s

Authentication Failed

Verify credentials are correct
Check IP whitelist settings
Ensure user has required permissions
Test connection from GoPie's IPs

Schema Changes

When source schema changes, you may need to:

Refresh schema metadata
Update column mappings
Adjust sync configuration

Performance Issues

Add indexes on cursor columns
Limit sync to specific tables/columns
Use incremental sync for large tables
Schedule syncs during off-peak hours

Advanced Features

Data Transformations

Apply transformations during sync:

Column Mapping: Rename or reorder columns
Type Casting: Convert data types
Filtering: Exclude rows based on conditions
Aggregations: Pre-aggregate data

-- Example transformation
SELECT 
  id,
  UPPER(name) as name,
  CAST(price AS DECIMAL(10,2)) as price,
  created_at::date as order_date
FROM orders
WHERE status = 'completed'

Multiple Environments

Connect to different environments:

Development: Test data and schemas
Staging: Pre-production validation
Production: Live data access

Webhooks and Triggers

Configure webhooks for:

Sync completion notifications
Error alerts
Data quality checks
Downstream processing

{
  "event": "sync.completed",
  "dataset": "sales_data",
  "records": 15234,
  "duration": "2m 34s",
  "status": "success"
}

Best Practices

Start Small
- Test with a subset of data first
- Verify sync behavior before full import
- Monitor initial syncs closely
Document Connections
- Note source system details
- Document any transformations
- Track sync schedules and dependencies
Plan for Growth
- Design for increasing data volumes
- Consider sync frequency vs. freshness needs
- Monitor resource usage
Maintain Quality
- Set up data validation rules
- Monitor for schema drift
- Alert on anomalies

What's Next?

Dataset Configuration - Customize imported schemas
Multi-Dataset Queries - Join data from multiple sources
API Authentication - Secure your data access

External Data Sources

On this page