User Guide/Data Management

Uploading Data

Learn how to upload and import data files into GoPie

GoPie makes it easy to upload your data files and start analyzing them immediately. This guide covers everything you need to know about uploading data, from supported formats to handling large files.

Supported File Formats

GoPie supports the most common data formats:

CSV (Comma-Separated Values)

  • Most common format for tabular data
  • Supports custom delimiters (comma, tab, pipe, etc.)
  • Automatic encoding detection (UTF-8, Latin-1, etc.)
  • Header row detection
name,age,department,salary
John Doe,32,Engineering,95000
Jane Smith,28,Marketing,75000

Excel Files (.xlsx, .xls)

  • Supports multiple sheets
  • Preserves data types and formatting
  • Handles merged cells and formulas
  • Date/time format recognition

When uploading Excel files with multiple sheets, each sheet becomes a separate table in your dataset.

Parquet Files

  • Columnar storage format
  • Excellent compression
  • Preserves complex data types
  • Ideal for large datasets

Best for:

  • Large analytical datasets
  • Data with many columns
  • Performance-critical applications

JSON Files

  • Supports nested structures
  • Array of objects format
  • Automatic flattening options
  • Schema inference
[
  {"id": 1, "name": "Product A", "price": 29.99},
  {"id": 2, "name": "Product B", "price": 39.99}
]

Upload Methods

Drag and Drop Upload

The easiest way to upload files:

Click on "Datasets" in the main navigation or within your project.

Drag Your File

Drag your file directly onto the upload area. You'll see a visual indicator when hovering.

Confirm Upload

Review the file details and click "Upload" to proceed.

Click to Browse

Alternatively, click the upload area to open your file browser:

  1. Click "Upload Dataset" or the upload area
  2. Select one or more files from your computer
  3. Review and confirm the upload

Batch Upload

Upload multiple files at once:

You can select multiple files in your file browser using:

  • Windows/Linux: Ctrl+Click
  • Mac: Cmd+Click
  • All platforms: Shift+Click for ranges

When uploading multiple files:

  • Each file becomes a separate table in your dataset
  • Files are processed in parallel for faster uploads
  • Progress is shown for each file individually

File Size Limits and Recommendations

Size Limits

  • Free Plan: Up to 100MB per file
  • Pro Plan: Up to 1GB per file
  • Enterprise Plan: Up to 10GB per file (contact support for larger files)

Performance Recommendations

File SizeFormat RecommendationUpload Time (approx)
< 10MBAny format< 5 seconds
10MB - 100MBCSV or Parquet5-30 seconds
100MB - 1GBParquet recommended30 seconds - 2 minutes
> 1GBParquet required2-10 minutes

Large File Tips

For files over 100MB:

  1. Use Parquet Format - Provides 5-10x compression
  2. Split Large Files - Break into multiple smaller files
  3. Remove Unnecessary Columns - Upload only needed data
  4. Use Data Sources - Consider connecting directly to your database

Data Validation

GoPie automatically validates your data during upload:

Automatic Checks

  • File Integrity - Ensures file is not corrupted
  • Format Validation - Confirms file matches expected format
  • Encoding Detection - Handles various text encodings
  • Schema Inference - Detects column types automatically

Common Validation Issues

Mixed Data Types

If a column contains mixed types (e.g., numbers and text), GoPie will:

  1. Attempt to find the most appropriate type
  2. Convert values where possible
  3. Mark unconvertible values as null

Example: A column with ["100", "200", "N/A"] becomes [100, 200, null]

Handling Validation Errors

When validation fails, you'll see:

  1. Error Description - What went wrong
  2. Affected Rows - Which rows have issues
  3. Suggested Fixes - How to resolve the problem

Common fixes:

  • Remove or fix corrupted rows
  • Ensure consistent date formats
  • Check for proper CSV delimiters
  • Verify file encoding (save as UTF-8)

Upload Progress and Status

During Upload

You'll see real-time progress including:

  • Upload percentage
  • Transfer speed
  • Estimated time remaining
  • Current processing stage

Processing Stages

Upload

File transfer to GoPie servers

Validation

Format checking and data validation

Processing

Schema inference and optimization

Indexing

Creating search indexes for fast queries

Ready

Dataset available for querying

Advanced Upload Options

Custom Parsing Options

For CSV files, you can customize:

  • Delimiter - Comma, tab, pipe, or custom
  • Quote Character - Single, double, or none
  • Escape Character - Backslash or custom
  • Header Row - First row or specify row number
  • Skip Rows - Ignore initial rows
  • Encoding - UTF-8, Latin-1, etc.

Data Type Overrides

Override automatic type detection:

-- After upload, you can modify column types
ALTER TABLE my_dataset 
ALTER COLUMN price TYPE DECIMAL(10,2);

Compression Support

GoPie automatically handles compressed files:

  • .gz - Gzip compression
  • .zip - ZIP archives (single file)
  • .bz2 - Bzip2 compression

Post-Upload Actions

After successful upload:

  1. Review Schema - Check detected column types
  2. Add Descriptions - Document your columns
  3. Set Aliases - Create user-friendly names
  4. Configure Relationships - Link to other datasets
  5. Test Queries - Run sample queries to verify

Troubleshooting

Upload Fails Immediately

  • Check file size limits
  • Ensure file extension matches content
  • Verify you have upload permissions
  • Try a different browser

Upload Stalls

  • Check internet connection
  • Try smaller file or split large file
  • Use Parquet format for better compression
  • Contact support for large uploads

Data Looks Wrong

  • Verify CSV delimiter settings
  • Check date/time formats
  • Review encoding (especially for international characters)
  • Ensure numeric formats use proper decimal separators

Best Practices

  1. Prepare Your Data

    • Remove unnecessary columns before upload
    • Ensure consistent formatting
    • Use meaningful column names
  2. Choose the Right Format

    • CSV for simplicity and compatibility
    • Parquet for large files and performance
    • Excel when preserving formatting matters
  3. Document Your Data

    • Add dataset descriptions immediately after upload
    • Document any data transformations
    • Note data sources and update frequency
  4. Optimize for Analysis

    • Include proper date columns for time-series analysis
    • Use consistent units (e.g., all prices in USD)
    • Maintain referential integrity for joins

What's Next?

Now that your data is uploaded, explore these topics: