Multi-Tenancy

How GoPie isolates data and resources between organizations

GoPie implements a robust multi-tenancy architecture that ensures complete data isolation, resource management, and security between organizations while maintaining performance and scalability.

Architecture Overview

Isolation Levels

Tenant Hierarchy

1. Organization Level

  • Top-level isolation boundary
  • Completely separate data spaces
  • Independent billing and usage
  • Isolated security context

2. Project Level

  • Logical grouping within organization
  • Shared resources with access control
  • Project-specific settings
  • Team collaboration boundary

3. User Level

  • Individual access within organization
  • Role-based permissions
  • Personal workspace
  • Audit trail tracking

Data Isolation

Database Strategy

Schema Separation

-- Each organization has isolated schema
CREATE SCHEMA org_12345;
CREATE SCHEMA org_67890;

-- Tables within organization schema
CREATE TABLE org_12345.datasets (...);
CREATE TABLE org_12345.queries (...);
CREATE TABLE org_12345.users (...);

Row-Level Security

-- PostgreSQL RLS policies
CREATE POLICY org_isolation ON datasets
    FOR ALL
    USING (organization_id = current_setting('app.current_org')::uuid);

-- Automatic filtering
SELECT * FROM datasets;  -- Only shows current org's data

Storage Isolation

Object Storage Structure

/storage/
  /org-12345/
    /datasets/
      /dataset-abc/
        /data.parquet
        /metadata.json
    /exports/
    /temp/
  /org-67890/
    /datasets/
    /exports/
    /temp/

Access Control

  • S3 bucket policies per organization
  • IAM roles for cross-account access
  • Encryption keys per tenant
  • Audit logging per namespace

Security Model

Authentication & Authorization

Organization Context

class OrganizationContext:
    def __init__(self, user_id: str, org_id: str):
        self.user_id = user_id
        self.org_id = org_id
        self.permissions = self.load_permissions()
    
    def can_access(self, resource: str) -> bool:
        return self.permissions.check(resource, self.org_id)

JWT Token Structure

{
  "sub": "user_123",
  "org": "org_12345",
  "roles": ["analyst", "developer"],
  "permissions": ["read:datasets", "write:queries"],
  "exp": 1634567890
}

API Isolation

Request Routing

@app.middleware("http")
async def org_context_middleware(request: Request, call_next):
    # Extract organization from token
    org_id = extract_org_from_token(request.headers)
    
    # Set context for request
    request.state.org_id = org_id
    
    # Add to database session
    set_db_context(org_id)
    
    return await call_next(request)

Endpoint Structure

/api/v1/organizations/{org_id}/datasets
/api/v1/organizations/{org_id}/queries
/api/v1/organizations/{org_id}/users

Resource Management

Compute Resources

Query Execution Limits

organization_tiers:
  starter:
    max_concurrent_queries: 5
    query_timeout_seconds: 300
    max_result_size_mb: 100
    cpu_cores: 2
    memory_gb: 8
    
  professional:
    max_concurrent_queries: 20
    query_timeout_seconds: 900
    max_result_size_mb: 1000
    cpu_cores: 8
    memory_gb: 32
    
  enterprise:
    max_concurrent_queries: unlimited
    query_timeout_seconds: 3600
    max_result_size_mb: 10000
    cpu_cores: dedicated
    memory_gb: dedicated

Resource Allocation

class ResourceManager:
    def allocate_query_resources(self, org_id: str, query: Query):
        tier = self.get_org_tier(org_id)
        
        return {
            "cpu_limit": tier.cpu_cores,
            "memory_limit": tier.memory_gb,
            "timeout": tier.query_timeout_seconds,
            "priority": tier.priority_level
        }

Storage Quotas

Quota Management

class StorageQuota:
    def check_quota(self, org_id: str, size_bytes: int) -> bool:
        current_usage = self.get_usage(org_id)
        quota_limit = self.get_quota(org_id)
        
        if current_usage + size_bytes > quota_limit:
            raise QuotaExceededException(
                f"Storage quota exceeded: {current_usage}/{quota_limit}"
            )
        
        return True

Usage Tracking

  • Real-time usage monitoring
  • Historical usage trends
  • Predictive quota alerts
  • Automated cleanup policies

Performance Optimization

Query Isolation

Connection Pooling

# Per-organization connection pools
connection_pools = {}

def get_connection(org_id: str):
    if org_id not in connection_pools:
        connection_pools[org_id] = create_pool(
            min_size=2,
            max_size=10,
            database=f"gopie_org_{org_id}"
        )
    
    return connection_pools[org_id].acquire()

Cache Isolation

# Redis keyspace per organization
def cache_key(org_id: str, key: str) -> str:
    return f"org:{org_id}:{key}"

# Separate cache instances for large orgs
def get_cache_client(org_id: str):
    if is_enterprise_org(org_id):
        return dedicated_cache_clients[org_id]
    return shared_cache_client

Scaling Strategies

Horizontal Partitioning

# Shard organizations across clusters
sharding_strategy:
  method: consistent_hash
  shards:
    - shard_1: [org_1 - org_1000]
    - shard_2: [org_1001 - org_2000]
    - shard_3: [org_2001 - org_3000]

Dedicated Resources

For enterprise customers:

  • Dedicated compute clusters
  • Isolated database instances
  • Private network segments
  • Custom SLAs

Cross-Tenant Features

Shared Resources

Public Datasets

class PublicDataset:
    def __init__(self):
        self.visibility = "public"
        self.owner_org = None
        self.access_list = ["*"]
    
    def can_access(self, org_id: str) -> bool:
        return True  # Public access

Marketplace

  • Shared data products
  • Cross-org licensing
  • Revenue sharing
  • Usage tracking

Data Sharing

Secure Sharing

class DataShare:
    def create_share(self, dataset_id: str, target_org: str):
        # Generate secure access token
        token = generate_share_token()
        
        # Set expiration and permissions
        share = {
            "dataset_id": dataset_id,
            "source_org": self.current_org,
            "target_org": target_org,
            "token": token,
            "expires_at": datetime.now() + timedelta(days=30),
            "permissions": ["read"]
        }
        
        return self.save_share(share)

Monitoring and Compliance

Audit Logging

Comprehensive Tracking

{
  "timestamp": "2024-01-15T10:30:00Z",
  "org_id": "org_12345",
  "user_id": "user_789",
  "action": "dataset.create",
  "resource": "dataset_abc",
  "ip_address": "192.168.1.1",
  "user_agent": "Mozilla/5.0...",
  "result": "success",
  "metadata": {
    "dataset_size": 1048576,
    "format": "csv"
  }
}

Compliance Reports

  • Access history
  • Data lineage
  • Permission changes
  • Security events

Health Monitoring

Per-Tenant Metrics

# Prometheus metrics per organization
query_latency = Histogram(
    'query_latency_seconds',
    'Query execution time',
    ['org_id', 'query_type']
)

storage_usage = Gauge(
    'storage_usage_bytes',
    'Storage usage per organization',
    ['org_id']
)

Best Practices

For Platform Administrators

  1. Regular Audits: Review organization access patterns
  2. Resource Monitoring: Track usage against quotas
  3. Security Updates: Apply patches across all tenants
  4. Backup Strategy: Tenant-aware backup policies

For Organization Admins

  1. Access Reviews: Regularly review user permissions
  2. Data Classification: Tag sensitive datasets
  3. Usage Optimization: Monitor resource consumption
  4. Integration Security: Rotate API keys regularly

For Developers

  1. Tenant Context: Always validate organization context
  2. Resource Limits: Respect tenant quotas in code
  3. Error Handling: Don't leak cross-tenant information
  4. Testing: Test with multiple tenant scenarios

Migration and Onboarding

New Organization Setup

Automated Provisioning

async def provision_organization(org_details: dict):
    # Create database schema
    await create_org_schema(org_details['id'])
    
    # Set up storage buckets
    await create_storage_namespace(org_details['id'])
    
    # Initialize vector collections
    await create_vector_collections(org_details['id'])
    
    # Configure default roles
    await setup_default_roles(org_details['id'])
    
    # Send welcome email
    await send_onboarding_email(org_details)

Data Migration

  • Import from existing systems
  • Maintain data integrity
  • Preserve permissions
  • Validate post-migration

Future Enhancements

Planned Features

  • Cross-region replication
  • Tenant-specific encryption keys
  • Advanced resource scheduling
  • Multi-cloud deployment

Research Areas

  • Zero-trust architecture
  • Homomorphic encryption
  • Federated learning
  • Edge computing support

Next Steps