Multi-Tenancy
How GoPie isolates data and resources between organizations
GoPie implements a robust multi-tenancy architecture that ensures complete data isolation, resource management, and security between organizations while maintaining performance and scalability.
Architecture Overview
Isolation Levels
Tenant Hierarchy
1. Organization Level
- Top-level isolation boundary
- Completely separate data spaces
- Independent billing and usage
- Isolated security context
2. Project Level
- Logical grouping within organization
- Shared resources with access control
- Project-specific settings
- Team collaboration boundary
3. User Level
- Individual access within organization
- Role-based permissions
- Personal workspace
- Audit trail tracking
Data Isolation
Database Strategy
Schema Separation
-- Each organization has isolated schema
CREATE SCHEMA org_12345;
CREATE SCHEMA org_67890;
-- Tables within organization schema
CREATE TABLE org_12345.datasets (...);
CREATE TABLE org_12345.queries (...);
CREATE TABLE org_12345.users (...);
Row-Level Security
-- PostgreSQL RLS policies
CREATE POLICY org_isolation ON datasets
FOR ALL
USING (organization_id = current_setting('app.current_org')::uuid);
-- Automatic filtering
SELECT * FROM datasets; -- Only shows current org's data
Storage Isolation
Object Storage Structure
/storage/
/org-12345/
/datasets/
/dataset-abc/
/data.parquet
/metadata.json
/exports/
/temp/
/org-67890/
/datasets/
/exports/
/temp/
Access Control
- S3 bucket policies per organization
- IAM roles for cross-account access
- Encryption keys per tenant
- Audit logging per namespace
Security Model
Authentication & Authorization
Organization Context
class OrganizationContext:
def __init__(self, user_id: str, org_id: str):
self.user_id = user_id
self.org_id = org_id
self.permissions = self.load_permissions()
def can_access(self, resource: str) -> bool:
return self.permissions.check(resource, self.org_id)
JWT Token Structure
{
"sub": "user_123",
"org": "org_12345",
"roles": ["analyst", "developer"],
"permissions": ["read:datasets", "write:queries"],
"exp": 1634567890
}
API Isolation
Request Routing
@app.middleware("http")
async def org_context_middleware(request: Request, call_next):
# Extract organization from token
org_id = extract_org_from_token(request.headers)
# Set context for request
request.state.org_id = org_id
# Add to database session
set_db_context(org_id)
return await call_next(request)
Endpoint Structure
/api/v1/organizations/{org_id}/datasets
/api/v1/organizations/{org_id}/queries
/api/v1/organizations/{org_id}/users
Resource Management
Compute Resources
Query Execution Limits
organization_tiers:
starter:
max_concurrent_queries: 5
query_timeout_seconds: 300
max_result_size_mb: 100
cpu_cores: 2
memory_gb: 8
professional:
max_concurrent_queries: 20
query_timeout_seconds: 900
max_result_size_mb: 1000
cpu_cores: 8
memory_gb: 32
enterprise:
max_concurrent_queries: unlimited
query_timeout_seconds: 3600
max_result_size_mb: 10000
cpu_cores: dedicated
memory_gb: dedicated
Resource Allocation
class ResourceManager:
def allocate_query_resources(self, org_id: str, query: Query):
tier = self.get_org_tier(org_id)
return {
"cpu_limit": tier.cpu_cores,
"memory_limit": tier.memory_gb,
"timeout": tier.query_timeout_seconds,
"priority": tier.priority_level
}
Storage Quotas
Quota Management
class StorageQuota:
def check_quota(self, org_id: str, size_bytes: int) -> bool:
current_usage = self.get_usage(org_id)
quota_limit = self.get_quota(org_id)
if current_usage + size_bytes > quota_limit:
raise QuotaExceededException(
f"Storage quota exceeded: {current_usage}/{quota_limit}"
)
return True
Usage Tracking
- Real-time usage monitoring
- Historical usage trends
- Predictive quota alerts
- Automated cleanup policies
Performance Optimization
Query Isolation
Connection Pooling
# Per-organization connection pools
connection_pools = {}
def get_connection(org_id: str):
if org_id not in connection_pools:
connection_pools[org_id] = create_pool(
min_size=2,
max_size=10,
database=f"gopie_org_{org_id}"
)
return connection_pools[org_id].acquire()
Cache Isolation
# Redis keyspace per organization
def cache_key(org_id: str, key: str) -> str:
return f"org:{org_id}:{key}"
# Separate cache instances for large orgs
def get_cache_client(org_id: str):
if is_enterprise_org(org_id):
return dedicated_cache_clients[org_id]
return shared_cache_client
Scaling Strategies
Horizontal Partitioning
# Shard organizations across clusters
sharding_strategy:
method: consistent_hash
shards:
- shard_1: [org_1 - org_1000]
- shard_2: [org_1001 - org_2000]
- shard_3: [org_2001 - org_3000]
Dedicated Resources
For enterprise customers:
- Dedicated compute clusters
- Isolated database instances
- Private network segments
- Custom SLAs
Cross-Tenant Features
Shared Resources
Public Datasets
class PublicDataset:
def __init__(self):
self.visibility = "public"
self.owner_org = None
self.access_list = ["*"]
def can_access(self, org_id: str) -> bool:
return True # Public access
Marketplace
- Shared data products
- Cross-org licensing
- Revenue sharing
- Usage tracking
Data Sharing
Secure Sharing
class DataShare:
def create_share(self, dataset_id: str, target_org: str):
# Generate secure access token
token = generate_share_token()
# Set expiration and permissions
share = {
"dataset_id": dataset_id,
"source_org": self.current_org,
"target_org": target_org,
"token": token,
"expires_at": datetime.now() + timedelta(days=30),
"permissions": ["read"]
}
return self.save_share(share)
Monitoring and Compliance
Audit Logging
Comprehensive Tracking
{
"timestamp": "2024-01-15T10:30:00Z",
"org_id": "org_12345",
"user_id": "user_789",
"action": "dataset.create",
"resource": "dataset_abc",
"ip_address": "192.168.1.1",
"user_agent": "Mozilla/5.0...",
"result": "success",
"metadata": {
"dataset_size": 1048576,
"format": "csv"
}
}
Compliance Reports
- Access history
- Data lineage
- Permission changes
- Security events
Health Monitoring
Per-Tenant Metrics
# Prometheus metrics per organization
query_latency = Histogram(
'query_latency_seconds',
'Query execution time',
['org_id', 'query_type']
)
storage_usage = Gauge(
'storage_usage_bytes',
'Storage usage per organization',
['org_id']
)
Best Practices
For Platform Administrators
- Regular Audits: Review organization access patterns
- Resource Monitoring: Track usage against quotas
- Security Updates: Apply patches across all tenants
- Backup Strategy: Tenant-aware backup policies
For Organization Admins
- Access Reviews: Regularly review user permissions
- Data Classification: Tag sensitive datasets
- Usage Optimization: Monitor resource consumption
- Integration Security: Rotate API keys regularly
For Developers
- Tenant Context: Always validate organization context
- Resource Limits: Respect tenant quotas in code
- Error Handling: Don't leak cross-tenant information
- Testing: Test with multiple tenant scenarios
Migration and Onboarding
New Organization Setup
Automated Provisioning
async def provision_organization(org_details: dict):
# Create database schema
await create_org_schema(org_details['id'])
# Set up storage buckets
await create_storage_namespace(org_details['id'])
# Initialize vector collections
await create_vector_collections(org_details['id'])
# Configure default roles
await setup_default_roles(org_details['id'])
# Send welcome email
await send_onboarding_email(org_details)
Data Migration
- Import from existing systems
- Maintain data integrity
- Preserve permissions
- Validate post-migration
Future Enhancements
Planned Features
- Cross-region replication
- Tenant-specific encryption keys
- Advanced resource scheduling
- Multi-cloud deployment
Research Areas
- Zero-trust architecture
- Homomorphic encryption
- Federated learning
- Edge computing support
Next Steps
- Explore Database Architecture for storage details
- Learn about MCP Servers for AI integration
- Review Data Pipeline for processing isolation