medcards-ai/SCALABILITY_ARCHITECTURE.md
Build for 10k users, architect for 1M users.
This document outlines how MEDCARDS.AI scales from MVP (1k users) to platform (1M+ users) without major rewrites.
Monthly Active Users: 0-10,000 Daily Interactions: 0-100k Infrastructure Cost: $500-1,000/month
Stack:
Frontend: Vercel Edge Network
Backend: Next.js Server Actions (Vercel Serverless)
Database: Supabase Free/Pro (PostgreSQL)
AI: Anthropic Claude API (pay-per-use)
Cache: None (database only)
CDN: Vercel automatic
Why it works:
Bottlenecks:
Monthly Active Users: 10,000-100,000 Daily Interactions: 100k-1M Infrastructure Cost: $2,000-5,000/month
Stack Upgrades:
Frontend: Vercel Edge Network (same)
Backend: Next.js Server Actions (same)
Database: Supabase Pro ā Team plan
- Connection pooling (pgBouncer)
- Read replicas for analytics
- 100GB storage
AI: Anthropic Claude API + Response caching
Cache: Upstash Redis (Vercel KV)
- Cache AI responses (24h TTL)
- Cache user sessions
- Rate limiting
CDN: CloudFlare in front of Vercel (optional)
Monitoring: Vercel Analytics + Sentry
Architecture Pattern:
// lib/cache/redis.ts
import { Redis } from '@upstash/redis';
const redis = Redis.fromEnv();
export async function getCachedAIResponse(cacheKey: string) {
return await redis.get(cacheKey);
}
export async function setCachedAIResponse(
cacheKey: string,
response: any,
ttlSeconds: number = 86400 // 24 hours
) {
await redis.setex(cacheKey, ttlSeconds, JSON.stringify(response));
}
// Usage in AI feedback generation
export async function generateFeedback(context: FeedbackContext): Promise<AIFeedback> {
const cacheKey = `feedback:${context.case.id}:${context.student_answer.selected_answer_id}`;
// Try cache first
const cached = await getCachedAIResponse(cacheKey);
if (cached) {
console.log('Cache hit for feedback');
return JSON.parse(cached as string);
}
// Generate new feedback
const feedback = await callClaudeAPI(context);
// Cache for future students
await setCachedAIResponse(cacheKey, feedback);
return feedback;
}
Database Optimizations:
-- Partition interactions table by month (reduces query time)
CREATE TABLE interactions_2025_01 PARTITION OF interactions
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE interactions_2025_02 PARTITION OF interactions
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
-- Indexes for hot queries
CREATE INDEX CONCURRENTLY idx_interactions_user_recent
ON interactions(user_id, created_at DESC)
WHERE created_at > NOW() - INTERVAL '30 days';
-- Materialized view for dashboard stats (refresh every hour)
CREATE MATERIALIZED VIEW user_stats_cache AS
SELECT
user_id,
COUNT(*) as total_cases,
AVG(CASE WHEN is_correct THEN 1.0 ELSE 0.0 END) as success_rate,
MAX(created_at) as last_activity
FROM interactions
GROUP BY user_id;
CREATE UNIQUE INDEX ON user_stats_cache(user_id);
-- Auto-refresh via pg_cron
SELECT cron.schedule('refresh-user-stats', '0 * * * *',
'REFRESH MATERIALIZED VIEW CONCURRENTLY user_stats_cache');
Expected Performance:
Monthly Active Users: 100,000-1,000,000 Daily Interactions: 1M-10M Infrastructure Cost: $10,000-30,000/month
Major Architecture Changes:
Shard by User ID (most queries are user-scoped):
-- Shard 1: Users with ID hash % 4 = 0
-- Shard 2: Users with ID hash % 4 = 1
-- Shard 3: Users with ID hash % 4 = 2
-- Shard 4: Users with ID hash % 4 = 3
-- Routing logic in application
function getShardForUser(userId: string): number {
const hash = hashUserId(userId);
return hash % 4;
}
// Connection pool per shard
const shardConnections = {
0: createSupabaseClient(SHARD_0_URL),
1: createSupabaseClient(SHARD_1_URL),
2: createSupabaseClient(SHARD_2_URL),
3: createSupabaseClient(SHARD_3_URL),
};
export function getDbForUser(userId: string) {
const shard = getShardForUser(userId);
return shardConnections[shard];
}
Cross-shard queries (leaderboards, analytics) go to read replicas or data warehouse.
Problem: Claude API costs scale linearly ($1M+ users = $50k+/month in AI costs)
Solution: Multi-tier AI strategy
// Tier 1: Pre-computed responses (instant, free)
// For common case + answer combinations (80% of traffic)
const precomputedFeedback = await db
.from('precomputed_feedback')
.select('*')
.eq('case_id', caseId)
.eq('selected_answer', answerId)
.single();
if (precomputedFeedback) return precomputedFeedback;
// Tier 2: Cached responses (fast, cheap)
// For less common combinations (15% of traffic)
const cached = await redis.get(`feedback:${caseId}:${answerId}`);
if (cached) return JSON.parse(cached);
// Tier 3: Real-time AI generation (slow, expensive)
// For rare combinations or premium users (5% of traffic)
const feedback = await generateWithClaude(context);
await redis.setex(`feedback:${caseId}:${answerId}`, 86400, JSON.stringify(feedback));
return feedback;
Cost Impact:
Move heavy operations off request path:
// lib/jobs/queue.ts
import { Inngest } from 'inngest';
const inngest = new Inngest({ name: 'MedCards' });
// Heavy operations run async
export const calculateUserMetrics = inngest.createFunction(
{ name: 'Calculate User Metrics' },
{ event: 'user/interaction.created' },
async ({ event }) => {
const userId = event.data.userId;
// Recalculate all user stats
const stats = await computeComprehensiveStats(userId);
// Update database
await db.from('users').update({ progress: stats }).eq('id', userId);
// Check for badge unlocks
await checkBadgeUnlocks(userId, stats);
// Update leaderboards
await updateLeaderboards(userId, stats);
}
);
// Badge unlock notifications
export const notifyBadgeUnlock = inngest.createFunction(
{ name: 'Notify Badge Unlock' },
{ event: 'badge/unlocked' },
async ({ event }) => {
// Send email
// Push notification
// Update UI via WebSocket
}
);
Benefits:
// lib/db/routing.ts
// Write operations ā Primary database
export async function writeInteraction(data: InteractionData) {
return await primaryDb.from('interactions').insert(data);
}
// Read operations ā Read replicas (distribute load)
const readReplicas = [replicaDb1, replicaDb2, replicaDb3];
let currentReplica = 0;
export async function getUser Interactions(userId: string) {
const db = readReplicas[currentReplica % readReplicas.length];
currentReplica++;
return await db
.from('interactions')
.select('*')
.eq('user_id', userId)
.order('created_at', { ascending: false })
.limit(20);
}
// next.config.ts
export default {
images: {
loader: 'cloudinary', // Or imgix, cloudflare
domains: ['res.cloudinary.com'],
},
// Serve heavy assets from CDN
assetPrefix: process.env.CDN_URL,
};
Asset Strategy:
Monthly Active Users: 1M+ Daily Interactions: 10M+ Infrastructure Cost: $50,000-100,000/month
Full Microservices Architecture:
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā CloudFlare CDN ā
āāāāāāāāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāā“āāāāāāāāāāāāāā
ā Load Balancer ā
āāāāāāāāāāāāāāā¬āāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāā“āāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā ā
āāāāāāāāā¼āāāāāāāāā āāāāāāāāāā¼āāāāāāāāā
ā Web Frontend ā ā Mobile API ā
ā (Vercel Edge) ā ā (Dedicated) ā
āāāāāāāāā¬āāāāāāāāā āāāāāāāāāā¬āāāāāāāāā
ā ā
āāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāā
ā API Gateway ā
ā (Rate limiting, Auth) ā
āāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāā“āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā ā
āāāāāāāāā¼āāāāāāāāāāā āāāāāāāāāāāāāā āāāāāāāāāāāāāāā¼āāāāāāāāā
ā User Service ā ā Cache ā ā Case Service ā
ā (Supabase) ā ā (Redis) ā ā (Dedicated DB) ā
āāāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāāāāāāā
ā ā
ā āāāāāāāāāāāāāāā ā
āāāāāāāāāāāāāāāāŗ AI Service āāāāāāāāāāāāāāāāāā
ā (Claude + ā
ā Fine-tune) ā
āāāāāāāā¬āāāāāāāā
ā
āāāāāāāāāāā¼āāāāāāāāāāā
ā Analytics Service ā
ā (ClickHouse) ā
āāāāāāāāāāāāāāāāāāāāāā
Service Breakdown:
| Service | Tech | Purpose |
|---|---|---|
| User Service | Supabase | User profiles, auth, progress |
| Case Service | Dedicated PostgreSQL | Clinical cases, interactions |
| AI Service | Claude API + Custom models | Feedback, coaching, adaptive |
| Analytics | ClickHouse | Real-time analytics, dashboards |
| Search | Elasticsearch | Case search, user search |
| Notifications | Pusher / Socket.io | Real-time updates |
| Jobs | Temporal | Background processing |
| Cache | Redis Cluster | Multi-layer caching |
Vercel Pro: $20/month
Supabase Pro: $25/month
Anthropic API: $300/month (100k AI calls)
Domain + SSL: $15/month
Monitoring: $50/month
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
TOTAL: $410/month
Cost per user: $0.041/month
Vercel Enterprise: $500/month
Supabase Team: $599/month
Anthropic API: $1,500/month (500k AI calls, 70% cached)
Upstash Redis: $200/month
CloudFlare Pro: $20/month
Sentry: $100/month
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
TOTAL: $2,919/month
Cost per user: $0.029/month
Vercel Enterprise: $2,000/month
Supabase (4 shards): $2,400/month ($600 each)
Anthropic API: $4,500/month (cached 95%)
Redis Cluster: $1,000/month
CloudFlare: $200/month
Sentry: $500/month
Inngest (jobs): $300/month
Datadog: $800/month
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
TOTAL: $11,700/month
Cost per user: $0.012/month
Key Insight: Cost per user DECREASES as you scale (economies of scale).
Monthly Uptime Target: 99.9%
Error Budget: 0.1% = 43 minutes downtime/month
Week 1: 5 minutes ā 37 minutes left
Week 2: 10 minutes ā 27 minutes left
Week 3: 30 minutes ā -3 minutes (EXCEEDED!)
ā Freeze feature releases
ā Focus on stability
ā Root cause analysis
// lib/monitoring/metrics.ts
import * as Sentry from '@sentry/nextjs';
import { track } from '@vercel/analytics';
// Track all API calls
export async function monitoredAPICall<T>(
operation: string,
fn: () => Promise<T>
): Promise<T> {
const startTime = Date.now();
try {
const result = await fn();
const duration = Date.now() - startTime;
// Success metrics
track('api_call_success', {
operation,
duration,
});
return result;
} catch (error) {
// Error tracking
Sentry.captureException(error, {
tags: { operation },
extra: { duration: Date.now() - startTime },
});
// Error metrics
track('api_call_error', {
operation,
error: error.message,
});
throw error;
}
}
// Usage
export async function submitAnswer(data: AnswerData) {
return monitoredAPICall('submit_answer', async () => {
// ... actual implementation
});
}
alerts:
- name: High Error Rate
condition: error_rate > 5%
window: 5 minutes
severity: critical
notify: pagerduty
- name: Slow API Responses
condition: p95_latency > 2 seconds
window: 10 minutes
severity: warning
notify: slack
- name: Database Connection Pool Exhaustion
condition: available_connections < 10
severity: critical
notify: pagerduty
- name: AI API Rate Limit Approaching
condition: anthropic_remaining_requests < 100
severity: warning
notify: slack
- name: Daily Active Users Drop
condition: dau_vs_yesterday_decrease > 20%
severity: warning
notify: slack
Month 1: 100 users
Month 3: 1,000 users (10x growth)
Month 6: 10,000 users (10x growth)
Month 12: 50,000 users (5x growth)
Month 18: 150,000 users (3x growth)
Month 24: 500,000 users (3.3x growth)
| Metric | Trigger | Action |
|---|---|---|
| Database CPU | >70% for 1h | Add read replica |
| Database Storage | >80% used | Upgrade plan OR archive old data |
| API Error Rate | >5% for 5min | Scale up serverless OR rollback |
| Redis Memory | >80% used | Upgrade OR implement LRU eviction |
| AI API Costs | >$10k/month | Implement aggressive caching |
At 10k users:
At 50k users:
At 100k users:
# Blue-Green Deployment on Vercel
1. Deploy new version to staging
2. Run smoke tests
3. Deploy to production (Vercel handles canary rollout)
4. Monitor error rates for 15 minutes
5. If errors spike: automatic rollback
6. If stable: full rollout
// migrations/0015_add_community_cases.ts
export async function up() {
// Safe migration: additive only
await db.schema
.createTable('community_cases')
.addColumn('id', 'uuid', (col) => col.primaryKey())
.addColumn('created_at', 'timestamp')
// ... other columns
.execute();
}
export async function down() {
// Rollback (but never run in production!)
await db.schema.dropTable('community_cases').execute();
}
Migration Rules:
// middleware.ts
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(100, '1 m'), // 100 requests per minute
});
export async function middleware(request: Request) {
const ip = request.headers.get('x-forwarded-for') ?? 'unknown';
const { success, limit, remaining } = await ratelimit.limit(ip);
if (!success) {
return new Response('Rate limit exceeded', { status: 429 });
}
return NextResponse.next();
}
CloudFlare WAF ā Vercel ā Application
- CloudFlare: Block malicious IPs, rate limit per IP
- Vercel: Edge protection, DDoS mitigation
- Application: User-level rate limits
- At Rest: Supabase encrypts all data (AES-256)
- In Transit: TLS 1.3 everywhere
- Backups: Encrypted, geographically distributed
- Secrets: Managed via Vercel environment variables
-- ClickHouse table for real-time analytics (better than PostgreSQL for OLAP)
CREATE TABLE analytics.interactions (
user_id UUID,
case_id UUID,
is_correct Boolean,
time_to_answer Int32,
created_at DateTime,
specialty String
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (created_at, user_id);
-- Fast aggregations
SELECT
specialty,
COUNT(*) as total,
AVG(is_correct) as success_rate
FROM analytics.interactions
WHERE created_at > now() - INTERVAL 7 DAY
GROUP BY specialty;
-- Executes in <50ms on 100M rows
Operational DB (PostgreSQL) ā CDC ā Data Warehouse (ClickHouse)
ā
Analytics Dashboard (Metabase/Looker)
MVP (0-10k): Simple stack, manual processes, good enough
Growth (10-100k): Add caching, optimize database, automate
Scale (100k-1M): Sharding, microservices, background jobs
Platform (1M+): Full distribution, dedicated services, ML ops
Philosophy: Scale progressively, not prematurely.
Build what you need TODAY, architect for TOMORROW.
Next Steps: Implement MVP stack, monitor metrics, scale when triggers hit.