Building a Singapore Food Classifier — Part 2: Deploying an ML App on Free Tiers
Part 2 of building Hawker AI — this time covering the engineering: how the app works, how it’s deployed, and how it runs for free.
If you missed Part 1, it covers the data collection, training, and accuracy analysis.
The goal
Ship a working ML app that:
- Lets anyone upload a photo and get a prediction
- Runs inference on a real PyTorch model (not a toy example)
- Runs entirely on free tiers for portfolio-level traffic
- Is secure enough that someone can’t rack up a bill by hitting the API
Here’s how.
Architecture overview
Browser → Vercel (Next.js) → Modal (FastAPI + PyTorch) ← Azure Blob Storage
Three services, three free tiers, zero servers to manage.
Vercel hosts the frontend and a server-side API route. The user never talks to Modal directly — Vercel acts as a proxy, adding the API key before forwarding the request.
Modal runs the actual inference. It spins up a container with PyTorch and the model weights, processes the image, and shuts down after 5 minutes of inactivity. Pay-per-call pricing means idle time costs nothing.
Azure Blob Storage holds the model weights (78MB) and training images. Modal downloads the weights once when building its container image, then caches them. Azure isn’t involved in serving requests.
Why these tools?
I evaluated a few options for each layer:
Serving the model
| Option | Pros | Cons |
|---|---|---|
| Modal | Pay-per-call, auto-scaling, bake weights into image | Cold starts (~3s) |
| AWS Lambda | Mature, well-documented | 250MB package limit (PyTorch is 800MB+) |
| Railway/Fly.io | Always-on, no cold starts | Costs money even when idle |
| Self-hosted | Full control | Have to manage a server |
Modal won because PyTorch doesn’t fit in a Lambda, and I didn’t want to pay for an always-on server for a portfolio project. The cold start is real (~3 seconds on first request), but acceptable.
Frontend hosting
Vercel was the obvious choice for Next.js. Free tier, edge CDN, serverless functions. The critical feature: server-side API routes. This lets me keep the Modal endpoint URL and API key on the server — they never touch the browser.
Model storage
I was already using Azure for other things, so Azure Blob Storage made sense. The model is only 78MB — well within free tier limits. The key insight: Modal can download the weights during its image build step and cache them, so Azure is only contacted during deploys, not during requests.
The inference pipeline
Modal setup
The Modal app does three things at build time:
- Installs Python dependencies (PyTorch, FastAPI, etc.)
- Downloads the model weights from Azure Blob Storage
- Copies the dish metadata files (dishes.json, nutrition.json, class_index.json)
def download_weights():
"""Runs during Modal image build — not on every request."""
sas_url = os.environ["AZURE_BLOB_SAS_URL"]
blob_service = BlobServiceClient(account_url=sas_url)
container = blob_service.get_container_client("models")
os.makedirs("/app/weights", exist_ok=True)
with open("/app/weights/hawker_efficientnet.pt", "wb") as f:
container.download_blob("latest/hawker_efficientnet.pt").readinto(f)
image = (
modal.Image.debian_slim(python_version="3.11")
.pip_install("torch", "torchvision", "fastapi", "azure-storage-blob", ...)
.run_function(download_weights, secrets=[...])
.add_local_dir("data", remote_path="/app/data")
)
The weights are baked into the container image. Modal caches this image, so subsequent deploys only rebuild if code or dependencies change. This keeps deploy times fast and means Azure Blob Storage isn’t a runtime dependency.
Container lifecycle
When a request comes in:
- Cold start (~3s): Modal spins up a container, loads the model into memory
- Inference (~300ms): Image is processed, top-3 predictions returned
- Warm period (5 min): Container stays alive for subsequent requests — these skip the cold start
- Scale down: After 5 minutes of no requests, the container shuts down (cost drops to $0)
For a portfolio project, most requests will hit a cold start. For production traffic, the warm period handles bursts well.
Cost math
Each inference call uses 2 CPU-seconds. At Modal’s pricing ($0.0002/CPU-second), that’s roughly $0.0004 per prediction. The free tier gives $30/month — enough for about 75,000 predictions. My portfolio gets maybe 50 visits a month.
Security: keeping the API key out of the browser
This is the part I see most tutorials skip. If you put your API endpoint URL in a NEXT_PUBLIC_ env var, anyone can open DevTools, grab the URL, and hit it directly — bypassing your frontend and potentially running up your bill.
The proxy pattern
Instead of calling Modal from the browser:
Browser → Modal (API key in JavaScript — BAD)
I route through a Next.js API route:
Browser → /api/predict (Vercel) → Modal (API key on server — GOOD)
The browser sends the image to /api/predict on the same domain. The API route (running server-side on Vercel) reads the MODAL_ENDPOINT_URL and HAWKER_AI_API_KEY from environment variables — which are never included in the JavaScript bundle — adds the X-API-Key header, and forwards the request to Modal.
// src/app/api/predict/route.ts — runs on the server, not in the browser
const res = await fetch(`${process.env.MODAL_ENDPOINT_URL}/predict`, {
method: "POST",
headers: { "X-API-Key": process.env.HAWKER_AI_API_KEY },
body: upstream,
});
The user never sees the Modal URL or the API key. DevTools shows a request to /api/predict on your own domain — nothing to steal.
Rate limiting
The API route also enforces rate limiting — 10 requests per minute per IP address:
const RATE_LIMIT = 10;
const WINDOW_MS = 60_000;
const hits = new Map<string, { count: number; reset: number }>();
function isRateLimited(ip: string): boolean {
const now = Date.now();
const entry = hits.get(ip);
if (!entry || now > entry.reset) {
hits.set(ip, { count: 1, reset: now + WINDOW_MS });
return false;
}
entry.count++;
return entry.count > RATE_LIMIT;
}
This is an in-memory rate limiter — it resets when the serverless function cold-starts, so it’s not bulletproof. But it stops casual abuse, which is all a portfolio project needs. For production, you’d use Redis or Vercel’s KV-backed rate limiting.
Modal-side protection
On the Modal side, additional limits prevent runaway costs:
max_containers=2— even under heavy load, Modal won’t spin up more than 2 containersscaledown_window=300— containers shut down after 5 minutes idle- API key validation — requests without a valid
X-API-Keyheader get a 401
The full security stack
| Layer | What it stops |
|---|---|
| Vercel edge | DDoS at the network level |
| API route rate limit | One user spamming the endpoint |
| API route file size check | 10MB upload limit |
| Server-side API key | No one can call Modal directly |
| Modal max containers | Cost cap even if rate limiting fails |
| Modal scale-down | No idle cost accumulation |
CI/CD: GitHub Actions with manual triggers
I chose manual deploys (workflow_dispatch) over auto-deploy-on-push for a specific reason: the backend deploy downloads model weights from Azure and builds a Docker image on Modal. If that breaks, I want to be the one who triggered it — not a random push to fix a typo in the README.
Backend workflow
name: Deploy Backend to Modal
on:
workflow_dispatch:
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install modal
- run: modal deploy backend/modal_app.py
env:
MODAL_TOKEN_ID: ${{ secrets.MODAL_TOKEN_ID }}
MODAL_TOKEN_SECRET: ${{ secrets.MODAL_TOKEN_SECRET }}
Modal’s CLI handles the heavy lifting — it builds the image (running download_weights() which pulls from Azure), pushes it to Modal’s registry, and deploys the FastAPI app.
Frontend workflow
name: Deploy Frontend to Vercel
on:
workflow_dispatch:
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
- run: npm i -g vercel@latest
- run: vercel pull --yes --environment=production --token=${{ secrets.VERCEL_TOKEN }}
- run: vercel build --prod --token=${{ secrets.VERCEL_TOKEN }}
- run: vercel deploy --prebuilt --prod --token=${{ secrets.VERCEL_TOKEN }}
The vercel pull step fetches the environment variables (including the sensitive MODAL_ENDPOINT_URL and HAWKER_AI_API_KEY) from the Vercel dashboard. These get injected into the serverless function at runtime.
Secrets management
Secrets live in three places, each scoped to its platform:
| Where | Secrets | Used by |
|---|---|---|
| GitHub Actions | MODAL_TOKEN_ID, MODAL_TOKEN_SECRET, VERCEL_TOKEN, VERCEL_ORG_ID, VERCEL_PROJECT_ID | CI/CD workflows |
| Modal dashboard | AZURE_BLOB_SAS_URL (for weight download), HAWKER_AI_API_KEY (for auth) | Modal app |
| Vercel dashboard | MODAL_ENDPOINT_URL, HAWKER_AI_API_KEY | Next.js API route |
Nothing is in the codebase.
The frontend
The frontend is straightforward — Next.js 14 with Material UI (MUI) components and a yellow colour theme. I initially built it with Tailwind, but it looked too plain, so I switched to MUI for the card layouts, progress bars, and chips.
The upload flow
- User drags or clicks to upload an image (react-dropzone handles this)
- Image preview is shown immediately (via
URL.createObjectURL) - File is sent as
multipart/form-datato/api/predict - Loading spinner while waiting for Modal inference
- Results displayed: dish name, confidence score, nutrition data, top-3 alternatives
The training page
I added a /training page that shows the full accuracy breakdown, known issues, and confusion pairs. It’s essentially the evaluation results rendered as an interactive page — more useful than a static table in the README.
This was a deliberate choice: showing not just the result but the analysis behind it. “Here’s what works, here’s what doesn’t, here’s why” is more useful than a cherry-picked demo.
Lessons learned
1. The proxy pattern is non-negotiable
Any time you’re calling a paid API from a frontend, route through a server-side proxy. It takes 20 lines of code and prevents your API key (and your bill) from being exposed to anyone with a browser.
2. Modal is perfect for portfolio ML projects
Pay nothing when idle, pay fractions of a cent when used. The cold start trade-off is real but acceptable. For production traffic, you’d want an always-on option — but for portfolios, side projects, and demos, it’s ideal.
3. Bake weights into the image, don’t download at runtime
Downloading 78MB from Azure on every cold start would add 5-10 seconds of latency. By downloading during the image build and caching, the weights are already on disk when the container starts. Cold start drops from ~13s to ~3s.
4. Manual deploys are fine
Auto-deploy-on-push is great for simple web apps. For ML apps where the deploy involves downloading model weights and building container images, manual triggers give you more control and fewer surprise failures.
5. Edge runtime can’t read sensitive env vars on Vercel
I initially used export const runtime = "edge" for the API route. It worked locally but failed in production — edge runtime on Vercel can’t access encrypted environment variables. Switching to export const runtime = "nodejs" fixed it. The trade-off (slightly higher cold start) is invisible when the downstream call to Modal takes 300ms-3s anyway.
Cost breakdown
| Service | Free tier | My usage | Monthly cost |
|---|---|---|---|
| Vercel (Hobby) | 100GB bandwidth, serverless functions | ~50 page views | $0 |
| Modal | $30 free credit | ~50 predictions | $0.02 |
| Azure Blob Storage | 5GB, 20K transactions | 78MB stored, ~2 deploys | $0 |
| RunPod (training) | None | One-time: 45 min A100 | $3 (one-time) |
| Total monthly | $0 |
The entire app — frontend, backend, model serving, CI/CD — runs on free tiers. The only cost was $3 for training, which is a one-time expense.
Wrapping up
The full stack:
- 78MB model trained on 4,000 images for $3
- 30 hawker dishes classified at 79.3% accuracy
- ~300ms inference on CPU (no GPU needed for serving)
- Free-tier running cost
- Secure — API key never leaves the server, rate limited, containerised
Is it production-ready? No. Economy rice is at 35%, the rate limiter resets on cold start, and the cold start itself is 3 seconds. But as a portfolio project that demonstrates data collection, model training, evaluation analysis, full-stack deployment, and security — it does the job.
Try it: hawker-ai.vercel.app
← Part 1: Data, Training, and Accuracy · Next: Part 3: Case Studies →