Jun 21, 2026

Building a Singapore Food Classifier — Part 2: Deploying an ML App on Free Tiers

Part 2 of building Hawker AI — this time covering the engineering: how the app works, how it’s deployed, and how it runs for free.

If you missed Part 1, it covers the data collection, training, and accuracy analysis.

The goal

Ship a working ML app that:

Lets anyone upload a photo and get a prediction
Runs inference on a real PyTorch model (not a toy example)
Runs entirely on free tiers for portfolio-level traffic
Is secure enough that someone can’t rack up a bill by hitting the API

Here’s how.

Architecture overview

Browser → Vercel (Next.js) → Modal (FastAPI + PyTorch) ← Azure Blob Storage

Three services, three free tiers, zero servers to manage.

Vercel hosts the frontend and a server-side API route. The user never talks to Modal directly — Vercel acts as a proxy, adding the API key before forwarding the request.

Modal runs the actual inference. It spins up a container with PyTorch and the model weights, processes the image, and shuts down after 5 minutes of inactivity. Pay-per-call pricing means idle time costs nothing.

Azure Blob Storage holds the model weights (78MB) and training images. Modal downloads the weights once when building its container image, then caches them. Azure isn’t involved in serving requests.

Why these tools?

I evaluated a few options for each layer:

Serving the model

Option	Pros	Cons
Modal	Pay-per-call, auto-scaling, bake weights into image	Cold starts (~3s)
AWS Lambda	Mature, well-documented	250MB package limit (PyTorch is 800MB+)
Railway/Fly.io	Always-on, no cold starts	Costs money even when idle
Self-hosted	Full control	Have to manage a server

Modal won because PyTorch doesn’t fit in a Lambda, and I didn’t want to pay for an always-on server for a portfolio project. The cold start is real (~3 seconds on first request), but acceptable.

Frontend hosting

Vercel was the obvious choice for Next.js. Free tier, edge CDN, serverless functions. The critical feature: server-side API routes. This lets me keep the Modal endpoint URL and API key on the server — they never touch the browser.

Model storage

I was already using Azure for other things, so Azure Blob Storage made sense. The model is only 78MB — well within free tier limits. The key insight: Modal can download the weights during its image build step and cache them, so Azure is only contacted during deploys, not during requests.

The inference pipeline

The Modal app does three things at build time:

Installs Python dependencies (PyTorch, FastAPI, etc.)
Downloads the model weights from Azure Blob Storage
Copies the dish metadata files (dishes.json, nutrition.json, class_index.json)

def download_weights():
    """Runs during Modal image build — not on every request."""
    sas_url = os.environ["AZURE_BLOB_SAS_URL"]
    blob_service = BlobServiceClient(account_url=sas_url)
    container = blob_service.get_container_client("models")
    
    os.makedirs("/app/weights", exist_ok=True)
    with open("/app/weights/hawker_efficientnet.pt", "wb") as f:
        container.download_blob("latest/hawker_efficientnet.pt").readinto(f)

image = (
    modal.Image.debian_slim(python_version="3.11")
    .pip_install("torch", "torchvision", "fastapi", "azure-storage-blob", ...)
    .run_function(download_weights, secrets=[...])
    .add_local_dir("data", remote_path="/app/data")
)

The weights are baked into the container image. Modal caches this image, so subsequent deploys only rebuild if code or dependencies change. This keeps deploy times fast and means Azure Blob Storage isn’t a runtime dependency.

Container lifecycle

When a request comes in:

Cold start (~3s): Modal spins up a container, loads the model into memory
Inference (~300ms): Image is processed, top-3 predictions returned
Warm period (5 min): Container stays alive for subsequent requests — these skip the cold start
Scale down: After 5 minutes of no requests, the container shuts down (cost drops to $0)

For a portfolio project, most requests will hit a cold start. For production traffic, the warm period handles bursts well.

Cost math

Each inference call uses ~~2 CPU-seconds. At Modal’s pricing (~~$0.0002/CPU-second), that’s roughly $0.0004 per prediction. The free tier gives $30/month — enough for about 75,000 predictions. My portfolio gets maybe 50 visits a month.

Security: keeping the API key out of the browser

This is the part I see most tutorials skip. If you put your API endpoint URL in a NEXT_PUBLIC_ env var, anyone can open DevTools, grab the URL, and hit it directly — bypassing your frontend and potentially running up your bill.

The proxy pattern

Instead of calling Modal from the browser:

Browser → Modal (API key in JavaScript — BAD)

I route through a Next.js API route:

Browser → /api/predict (Vercel) → Modal (API key on server — GOOD)

The browser sends the image to /api/predict on the same domain. The API route (running server-side on Vercel) reads the MODAL_ENDPOINT_URL and HAWKER_AI_API_KEY from environment variables — which are never included in the JavaScript bundle — adds the X-API-Key header, and forwards the request to Modal.

// src/app/api/predict/route.ts — runs on the server, not in the browser
const res = await fetch(`${process.env.MODAL_ENDPOINT_URL}/predict`, {
    method: "POST",
    headers: { "X-API-Key": process.env.HAWKER_AI_API_KEY },
    body: upstream,
});

The user never sees the Modal URL or the API key. DevTools shows a request to /api/predict on your own domain — nothing to steal.

Rate limiting

The API route also enforces rate limiting — 10 requests per minute per IP address:

const RATE_LIMIT = 10;
const WINDOW_MS = 60_000;
const hits = new Map<string, { count: number; reset: number }>();

function isRateLimited(ip: string): boolean {
    const now = Date.now();
    const entry = hits.get(ip);
    if (!entry || now > entry.reset) {
        hits.set(ip, { count: 1, reset: now + WINDOW_MS });
        return false;
    }
    entry.count++;
    return entry.count > RATE_LIMIT;
}

This is an in-memory rate limiter — it resets when the serverless function cold-starts, so it’s not bulletproof. But it stops casual abuse, which is all a portfolio project needs. For production, you’d use Redis or Vercel’s KV-backed rate limiting.

On the Modal side, additional limits prevent runaway costs:

max_containers=2 — even under heavy load, Modal won’t spin up more than 2 containers
scaledown_window=300 — containers shut down after 5 minutes idle
API key validation — requests without a valid X-API-Key header get a 401

The full security stack

Layer	What it stops
Vercel edge	DDoS at the network level
API route rate limit	One user spamming the endpoint
API route file size check	10MB upload limit
Server-side API key	No one can call Modal directly
Modal max containers	Cost cap even if rate limiting fails
Modal scale-down	No idle cost accumulation

CI/CD: GitHub Actions with manual triggers

I chose manual deploys (workflow_dispatch) over auto-deploy-on-push for a specific reason: the backend deploy downloads model weights from Azure and builds a Docker image on Modal. If that breaks, I want to be the one who triggered it — not a random push to fix a typo in the README.

Backend workflow

name: Deploy Backend to Modal
on:
  workflow_dispatch:

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install modal
      - run: modal deploy backend/modal_app.py
        env:
          MODAL_TOKEN_ID: ${{ secrets.MODAL_TOKEN_ID }}
          MODAL_TOKEN_SECRET: ${{ secrets.MODAL_TOKEN_SECRET }}

Modal’s CLI handles the heavy lifting — it builds the image (running download_weights() which pulls from Azure), pushes it to Modal’s registry, and deploys the FastAPI app.

Frontend workflow

name: Deploy Frontend to Vercel
on:
  workflow_dispatch:

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm i -g vercel@latest
      - run: vercel pull --yes --environment=production --token=${{ secrets.VERCEL_TOKEN }}
      - run: vercel build --prod --token=${{ secrets.VERCEL_TOKEN }}
      - run: vercel deploy --prebuilt --prod --token=${{ secrets.VERCEL_TOKEN }}

The vercel pull step fetches the environment variables (including the sensitive MODAL_ENDPOINT_URL and HAWKER_AI_API_KEY) from the Vercel dashboard. These get injected into the serverless function at runtime.

Secrets management

Secrets live in three places, each scoped to its platform:

Where	Secrets	Used by
GitHub Actions	`MODAL_TOKEN_ID`, `MODAL_TOKEN_SECRET`, `VERCEL_TOKEN`, `VERCEL_ORG_ID`, `VERCEL_PROJECT_ID`	CI/CD workflows
Modal dashboard	`AZURE_BLOB_SAS_URL` (for weight download), `HAWKER_AI_API_KEY` (for auth)	Modal app
Vercel dashboard	`MODAL_ENDPOINT_URL`, `HAWKER_AI_API_KEY`	Next.js API route

Nothing is in the codebase.

The frontend

The frontend is straightforward — Next.js 14 with Material UI (MUI) components and a yellow colour theme. I initially built it with Tailwind, but it looked too plain, so I switched to MUI for the card layouts, progress bars, and chips.

The upload flow

User drags or clicks to upload an image (react-dropzone handles this)
Image preview is shown immediately (via URL.createObjectURL)
File is sent as multipart/form-data to /api/predict
Loading spinner while waiting for Modal inference
Results displayed: dish name, confidence score, nutrition data, top-3 alternatives

The training page

I added a /training page that shows the full accuracy breakdown, known issues, and confusion pairs. It’s essentially the evaluation results rendered as an interactive page — more useful than a static table in the README.

This was a deliberate choice: showing not just the result but the analysis behind it. “Here’s what works, here’s what doesn’t, here’s why” is more useful than a cherry-picked demo.

Lessons learned

1. The proxy pattern is non-negotiable

Any time you’re calling a paid API from a frontend, route through a server-side proxy. It takes 20 lines of code and prevents your API key (and your bill) from being exposed to anyone with a browser.

Pay nothing when idle, pay fractions of a cent when used. The cold start trade-off is real but acceptable. For production traffic, you’d want an always-on option — but for portfolios, side projects, and demos, it’s ideal.

3. Bake weights into the image, don’t download at runtime

Downloading 78MB from Azure on every cold start would add 5-10 seconds of latency. By downloading during the image build and caching, the weights are already on disk when the container starts. Cold start drops from ~13s to ~3s.

4. Manual deploys are fine

Auto-deploy-on-push is great for simple web apps. For ML apps where the deploy involves downloading model weights and building container images, manual triggers give you more control and fewer surprise failures.

5. Edge runtime can’t read sensitive env vars on Vercel

I initially used export const runtime = "edge" for the API route. It worked locally but failed in production — edge runtime on Vercel can’t access encrypted environment variables. Switching to export const runtime = "nodejs" fixed it. The trade-off (slightly higher cold start) is invisible when the downstream call to Modal takes 300ms-3s anyway.

Cost breakdown

Service	Free tier	My usage	Monthly cost
Vercel (Hobby)	100GB bandwidth, serverless functions	~50 page views	$0
Modal	$30 free credit	~50 predictions	$0.02
Azure Blob Storage	5GB, 20K transactions	78MB stored, ~2 deploys	$0
RunPod (training)	None	One-time: 45 min A100	$3 (one-time)
Total monthly			$0

The entire app — frontend, backend, model serving, CI/CD — runs on free tiers. The only cost was $3 for training, which is a one-time expense.

Wrapping up

The full stack:

78MB model trained on 4,000 images for $3
30 hawker dishes classified at 79.3% accuracy
~300ms inference on CPU (no GPU needed for serving)
Free-tier running cost
Secure — API key never leaves the server, rate limited, containerised

Is it production-ready? No. Economy rice is at 35%, the rate limiter resets on cold start, and the cold start itself is 3 seconds. But as a portfolio project that demonstrates data collection, model training, evaluation analysis, full-stack deployment, and security — it does the job.

Try it: hawker-ai.vercel.app

← Part 1: Data, Training, and Accuracy · Next: Part 3: Case Studies →