Used Car Price Prediction API

Motivation

Why this project exists

Applying machine learning models to used car pricing can be an effective tool for dealerships, online marketplaces, and lenders to quickly and accurately set competitive listing prices, flag underpriced inventory for acquisition, detect overpriced listings, and underwrite auto loans. A model that predicts within ~$1,300 of the true price at the median — as CatBoost does here — is accurate enough to automate first-pass pricing and reduce the need for manual appraisals. To actually capture that value, the model needs to live behind an API that's reliable, observable, and deployable.

This project demonstrates the full ML engineering lifecycle, from model selection in Part 1 to production deployment in Part 2.

Architecture

Request → Prediction pipeline

Every prediction passes through validation, fuzzy correction, feature engineering, and model inference — the same pipeline used during training to eliminate training-serving skew.

1

Validate

Pydantic v2 checks
18 vehicle attributes

→

2

Fuzzy Match

"Toyata" corrected
to "toyota" + warning

→

3

Impute

Optional fields filled
with training medians

→

4

Engineer

Shared pipeline
transforms features

→

5

Predict

CatBoost inference
log1p → expm1

→

6

Respond

JSON with price,
warnings, input echo

Engineering

Key design decisions

⟳ Shared Feature Pipeline

The same pipeline.py transforms training data and API inputs. This eliminates training-serving skew — the most common silent failure mode in production ML.

≈ Fuzzy Input Matching

Manufacturer, model, drivetrain, and fuel type are matched against the training vocabulary using SequenceMatcher. Typos get corrected with warnings, not rejections.

◐ Median Imputation

5 optional listing fields are filled with training-set medians when omitted. Users can submit 13 required fields and still get a reasonable prediction.

◉ Prometheus Metrics

Prediction count, latency percentiles, and error rates exposed at /metrics. Middleware-based instrumentation keeps business logic untouched.

API

Example request & response

Submit vehicle attributes via POST and receive a predicted listing price in USD. The API handles typo correction, color normalization, engine parsing, and optional field imputation automatically.

POST /api/v1/predict

{
  "manufacturer": "toyota",
  "model": "camry le",
  "year": 2020,
  "mileage": 35000,
  "engine": "2.5l i4 dohc 16v",
  "transmission": "8 speed automatic",
  "drivetrain": "fwd",
  "fuel_type": "gasoline",
  "exterior_color": "silver metallic",
  "interior_color": "black leather",
  "accidents_or_damage": 0,
  "one_owner": 1,
  "personal_use_only": 1
}

Response · 200 OK

{
  "predicted_price": 27474.00,
  "currency": "USD",
  "model_used": "CatBoost",
  "warnings": [],
  "input_echo": { ... }
}

Production Readiness

What makes this production-grade

✓

Request validation — Pydantic v2 schemas with domain constraints reject malformed inputs with clear error messages

✓

Health probes — Separate /health (liveness) and /ready (readiness) endpoints for Kubernetes probe integration

✓

Autoscaling — HPA scales from 2 to 6 pods based on CPU utilization with scale-down stabilization

✓

Security hardening — Non-root container, dropped Linux capabilities, and read-only access patterns

✓

Zero-downtime deploys — Rolling update strategy with maxUnavailable: 0 ensures no dropped requests

✓

Observability — Prometheus metrics with prediction latency histograms, success/error counters, and K8s scraping annotations

Observability

Grafana monitoring dashboard

The API exposes Prometheus-compatible metrics at /metrics, scraped automatically by Prometheus via Kubernetes annotations. A custom Grafana dashboard visualizes prediction throughput, latency, and errors in real time — the same observability stack used in production ML systems.

Counter

prediction_requests_total

Tracks every prediction, labeled by success or error status for SLA monitoring.

Histogram

prediction_latency_seconds

Captures latency distribution across configurable buckets for percentile analysis (p50, p95, p99).

Counter

prediction_errors_total

Categorizes failures by type — validation errors, server errors, and unexpected exceptions.

Grafana — Used Car Price API Dashboard

Grafana monitoring dashboard showing average latency (9.99ms), total predictions (12 errors, 70 successes), request throughput over time, p95 prediction latency, and error rate breakdown

Live dashboard running on minikube — 5 panels tracking latency, throughput, error rates, and prediction counts across 2 pods

CI/CD

Automated pipeline

Every push to main triggers a GitHub Actions pipeline with a two-tier test strategy. PRs get fast unit test feedback; merges to main run integration tests against the real CatBoost model.

🧪

Unit Tests

Mocked model fixtures

🐳

Docker Build

Real model via LFS

☸️

K8s Validation

kubeconform lint

🔬

Integration

Real model predictions

Stack

Technologies used

Python FastAPI CatBoost Pydantic v2 Docker Kubernetes Prometheus Grafana GitHub Actions pytest pandas NumPy Uvicorn

Explore

Project links

Part 2: API Repository

FastAPI, Docker, Kubernetes, CI/CD

Part 1: Model Development

EDA, feature engineering, model selection