← Back to projects

Overview

The Purchase & Sales Analytics API ingests bulk purchase and sales data from Excel and CSV uploads, then exposes analytics endpoints for profit reporting, top/bottom product rankings, and parameterized product search. The hard part is not the analytics — it is the ingestion. A malformed upload must never reach the live dataset, and the pipeline has to stay fast even at scale.

Challenge

  • Hard constraint on data integrity: a malformed Excel or CSV file could corrupt the live production dataset, so validation has to happen before a single row is written.
  • Performance under load: the target was to ingest files with millions of rows without timing out or exhausting memory.
  • Three bottlenecks, one pipeline: parsing, batching, and bulk write each contributed their own bottleneck, and no single optimization was enough.

Solution

  • Pre-write schema validation: every uploaded file is validated against the expected schema before any row touches the database, protecting the live dataset under tight deadlines.
  • Streaming parsing with CsvHelper to avoid loading the entire file into memory.
  • 20K-row batching to balance round-trip overhead against transaction size.
  • SqlBulkCopy via EFCore.BulkExtensions for the final write, pushing past EF Core’s per-row insertion cost.
  • Analytics endpoints for profit reporting, top/bottom product rankings, and parameterized product search, designed for real business queries rather than toy examples.

Lessons Learned

The lesson here is that performance is a pipeline problem, not a single-knob problem. Streaming parsing fixed memory. Batching fixed round-trip overhead. Bulk insert fixed EF Core’s per-row cost. None of the three on its own would have hit 1M+ rows in 0.35s, and the engineering work was identifying which layer was the current bottleneck at each iteration.