nonobie — Real dev skills, explained like a friend would

What you'll learn

By the end of this chapter you will:

Design the bulk import flow: S3 → POST /imports → job row → → worker → error report
Stream-parse CSV and XLSX files without loading the whole file into memory
Validate every row and collect all errors into a downloadable error report CSV
Insert in batches of ~500 inside transactions and avoid DB lock contention
Make bulk imports idempotent using file SHA256 and row-level unique constraints
Defend against formula injection (DDE) in XLSX exports

Rows	Pattern	UX
1–500	Synchronous (request blocks until done)	Spinner, then result
500+	Asynchronous (job + worker)	Upload → "we'll email you" / progress bar

plaintext

1. Frontend uploads CSV → S3 (presigned URL, Chapter 20)
2. Frontend POSTs /imports { s3_key, type: "transactions" }
3. Backend creates a row in `import_jobs`:
     id, owner_id, s3_key, type, status='queued', progress=0,
     total_rows=null, error_count=null, error_report_s3_key=null
4. Backend enqueues a job onto the worker queue (Chapter 15)
5. Worker:
     - Downloads file from S3 (streamed)
     - Parses & validates row by row
     - Writes batches of 500 rows in transactions
     - Updates progress on the job record every batch
     - Collects errors → writes an error_report.csv to S3
     - Marks status='completed' (or 'failed')
6. Frontend polls GET /imports/:id (or receives a notification)
7. If errors exist, user downloads the error_report.csv to fix and re-upload

import { parse } from 'csv-parse';
import { Readable } from 'stream';
 
async function processCsv(stream: Readable, onRow: (row: any) => Promise<void>)

import { stream as xlsxStream } from 'exceljs';
 
const workbookReader = new xlsxStream.xlsx.WorkbookReader(s3ReadStream, {
  entries: 'emit',
  sharedStrings: 'cache',
}

Tip

Never use XLSX.readFile() (the synchronous xlsx library reader) in a worker — it loads the whole workbook into memory. For large files use exceljs streaming or convert to CSV first.

const errors: { row: number; column: string; value: any; message: string }[] = [];
let rowNum = 1;     // row 1 = header
 
for await (

plaintext

row,column,value,message
17,email,not-an-email,Must be a valid email
24,amount,-50,Must be positive
102,partner_ref,abc-123,Already imported (duplicate)

await this.sequelize.transaction(async (tx) => {
  await this.transactionModel.bulkCreate(batch, {
    transaction: tx,
    validate: false,                     // we already validated above
    returning:

import { from as copyFrom } from 'pg-copy-streams';
 
const stream = pgClient.query(copyFrom(
  `COPY transactions(partner_ref, amount, currency) FROM STDIN WITH (FORMAT csv)`,
));
csvStream.pipe(stream);

sql

ALTER TABLE transactions
  ADD CONSTRAINT uniq_partner_ref UNIQUE (partner_id, partner_ref);

await this.transactionModel.bulkCreate(batch, {
  ignoreDuplicates: true,
});

const fileSha = createHash('sha256').update(buffer).digest('hex');
const existing = await this.importJobModel.findOne({
  where: { owner_id, file_sha256:

Tip

Both patterns combined is best: file-level for “same upload twice”, row-level for partial overlaps between two different files.

plaintext

account_number
00123456789           ← in CSV
123456789             ← what Excel does to it on save

const parsed = parseISO(row.date) ;
if (!isValid(parsed)) errors.push({ row: rowNum, ... });

Watch out

A .xlsx file is a zip archive of XML files. It can carry formula injection, macros, and XXE vulnerabilities. Never blindly trust user-uploaded XLSX content.

function sanitiseCell(v: string): string {
  if (typeof v !== 'string') return v;
  if (/^[=+\-@\t\r]/.test(v)) return `'${v

import JSZip from 'jszip';
const zip = await JSZip.loadAsync(buffer, { checkCRC32: true });
let total = 0;
for (const name of Object.keys(zip

plaintext

Max file size:        20 MB (CSV) / 10 MB (XLSX)
Max rows per file:    50,000
Max jobs per user:    1 active at a time
Max imports per day:  10
Worker timeout:       30 minutes per job

One thing to remember

Collect all errors from every row before failing. One upload = one error report. Never stop at the first bad row — users want to fix everything in one pass. And always build for idempotency: the user will upload twice.

One thing to remember

Collect all errors from every row before failing. One error = one error report. Never stop at the first bad row — users want to fix everything in one pass. And always idempotency: the user will upload twice.

Chapter 21 — Bulk Imports (CSV / XLSX)

Why bulk uploads are dangerous

Synchronous vs asynchronous — pick by row count

The async bulk-upload flow (use this)

Streaming parsers — never load the whole file in memory

CSV (using `csv-parse`)

XLSX (using `exceljs` streaming reader)

Validate every row, collect every error

Batch inserts — 500 rows per transaction

Idempotency — the user WILL upload twice

Pattern 1 — natural unique key

Pattern 2 — file-level idempotency key

CSV pitfalls — the small details that break in production

BOM (Byte Order Mark)

Encoding

Numbers as text

Embedded commas and newlines

Dates

XLSX pitfalls — XLSX is more dangerous than CSV

Formulas

Macros

XXE in the embedded XML

ZIP bombs

Limits — set them or regret them

DB lock contention — the silent killer

Anti-patterns

Bulk-import endpoint checklist

Chapter 21 — Bulk Imports (CSV / XLSX)

Why bulk uploads are dangerous

Synchronous vs asynchronous — pick by row count

The async bulk-upload flow (use this)

Streaming parsers — never load the whole file in memory

CSV (using csv-parse)

XLSX (using exceljs streaming reader)

Validate every row, collect every error

Batch inserts — 500 rows per transaction

Idempotency — the user WILL upload twice

Pattern 1 — natural unique key

Pattern 2 — file-level idempotency key

CSV pitfalls — the small details that break in production

BOM (Byte Order Mark)

Encoding

Numbers as text

Embedded commas and newlines

Dates

XLSX pitfalls — XLSX is more dangerous than CSV

Formulas

Macros

XXE in the embedded XML

ZIP bombs

Limits — set them or regret them

DB lock contention — the silent killer

Anti-patterns

Bulk-import endpoint checklist

CSV (using `csv-parse`)

XLSX (using `exceljs` streaming reader)