Realistic invoice line items for a wholesale/janitorial distributor — driven by customer segments (A–D), relationship momentum, large-buy spikes, and category-specific markups. Perfect as a sample sales file for dashboards, BI demos, and SQL practice.
Each row is one invoice line (one product on one order). The full schema:
| Column | Type | Description |
|---|---|---|
| order_date | date | Business day the order was placed (weekends excluded). |
| invoice_no | integer | Invoice identifier; multiple lines share one invoice. |
| customer_id / customer | int / text | The buying business. |
| product_id / product | int / text | The SKU ordered. |
| category | text | One of five distributor categories (paper, office, cleaning, breakroom, copy paper). |
| segment | text | The customer's role for that category: segA (largest) → segD (smallest). |
| quantity | integer | Units ordered; scales with segment & volume tier. |
| unit_cost / unit_price | number | Your cost and the price charged (cost × markup). |
| revenue / cost / margin | number | Line economics (quantity × price/cost, and the difference). |
| ship_date | date | Fulfilment date (mostly next business day). |
| anomaly | 0/1 | Present only when anomaly injection is on; flags inflated-price large orders. |
Generic mock-data tools draw every cell independently, so quantities, prices, and customers have no relationship. This simulator instead models distribution dynamics: each customer is assigned a segment per category, and segment drives order frequency, order size, and the discount they command. Big accounts (segA) buy more often, in larger quantities, but at thinner markups; convenience-driven small accounts (segD) pay a premium. Relationship momentum means winning one category makes the next more likely, mirroring share-of-wallet stickiness. Occasional large buys create realistic volume spikes at compressed margins. The result behaves like a real sales export — trends, seasonality of buyers, and a long tail of small orders — which is exactly what you need to demo a dashboard or set a take-home that rewards real analysis.
Roughly one customer per 20 rows and one SKU per 50 rows (five categories), so a 5,000-row file has ~250 customers and ~100 products. Both scale with the row count.
Yes. With a seed set, the exact dataset is reproducible — great for tutorials and shared examples. Leave the seed blank for fresh random data each time.
No — generation is 100% in your browser.