Numeric features with continuous targets like price and margin — a realistic dataset for linear regression and gradient-boosted regressors.
This is a free, reproducible B2B distribution dataset you can generate and download right here as CSV, Excel, JSON or SQL. It is built for linear / tree regression, feature engineering and price modeling — and because every field is correlated rather than random, the numbers actually hold together when you analyze them.
Each customer is assigned a segment per product category that drives how often and how much they buy, with relationship momentum, occasional large-buy spikes, and category-specific markups — so the file behaves like a real distributor sales export, not random noise.
Schema for the B2B distribution export (the anomaly column appears only when labels are switched on):
| Column | Type | Description |
|---|---|---|
| order_date | date | Business day the order was placed. |
| invoice_no | integer | Invoice id; multiple lines share one invoice. |
| customer_id / customer | int / text | The buying business. |
| product_id / product | int / text | The SKU ordered. |
| category | text | One of five distributor categories. |
| segment | text | Customer role for that category (segA largest to segD smallest). |
| quantity | integer | Units ordered; scales with segment and volume. |
| unit_cost / unit_price | number | Your cost and the price charged. |
| revenue / cost / margin | number | Line economics. |
| ship_date | date | Fulfilment date (usually next business day). |
| anomaly | 0/1 | Present with labels on; flags inflated-price large orders. |
library(readr)
df <- read_csv("b2b_invoices.csv")
head(df)
Around 10,000 rows by default. Change the row count in the generator above and re-export — anything up to ~200k works in the browser.
CSV, Excel (.xlsx), JSON, and SQL (a CREATE TABLE plus INSERT statements). Pick whatever fits your workflow.
Yes. This page uses the fixed seed regress-demo, so the download is byte-identical on every machine. Clear the seed in the generator for fresh random data.
Yes. Use Tables → Excel/SQL for a normalized multi-table export, switch on Messy / dirty data in Advanced options for nulls, typos and inconsistent dates, and choose CSV, Excel, JSON or SQL on any download.
No — it is 100% synthetic, generated in your browser, with no real people or companies. Free to use commercially.