Sample data for classification

A labeled dataset with categorical and numeric features and a binary target — ready for logistic regression, trees and gradient boosting.

E-commerceSeeded - reproducibleCSV / Excel / JSON / SQL100% in-browser

Generate & download

Save / load scenario (stored only in this browser)

About this dataset

This is a free, reproducible E-commerce dataset you can generate and download right here as CSV, Excel, JSON or SQL. It is built for binary classification, model comparison and feature engineering — and because every field is correlated rather than random, the numbers actually hold together when you analyze them.

Shoppers carry RFM-style segments that set purchase frequency and behavior, and demand flows through a seasonality curve with weekend lifts and a Q4 holiday peak — so RFM, cohort, attribution, and seasonality analysis all return meaningful results.

Columns in this dataset

Schema for the E-commerce export (the anomaly column appears only when labels are switched on):

ColumnTypeDescription
order_datedateDay the order was placed.
order_idintegerOrder id; lines in an order share it.
customer_id / customerint / textThe shopper.
segmenttextRFM-style: Champion, Loyal, Regular, New, At-Risk.
channeltextAcquisition channel.
product / categorytextSKU and department.
quantity / unit_pricenumberUnits and list price.
discount_pctnumberPromo or loyalty discount.
line_totalnumberquantity x price x (1 - discount).
returned0/1Whether the line was returned.
anomaly0/1Present with labels on; flags fraud-like orders.

Load it with pandas

import pandas as pd
df = pd.read_csv("ecommerce_orders.csv")
df.head()

Good for

Binary classificationModel comparisonFeature engineeringImbalanced learning

Related sample datasets

FAQ

How big is this dataset?

Around 12,000 rows by default. Change the row count in the generator above and re-export — anything up to ~200k works in the browser.

What formats can I download?

CSV, Excel (.xlsx), JSON, and SQL (a CREATE TABLE plus INSERT statements). Pick whatever fits your workflow.

Does it include labels for modeling?

Yes — an anomaly column flags a small fraction of rows (fraud-like / outlier events), so you have ground truth for classification and anomaly detection.

Will I get the same file every time?

Yes. This page uses the fixed seed classif-demo, so the download is byte-identical on every machine. Clear the seed in the generator for fresh random data.

Can I get separate tables, messy data, or other formats?

Yes. Use Tables → Excel/SQL for a normalized multi-table export, switch on Messy / dirty data in Advanced options for nulls, typos and inconsistent dates, and choose CSV, Excel, JSON or SQL on any download.

Is the data real?

No — it is 100% synthetic, generated in your browser, with no real people or companies. Free to use commercially.