Basket-level point-of-sale transactions with real product affinities — items that genuinely co-occur (chips + salsa + soda), plus store IDs, hour-of-day patterns, and a payment mix. The dataset market-basket and association-rule tutorials actually need.
Each row is one item within a basket. Group by transaction_id to reconstruct each shopper's basket for association-rule mining.
| Column | Type | Description |
|---|---|---|
| transaction_id | integer | The basket; multiple item rows share it. |
| datetime | datetime | Timestamp with realistic hour-of-day weighting. |
| store_id | text | Which store rang the sale (scales with row count). |
| product / department | text | The item and its aisle/department. |
| quantity / unit_price | number | Units and shelf price. |
| line_total | number | quantity × unit_price. |
| payment | text | Card / Cash / Mobile. |
| anomaly | 0/1 | Present only with injection on; flags suspicious transactions (e.g. odd-hour high-value bulk). |
The catalog is organized into affinity groups — sets of items that really go together, like {tortilla chips, salsa, guacamole, soda} or {diapers, wipes, baby food}. Each basket draws one or two of these groups and co-purchases their members at high probability, with the occasional impulse buy mixed in. That means an association-rule miner (Apriori/FP-Growth) will actually surface lift between linked products — the whole point of a market-basket exercise — instead of finding nothing because the items were independent. Layer on weighted shopping hours, weekend/holiday traffic lifts, and multiple stores, and you get transaction data that behaves like a real grocery POS feed.
Group rows by transaction_id to form item lists, then feed them to Apriori or FP-Growth. You should see strong lift within the affinity groups baked into the catalog.
It scales with size — from one store on small files up to eight on large ones — so you can compare store performance.
No — generation is 100% in your browser.