Market-basket analysis in Python (with a free POS dataset)

Find which products are bought together using Apriori and association rules. Uses a retail POS dataset built with real basket affinities, so you'll actually see meaningful lift. About 30 minutes.

Get the dataset

Generic random transactions are useless here — if items are independent, no rules emerge. This dataset is built from affinity groups (chips + salsa + soda, diapers + wipes + baby food…), so association-rule mining returns genuine results.

Download market-basket dataset (CSV) → Customize in the generator

Each row is one item in a basket; group by transaction_id to reconstruct baskets.

Steps

  1. Install & load

    pip install pandas mlxtend
    
    import pandas as pd
    df = pd.read_csv("retail_pos.csv")
    df.head()
  2. Reshape into a basket matrix

    Turn the long table into one row per transaction, one column per product, with 1/0 for presence:

    basket = (df.groupby(["transaction_id", "product"])["quantity"]
                .sum().unstack().fillna(0))
    basket = (basket > 0).astype(int)
    basket.shape
  3. Run Apriori for frequent itemsets

    from mlxtend.frequent_patterns import apriori, association_rules
    
    itemsets = apriori(basket, min_support=0.02, use_colnames=True)
    itemsets.sort_values("support", ascending=False).head(10)
  4. Generate association rules

    rules = association_rules(itemsets, metric="lift", min_threshold=1.0)
    rules = rules.sort_values("lift", ascending=False)
    rules[["antecedents", "consequents", "support", "confidence", "lift"]].head(10)
  5. Interpret the results

    Support = how often the combo appears. Confidence = P(consequent | antecedent). Lift > 1 = the items co-occur more than chance — a real association. You should see the seeded affinity groups rise to the top (e.g. salsa → tortilla chips with high lift). Use these for cross-sell, store layout, or recommendation demos.

Why it works on this data

Because baskets are assembled from real affinity groups with the occasional impulse buy, the joint distribution has genuine structure. Apriori and FP-Growth surface lift you can defend — unlike random transaction generators where every rule has lift ≈ 1 and the exercise falls flat.

Keep going