Lab - Choice Modeling

Data - Grocery type of data about Milk Consumption
Variables:
id - unique number for each consumer, 500 observations
product - binary variable (1,0); if product ==1 : consumer bought 2% milk, otherwise : fat-milk
full_price - full price before promotion (if any)
full_pri - the price after the discount/promotion
disc_price - totall amount of discount
bundel - if consumers buy the products as a bundel (2 per 6, 1 per 3)
time_day : 1== morning (until noon), otherwise: after noon-close
repeated? - if consumer i is a repated buyer in the store
repeated_bundel? - if consumer already buy the product as a bundel before

Pull the data

In []:
%matplotlib inline
import pandas as pd
import statsmodels.api as sm
import numpy as np
In []:
mdata = pd.read_csv('./data/milkdata.csv')
mdata.head(3)

summary statistics (all variables)

In []:
mdata.describe()

plot the distribution (or density) of full_pri (price after promotion)

In []:
mdata['full_pri'].hist()

plot the distribution of promo (total promotion)

In []:
mdata['disc_price'].hist()

Run a simple logit model where yi = Prob(product i = 1) on all other variables in the data

In []:
mdata.columns[2:]
In []:
train_cols = mdata.columns[2:]
In []:
logit = sm.Logit(mdata["product"], mdata[train_cols])
In []:
results=logit.fit()
results.summary()

Questions:

What we are tryng to find?

1. What is the expeced probability that a consumer will buy 2% milk if all other variables are equal to the avegrage (mean) number in the whole sample?

2. Which variables are signficant and which are not? (95 percent confident)

3. Which variables are consistent with your prior intuition and which are not?

4. By reading the output from this regression - would you recommend for the Marketing Team to sell milk in bundle? yes? no? explain?

Interpret the estimated (betas):

Pair Excercise:

1. Run the same model with LPM

2. Predict y_hat

3. Plot the distribution of y_hat, is there a problem?

4. Plot the distribution of y_hat from the logit model, is there a problem?

Homework: (or if we have time in class)

1. Run the next logit model:

y = bundel (if buyera are buying milk in bundels)

x = product, full_price, full_pri, promo, disc_pricem, time_day, repeated

2. Are consumer more likley to buy 2% milk vs. fat-milk? yes or no? explain

3. Is the effect of promotion negative or positive on the outcome (Ignore significance)? Can promotions drive consumer to buy in boundle?
4. Calculate the odds ratio for this regression
5. Can you think, with the results we got from this regression, about a strategy to convert consumers to buy halthier milk (2%) rather than fat-milk?
In []: