Association Analysis

Q1: Lift Analysis
Please calculate the following lift values for the table correlating burger and chips below:

◦ Lift(Burger, Chips)
◦ Lift(Burgers, ^Chips)
◦ Lift(^Burgers, Chips)
◦ Lift(^Burgers, ^Chips)

Please also indicate if each of your answers would suggest independent, positive correlation, or negative correlation?

Lift(Burger, Chips)
= s(B u C)/(s(B) x s(c))
= s(B u C) = (600/1400) = 0.43
= s(B) = 1000/1400 = 0.71
= s(C) = 800/1400 = 0.57
= Lift (B,C) = .43/(.71*.57)
= 1.07
= positive correlation

Lift(burgers, ^Chips)
= s (B u ^C)/(s(B) x s(^C)
= s(B U ^C) = (400/1400) = 0.29
= s(B) = 1000/1400 = 0.71
= s(^C) = 600/1400 = 0.43
= Lift (B,^C) = .29/(.71*.43)
= 0.97
= negative correlation

Lift(^Burgers, Chips)
= s(^B u C)/(S(^B) x s(C))
= s(^B u C) = 200/1400 = .14
= s(^B) = 400/1400 = .29
= s(C) = 800/1400 = .57
= Lift (^B, C) = .14/(.29*.57)
= 0.89
= Negative correlation

Lift(^Burgers, ^Chips)
= s(^b u ^C)/s(^B) x s(^C)
s(^B u ^C) = 200/1400 = .14
s(^B) = 400/1400 = .29
s(^C) = 600/1400 = 0.43
Lift(^b, ^C) = .14/(.29*.43)
= 1.08
= positive correlation

Q2:
Please calculate the following lift values for the table correlating shampoo and ketchup below:

◦ Lift(Ketchup, Shampoo)
◦ Lift(Ketchup, ^Shampoo)
◦ Lift(^Ketchup, Shampoo)
◦ Lift(^Ketchup, ^Shampoo)

Please also indicate if each of your answers would suggest independent, positive correlation, or negative correlation?

Lift(Ketchup, Shampoo)
= s(K u S)/s(K) x s(S)
s(K u S) = 100/900 = .11
s(K) = 300/900 = .33
s(S) = 300/900 = .33
Lift(K, S) = .11(.33*.33)
= 1
Independent correlation

◦ Lift(Ketchup, ^Shampoo)
= s(K u ^S)/s(K) x (s(S)
s (K u ^S) = 200/900 = .22
s(K) = 300/900 = .33
s(^S) = 600/900 = .66
Lift(K ,^S) = .22/(.33*.66)
= 1
= Independent correlation

◦ Lift(^Ketchup, Shampoo)
= s(^K u S)/s(^K) x s(S)
s(^k u S) = 200/900 = .22
s(^K) = 600/900 = .66
s(S) = 300/900 = .33
Lift(^K, S) 22/(.33*.66)
= 1
= Independent correlation

◦ Lift(^Ketchup, ^Shampoo)
= s(^K u ^S)/s(^K) x s(^S)
s(^k u ^S) = 400/900 = .44
s(^K) = 600/900 = .66
s(^S) = 600/900 = .66
Lift(^K, ^S) = .44/(.66*.66)
= 1
= Independent correlation

Q3: Chi Squared Analysis
Please calculate the following chi squared values for the table correlating burger and chips below (Expected values in brackets).

◦ Burgers & Chips
◦ Burgers & Not Chips
◦ Chips & Not Burgers
◦ Not Burgers and Not Chips

For the above options, please also indicate if each of your answer would suggest independent, positive correlation, or negative correlation?

χ2 = Sum of (Actual-Expected)2 /Expected

χ2 Burgers & Chips
χ2 = (900-800)2 /800 + (100-200)2 /200 + (300-400)2 /200 + (200-100)2 /100
=12.5 + 50 + 50 + 100 =212.5
Positive correlation (Actual is greater than expected)

χ2 Burgers & Not Chips
χ2 = (100-200)2 /200 + (300-400)2 /200 + (200-100)2 /100
= 50 + 50 + 100 = 200
= negative correlation (Expected is greater than actual)
χ2 Chips & Not Burgers
= (300-400)2 /200 + (200-100)2 /100
= 50 + 100 = 150
= negative correlation (Expected is greater than Actual

χ2 Not Chips & Not Burgers
= (200-100)2 /100
= 100
= Positive correlation (Actual was greater than expected)

Q4: Chi Squared Analysis
Please calculate the following chi squared values for the table correlating burger and sausages below (Expected values in brackets).

◦ Burgers & Sausages
◦ Burgers & Not Sausages)
◦ Sausages & Not Burgers
◦ Not Burgers and Not Sausages

For the above options, please also indicate if each of your answer would suggest independent, positive correlation, or negative correlation?

χ2 Burgers & Sausages
(800-800)2/800 + (200-200)2/200 + (400-400)2/400 + (100-100)2/100
0 + 0 + 0 + 0 = 0
Independent

χ2 Burgers & Not Sausages
(200-200)2/200 + (400-400)2/400 + (100-100)2/100
0 + 0 + 0 = 0
Independent

χ2 Sausages & Not Burgers
(400-400)2/400 + (100-100)2/100
0 + 0 = 0
Independent

χ2 Not Burgers and Not Sausages
(100-100)2/100
= 0
Independent

Q5:

Under what conditions would Lift and Chi Squared analysis prove to be a poor algorithm to evaluate correlation/dependency between two events?
Please suggest another algorithm that could be used to rectify the flaw in Lift and Chi Squared?

Both prove to be a poor algorithm to evaluate correlation or dependency between two events when there are a large number of Null Transactions

Alternatively one can use:
AllConf(A, B)
Jaccard (A, B)
Cosine (A, B)
Kulczynski (A, B)
MaxConf 9A, B)

Leave a Reply

Your email address will not be published. Required fields are marked *