Research Analysis MICHAEL BERNSTEIN CS 376
Last time What is a statistical test? Chi-square t-test Paired t-test 2
Today ANOVA Posthoc tests Two-way ANOVA Repeated measures ANOVA 3
Recall: hypothesis testing
Anatomy of a statistical test If your change had no effect, what would the world look like? No difference in means No slope in relationship This is known as the null hypothesis 5
Anatomy of a statistical test Given the difference you observed, how likely is it to have occurred by chance? Probability of seeing a mean difference at least this large, by chance, is 0.012 Probability of seeing a slope at least this large, by chance, is 0.012 6
Errors Difference exists? Y N Y True positive Type 1 error publish false findings Difference detected? N Type 2 error get more data? True negative 7
Errors 8
p-value The probability of seeing the observed difference by chance In other words, P(Type I error) Typically accepted levels: 0.05, 0.01, 0.001 9
ANOVA
t-test: compare two means Do people fix more bugs with our IDE bug suggestion callouts? 11
ANOVA: compare N means Do people fix more bugs with our IDE bug suggestion callouts, with warnings, or with nothing? 12
Rough intuition for ANOVA test How much of the total variation can be accounted for by looking at the means of each condition? Ȳ 2 Ȳ 2 Ȳ Ȳ Ȳ Ȳ 1 1 total deviation from grand mean deviation of factor mean from grand mean deviation of response from factor mean 13
ANalysis Of VAriance (ANOVA) Degrees of freedom: how many values can vary? (Using n and r) Degrees of freedom in individual data points: n - 1 Degrees of freedom in factor level averages: r - 1 Combined: n - r 14
Finally: run the test! How large is the value we constructed from the F distribution? Test if F >F(1 ; r 1,n r) factor error ( what s left ) 3 factor levels hopefully F(2,21) p <.001 24 observations top >> bottom 15
Reporting an ANOVA A one-way ANOVA revealed a significant difference in the effect of news feed source on number of likes (F(2, 21)=12.1, p<.001). 16
Posthoc tests
ANOVA! Are we done no Significant means One of the µ i are different. That s not very helpful: There is some difference between populating the Facebook news feed with friends vs. strangers vs. only Michael s status updates 18
Estimating pairwise differences Which pairs of factor levels are different from each other? 90.0 67.5 Mean likes 45.0 22.5 0.0 Friend feed Stranger feed Michael feed 19
Roughly: we do pairwise t-tests 90.0 Mean likes 67.5 45.0 t >t(1 ; n r) t >t(1 ; n r) t >t(1 ; n r) 22.5 0.0 Friend feed Stranger feed Michael feed 20
But familywise error! =.05 implies a.95 probability of being correct If we do m tests, the actual probability of being correct is now: m =.95.95.95... <.95 21
Bonferroni correction Avoid familywise error by adjusting to be more conservative Divide by the number of comparisons you make 4 tests at =.05 implies using =.0125 Conservative but accurate method of compensating for multiple tests 22
Bonferroni correction 23
Tukey test Less conservative than Bonferroni Compares all pairs of factor level means 24
Reporting Posthoc tests using Bonferroni correction revealed that friend feed and Michael feed were significantly better than a stranger feed (p<.05), but the two were not significantly different from each other (p=.32). 25
Two-way ANOVA
Crossed study designs Suppose you wanted to measure the impact of two factors on total likes on Facebook: Strong ties vs. weak ties in your news feed Presence of a reminder of the last time you liked each friend s content (e.g., You last liked a story from John Hennessy in January ) This is a 2 x 2 study: two factor levels for each factor {tie strength, reminder} 27
Interaction effects Sometimes the basic model doesn t capture subtle interactions between factors Data: People who see strong ties and have a reminder are especially active Result: Grand mean 8, strong tie mean 11, reminder mean 7, but mean in this cell is 20 28
Two-factor ANOVA test Test for main effects and interaction factor or interaction SS MS F p Main effects are significant, but interaction effect is also significant 29
Significant interaction? Significant interactions mean that you can t just report the main effects the story is more complicated Inspect to figure it out: Pen Touch Technique A 15.3 21.1 Technique B 23.9 33.1 Technique C 32.9 44.9 The slower techniques (B, C) harm Touch more than Pen 30
Repeated measures ANOVA
Within-subjects studies Control for individual variation using the mean response for each participant Before: we found the mean effect of each treatment Now: we find the mean effect of each participant 32
Repeated measures in R repeated measures error term effect of subtracting out the participant means remaining main effects 33
All together now
Always follow every step! 1. Visualize the data 2. Compute descriptive statistics (e.g., mean) 3. Remove outliers >2 standard deviations from the mean 4. Check for heteroskedasticity and non-normal data Try log, square root, or reciprocal transform ANOVA is robust against non-normal data, but not against heteroskedasticity 5. Run statistical test 6. Run any posthoc tests if necessary 35