PowerPoint Presentation

Similar documents
(baking powder) 1 ( ) ( ) 1 10g g (two level design, D-optimal) 32 1/2 fraction Two Level Fractional Factorial Design D-Optimal D

untitled

untitled

untitled

Microsoft Word - p11.doc

untitled

( ) t ( ) ( ) ( ) ( ) ( ) t-

TA-research-stats.key

Untitled-3

[9] R Ã : (1) x 0 R A(x 0 ) = 1; (2) α [0 1] Ã α = {x A(x) α} = [A α A α ]. A(x) Ã. R R. Ã 1 m x m α x m α > 0; α A(x) = 1 x m m x m +

spss.doc

STANDARD

1

d y d = d 2 y d 2 = > 0 Figure45 :Price consumer curve not a Giffen good X & Y are substitutes From the demand curves. Figure46 Deman

)

年 參 類 來 識 見 錄 力 不 了 更 不 度 來說

國立自然科學博物館館訊第263期

台灣經濟新報資料庫

When the rejection rule for a test at every level α can be re-written as then xxx is the p-value of the test. xxx < α, If p-value < α, then the test c

untitled

Vol. 15 No. 1 JOURNAL OF HARBIN UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb O21 A

untitled

ENGG1410-F Tutorial 6

11第十一章階層線性模式.DOC

Microsoft Word - A doc

兼營營業人營業稅額 計算辦法及申報實務

untitled

1

Stochastic Processes (XI) Hanjun Zhang School of Mathematics and Computational Science, Xiangtan University 508 YiFu Lou talk 06/

untitled

PowerPoint 簡報

2011台灣高中職專題暨小論文競賽

untitled

量 a 0 b 0 c d e 量 6. ( ) 量 量 了 念 7. 利 (comparative advantage) 兩 兩 利 8. () 易 理 (2) 理 例 理 行 行 理 律 療 9. 論 不 來 來 力來 0. 類 () (micro

Microsoft PowerPoint - NCBA_Cattlemens_College_Darrh_B

untitled

untitled

untitled

第一章

热设计网

~ 10 2 P Y i t = my i t W Y i t 1000 PY i t Y t i W Y i t t i m Y i t t i 15 ~ 49 1 Y Y Y 15 ~ j j t j t = j P i t i = 15 P n i t n Y

國家圖書館典藏電子全文

Fuzzy GP

朝 陽 科 技 大 學

untitled

untitled

第一章 簡介

國立陽明大學輻射防護計畫書

個人教室 / 網路硬碟

102_BS

吃寒天真的能減肥嗎

untitled

論 文 摘 要 本 文 乃 係 兩 岸 稅 務 爭 訟 制 度 之 研 究, 蓋 稅 務 爭 訟 在 行 訴 訟 中 一 直 占 有 相 當 高 的 比 例, 惟 其 勝 訴 率 一 直 偏 低, 民 87 年 10 月 28 日 行 訴 訟 法 經 幅 修 正 後, 審 級 部 分 由 一 級 一


untitled

<4D F736F F D20BDD7A4E5A4BAA4E5BB50A5D8BFFD2E646F63>

天 主 教 輔 仁 大 學 社 會 學 系 學 士 論 文 小 別 勝 新 婚? 久 別 要 離 婚? 影 響 遠 距 家 庭 婚 姻 感 情 因 素 之 探 討 Separate marital relations are getting better or getting worse? -Exp

untitled

30 ml polystyrene 4 mm ph 0.1 mg blender M -cm D. pulex D. magna 20 L 2 20

untitled

國家圖書館典藏電子全文

Microsoft Word doc

天 主 教 輔 仁 大 學 社 會 學 系 學 士 論 文 百 善 孝 為 先? 奉 養 父 母 與 接 受 子 女 奉 養 之 態 度 及 影 響 因 素 : 跨 時 趨 勢 分 析 Changes in attitude toward adult children's responsibilit

untitled


untitled

地方公共服務績效比較評量之探討—標竿學習策略的觀點

untitled

untitled

PowerPoint Presentation

國科會專題研究計畫成果報告撰寫格式說明

投影片 1

依據教育部八十九年 月 日臺(八九)技(二)字第 號函

untitled

untitled

untitled

Microsoft Word - ch05note_1210.doc

untitled

ARCLE No.2

% % 34

untitled

untitled

untitled

2014 EduG subject effect the effect of object of measurement 2 item effect 3 4 random error error confounding 3 universe of admissible observati

Microsoft PowerPoint _代工實例-1


行 類 錄 行 類 說...I 行 類 說 A 類 林...1 B 類...7 C 類...9 D 類 力...91 E 類...92 六 F 類...96 G 類 零 H 類 I 類 J 類 K 類 金 L 類 不 M 類..

地方公共服務績效比較評量之探討—標竿學習策略的觀點

untitled

untitled

第五章 鄉鎮圖書館閱讀推廣活動之分析

第三章 我國非營業特種基金制度及運作現況

歯WP02-12-부속물.PDF

untitled

untitled

untitled

龍 華 科 技 大 學

untitled

東吳大學

untitled

Transcription:

13. Linear Regression and Correlation 數 1

Outline Data: two continuous measurements on each subject Goal: study the relationship between the two variables PART I : correlation analysis Study the relationship between two continuous variables. Steps : Scatter diagram Correlation coefficient : Calculation, meaning, hypothesis testing PART II : linear regression Construct a linear equation between variables. Model building Model estimating : Confidence intervals and prediction intervals Model fitting: Strength of the linear association,coefficient of determination

Recall : In Ch.11 and 1, X : nominal variable. Ex. Gender(binary), brand(3-level) Y : response variable(cont. or binary). Ex. Score, success-failure, yield, Q : whether X and Y are correlated? A : If Y is continuous, comparing the population means of Y in the groups divided by X. Ex : 藍 綠 --Z-test, T-test, ANOVA F-test 3

Recall : In Ch.11 and 1, Q : whether X and Y are correlated? A : If X and Y are binary, compare the population proportions of Y in the two groups divided by X. Ex : 參 行 例 女 參 行 例 When sample sizes are large, Z-test is used. Q : How to determine the correlation if X and Y are both continuous? -- correlation and regression analysis! 4

Data : A sample of n sets of observation. There are k continuous variables measured in each observation. Example. Surveyed n=10 students, k=3 scores are recorded. Questions : any association between scores? 5

What is correlation analysis? Study the relationship between several continuous variables. Measure the strength of the association between variables. Correlation analysis consists : Step 1. Scatter diagram : Plot (X1, X) Step. Coefficient of correlation : 6

EXCEL Conclusion : 1. 不 率 0 行. 3. 7

Population coefficient of correlation, : A measure of the strength of the linear relationship between two variables. Definition: population correlation coefficient (x µ µ ρ = σ σ Estimation : sample correlation coefficient Sxy r = SS x y x )(y y)p(x, y) x y where S xy n (xi x)(yi y) xiyi nxy i= 1 i= 1 = = n 1 n 1 n n (xi x) (yi y) i= 1 i= 1 y S x =,S = n 1 n 1 n 8

Properties : -1 r 1 Positive linear association : r > 0 Negative linear association : r < 0 no linear relation : r 0 (! Other relation may exist) Strongly positive linear association : r 1 Strongly negative linear association : r -1 9

10

Why such definition? If there is a strongly positive linear association, when x is large, y is large, then we have a large positive value of Sxy. If there is a strongly negative linear association, when x is large, y is small, then we have a large negative value of Sxy. If there is no relation, when x is large, some y are large, some y are small, then Sxy 0, r 0. 11

EXCEL 欄 欄 欄 數 數 數 數 數 數 異數 異數 異數 度 度 度 數 數 數 1

EXCEL Conclusion : 1. 數 =0.0330 0. 數 =0.7151 3. 數 =0.3754 13

Recall in a single population with one variable : Population : N= subjects X=measurement =population mean H0 : = 0? Unknown! Random sampling Sample : n subjects X=observations calculation X =sample mean X 0 t/z= SE(X) -- a t/z-test!. 14

Now we have a single population with two variables : Population : N= subjects Measurements:(X, Y) =population coef. of correlation H0 : = 0? Unknown! Random sampling calculation r = sample coef. of correlation Sample : n subjects Observations: (X,Y) t r 0 = SE(r) -- a t-test! 15

Testing the significance of the correlation coefficient Testing the null hypothesis of no correlation : =0 Step 1. State the hypotheses H0 : no correlation v.s. H1: correlated H0 : = 0 v.s. H1 : 0 Step. Select the significance level 16

Step 3. Determine the test statistic r 0 r r n t = = = SE(r) (1 r ) /(n ) 1 r Note that under null hypothesis, t ~ t-distribution with d.f.=(n-) Step 4. Formulate the decision rule A two-sided test; A t-test; With significance level, H0 should be rejected if t > t /,n- or t <- t /,n- Step 5. Collect data, compute t-value, draw conclusion 17

Example. At =0.05, n=10, df=10-=8, t (0.05,8) =.306 Test 1 : ( v.s. ) H0 : 1= 0 v.s. H1 : 1 0 Since r1=0.033, n=10, r t = n 0.033 10 1 = = 1 r1 1 (0.033) 0.093 Since.306< t=0.093 <.306, H0 is not rejected. Conclusion : there is no sufficient evidence to reject the null hypothesis of no correlation. 18

Example. At =0.05, n=10, df=10-=8, t (0.05,8) =.306 Test : ( v.s. ) H0 : = 0 v.s. H1 : 0 Since r=0.7151, n=10, t = r n 0.7151 10 = = 1 r 1 (0.7151).89 Since t=.89>.306, H0 is rejected. Conclusion : there is sufficient evidence to reject the null hypothesis of no correlation. 19

Example. At =0.05, n=10, df=10-=8, t (0.05,8) =.306 Test 3: ( v.s. ) H0 : 3= 0 v.s. H1 : 3 0 Since r3=0.3755, n=10, r t = n 0.3755 10 3 = = 1 r3 1 (0.3755) 1.15 Since.306< t=1.15 <.306, H0 is not rejected. Conclusion : there is no sufficient evidence to reject the null hypothesis of no correlation. 0

! A word of caution : for H0 of no correlation been rejected,! Only linear relationship between variables are ascertained.! Quadratic? Cubic?! No cause and effect ( ) is established.!!!?!! Spurious( ) correlations :! 來 數! 來 數量! 數 年 1

Variables : PART II. Linear Regression Analysis X=Independent variable(s), explanatory variable, predictor, 數, 數 Y=Dependent variable, response variable, 數, 數 To be predicted or estimated. Regression analysis : Develop an equation/function that allows us to estimate/predict Y based on X. Example. X= Y 若 60 數

ANOVA Regression Recall : In a one-way ANOVA AGE vs. INCOME The whole population are classified into three subpopulations by AGE A young-population. A middle-age-population. A senior-population The INCOMEs of all sub-populations are Normally distributed with same variance Research question: The mean INCOMEs, income,are the same? 3

ANOVA Regression Recall : In a simple linear regression model, (X) vs. (Y) The whole population are classified into many sub-populations by (X) X=0-population; X=1-population;..., X=100-population. The (Y) of all sub-populations are Normally distributed with same variance Research question: The mean s, Y, are the same? Establish the relationship between Y and X 4

Regression Model: (P449) 1. Given each value of X, there is a group of Ys. X X=60 Y= X=50 Y=. These Y values ~ normal distribution. At X=60, Y~ N( µ Y X = 60, σ ) At X=50, Y~ N(, σ ) µ Y X = 50 5

3. The means of these normal distributions is a linear function of x 數 X 數 數 X 數 µ Y x = α + βx Example. -30+1.5( ) Y = -30 + 1.5x 6

4. The standard deviations of these normal distributions are all the same. (independent with x) 異數 At X=60, Y~ At X=50, Y~ N( µ Y X = 60, σ ) N(, σ ) µ Y X = 50 5. The observations (X 1, Y 1 ),,(X n,y n ) are statistically independent. 立 7

Y ~ N( α + βx, σ ) N( α + β x, σ 1 ) N( α + β x, ) N( α + βx3, σ σ ) µ Y x = α + βx 8

Practically, only a sample data is collected and α, β, σ are unknown P(y) " µ Y x = α + βx"? : observations 9

How to estimate the regression equation using a sample data? Y " µ Y x = α + βx"? : observations 30

Estimation of the model Observations : (X 1, Y 1 ),,(X n,y n ) Regression model : Y X=x~ N( µ Y x = α + βx, σ ) Regression equation : µ Y x = α + βx Estimation : α =? β =? σ =? 31

Let a, b be estimates of α,β Predicted equation : Y = a + b x, it could be a 1. predicted value of Y : Y Ex. 若 X=60 數 Y X=60? µ Y x. estimated value of Ex. X=60 數 µ = =? Y X 60? µ Y x 3

Estimation of α,β A possible equation! Many possible equations! 33

34 observed Y; predicted Y Prediction error = Y-Y min imum 4 4 Y ') (Y 3 1 i i i = + + = = 44 6 Y ') (Y 3 1 i i i = + + = = 13 8 8 Y ') (Y 3 1 i i i = + + = =

In the predicted equation, the intercept a =? The slope 率 b=? Least Squares estimates (LSE, ) a, b : Principle : find a regression equation which minimizes the sum of squared differences between the actual Y and the predicted Y min imize n i= 1 (Yi Y i ') 35

Formulae : Linear regression equation : Y = a + bx Estimated regression coefficients : b = S /S,a = Y bx xy x where S = (x x)(y y)/(n 1) = { xy nxy}/(n 1), xy S = (x x) /(n 1) = { x nx }/(n 1) x 36

Meaning of the estimated intercept, a a = Y at X=0. µ Y = The estimated value of X 0 when X=0. Example. X Y 零 金 0 零 金 = a The predicted value of Y when X=0. Example. X Y 零 金 若 0 零 金 a 0 不 數 X a Example. X= Y= X 不 0 a 37

a is an estimate of the true intercept. One may interest in testing H0 : =0. When =0, the equation passes through the origin( ), µ µ Y x Y x= 0 = βx = 0 38

Meaning of the estimated slope, b b = increment with unit change of x When there is one unit change in x, the increment/decrement in µ Y x Example. In previous case, if b=0., X 1 零 金 0. µ Y x 39

b is an estimate of the true slope. One is more interested in testing H0 : =0. When =0, the equation is a constant 數 and independent of X values, µ Y x = α,y ~ N( α, σ The distribution of Y is uncorrelated with X. X and Y are independent! ) P(y) Y µ Y x = α x1 x x3 X 40

Example. X= Y X Y mean s.d. variance S = ( xy nxy)/(n 1) xy = {53976 10(73.5)(70.)}/(10 1) = 64.333 b= S /S xy x = 64.333/ 8.944 = 0.934 a = y bx = 70. 0.934 73.5 = 1.5345 Ans. Y =1.5345+0.9345X 41

EXCEL : 數 數 度 Model fitting 數 Model estimating 4

EXCEL : output 數 Note : The difference to previous calculation is due to rounding error. 數 a, b estimates of, SE(a), SE(b) t- testing the null hypothesis of =0, =0 t-value(a)=a/se(a), t-value(b)=b/se(b) p- corresponding p-values to t-value(a), t-value(b) p-value (a) =0.9511>0.05, not reject that =0 p-value (b) = 0.0<0.05, reject! 0 95% 95% confidence interval for, 43

After a, b are obtained, σ =? µ Y x = α + βx a + bx = Y' 44

The standard error of estimate : Variance σ : Dispersion of Y around the regression line The variation of the random error, Error = = : unobtainable Standard error of estimate : Use residuals to estimate error, Residual = = Y-Y : observable Standard error of estimate is defined by s y x (Y Y') = n n 1 (S n S where Sy : sample s.d. of Y, Sx : sample s.d. of X = Y µ Y x Y X b ) 45

The standard error of estimate : σ or S y x The random variation is unexplained by the regression line. Great random variation Small random variation 46

Example. X= Y : Y =1.53+0.93X X Y Y-Y' Note : Residual = = Y-Y : observable Error = = Y µ Y x : unobtainable 47

Example. X= Y : Y =1.53+0.93X s y x (Y Y') = n = 13.08 8 = 16.9 = (n 1) (n ) (S y S x b ) = 9 8 (48.84 8.94(0.93) ) 16.3 The standard error of estimate is 16.3 48

EXCEL : 數 數 sy x 49

ESTIMATION & PREDICTION Confidence intervals and prediction intervals ESTIMATION: Q: At X=x, the mean value of Y, Point estimation, confidence interval µ Y = x? PREDICTION: Q:If an individual is drawn from the population of X=x, Y=? Point prediction, prediction interval µ Y x3 =? Y x=? 50

Confidence interval of µ Y x at X=x Confidence interval : At X=x, the mean value of Y, Point estimation : Y = a+bx µ Y x 100(1- )% confidence interval : Y' ± t(n, α / ) sy x 1 n + (x x) (n 1)s x 51

Example. X= Y : Y =1.53+0.93X Q : 60 95% Ans. 1. Point estimation : Y = 1.53+0.93(60)=57.33. 95% confidence interval : Y' = n = 10,(x x) Y' ± t 57.33, t (n, α / ) (8,0.05) s =.306,s = (60 73.5) y x 1 n 57.33±.306 16.9 (x x) + (n 1)s 1 10 y x + = 16.9, = 18.5,s x 18.5 9(8.94) x = 8.94 = 57.33± 15.56 5

Prediction interval of Y at X=x Prediction interval : If draw an individual from the population of X=x, Y=? Prediction : Y = a + bx 100(1- )% prediction interval : Y' ± t(n, α / ) sy x 1+ 1 n + (x x) (n 1)s x 異數 53

Example. X= Y : Y =1.53+0.93X Q : 若 X=60 95% Ans. 1. Point estimation : Y = 1.53+0.93(60)=57.33. 95% confidence interval : Y' = 57.33, t =.306,s = 16.9, (8,0.05) y x n = 10,(x x) = (60 73.5) = 18.5,s = 8.94 x 1 (x x) Y' ± t s 1+ + n (n 1)s (n, α /) y x x 1 18.5 57.33±.306 16.9 1+ + = 57.33± 40.66 10 9(8.94) 54

RECALL : ANOVA-table Recall an ANOVA-table in Chapter 1. SStotal = (Y Y) D.f. = n-1 for n observations. MStotal = SS total/(n-1) SST = due to treatment = (Yj Y) (Between-group variation) j:treatment Yj = estimated mean of Y of treatment-j group D.f. = k-1 for k treatments MST=SST/(k-1) Does treatment effect exist? SSE = due to random error = D.f. = n k MSE = SSE/(n-k) SS total = SST + SSE (Y Y j ) (Within-group variation) 55

ANOVA Table Source of Variation Sum of Squares Degrees of Freedom Mean Square Treatment SST k-1 SST/(k-1)=MST Error SSE n-k SSE/(n-k)=MSE Total SS total n-1 F MST/MSE F-test : H0 : No treatment effect exists Vs H1 : There are treatment effects. 56

In a regression model, SStotal = Total variation of Y : SStotal = Σ(Y Y) = (Y' Y) + (Y Y') = SSR + SSE SSR = The variation explained by the regression model SSE=The unexplained variation 57

SStotal= D.f. = n-1 for n observations. Mstotal = SS total/(n-1) In a regression model, ( Y Y) = (n 1)S SSR = due to regression model Y = estimated mean of Y at some X-level D.f. = -1=1 for regression coefficients MSR=SSR/1 = SSR Does the X-effect exist? Y = ( Y' Y) = (n Does the linearity exist? 1)b S X SSE = due to random error D.f. = n MSE = SSE/(n-) = S y x = ( Y Y') = (n )Sy x 58

ANOVA Table Source of Variation Sum of Squares Degrees of Freedom Mean Square Regression SSR -1 SSR/1=MSR Error SSE n- SSE/(n-)=MSE Total SS total n-1 F MSR/MSE F-test : The regression line is horizontal. vs. H0 : No global linear effect exists =0 H1 : There is a linear association. 59

(X) (Y) sum mean sd variance 60

度 SS total = SS reg = (Y Y) (Y' Y) = (n 1)S = (n 1)b 4345.6 SS error = SS total SS reg = 4345.6-.5 13.1 Further, since for F-test, p-value = 0.0< 0.05, the linearity exists. Y S = 9 48.84 X = 9 0.934 8.9.5 61

The Coefficient of Determination Coefficient of Determination : 數 the proportion of the total variation of Y that is explained by the variation of X. 數 Y 度 來 數 X 例 數 Y 度 數 X 例 6

The Coefficient of Determination Coefficient of determination : SSR r = SStotal exp lained by mod el = total total un exp lained = total (Y' Y) = (Y Y) SSE = 1 SStotal Coefficient of determination = (correlation coefficient) 63

數 數 度 r = = = 0.51 = S S y x y x SSR.5 (0.7) SStotal 4345.6 = MSE = SSE /(n ) = 65.385, = 65.385 = 16.9 Note:adj(r ) = 1 MSE MStotal = 1 65.3853 4345.6 / 9 = 0.450371 64

EXCEL I. Correlation analysis Scatter plot XY Correlation : 數 若 兩 數 CORREL 數 若 兩 數 料 數 II. Linear regression : 料 65

Exercise. Correlation analysis : 37, 39, 43 Linear regression analysis : 45, 46, 53, 57 EXCEL: 47, 49 66

Bonus(%) : by EXCEL Exercise 13.47 X= 數 ; Y= 金 數 (X) 金 (Y) 1. (correlation analysis) Scatter plot, correlation matrix X Y ( =0.05). 立 (regression analysis) 1.. (predicted equation) 說 數 數 ( =0.05) 3. ANOVA 論 67