Regression and Correlation
 
 
  • notes about Utts

  •  
     
  • bivariate relationships between interval scale variables

  •  

     
     
     

  • first step, display data w/ scatterplot

  •  

     
     
     


     
     
     

    Simple linear regression
     
     

  • goal: summarize relationship between X and Y -- use X to predict Y

  •  

     
     
     

  • regression equation: y = a + bX

  •  

     
     
     


     
     
     
     

    Issues to consider for regression
     


     
     
     
     
     

    r2 as a PRE measure - strength of association
     
     
     

    Proportional Reduction in Error (PRE)
     
     

  • PRE = (E1 - E2) / E1

  •  

     


     

    r2:
     
     
     

  • E1 = predict values of Y based on Y bar (mean)

  •  

     


     
  • E2: predict Y based on X and regression line

  •  

     
     
     


     
     

    Correlation: r
     
     

  • Pearson correlation coefficient or product-moment coefficient

  •  

     
     
     

  • indicates how closely observed values fall around regression line, clustering      about line

  •  

     
     
     

  • r = square root of r2 and takes sign of slope

  •  

     
     
     

  • ranges between -1 and 1

  •  
     
     
     
     

    strength of r
     
     
     

  • which is stronger:  -.2 or +.1?  -.5 or +.75?

  •  

     
     
     
     
     
     
     
     
     
     
     
     

  • absolute value of r indicates strength

  •  

     
     
     

  • general guide for interpreting strength of r (absolute value)

  •  

     
     

    0 - .2 =  weak, slight

    .2 - .4 = mild/modest

    .4 - .6 = moderate

    .6 - .8 = moderately strong

    .8 - 1.0 = strong





  • r standardizes the degree of association, regardless of units of measurement

  •  

     
     
     

  • r appropriate for describing for linear relationships only

  •  

     
     
     

  • restricted range on one or both variables attenuates correlation

  •  

     
     
     

  • outliers influence correlation, too

  •  

     
     
     
     
     
     
     
     
     

     Ecological correlation
     
     

  • correlation between rates or averages

  •  

     
     
     

  • units of analysis = some kind of aggregate (e.g., neighborhoods, companies, states, countries)

  •  

     
     
     

  • interpret carefully; may inflate degree of association between underlying conceptual variables

  •  

     


     
     
     

    Nonlinear relationships and transformations
     
     
     

  • one response: transform the data on one or both variables to make relationship more linear

  •