Excel Basics — Finding areas under the normal distribution.

Excel has some very useful functions for finding areas under the normal distribution.

 

NORMSDIST(z)

Z   is the value for which you want the distribution.

Returns the standard normal cumulative distribution function. The distribution has a mean of 0 (zero) and a standard deviation of one. Use this function in place of a table of standard normal curve areas.

  1. a) Pick a cell and enter a z score into it (for example 2), don’t forget to add a label so you’ll know what you put in this cell.
  2. b) In a cell next to it, enter the function NORMSDIST(Z), use the address of the cell where you placed the z score as your z value. What did you get?

    (if you used z=2, you should get an area of 0.97724 or in other words, more than 97% of the population have scores lower then your z. Try other values of z in order to get a better feeling for the use of this function, for example 0,1,5,-1,-3)

    NORMSINV(probability)

    Probability   is a probability corresponding to the normal distribution.

    Returns the inverse of the standard normal cumulative distribution. The distribution has a mean of zero and a standard deviation of one.

    NORMSINV will return a z score that corresponds to an area under the curve. The area should be between 0 and 1.

  3. a) Pick a cell and enter a probability into it (for example 0.975), don’t forget to add a label so you’ll know what you put in this cell.

b) In a cell next to it, enter the function NORMSINV(probability), use the address of the cell where you placed the probability. What did you get?

(if you used p=0.975, you should get a z score of 1.95996. What would you get if you used a p = 0.97724 (you should get a value close to 2, your z from #1) Try other values of p in order to get a better feeling for the use of this function, for example 0.5,0.99.

In real life, we usually deal with normal distributions that are not standardized, so they are not expressed in z scores. Excel has several functions that will let you compute areas under the curve directly from your scores without standardizing them first.

NORMDIST(x,mean,standard_dev,cumulative)

X   is the value for which you want the distribution.

Mean   is the arithmetic mean of the distribution.

Standard_dev   is the standard deviation of the distribution.

Cumulative   is a logical value that determines the form of the function. If cumulative is TRUE, NORMDIST returns the cumulative distribution function; if FALSE, it returns the probability mass function. IN THIS EXERCISE USE "TRUE" SINCE YOU WANT THE AREA UNDER THE CURVE.

3) a) Enter your score, mean, S.D to different cell in Excel. Don’t forget to add a label so you’ll know what you put in this cell, for example use, x=102,m=100,sd = 2. You can also enter the word TRUE into a cell so you can use it in the function.

b) In another cell enter the function NORMDIST(x,mean,SD,cumulative), use the address of the cells where you placed the x,mean,sd,TRUE. What did you get?

(if you used the sample values, you should get an area of 0.8413. Try other values of x, m and s.d in order to get a better feeling for the use of this function.

NORMINV(probability,mean,standard_dev)

Probability   is a probability corresponding to the normal distribution.

Mean   is the arithmetic mean of the distribution.

Standard_dev   is the standard deviation of the distribution.

Returns the inverse of the normal cumulative distribution for the specified mean and standard deviation.

4) a) Type into different cells a probability,mean and S.D (for example p = 0.84134, m =100, s.d = 2), don’t forget to add a label so you’ll know what you put in this cell.

b) In a cell next to it, enter the function NORMINV(probability,mean,s.d), use the addresses of the cells where you placed the probability, mean and S.D. What did you get?

(if you used the sample values, you should get a score close to 102 which was the x value from #3. This means that 84.135% of the population have a score below 102 in a population with that is normally distributed with a mean of 100 and S.D of 2. Try other values of p, m and S.D in order to get a better feeling for the use of this function.

Questions about the normal distribution often ask you to calculate the area under the curve between two scores or the probability that a score would turn out to be between two scores. The following exercise shows you how to calculate those values easily.

 

5) Enter into cell A16 "M", B16 "SD", C16 "X1", D16 "X2", E16 "Z1", F16 "Z2",G16 "F(Z1)", H16 "F(Z2)", I16 "p".

Enter the values 100,2,96,104 below m, S.D, x1, x2 respectively. These values are usually given to you in questions. Below Z1 we will calculate the standard score of X1. There are two ways to do this, either using the formula we learned in class (X-m )/σ or using the excel function STANDARDIZE(x, mean, S.D). Choose either one of them.

Calculate Z2 in the same way. If you used the sample values you should get z scores of —2,2.

Given the Z scores in E17 and F17, which Excel function would you use to calculate the area under the curve? (NORMSDIST). Calculate F(Z1) and F(Z2) using NORMSDIST, you should get areas of 0.023,0.977.

In order to calculate the area between these two scores, or the probability that a score would fall between X1 and X2, calculate the difference between F(Z2) and F(Z1) in cell I17. (H17-G17) You should get a value of 0.954 so there is 95.4 chance that a given score would fall between 96 and 104 in our distribution.

This could be very useful when working on the homework problems. For example, see question 4 in chapter 6.

We are given the mean height of a Merkin plant m=65 and the S.D = 3.

  1. What is the probability that the plant will be between 64 and 67 inches tall?

D) What is the probability that the plant will be less than 40 inches tall? I would use NORMDIST to answer this.

E) What is the probability that the plant will be more than 60 inches tall? You can use NORMDIST to calculate the probability that the plant will be less than 60 inches and then get the probability that the plant will be more than 60 inches tall using the fact that the total area under the curve is equal to 1.

Question:

What is the z score that corresponds to Alpha = 0.05, or in other words: if we set up alpha to be 0.05 what is the z score that anything above it would cause us to reject the null hypothesis?

Enter 0.05 into a cell in Excel, label it Desired alpha. In another cell, calculate the bottom portion of the distribution. In other words, if your alpha is set to 0.05, then 0.95 of the population is below is so the bottom portion is equal to 1- desired alpha.

Now use the function NORMSINV(p) to calculate the z score that corresponds to this alpha. Your p is the bottom portion and you should get a z = 1.645, you will see this value many times in the next few weeks.

As you will soon learn in class, hypothesis testing can be either non-directional or directional. If we divide the distribution to a bottom portion and the region above alpha we are using a directional hypothesis and predicting that our effect will be found in the upper portion of the curve. Sometime we don’t know where we will find an effect so we use a non directional test. In that case, an alpha of 0.05 should be divided to 2 so that we place 0.025 on one end of the curve and 0.025 on the other side of the curve.

Enter your desired alpha into a cell in Excel, the upper tail in this case will be alpha/2. Calculate this value in another cell.

As before, the bottom portion is 1-upper_tail, the only difference is that now the upper tail is equal to alpha/2. Calculate the bottom portion in a different cell, you should get 0.975.

You can now calculate the z score that corresponds to the bottom portion using NORMSINV(p). You should get z=1.96. Because of symmetry reasons (the standardized normal distribution is symmetric around 0) the z score that corresponds to the upper portion is equal to —z or —1.96. You can also get this value by using NORMSINV(upper tail) or NORMSINV(0.025). Don’t worry if this is not entirely clear, the class should clear up any confusion.