The manufacturer of m&m's says that 20% of of the candies are yellow, 20% are red, 30% are brown, and orange, blue, & green candies take up 10% each.
Mr. Mays bought a regular bag of m&m's and got 29 yellow, 23 red, 12 orange, 14 blue, 8 green, and 20 brown. Did Mr. Mays get an unusual bag of m&m colors or is this pretty typical of most m&m packages?
You can see that the majority of the candies were yellow, but brown m&m's take up a larger percentage at the factory.
This might be different that what you would expect, but is it enough of a difference to say that Mr. Mays has an unusual pack of m&m's?
This chapter will focus on 3 different tests to check the counts of data, similar to what we see in the m&m scenario above.
The 3 tests are called:
2. Test for Homogeneity
A Goodness-of-Fit test answers the question, 'how well do the observed counts fit an expected 'null' set of counts for the data?'
Goodness-of-Fit involves testing a hypothesis. Above we have a model for the distribution of colors in one m&m pack and we want to know whether it fits what is expected. There is no single parameter to estimate, so a confidence interval wouldn't make much sense.
If we were only concerned with the percentage of the yellow m&m's, we could conduct a one-proportion z-test, but since we want to analyze all of the categories of all the colors, we need a test that considers all the data.
Assumptions and Conditions for GOF
Counted Data Condition - Check that the data are COUNTS for the categories and not proportions or percentages.
Randomization Condition - The values that have been COUNTED must come from a random sample of the population.
Expected Cell Frequency Condition - The 'expected' value for each cell needs to be at least 5 counts.
The expected value for last condition above is determined by the context of the scenario. For example, when you consider the m&m's question there were 106 m&m's in the package that Mr. Mays bought, therefore your would 'expect' there to be about 21 yellow, 21 red, 32 brown, 11 orange, 11 blue, and 11 green based on the percentage given by the manufacturer. The 'Expected Cell Frequency Condition' would be satisfied for this problem, because there are at least 5 expected counts for each color.
Chi-Square Test Statistic
Obviously the observed values of colors of m&m's aren't exactly what is expected, but is that because of natural variation or is the difference between the two so large that they indicate something unusual or important?
In order to test if the differences are large we need a new test statistic, called CHI-SQUARE. The formula to calculate chi-square is below.
Example Video
The following video gives an example of a Goodness-of-Fit test using Chi-Square as the test statistic. Note that the example in the video does NOT check the assumptions and conditions. YOU NEED TO CHECK THE ASSUMPTIONS AND CONDITIONS.
Goodness-of-Fit Test
Let's go back and take a look at our m&m's example. In fact, let's work through the hypothesis test. We've already checked the assumptions and conditions above.
The null and alternative hypotheses would be:
H0: The distribution of colors of M&M's is as specified by the company.
Ha: The distribution of colors of M&M's is different than specified by the company.
This Goodness of Fit, Chi-Square test will use a level of significance of .05 and 5 degrees of freedom.
(The degrees of freedom is found by taking the number of categories - 1, and in this case 6-1 = 5)
In order to calculate the test statistic, chi-square, I need to know the observed values and the expected values for each color.
Yellow = 29, Red = 23, Orange = 12, Blue = 14, Green = 8, and Brown =20
Yellow = 106(.20) = 21.2, Red = 106(.20) = 21.2, Orange = 106(.10) = 10.6, Blue = 106(.10) = 10.6, Green = 106(.10) = 10.6, and Brown =106(.30) = 31.8
Note that you should NOT round expected values to whole numbers!
Using those observed and expected values, you should calculate a test statistic for chi-square of 9.314.
A chi-square of 9.314, with 5 degrees of freedom will result in a p-value = .0972.
Since the p-value is more than alpha = .05, I will retain the null hypothesis.
There is not enough evidence to support a claim that the distribution of colors is anything other than the distribution specified by the company.
We're Not Proving Anything
Remember that a hypothesis test can only to reject or retain the null hypothesis. We can never confirm that a theory is in fact true, which is often what people want to do.
This marks the end of part 1 for chapter 26.
In the previous M&M's example were were comparing a sample to a parameter set by a company. But what if we have 2 samples that we want to compare to each other. If that's the case, then we will conduct a 'Test for Homogeneity'.
Example
A random survey of cars parked in a student lot and the staff lot at a large university classified the brands by country of origin. Are there differences in the national origins of cars driven by students and staff?
The question above is asking is the sample of cars for students is the same as the sample of cars for the staff. Since this is an 'are the the same' question, it's a 'Test for Homogeneity'.
Here's how the car origins broke down:
American = 107, European = 33, and Asian = 55
American = 105, European = 12, and Asian = 47
It's obvious that the numbers aren't the same, but are they so different that the differences would be considered significant? Let's run a Test for Homogeneity to find out.
The Hypotheses
Ho: The distribution of car origin is the same for students and staff.
Ha: The distribution of car origin is different for students and staff.
Before we get to the rest of the test, there are a few things that we have to consider first.
The null hypothesis will be that the distribution does not change from group to group.
The mechanics of the test are the same as a GOF test, except that the expected values and degrees of freedom are found differently.
The assumptions and conditions are the same as a GOF test: Counted Data, Randomization, & Expected Cell Frequency is at least 5.
In order to find expected value for a test for homogeneity, you should take the
For example, if I wanted to find the expected value for the 'Student Cars & American' cell below, I would do the following calculation:
(212)(195) / (359) = 115.15
Degrees of Freedom for Homogeneity
The degrees of freedom for a homogeneity test is found by (R-1)(C-1), where R is the number of rows in your table and C is the number of columns in your table.
For this example, you can see there are 3 Rows and 2 Columns, so the degrees of freedom would be
The chi-square value for this example would be 7.828, with 2 degrees of freedom.
That gives a p-value of approximately .02.
This p-value is small enough to reject the null hypothesis at the .05 significance level. And based on this sample, there is enough evidence to support the claim that the distribution of car origins at this university differs between students and staff.
Example
Chi-Square Test for Homogeneity
Return tomorrow for part 3.
There's one more type of Chi-Square test and that's a 'Test for Independence'.
The assumptions and conditions are the same as for the other two tests. The way you find your degrees and freedom and expected values are the same as Test for Homogeneity, but the null and alternative hypotheses are different.
Null and Alternative
Ha: The two variables are NOT independent.
Example
The following video gives you an example of working through a Test for Independence.
Chi-Square Test for Independence
A Goodness-of-Fit test compares the observed outcomes for a single categorical variable to the expected outcomes predicted by a probability model.
A Test for Homogeneity compare the distribution of several groups for the same categorical variable.
A Test for Independence examines the counts from a single group for evidence of an association between two categorical variables.
That's it for this chapter. The AP Test is almost upon us, we only have 1 chapter left.
Now is the time to redefine your true self using Slader’s free Stats Modeling the World answers. Shed the societal and cultural narratives holding you back and let free step-by-step Stats Modeling the World textbook solutions reorient your old paradigms. NOW is the time to make today the first day of the rest of your life.
Sampling Distributions Applet (Use SAFARI!) Video: Categorical Vs Quantitative Variables (Katelyn) Video: Displaying & Summarizing Quantitative Data Video: Describing Sampling Distributions (From Stem & Leaf Plot) Video: Describing Distributions (Soojin) Video: One Variable Statistics on TI-Nspire (Grace) Video: Making Histograms, Boxplots, Timeplots on Graphing Calculator Video: Making Histograms with TI-84 Plus (Adelaide) Video Creating Histograms by Hand Video: Histograms in Statistics Video: Interpreting Frequency Histograms Video: What is a Histogram in Statistics? (Fred) Video: How to Make a Stem & Leaf Plot (Maria) Video: Creating a Stem Plot Video: Stem Plots in Statistics (Sam P) Video: Comparison of 2 Dot Plots and the Effect of Outliers on the Mean Video: How to Use TI-Calculator '1-Var Stats' Command (Adelaide) Video: Using TI-84 to Calculate Mean & SD of a Data Set (Kenneth) Video: Summarizing Data with 5 Number Summary, Range, IQR Video: TI-Tutorials- The Five Number Summary (Aidan) Video: Crash Course #3 Mean, Median, Mode, Measures of Central Tendency (Kaylee) Video: Mode, Median, Mean, Range, & SD (Daniel) Video: Shape, Center, Spread Video: Center, Spread, and Shape of Distributions- Basic Example (Alec)