Math Study Guide for the SHSAT

Page 5

Probability and Statistics

Let’s start with the concepts of probability and compound events, and then move on to statistics, specifically focusing on box plots, visual overlap, and drawing inferences from data.

Probability of Compound Events

Probability is the measure of the likelihood that an event will occur. It is typically expressed as a number between \(0\) and \(1\), where \(0\) indicates impossibility and \(1\) indicates certainty. The probability of an event \(A\) is denoted by \(P(A)\).

A simple event is an event with a single outcome, whereas a compound event consists of two or more simple events. Compound events can be either independent or dependent.

Tools

Probability theory serves as a fundamental framework for understanding uncertainty and predicting outcomes in various real-world scenarios. From determining the likelihood of winning a game to assessing the risk of an investment, probability analysis plays a vital role in decision-making processes across multiple domains.

To effectively analyze probabilities, mathematicians and statisticians have developed a set of powerful tools. In this study guide, we will explore three fundamental tools for probability analysis: organized lists, sample space tables, and tree diagrams.

Organized Lists

Organized lists are systematic arrangements of all possible outcomes for a given scenario. They are particularly useful when dealing with simple or discrete events.

For instance, if you were rolling a fair six-sided die, an organized list of outcomes would be:

\[{1, \,2, \,3, \,4, \,5, \,6}\]
Sample Space Tables

Sample space tables present all possible outcomes of an experiment in a structured tabular format. They provide a clear overview of the sample space and make it easier to compute the associated probabilities..

Consider a scenario in which you are both flipping a coin and rolling a six-sided dice. The sample space table for these simultaneous events would look like this:

  1 2 3 4 5 6
H (H, 1) (H, 2) (H, 3) (H, 4) (H, 5) (H, 6)
T (T, 1) (T, 2) (T, 3) (T, 4) (T, 5) (T, 6)
Tree Diagrams

Tree diagrams visually represent all possible outcomes of a series of events. They are especially helpful when analyzing compound events or multiple stages of probability experiments. For instance, if you were both flipping a coin and rolling a six-sided die and wanted to figure out the probability of a certain outcome, you could use the following tree diagram:

13 Tree Diagram.png

Simulation

Simulation is a technique used in probability analysis to model real-world scenarios and predict outcomes through experiments. Simulations allow us to explore complex systems by generating random samples that mimic the probability distribution. By repeatedly running these simulations, we can estimate probabilities, evaluate strategies, and make informed decisions in uncertain environments.

During the SHSAT, you may need to identify the correct description of a presented simulation.

Expressing as a Fraction

When dealing with compound events, the probability can be expressed as a fraction. For example, if event A has a probability of \(\frac{1}{4}\) and event B has a probability of \(\frac{1}{3}\), the probability of both events occurring together (assuming they are independent) is:

\[\frac{1}{4} \times \frac{1}{3}\] \[=\frac{1}{12}\]

Statistics

Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data.

Box Plots

Box plots, also known as box-and-whisker plots, are graphical representations of data that display the distribution of a data set along a single axis. They are particularly useful for summarizing the central tendency and spread of a data set.

A measure of central tendency attempts to capture the most important part of the data by identifying the “central” point in the data. There are three common measures of central tendency: mean, median, and mode.

The mean of a data set is the average. This is obtained by summing up all values in the data set and dividing by the total number of values. The median of a data set is the middle point of the set when it is sorted. If there are an even number of values in the data set, there are two middle points. In this case, the median is obtained by taking the average of these two middle points. Finally, the mode of a data set is the most frequent value in the data set. One downside of using mode is that it may not be unique. There could be multiple values that appear the same number of times in a data set.

A measure of spread measures how similar or varied a data set is. Two common measure of spread are range and interquartile range (IQR). The range is the difference between the largest and smallest number in the data set. When the range is small, this means the data is very similar. When the range is large, this means the data may be a bit more varied.

The interquartile range (IQR) is obtained in the following way. First, you need to sort the data set from smallest to largest. Then, find the median of the data set. This will be denoted quartile 2 in the box-and-whisker plot below and splits the data into two parts. Next, find the median of each of these parts. These medians will be denoted quartile 1 (Q1) and quartile 3 (Q3). Finally, the IQR is going to be the difference between Q3 and Q1. It is essentially the range of the middle 50% of your data. As with range, when the IQR is small, this means the middle part of your data is quite similar.

14 Box Plot (NEW).png

Consider a data set of exam scores: \(65, \,70,\, 72,\, 75,\, 78,\, 80,\, 82,\, 85,\, 88,\, 90\). The box plot would illustrate the distribution of these scores, including key statistics such as the median, quartiles, and any outliers. Outliers are any values that are less than \(Q1 - (1.5 \times IQR)\) and any values that are more than \(Q3 + (1.5 \times IQR)\).

Constructing

When constructing a box plot, data points are first sorted in ascending order. The plot consists of a box representing the spread or interquartile range (IQR), with a line inside representing the median. Whiskers extend from the box to the minimum and maximum values within \(1.5\) times the IQR from the lower and upper quartiles, respectively. Here is an example using the data above:

14A Sorting Box Plot Data.png

15 Box Plot Construction (NEW).png

Interpreting

Interpreting a box plot involves understanding the central tendency (median), spread , and presence of outliers, if any. The box indicates where the middle \(50\%\) of the data lies, while the whiskers show the range of the data. Outliers, if present, are depicted as individual points beyond the whiskers.

Finding the Interquartile Range

The interquartile range (IQR) is calculated as the difference between the third quartile (\(Q3\)) and the first quartile (\(Q1\)) of the data set. It represents the spread of the middle \(50\%\) of the data. From the above image, the IQR is \(15-8=7\).

Determining Outliers

Outliers are data points that fall significantly above or below the rest of the data. In a box plot, outliers are typically identified as points that lie beyond \(1.5\) times the IQR from the quartiles. From the above image, any value below \(8 - (1.5 \times 7) = -2.5\) and above \(15 + (1.5 \times 7) =25.5\) are outliers.

On the SHSAT, you will not have to construct box plots with outliers, but you may need to identify the outliers in a given box plot.

Visual Overlap

There is often overlap between two quantitative data distributions, which can be shown graphically, such as in histograms or density plots. These visual representations help with comparing the distributions and understanding their similarities and differences.

Drawing Inferences From Data

Drawing inferences from data involves extracting additional information by making interpretations or predictions or drawing conclusions based on the collected data. Inferences are not information explicitly shown in the data, but are logical assumptions one can make from the data. This often includes making comparative inferences between different data sets.

Measures of Center

Measures of center provide insights into the central tendency of a data set. For the SHSAT, you need to know the following three measures:

mean—The mean, also known as the arithmetic average, represents the sum of all values in a data set divided by the total number of values. Consider this example problem.

The test scores of five students in a class are \(85, \,90, \,92, \,88,\) and \(86\). Calculate the mean test score.

To find the mean, you simply add all the test scores and divide by the total number of scores:

\[\frac{85+90+92+88+86}{5}\] \[=\frac{441}{5}\] \[=88.2\]

median— The median represents the middle value of a data set when arranged in ascending or descending order. If there is an odd number of values in the set, it’s the middle value. If there is an even number of values, the median is the average of the two middle values. The median divides the data set into two equal halves, with half of the values lying below it and half above it. Try this example problem.

Find the median of the following set of numbers: \(15, \,12, \,18, \,27, \,24, \,21, \,30\).

To find the median, arrange the values in ascending order and then find the middle value:

\[12,\,15,\,18,\,21,\,24,\,27,\,30\]

Since there are seven values, the median is the fourth number, which is \(21\).

So, the median of the set is \(21\)

mode— The mode is the value that appears most frequently in a data set. There can be more than one mode in a set, or no mode if all the numbers in the set appear the same amount of times. Unlike the mean and median, which are measures of central tendency, the mode is a measure of frequency. Here is an example problem.

In a survey of \(10\) people, the following were their favorite colors: red, blue, green, red, yellow, blue, green, red, red, purple. Determine the mode(s) of the favorite colors.

Since the mode is the value (or values) that appears most frequently in a data set, let’s examine the totals for this one. In this case, red appears four times, blue appears two times, green appears two times, purple appears one time, and yellow appears one time. Since red appears the most number of times, it is the mode.

Measures of Variability

Measures of variability provide insights into the spread, dispersion, and scatter of data points within a data set. They complement measures of central tendency by offering a deeper understanding of how the data is distributed

range—The range is the simplest measure of variability and is calculated as the difference between the maximum and minimum values in a data set. It gives a rough idea of how spread out the data points are.

For instance, suppose there is a class of \(10\) students. The youngest is \(18\) years old, and the oldest is \(25\) years old. Therefore, the range of ages in the class is \(25 - 18 = 7\) years.

interquartile range (IQR)— The IQR is a more robust measure of variability that focuses on the middle \(50\%\) of the data. It is calculated as the difference between the third quartile (\(Q3\)) and the first quartile (\(Q1\)) of the data set.

For a data set of test scores, if \(Q1 = 70\) and \(Q3 = 85\), then the IQR is \(85 - 70 = 15\). This means that the middle \(50\%\) of the test scores fall within a range of \(15\) points.

All Study Guides for the SHSAT are now available as downloadable PDFs