With the ability to gather seemingly endless information due to the internet, we need to know how to process the data. At the basic level, that’s all Data Analysis and Statistics is: the collecting, processing, organizing, and modeling of information. It’s becoming increasingly important to know these skills not only in the workplace, but also as a savvy consumer of advertised products and information. So let’s get started!
If you’re trying to understand how many people, animals, or objects behave as a group, it’s best to use a measure of central tendency to describe the behavior. For all three measures of central tendency below, let’s analyze the set of grades you’ve had on 10 different math tests:
Usually, your math teacher will use this measure of central tendency to calculate your final grade in the class. The mean is just the average. To find the mean, take the sum of all the values and divide it by the total number of tests.
So, let’s find the average (or mean) of the test scores.
The median is just the middle value of a data set. To find the median, the first step is to order the data from least to greatest:
Now, find the middle number by crossing out the highest and lowest values and working your way in.
Notice, there are two middle numbers:
If this happens (which it will every time there are an even number of values), find the average of the two middle numbers:
NOTE: You don’t have to do this if there are an odd number of values. Let’s look at this set:
Now start crossing out and you’ll arrive at just one middle number: .
The mode is the easiest of the measures of central tendency to find. The mode is just the most often occuring value. In this case, occurs twice, so is the mode.
If all values occur the same number of times, we say there is no mode.
Occasionally, two or more values might occur more than the others, like in the following set:
In this case, both and are modes.
In order to make data meaningful, we’ve found ways to express the data as a type of graph. They can be used to show trends, comparisons, or the bigger picture and are often easier to use than lists of data.
Circle graphs (or pie charts) are useful when showing percentages. Pie charts help to put things in perspective. Bigger pieces of the pie represent higher percentages. In this example, the whole circle represents the United States Federal Spending in the 2017 fiscal year. You can see from a glance that the most money was spent on health care, pensions, and defense. Look a little closer to find the actual percentages.
Retrieved from: https://www.usgovernmentspending.com/year_spending_2017USbf_XXbs2n
When making an accurate circle graph, you’ll need to first calculate percentages. Remember,
Then, you’ll need to draw a piece of the pie to represent it. For a quick reference, know that takes up of the circle, or a angle.
If you need to find the exact angle, use this formula:
Bar graphs or charts can be used almost any time you would use a circle chart. They can be arranged vertically or horizontally. Each bar corresponds to an actual amount instead of a percentage.
Horizontal Bar Chart
Vertical Bar Chart
Occasionally, the actual value is placed at the tip of the bar to make it easier for the reader.
Line graphs are composed of two axes. The bottom axis is usually time (years, months, days, etc). They are useful in showing trends of data. For example note this graph seems to show that as time goes on, US spending is increasing.
Retrieved from: https://www.usgovernmentspending.com/spending_chart_2003_2023USb_19s1li001mcn_F0t
Line graphs are made by collecting data every so often. In this case, it was every year. Dots are put on each piece of data. In 2010, for instance, the government spent 6000 billion dollars. Once all the dots are drawn, connect them from left to right. Sometimes graph makers choose to keep the dots present, otherwise, they blend them into the line.
How much money was spent in 2016? Move over to 2016 on the bottom axis. The line above it has a height of just under 7000 billion dollars.
A scatterplot is composed of two axes. Each piece of data is represented as a dot on the plane.
Here’s an example comparing heights and weights of various people.
Read the data the same way you’d read points on the x-y plane. For example, you can see that there is a 4 ft person who weighs roughly 125 pounds (see the lower left dot). In total, there are 6 people who submitted their weights and heights.
A basic histogram is basically a line chart mixed with a bar graph. Two axes again, usually with time on the bottom. Dots are made for each date. Instead of drawing lines connecting dots, draw a bar going up to the dot.
Retrieved from: https://www.usgovernmentspending.com/
Sometimes, you can make more interesting histograms. You can break up the bars similar to the way you’d break up the pieces of a pie chart. In the case below, the author wants to show that the combined total money spent on housing, food, and clothing has been declining over time. In addition, each bar has been split up to show how each particular amount has changed over time. At a glance, it looks like the clothing and food sections have decreased over time while housing has fluctuated.
Retrieved from: http://visualizingeconomics.com/blog/2013/11/18/100-years-of-family-spending-in-the-us
To create a box plot or box and whisker plot you need to find things about your data set: minimum, first quartile, median, third quartile, and maximum.
Start by ordering your data. In this example let’s look at a few test scores in a class:
The minimum is the smallest number:
The maximum is the largest number:
The median is the middle number. In this case, both 82s are the middle, so the average of them is 82.
Now, split the set right in the middle into two sets.
The lower half:
And the upper half:
The first quartile or Q1 is the median of the lower half: (the average of and )
The third quartile or Q3 is the median of the upper half: (the average of and )
Now that you have all five numbers, everything between Q1 and Q3 is enclosed in a box. Then, the whiskers extend to the minimum and maximum.
You can calculate the range of data by subtracting the minimum from the maximum.
You can also find the interquartile range or IQR by subtracting Q1 from Q3.