Statistics: central tendency and how to calculate it

What is it?

A central tendency, within statistics, is essentially an average. It's a number used by behavioral scientists that's most representative of an entire set of "scores." For example, when I state the fact that the hottest month in San Francisco, California is July with an average temperature of 57 degrees fahrenheit, I'm actually pointing out a central tendency. However, in statistics, there is no standard way of determining the central tenancy. Below are a few examples in which different techniques might be used in determining a central tendency. Remember, the purpose of finding a central tendency is to find one number that best represents your data.




  1. If all your scores vary pretty evenly and in the center of your scores there's a disproportionate amount of 10's. In this case, it's easy to see that 10 is your central tendency simply because of how many times it appears.
  2. If the most recurring number is at the top of your scale, 20, and there are many more scores all the way down your scale to 1. In this case, 20 is the most recurring number, but there are more scores on non 20 scores. In other words, scores below 20 might not occur as frequently as 20 individually, but combined add up to much more than 20 scores. In this case, selecting 20 as your representative score would not be appropriate. 
  3. And finally, here's another situation. Using the same scale, from 1-20, we find that there are many scores centering around 5 and many scores centering around 15. We also find the there are no scores anywhere between 8-12. In this case, what is the central tendency? Can we use 10, because it's right in the center of both of these?

Mean

Mean is sometimes called the arithmetic average and is computed simply by adding up all of your scores and dividing the sum by the quantity of scores. The goal is to divide the total sum equally among all individuals. For example, if you have 10 different scores all consisting of 10's, you add them all up to get 100 then divide by 10 and get 10. Normally, the mean is represented by the letter M in italics. Also, keep in mind that if you see a mean being represented by an M then it's referencing data taken from a sample, not a population. The formula for finding mean is below:
The formula for finding mean in a population is exactly the same but instead of an M representing mean we have a greek symbol. Here it is below:

The weighted mean

We use a weighted mean, or an overall mean, when we're combing sets of scores. For example, if you take data from 10 individuals in one group and then take data from 12 individuals in another. It's not ok to simply calculate the mean for each set of data and then add the means together. You must add up all your scores from your data sets individually and then add them together. Then, you must add up the amount of individuals from both your data sets and then add them together. You should now have two numbers: the total scores and the total amount of individuals from both studies. Finally, you take your total scores and divide by the total amount of individuals. You have now calculated a weighted mean. For those of you think better in formulas, it's below:

The median

Another way to measure central tendency is by calculating the median. The word sounds a lot like "medium," which is useful for remember the goal of the median which is to find midpoint of the distribution. When I say that the median is the midpoint of the distribution, I don't mean the mid point between your smallest score and the largest. The midpoint in your distribution refers to the amount of scores. In other words, the median will divide your distribution into two equal-sized groups. An easy way to find the mean is by lining up your scores from smallest to largest (in terms of size) and counting until you get the halfway point in your data. The first number that crosses the 50 percent mark is the median. Here is an example below.

In this example, you have 5 scores: 2, 3, 5, 6, 8 

To find the median here we count the scores until we get to the midway point. In this case, 5 is our midway point and therefore 5 is our median.

The previous example was easy because the amount of scores in our distribution was odd. When there is an even amount of scores in our distribution, the process is slightly different. 

In this second example, you have 6 scores: 2, 3, 5, 6, 8, 9

To find the median here, we find the two midway point numbers, add them up and then divide by two. In this case, our two midway point numbers are 5 and 6. We add up 5 and 6 and get 11, then divide 11 by 2 and get 5.5.

In the previous example, there was no mid-point and therefore we had to create one between two numbers. 

The mode

Finally, another measure of central tendency is the mode. The mode is basically the single score or category that has the most frequency. To find the mode, you simply look through your data to find what single score occurred more frequently than all other scores. Here's an example below:

In this example, you have 10 scores: 1, 1, 2, 4, 5, 5, 5, 8, 9, 9

What score appears more frequently in this set of scores than all others? 5. Therefore, in this example, 5 is the mode. 

Which measure of central tendency should you use?

The goal of central tendency is to find one single number that's representative of the entire set of scores. Usually, the mean is used because it's often the best representative of the data. However, there are situations where a mean is impossible to calculate or where the mean isn't the best representative of the data. 
When a set of scores has a few scores on the extreme, that skews the mean and makes it a not so good representative of the data. For example, if you take the total amount of Americans alive right now and the total amount of income earned, using the mean to represent what the average American earns in income wouldn't provide an accurate representation. The reason being that there are a small percentage of Americans earning far more than the other much larger percentage and therefore the mean is greatly skewed. In this case, the median would be a much more suitable measure of central tendency which better represents the average American. 

Below is a simplified example of why this is so:

In this example, there are ten Americans. 9 out of 10 Americans earn 10 thousand dollars a year and the 10th earns 200 thousand. When we calculate the mean, we add up all the incomes and get a total income of 290 thousand dollars. We divide this number by 10, the total amount of Americans, and we get 29 thousand dollars. 

In previous example, the mean doesn't accurately represent the average American. Median would be a more representative number in this case because of the fact that median isn't skewed by extreme scores. Also, in situations where certain individuals weren't able to produce a score, calculating a mean is impossible. You can't include add a non existing score into a mean, so you can use median instead. 

Another situation in which you can't use mean as a central tendency is when using open ended distributions. This scenario is one in which scores are not exact and hence impossible to solidify into a number to be added into the mean. For example, if one of the categories in your study is something like "less than 1 hamburgers eaten" or "greater than 10 hamburgers eaten," there's no exact number for how much hamburgers were eaten in either category. In this case we can also use the median as a measure of central tendency. 

There are other situations in which calculating the mean is either not possible or not representative of the data. They include, but are not limited to, data sets that use nominal scales, ordinal scales, and discrete variables. To learn more about what measure of central tendency should be used in each of the above examples, click on their links. 

Distribution shapes

Most behavioral scientists will calculate all three forms of central tendency, when possible, and publish them with their research. The relationship between these three measurements depends on the shape of the distribution. For example, in a symmetrical distribution, in which the right side of the graph presenting your data is identical to the left, the mean, mode and median will always be exactly in the center (assuming there's only one mode). The more symmetrical looking the chart is, the closer your mean, mode and median will be. 
Positively or negatively skewed distributions will have these three measures in different locations on your graph. For example, in a positively skewed distribution, the highest occurring score will be closer to the smaller extreme, and therefore the mode will be the furthest to your left on your chart. The opposite is true on a negatively skewed distribution. Below is a simplified example of where your mean, median and mode will be located within a negatively and positively skewed chart. 
If you have any questions or suggestion, please feel submit them below in the comment section. 


Comments