Measures of Central Tendency
It can be useful to have one number that helps to describe how the average individual can be described for a given situation. For example, you might here that “The average person spends 2 hours per day on social media sites”, or “The median amount of time spent on social media sites is 1 hour”. But what do these words “median” and “mean” actually mean? We will discuss different statistics that best describe the typical individual for a given problem.
We go over additional how to calculate the mean and other statistics in our Intro to R for Non-Programmers short course! If you are interested, check it out at the following link: https://vimeo.com/ondemand/rintro
The Mean
The mathematical definition of the population mean can be described as
where N is the number of individuals in the population, and each individual’s value is represented by .
In other words, you take each value in the population, add them all up, and divide by the number of individuals in the population. So for example, if a population had the following values as seen in Table 1:
Table 1 | |
Individual | Value |
1 | 3 |
2 | 5 |
3 | 7 |
4 | 10 |
We would find the mean by doing the following
To do this in R, we would type the following:
3+5+7+10 #adding up all the values 25/4 #using the summed up values in the numerator, we divide by the number of observations #In this case, 4
The output is displayed in Figure 1.
To do this with the mean function, do the following
vec<- c(3, 5, 7, 10) #we define the vector of all the observations here mean(vec) #we then use the mean function and apply it to the vector we just defined
The output is displayed in Figure 2.
However, sometimes, we do not have all the values for a given population. Sometimes, as we have is smaller part of the population, or a sample. When all we have is a sample from the population, all we can calculate is the sample mean. We might not be able to get the population mean because we are limited in resources, time, and/or human capital to do so. The mathematical expression for the mean can be shown as:
where n is the number of individuals in the sample, and each individual’s value is represented by .
In other words, we take each value in the sample, add them all up, and divide by the number of individuals in the sample. As you can see, it is the same process to find the sample and population mean. However, they are two very different things. Sometimes, you will be able to find sample means that are very close in value to the population mean. Sometimes you are unable to do so. However, the sample mean is generally accepted to be the best estimator of the population mean.
Table 2 | |
Individual | Value |
1 | 3 |
2 | 5 |
3 | 7 |
4 | 10 |
5 | 12 |
The Median
The median of a population is simply the middle value of the population. If we had a population with 5 individuals with the values provided in Table 2, then the median would be 7.
This is fairly straightforward if the data is presented in order and you have a small number of individuals or observations. However, if Table 2 was presented in the following order for the Values column
12, 5, 3, 7, 10
then we would need to order them. We can generally do this in ascending order as it will be more useful for topics discussed later (click here to see where it is applied). Here is the values ordered correctly in ascending order
3, 5, 7, 10, 12
To do this in R, we used the median(). median() automatically sorted the data for us. To do this example, type in the following command:
vec<-c(12, 5, 3, 7, 10) # we first defined a vector of all the observations median(vec) #we then apply the median function to the vector we just defined
The output is displayed in Figure 3.
As stated previously, we cannot always get the entire population and their values. Therefore, we are often left with a sample. The sample median is found in the same way as the population median. However, the sample median will not always be the same as the population median. (To find the median using R, click here)
The Mode
The mode is the value that occurs the most often in a population or sample. If we had a sample as see in Table 3, then the mode would be 13. This is because 5 only occurs once, 10 only occurs once, and 13 occurs three times. Since 13 is the largest, then it mode.
Table 3 | |
Individual | Value |
1 | 5 |
2 | 10 |
3 | 13 |
4 | 13 |
5 | 13 |
To do this in R, we need to be use a series of functions. There is no build in function to do this for us automatically. use the following code
vec<- c() # we first defined our vector with all of our observations tab<-table(vec) # we then put the vector into a table sort(tab,decreasing=TRUE) # we then took that table and applied the sort function to it #notice that we also indicated the option "decreasing=TRUE". #this option puts values in decreasing order
The output is displayed in Figure 4.
We can observe samples with multiple modes. If 2 or more values occur an equal amount of times, then all of those values are the modes. If all values occur an equal amount of times, then all the values are modes.