Visualizing Data
There was a variety of ways to visualize data in one’s introductory statistics course. We will be going over some of those visualization techniques today. While we will not be covering what they mean, we will be covering how to produce these pictures. Additionally, we will be asking questions about visualizing and interpreting pictures on data that we provide.
If you want more examples and tutorials on creating plots in R, we go over some additional topics in our Intro to R course on Vimeo: https://vimeo.com/ondemand/rintro
Boxplots
Below is a video on boxplots. If you prefer to read, then just skip it matey!
Boxplots, or whisker plots, can be easily produce on data using the following function
boxplot()
However, the boxplot command does not provide the exact five number summary. It only provides the visualization of the data. There are three options for this function that we would like to discuss. They involve changing the color of the boxplot, the main title, and the individual boxplot titles. Here is the general format of the boxplot function in its standard form.
boxplot(x,… , col=NULL, main=NULL, names)
col changes the color, main changes the main title, and names changes the names of each boxplot as this function allows for multiple boxplots in one produced image. R comes with many different colors built in.
Let us go over some examples to get a better understanding on how to use this function and its options.
Example I
We have 100 observations from a normal distribution with mean 13 and standard deviation of 2. It is on this file. The first column is from the normal distribution we just described. The second column is from a gamma distribution which we will discuss later in the next examples. Create a boxplot for this data and save the file. Then create another boxplot with the main title of “N(13,2)”, change the color to navy, and save that image as a file. Make sure to comment on each line of your code.
Answer I
The code, output, and final figures are provided below. To save these images, click on the image. Then go to the top left hand corner and click on “File”. Then click “Save As”, and then click on your desired image file. I typically go with PNG.
mydata<-read.table('visualizing_data.csv', sep=',') #loading in the data mydata<-as.matrix(mydata) #converting the data into a matrix boxplot(mydata[,1]) #creating a boxplot of the normal data boxplot(mydata[,1], main="N(13,2)", col='navy') #apppling changes
Example II
We also have 100 observations from a gamma distribution with both parameters equal to 1. Create a figure with 2 boxplots, one for the normal observations and one for the gamma observations. Save that image. Then change the 1 and 2 on the x axis to the appropriate distribution where it came from. Save that image as a different file.
Answer II
The code, output, and final figures are provided below.
mydata<-read.table('visualizing_data.csv', sep=',') #loading in the data mydata<-as.matrix(mydata) #converting the data into a matrix boxplot(mydata[,1], mydata[,2])#boxplot of both data boxplot(mydata[,1], mydata[,2], names=c('Normal','Gamma'), main='Boxplots', col='light blue') #adding changes; note how names must have the labels in a vector
Dotplots
Below is a video on dot plots. If you prefer to read, then just skip it!
Dotplots can be easily produced by the following function
plot()
We can also add titles, change the color of the dots, and other things. We will be going over how to add a main title, change the y axis title, and change the color of the dots. Those options for the function are, respectively,
plot(data, main=’’, ylab=’’, col=’black’)
Let us see an example.
Example III
Provide a dotplot of the N(13,2) data without changing the standard settings. Save that image. Then provide an image with the main title of “N(13,2)” and a y axis title of “Values”. Change the color of the plots to green.
Answer III
The code, output, and final figures are provided below.
mydata<-read.table('visualizing_data.csv', sep=',') #loading in the data mydata<-as.matrix(mydata) #converting the data into a matrix plot(mydata[,1]) #a dotplot of the normal data plot(mydata[,1], main="N(13,2)", col='green', ylab='Values') #applying changes
Histograms
Below is a video on dot plots. If you prefer to read, then just skip it!
Histograms can be created by using the following function
hist()
You can also change a variety of settings, but we will be going over how to increase the number of columns the histogram uses, change the main title, change the x axis title, and change the color of the columns and the borders. The options for the function are, respectively,
hist(data, breaks= ’Sturges’, main= paste(‘Histogram of’, xname), xlab= xname, col=NULL, border=NULL)
Let us see an example of where this is used.
Example IV
Create a histogram of the N(13, 2) data. Do not change the settings. Save the image. Then create a histogram with the main title as “N(13, 2)”, the x axis title as “Values”, the number of breaks, or columns, to 20, the column color to navy, and the border to light blue.
Answer IV
The code, output, and final figures are provided below.
mydata<-read.table('visualizing_data.csv', sep=',') #loading in the data mydata<-as.matrix(mydata) #converting the data into a matrix hist(mydata[,1]) #histogram of the normal data hist(mydata[,1], breaks=20, main="N(13,2)", col='navy', border='light blue', xlab='Values') #applying changes
Creating Multiple Figures in One Image
It is possible to put multiple images on one figure with the following function
par()
There are two main options that we will discuss. They are changing the amount of figures in the image and the way that the image shape’s property. The property can either be a square or maximum. Square makes the images a fixed size that does not change when the size is change. Maximum will morph as the image size changes. For the layout of the figures on the image, this function sees the image as a grid. You can establish how many rows and columns there will be in the grid. The figures will fill up the image until all the spots are filled. The options in the function for establishing the grid and adjusting the property of the image are, respectively,
par(mfrow=c(1,2), pty=’s’)
The first number for mfrow established the number of rows while the second number established the number of coulmns. “s” stand for square while “m” stand for maximum. After you establish the layout and image’s property, you can simply start listing the figures that you want in the image. Let us see an example.
Example V
Create a boxplot, histogram, and dotplot for the both sets of data. Color code them. Make appropriate titles for each figure. Save the final image.
Answer V
The code, output, and final figures are provided below. The normal data is navy and the gamma data is orange.
mydata<-read.table('visualizing_data.csv', sep=',') #loading in the data mydata<-as.matrix(mydata) #converting the data into a matrix par(mfrow=c(3, 2), pty='m') #setting up grid; i made it have 3 rows and 2 columns; i made it m, but s is fine too h1<-hist(mydata[,1], col='navy', border='light blue', main="N(3,2)", xlab='Values') #hist of normal; color will be navy h2<-hist(mydata[,2], col='orange', border='red', main="Gamma (1,1)", xlab='Values') #hist of gamma; color will be orange p1<-plot(mydata[,1], col='navy', ylab='Values') #dotplot of normal p2<-plot(mydata[,2], col='orange', ylab='Values') #dotplot of gamma w1<-boxplot(mydata[,1], col='navy') #boxplot of normal w2<-boxplot(mydata[,2], col='orange') #boxplot of gamma
Remember that many of these functions have additional properties that we did not cover at this time. To find further documentation and instruction on using these functions, type a question mark followed by the function’s name in the R console. For example, to find out more about the hist() function, type the following into the R console
?hist