Fivethirtyeight Graphic Redesign
We redesigned a graphic from Fivethirtyeight.com, a data driven news website. They attempted to display 12 months of weather data from 10 different cities across the United States. We picked city’s graphic, and redesigned it while incorporating and balancing the ability to make accurate comparisons, simplifying the appearance, providing context, and attracting and engaging the analyst. We believe that the graphic is an overall improvement. We include the R code used to produce the graphic in the at the bottom of this post.
Fivethirtyeight picked 10 cities across the United States and reported weather data on those cities. Here’s the link (by following this link, you can eventually obtain the data they used). They used data from Undergroundweather.com. Undergroundweather has weather data since 1962. The statistics incorporated into the project included for each day the record high and low, the average high and low, the recorded high and low for the past year, and the new highs and lows or equal to records. Their figures we identical in style and presentation. We decided to specifically redesign Charlotte, North Carolina because it had new records in high and low temperatures throughout the year. Some cities did not contain both of these qualities. Fivethirtyeight’s graphic is shown in Figure 1. Our redesign is displayed in Figure 2.
Colors
One of the most evident changes we made was regarding the graphics colors. Figure 1 consisted mostly of brown colors for the statistics. The graphic used blue and red for new record breaking or equal to temperatures for coldest and hottest for that day respectively. They utilized a white background and gray dotted lines. Figure 2 utilized a variety of different colors while also trying to remain logical for the analyst. For example, it utilized red with a white border for the warmest recorded temperature. The red color is an easier association for the hottest temperature for the analyst. This white border helps to make the red be more evident to the reader. These changes of colors for the temperature statistics made enabled more accurate comparisons and helped to engage the analyst. The gray background helps to make the graphic less harsh on the eyes. In comparison, the white background in Figure 1 is bright and can be harsher on the eyes. However, the white solid lines in Figure 2 helps to guide the eyes in getting more precise values for temperatures.
Points of Interest
Closely related to the color of the graphic, is those points of interest. Figure 1 attempts to point out the new or equal extreme temperature by using small red or blue dots. However, these are difficult to observe and find on the large graph. Therefore, in the redesign, we made the significant points much larger. We also gave each type a different shape. For the warmer record breaking observations, we gave them a triangle with a point facing upwards. For the colder record breaking observations, we gave them a triangle with a point facing downwards. We transitioned each point from a light triangle, to a dark triangle, to a light point. This helps to make these points even more self-evident. We intentionally put dots at the center of these observation, so that the analyst can no the precise location of the observation on the graph. These changes enabled accurate comparisons, helped to simply appearance, support interpretation, and further engaged the analyst.
Bar on Top
A new feature added to the graph in Figure 2 was the bar between the graphic and the title. On this bar, we put smaller blue and red bars for where below an important record breaking observation will be located. This also serves a quick summary of the entire graph. In statistics, we tried to find a few summary numbers that help to describe a scenario, such as the mean and variance of a distribution. This graphics “summary statistic” is this bar. It shows where the most important observations are, how close they are to other important observations, and gives a simple figure for the audience to take away. This summary bar helps to simplify appearance summarizing the entire graphic. It enables accurate comparisons by allowing the analyst to quickly find the important observations. It also engages the audience because if some audience member are intimidated by the graphic having so much information, this summary bar gives a quick summary of the graph that they can start at.
Legend
The next major change involved the legend. We altered the legend to more accurately describe the changes we made in the graph. We removed the solid lines going vertically and made them go horizontally where appropriate. We also changed some of the labels to more accurately describe data. For example, Figure 1 has a label called “Normal Range”. Analysts were often confused on what that label meant. By splitting that label up into two labels of “Average High” and “Average Low”, it was much clearer what kind of data was displayed. We also made all of the letters capitalized so that the labels were easier to read. These change help to simply the appearance of the legend. We included a title for the legend just so that audience members knew that it this part of the graphic was reserved for the legend. We also clearly distinguished what part of the graphic was used for the legend and for the data by enclosing the legend with a solid black line. Figure 1 does not have this feature and it can be confusing as even the dotted grid lines remain in the background. Keeping the grid lines in the legend adds unneeded complexity to the legend. By changing the redesign to have a solid gray background, it helps to simplify the appearance of the legend. We also added the number of important observation as a new feature in Figure 2’s legend. While Figure 1’s resolution was not lost by keeping the dots for the significant observation small, we believe that the changes in the significant observations displayed in Figure 2 are warranted. To help keep some of the data from the resolution of Figure 1, we included the number of important observations in the legend. This change helps to provide context to support interpretation.
Other Changes
There are many other changes done in the graphic. We increase the font size so that the analyst can clearly see the graphic without any limitations. We bolded the main title and y and x axis titles to give more impact. We also removed the top and right axis labels to simplify the appearance of the graphic, to save space, and to remove unnecessary parts as the graphic is complex.
Potential Changes
Some of the possible changes for further redesign could include changes the average high and low to the median high and low. Depending on the historical data’s shape, this change might be justified as the median would give a better measure of center than the mean. This is due to the fact that the mean is sensitive to skewness while the median is more resistant to these features. We could also try to simplify the title, as it is rather large and take up a significant amount of space in the graphic.
Conclusion
Overall, we believe the Figure 2 is a drastic, but needed, redesign of Figure 1. Figure 1 was able to accomplish many important goals as it did work. However, we believe that changes in the redesign helped to improve the overall quality of the graphic.
Code
library(ggplot2) library(grid) #load in data Seattle<-read.csv('KCLT.csv') #define number of days n<-365 #define order which will be used for the dates ORDER<- c(1:365) #grabbing important values to find highs and lows OldLows<- as.numeric(Seattle[,7]) OldHighs<- as.numeric(Seattle[,8]) RecordHigh<-as.numeric(Seattle[,4]) RecordLow <-as.numeric(Seattle[,3]) #define matrix to hold new lows NewLows<-matrix(nrow=365, ncol=3) NewLows<-as.data.frame(NewLows) NewLows[,1]<-Seattle[,9] NewLows[,2]<-Seattle[,3] #use for loop to find all the new lows for(i in 1:n){ if(OldLows[i]>=Seattle[i,3]){ NewLows[i,3]= Seattle[i,3] } else{ NewLows[i,3]= NA } } NEWL<-as.data.frame(NewLows) NEWL<- cbind(NEWL, ORDER) names<-c("record_min_temp", "actual_min_temp", "tempY_L", "ORDER") colnames(NEWL)<- names tempYL<-NEWL[,3] #finding days with lows tempL<-c() for(i in 1:n){ if(OldLows[i]>=RecordLow[i]){ tempL[i]<- i } else{ tempL[i]= NA } } #defining matrix for highs NewHighs<-matrix(nrow=365, ncol=3) NewHighs<-as.data.frame(NewHighs) NewHighs[,1]<-Seattle[,8] NewHighs[,2]<-Seattle[,4] #find the days that have this new high for(i in 1:n){ if(OldHighs[i]<=RecordHigh[i]){ NewHighs[i,3]<- RecordHigh[i] } else{ NewHighs[i,3]= NA } } #checking data just to be sure #head(NewHighs) NEW<-as.data.frame(NewHighs) NEW<- cbind(NEW, ORDER) #getting the number of total important observations One<-c() for(i in 1:n){ if(OldHighs[i]<=RecordHigh[i]){ One[i]<- 1 } else{ One[i]= 0 } } one<-sum(One) names<-c("record_max_temp", "actual_max_temp", "tempY", "ORDER") colnames(NEW)<- names newSeattle<-cbind(Seattle, ORDER) tempY<-NEW[,3] temp<-c() for(i in 1:n){ if(OldHighs[i]<=RecordHigh[i]){ temp[i]<- i } else{ temp[i]= NA } } dayY<-as.data.frame(temp) Two<-c() for(i in 1:n){ if(OldHighs[i]<=RecordHigh[i]){ Two[i]<- 1 } else{ Two[i]= 0 } } two<-sum(Two) #creating needed breaks xaxisbreaks<- c(15.5, 46.5, (62+92)/2 ,107.5, (123+153)/2, (153+184)/2, (184+215)/2, (215+243)/2, (243+274)/2, (274+304)/2, (304+335)/2, (335+365)/2 ) xaxislabels<- c("July\n2014","Aug\n2014","Sept\n2014", "Oct\n2014", "Nov\n2014", "Dec\n2014", "Jan\n2015", "Feb\n2015", "Mar\n2015", "Apr\n2015", "May\n2015", "June\n2015") testB<- c(100, 200) testL<- c('one','two') #calculating average of the days vector aveday<- mean(Seattle[,2]) #creating my theme mytheme<-theme( plot.title = element_text(lineheight=1.5, size=35, face="bold"), axis.text.x=element_text(size=23), axis.text.y=element_text(size=23), axis.title.x=element_text(size=28, face='bold'), axis.title.y=element_text(size=28, face='bold'), strip.background=element_rect(fill="gray80"), panel.background=element_rect(fill="gray80"), panel.grid.minor=element_blank(), panel.grid.major=element_blank(), axis.ticks= element_blank(), axis.text=element_text(colour="black"), plot.margin = unit(c(27,13,27,13), 'mm') ) #first part of graphic; putting gridlines and recording low temperature S<-ggplot(newSeattle, aes(x=ORDER, y =record_min_temp)) + geom_vline(xintercept = 31, colour = "white", size=.7) + #change these as you can't tell what the months are geom_vline(xintercept = 62, colour = "white", size=.7) + geom_vline(xintercept = 92, colour = "white", size=.7) + geom_vline(xintercept = 123, colour = "white", size=.7) + geom_vline(xintercept = 153, colour = "white", size=.7) + geom_vline(xintercept = 184, colour = "white", size=.7) + geom_vline(xintercept = 215, colour = "white", size=.7) + geom_vline(xintercept = 243, colour = "white", size=.7) + geom_vline(xintercept = 274, colour = "white", size=.7) + geom_vline(xintercept = 304, colour = "white", size=.7) + geom_vline(xintercept = 335, colour = "white", size=.7) + geom_vline(xintercept = 365, colour = "white", size=.7) + geom_hline(yintercept = -10, colour = "white", size=.5) + geom_hline(yintercept = 0, colour = "white", size=.5) + geom_hline(yintercept = 10, colour = "white", size=.5) + geom_hline(yintercept = 20, colour = "white", size=.5) + geom_hline(yintercept = 30, colour = "white", size=.5) + geom_hline(yintercept = 40, colour = "white", size=.5) + geom_hline(yintercept = 50, colour = "white", size=.5) + geom_hline(yintercept = 60, colour = "white", size=.5) + geom_hline(yintercept = 70, colour = "white", size=.5) + geom_hline(yintercept = 80, colour = "white", size=.5) + geom_hline(yintercept = 90, colour = "white", size=.5) + geom_hline(yintercept = 100, colour = "white", size=.5) + geom_hline(yintercept = 110, colour = "white", size=.5)+ geom_line(colour="white", size=3, fill='white') + geom_line(colour="navy", size=2, fill='navy') + xlab("Date") + ylab("Temperature (F)")+ ggtitle("Charolette, North Carolina, Temperatures from July 2014 to June 2015\nand Historical Temperatures from 1962 to 2014")+ scale_y_continuous(breaks = seq(-10, 110, by=10), labels = seq(-10, 110, by=10), expand=c(0,0)) + scale_x_continuous(breaks = xaxisbreaks, labels= xaxislabels, expand=c(0,0))+ mytheme #addig the record high temperatures S<- S + geom_line(data=newSeattle, aes(x=ORDER, y =record_max_temp), colour="white", size=3, fill='white')+ geom_line(data=newSeattle, aes(x=ORDER, y =record_max_temp), colour="red", size=2, fill='red') #adding the average temperatures SE<-S + geom_line(newSeattle, mapping=aes(x=ORDER, y=average_min_temp), color='mediumorchid1', size=3) + geom_line(newSeattle, mapping=aes(x=ORDER, y=average_max_temp), color='orange', size=3) #adding the daily range SEA<-SE + geom_linerange(newSeattle, mapping=aes(x=ORDER, ymin=actual_min_temp, ymax=actual_max_temp), color='darkgreen', size=1) #adding the important observations SEAT<-SEA + geom_point(data=NULL, aes(x=ORDER, y=tempY), shape= 24, colour="white", fill='white', size=7.5) + geom_point(data=NULL, aes(x=ORDER, y=tempY), shape= 24, colour="red", fill='red', size=5) + geom_point(data=NULL, aes(x=ORDER, y=tempY), colour="white", fill='white', size=2.5)+ geom_point(data=NULL, aes(x=ORDER, y=tempYL), shape=25, colour="darkslategray1",fill='darkslategray1', size=9)+ geom_point(data=NULL, aes(x=ORDER, y=tempYL), shape=25, colour="navy",fill='navy', size=6)+ geom_point(data=NULL, aes(x=ORDER, y=tempYL), colour="darkslategray1", fill='darkslategray1', size=2.7) SEATT<- SEAT #creating top bar on the graph SEATTL<-SEATT + annotate("rect", xmin=0, xmax=365, ymin=111, ymax=120, color='white', fill='white')+ annotate("rect", xmin=temp, xmax=temp+1, ymin=112, ymax=113, color="red", fill='red')+ annotate("rect", xmin=tempL, xmax=tempL+1, ymin=114, ymax=115, color="navy", fill='navy')+ annotate("rect", xmin=1, xmax=81, ymin=-9, ymax=39, color= 'black', fill='gray80') #adding legend, notes, etc. SEATTLE<- SEATTL + annotate("rect", xmin = 28, xmax = 36, ymin = -1, ymax = 1, fill='navy', colour = "navy") + annotate("rect", xmin = 28, xmax = 36, ymin = 20, ymax = 22, colour = "red",fill='red') + annotate("rect", xmin = 28, xmax = 36, ymin = 4, ymax = 6, fill='mediumorchid1', colour = "mediumorchid1") + annotate("rect", xmin = 28, xmax = 36, ymin = 16, ymax = 18, fill='orange', colour = "orange") + annotate("segment", x = 32, xend = 32, y = 3, yend = 19, colour = "darkgreen", size=2) + #adding labels to the Legend annotate("text", x = 14, y = 21, label = "RECORD\nHIGH", size=4,fontface='bold', colour="red") + annotate("text", x = 51, y = 17, label = "AVERAGE HIGH", size=4, fontface='bold', colour="orange") + annotate("text", x = 51, y = 5, label = "AVERAGE LOW", size=4, fontface='bold', colour="mediumorchid1") + annotate("text", x = 14, y = 0, label = "RECORD\nLOW", size=4,fontface='bold', colour="navy") + annotate("text", x = 15, y = 11, label = "DAILY RANGE", size=4,fontface='bold', colour="darkgreen")+ annotate('point', x= 32, y= 26, colour= "white", fill='white',shape=24, size=7)+ annotate('point', x= 32, y= 26, colour= "red",fill='red', shape=24, size=4)+ annotate('point', x= 32, y= 26, colour="white",fill='white', size=2.5)+ annotate('point', x= 32, y= -4, colour= "darkslategray1", fill='darkslategray1', size=7, shape=25)+ annotate('point', x= 32, y= -4, colour= "navy", fill='navy', size=5, shape=25)+ annotate('point', x= 32, y= -4, colour= "darkslategray1", fill='darkslategray1', size=2.5)+ annotate("rect", xmin=37, xmax=78, ymin=23, ymax=28, color= 'white', fill='red')+ annotate('text', x= 57, y= 26, label="NEW HIGH OR EQUAL (6)", size=3.33, fontface='bold', colour="white")+ annotate("rect", xmin=37, xmax=77.5, ymin=-6, ymax=-1, color= 'darkslategray1', fill='navy')+ annotate('text', x= 57, y= -3, label="NEW LOW OR EQUAL (6)", size=3.33, fontface='bold', colour="darkslategray1")+ annotate('text', x= 37, y= 35, label="Legend", size=8, colour="black", face='bold') #displaying the final graph SEATTLE