How to read a box plot

I love box plots.

But every time I generate one, I am submerged with doubts and insecurities.

Am I reading it right? What’s the takeaway information? Is this the median value? Will the bottom quartile eat the quartile one at some point?

I can’t live with this shame anymore and neither shouldn’t you. So once and for all, here’s how to read a box plot.

But first, what is it for?

Box plots are useful for summarizing data distributions for quick comparisons between groups and highlighting outliers. They provide a concise visualization of the spread, central tendency, and variability of the data. This is a nice sentence but let’s take an example:

Before joining REKOLT, Karim exited a company specializing in the installation and maintenance of optical fiber networks in Morocco (Atlas Technofor). The company wants to optimize material usage across various installation jobs to reduce costs and improve efficiency. For that, we need to look at the data and analyze and compare material consumption patterns across the jobs.

A box plot is a standardized way of displaying the distribution of data based on a five-number summary: minimum, first quartile (Q1), median (second quartile, Q2), third quartile (Q3), and maximum. Here’s how to read a box plot:

  1. The Box: The central box represents the middle 50% of the data. The bottom of the box indicates the first quartile (Q1), and the top indicates the third quartile (Q3). The distance between them, known as the interquartile range (IQR), shows the spread of the middle half of the data.

  2. The Median: Inside the box, there is typically a line that indicates the median (Q2) of the data set. This is the value that splits the dataset into two equal parts.

  3. Whiskers: Extending from the box, the "whiskers" indicate variability outside the upper and lower quartiles. The end of the lower whisker represents the minimum data point within 1.5 IQR of the first quartile, and the end of the upper whisker represents the maximum data point within 1.5 IQR of the third quartile.

  4. Outliers: Any data points outside 1.5 IQR of the quartiles are considered outliers and are often plotted as individual points that are separate from the whiskers.

  5. Symmetry and Skewness: By observing the box and whiskers, you can get a sense of the symmetry and skewness of the data. If the median is not centered in the box or if the whiskers are not of equal length, the data might be skewed to one side.

Here's what each part can tell you about the data:

  • Long whiskers indicate more variability.

  • Short whiskers suggest less variability.

  • A longer box means a larger IQR and thus more data spread in the middle 50%.

  • A shorter box indicates that much of the data is similar, falling close to the median.

Next
Next

A Private Equity meme thread