When the sample size behind each category is highly variable, it can be great to represent it through the box widths.
First calculate the proportion of each level using the table()
function. Using these proportions will make the box twice bigger if a level is twice more represented. Then give these proportions to the width
argument when you call the boxplot()
function.
# Dummy data
names <- c(rep("A", 20) , rep("B", 8) , rep("C", 30), rep("D", 80))
value <- c( sample(2:5, 20 , replace=T) , sample(4:10, 8 , replace=T),
sample(1:7, 30 , replace=T), sample(3:8, 80 , replace=T) )
data <- data.frame(names,value)
# Calculate proportion of each level
proportion <- table(data$names)/nrow(data)
#Draw the boxplot, with the width proportionnal to the occurence !
boxplot(data$value ~ data$names , width=proportion , col=c("orange" , "seagreen"))