# Plot histogram with overlaid normal curve

Given a vector of values, create a ggplot histogram with overlaid best-fitting normal curve, with prettified caption of numerics
R
Code snippet
Published

July 29, 2024

Another basic task that I’m tired of looking up how to perform, so I’m posting this for personal reference.

Task: given a vector of values, create a ggplot histogram with overlaid best-fitting normal curve, with optional caption including mean and standard deviation, presented prettified.

``````library(tidyverse)

format_num <- function(n, digits = 3) {
# Prettify numeric results -- no scientific notation, use significant digits
formatC(signif(n, digits=digits), digits=digits, format="fg", flag="#")
}

hist_normal <- function(values, binwidth = NA, caption = TRUE, num_sd = NA) {
# values is a vector of numbers
df <- data.frame(value = values)
values_mean <- mean(df\$value)
values_sd   <- sd(df\$value)
if (is.na(binwidth)) {binwidth <- abs((max(df\$value) - min(df\$value)) / 30)}

g <- df %>%
ggplot(aes(x = value)) +
geom_histogram(
aes(y = after_stat(density)),
binwidth = binwidth,
colour = "black", fill = "white"
) +
stat_function(fun = dnorm, args = list(mean = values_mean, sd = values_sd))

if (caption) {
g <- g +
labs(caption = paste0(
"mean = ", format_num(values_mean),
"; sd = ", format_num(values_sd),
"; n = ", length(values)
))
}

if (!is.na(num_sd)) {
g <- g + coord_cartesian(xlim = values_mean + values_sd * c(-num_sd, num_sd))
}

return (g)
}``````

Simple example: default of around 30 bins will be too many for `n` = 200 points.

``hist_normal(rnorm(200))``

Smoother example, centering the plot around mean and specifying x-axis limits as 4 standard deviations around mean:

``hist_normal(rnorm(5000, mean = 25, sd = 2.5), binwidth = 0.5, num_sd = 4)``