
This post provides R code to estimate growth chart LMS parameters1 given a table of measurements and percentile values (or Z-scores). The initial rationale was to explore problems with LMS parameterization in a publication describing growth charts for Noonan syndrome2, with the findings uploaded to MedRxiv3.
Load packages
Show the code
library(tidyverse)
library(knitr)
library(minpack.lm) # nlsLM() for robust nonlinear least squares
library(ggh4x) # facet_grid2(), for multiple panels with free scalesLoad data
The source data was extracted from the supplemental materials from2 using Tabula and collected to a single datagrame.
Show the code
##### Load original published data tables #####
# `cappa_full.csv` contains: chart, age / units, gender, measure / units, M, S, L, 3 / 10 / 25 / 50 / 75 / 90 / 97 percentiles
df_orig <- read.csv("cappa_full.csv") |> as_tibble()
df_orig |> select(measure, gender, age, M, S, L, p_3:p_97) |> head() |> knitr::kable()| measure | gender | age | M | S | L | p_3 | p_10 | p_25 | p_50 | p_75 | p_90 | p_97 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| height_velocity | f | 0.0 | 19.301 | 0.262 | 0.780 | 10.353 | 13.069 | 15.955 | 19.301 | 22.780 | 26.015 | 29.298 |
| height_velocity | f | 0.2 | 18.601 | 0.264 | 0.781 | 9.923 | 12.557 | 15.355 | 18.601 | 21.976 | 25.114 | 28.298 |
| height_velocity | f | 0.4 | 17.903 | 0.266 | 0.782 | 9.497 | 12.048 | 14.759 | 17.903 | 21.172 | 24.212 | 27.297 |
| height_velocity | f | 0.6 | 17.207 | 0.267 | 0.784 | 9.077 | 11.544 | 14.166 | 17.207 | 20.370 | 23.311 | 26.296 |
| height_velocity | f | 0.8 | 16.515 | 0.269 | 0.785 | 8.663 | 11.045 | 13.577 | 16.515 | 19.570 | 22.412 | 25.295 |
| height_velocity | f | 1.0 | 15.827 | 0.271 | 0.786 | 8.254 | 10.551 | 12.994 | 15.828 | 18.775 | 21.516 | 24.297 |
There were problems with the LMS parameters, so the goal was to re-estimate them from the percentile values (3 / 10 / 25 / 50 / 75 / 90 / 97). The LMS parameterization relates the measurement value X with the parameters L, M, S, and Z as follows:
or
Prepare training data
Show the code
df <- df_orig |> # wide to long format with published X at percentiles
pivot_longer(
p_3:p_97,
names_to = "percentile",
names_prefix = "p_",
values_to = "X",
names_transform = as.numeric
) |>
mutate(Z = qnorm(percentile / 100))
df_train <- df |>
select(measure, gender, age, X, Z)
df_train |> head(n = 14) |> knitr::kable()| measure | gender | age | X | Z |
|---|---|---|---|---|
| height_velocity | f | 0.0 | 10.353 | -1.8807936 |
| height_velocity | f | 0.0 | 13.069 | -1.2815516 |
| height_velocity | f | 0.0 | 15.955 | -0.6744898 |
| height_velocity | f | 0.0 | 19.301 | 0.0000000 |
| height_velocity | f | 0.0 | 22.780 | 0.6744898 |
| height_velocity | f | 0.0 | 26.015 | 1.2815516 |
| height_velocity | f | 0.0 | 29.298 | 1.8807936 |
| height_velocity | f | 0.2 | 9.923 | -1.8807936 |
| height_velocity | f | 0.2 | 12.557 | -1.2815516 |
| height_velocity | f | 0.2 | 15.355 | -0.6744898 |
| height_velocity | f | 0.2 | 18.601 | 0.0000000 |
| height_velocity | f | 0.2 | 21.976 | 0.6744898 |
| height_velocity | f | 0.2 | 25.114 | 1.2815516 |
| height_velocity | f | 0.2 | 28.298 | 1.8807936 |
Functions
The meat of the code is the estimate_LMS_parameters() function which takes a dataframe including columns of X and Z, optionally groups (for example, by anthropometric measurement, sex, and age), and performs non-linear least squares regression via minpack.lm::nlsLM() to estimate L, M, and S for each group.
The estimation function could be improved to more explicitly handle the case where L = 0, as the objective function converges to a different form as L approaches 0, but setting starting values for L that include values just above and below 0 seems to work. Because we are trying to estimate 3 parameters, at least 3 points per group are needed. In this dataset, we have 7 X / Z points per group.
##### Estimate_LMS_parameters #####
estimate_LMS_parameters <- function(df, ...) {
# takes a dataframe of columns Z and X and optional (...) grouping variables
# returns each group's estimated LMS parameters, as well as RMSE, R-squared,
# convergence, n, and method comment
#
# Requires: `tidyverse` and `minpack.lm` libraries
library(tidyverse)
library(minpack.lm) # nlsLM() for robust nonlinear least squares
tolerance <- 1e-6 # used for L ≈ 0 and for when assessing `best_obj`
rmse_tolerance <- 0.01 # max RMSE for regression to be considered "converged"
# Vectorized model function for optimization
LMS_model_vec <- function(params, Z) {
L <- params[1]
M <- params[2]
S <- params[3]
if (abs(L) < tolerance) { # When L ≈ 0, use exponential form
return(M * exp(S * Z))
} else { # Normal case
return(M * (1 + L*S*Z)^(1/L))
}
}
# Function to estimate parameters for a single group
estimate_single_group <- function(group_data) {
Z <- group_data$Z
X <- group_data$X
n <- nrow(group_data)
if (n < 3) { # Need at least 3 points to estimate 3 parameters
return(data.frame(
L = NA, M = NA, S = NA, rmse = NA,
r_squared = NA, converged = FALSE,
n_obs = n, method = "insufficient_data"))
}
# Test discontinuity at L = 0 (normal distribution with no skew)
#
# tibble(
# measure = "test", age = 0, gender = "m",
# percentile = c(3, 10, 50, 90, 97)
# ) |>
# mutate(
# Z = qnorm(percentile / 100),
# X = 50 * exp(1 * Z) # formula for L = 0; M = 50, S = 1
# ) |>
# estimate_LMS_parameters(measure, age, gender)
#
# L_starts <- c(-5, 5) # fails to converge
# L_starts <- c(-5, -0.01, 0.01, 5) # converges to L = 1.165e-06
# Sets of starting values
M_starts <- c(mean(X), median(X)) # estimating the median
L_starts <- c(-5, -0.01, 0.01, 5) # handle potential L = 0
S_starts <- c(0.01, 1) # always positive
best_params <- NULL
best_obj <- Inf
best_method <- "failed"
# Try different starting combinations
for (M_start in M_starts) {
for (L_start in L_starts) {
for (S_start in S_starts) {
tryCatch({ # Using minpack.lm::nlsLM() robust nonlinear least squares
# objective function NOT normalized to # of points
# RMSE = sqrt(obj_va / n)
if (is.null(best_params) || best_obj > tolerance) {
fit <- nlsLM(
# X ~ ifelse( # this didn't work
# abs(L) < tolerance,
# M * exp(S * Z),
# M * (1 + L * S * Z)^(1/L)
# ),
X ~ M * (1 + L * S * Z)^(1/L), # will fail if L is exactly 0
data = data.frame(X = X, Z = Z),
start = list(L = L_start, M = M_start, S = S_start),
control = nls.lm.control(maxiter = 1000))
params <- coef(fit)
obj_val <- sum(residuals(fit)^2)
if (obj_val < best_obj) {
best_obj <- obj_val
best_params <- params
best_method <- "nlsLM"
}
}
}, error = function(e) {
# Unsuccessful fitting --> continue
})
}
}
}
# Check if found a solution
if (is.null(best_params)) {
return(data.frame(
L = NA, M = NA, S = NA,
rmse = NA, r_squared = NA, converged = FALSE,
n_obs = n, method = "all_failed"))
}
# Calculate fit statistics
predicted <- LMS_model_vec(best_params, Z)
residuals <- X - predicted
rmse <- sqrt(mean(residuals^2))
# R-squared
tss <- sum((X - mean(X))^2)
rss <- sum(residuals^2)
r_squared <- 1 - rss/tss
# Check convergence quality
converged <- (rmse < rmse_tolerance) && # OK criteria
all(is.finite(best_params)) && # pretty unrestrictive criteria
(r_squared > 0) # pretty unrestrictive criteria
return(data.frame(
L = best_params[1],
M = best_params[2],
S = best_params[3],
rmse = rmse,
r_squared = r_squared,
converged = converged,
n_obs = n,
method = best_method
))
}
# Apply to each group
grouping_syms <- rlang::ensyms(...)
results <- df
if (length(grouping_syms) > 0) {
results <- results |> group_by(!!!grouping_syms)
}
results <- results |>
group_modify(~ estimate_single_group(.x)) %>%
ungroup()
return(results)
}(Also included is a vectorized helper function from the peditools package to calculate X from L, M, S, and Z, which could also probably be improved by not using strict equality of L with 0.)
Show the code
##### z_lms_to_x #####
# From `peditools`
z_lms_to_x <- function (Z, L, M, S) {
expand.arguments <- function(...) {
dotList <- list(...)
max.length <- max(sapply(dotList, length))
lapply(dotList, rep, length = max.length)
}
temp <- expand.arguments(Z, L, M, S)
Z = temp[[1]]
L = temp[[2]]
M = temp[[3]]
S = temp[[4]]
ifelse(L != 0, M * (1 + L * S * Z)^(1/L), M * exp(S * Z))
}Use the training data to estimate LMS parameters
The full dataframe df_train includes 4 anthropometric measures, 2 genders, and 100 ages, so LMS parameters will be estimated from 800 separate groups of seven X / Z pairs (one for each of the 7 percentiles (3 / 10 / 25 / 50 / 75 / 90 / 97).
Show the code
df_est <- estimate_LMS_parameters(df_train, measure, gender, age) # group by measure, gender, and age
df_est |> head() |> knitr::kable()| measure | gender | age | L | M | S | rmse | r_squared | converged | n_obs | method |
|---|---|---|---|---|---|---|---|---|---|---|
| bmi | f | 0.0 | 0.0204592 | 14.14973 | 0.0923936 | 0.0001710 | 1 | TRUE | 7 | nlsLM |
| bmi | f | 0.2 | 0.0124989 | 14.33114 | 0.0928675 | 0.0001556 | 1 | TRUE | 7 | nlsLM |
| bmi | f | 0.4 | 0.0053028 | 14.50763 | 0.0933272 | 0.0002511 | 1 | TRUE | 7 | nlsLM |
| bmi | f | 0.6 | 0.0007589 | 14.67681 | 0.0937863 | 0.0002292 | 1 | TRUE | 7 | nlsLM |
| bmi | f | 0.8 | -0.0076834 | 14.83683 | 0.0942656 | 0.0001313 | 1 | TRUE | 7 | nlsLM |
| bmi | f | 1.0 | -0.0158573 | 14.98664 | 0.0947315 | 0.0002322 | 1 | TRUE | 7 | nlsLM |
The LMS estimation results include some per-group statistics (like RMSE, R-squared, and number of observations). The RMSE values are quite good but not perfect, as they should be if the percentile values were all calculated correctly from source LMS values.This imperfect convergence reflects problems with the source data.
Show the code
summary(df_est |> mutate(measure = as.factor(measure), gender = as.factor(gender), method = as.factor(method))) measure gender age L
bmi :200 f:400 Min. : 0.00 Min. :-3.8136
height :200 m:400 1st Qu.: 4.95 1st Qu.:-1.1473
height_velocity:200 Median : 9.90 Median :-0.1807
weight :200 Mean : 9.90 Mean :-0.3799
3rd Qu.:14.85 3rd Qu.: 0.6690
Max. :19.80 Max. : 2.6379
M S rmse r_squared
Min. : 0.8091 Min. :0.04280 Min. :3.656e-05 Min. :0.9996
1st Qu.: 11.2486 1st Qu.:0.09131 1st Qu.:1.768e-04 1st Qu.:1.0000
Median : 17.4175 Median :0.14562 Median :2.264e-04 Median :1.0000
Mean : 42.7057 Mean :0.17739 Mean :4.009e-03 Mean :1.0000
3rd Qu.: 54.2090 3rd Qu.:0.22781 3rd Qu.:2.879e-04 3rd Qu.:1.0000
Max. :166.2587 Max. :0.48378 Max. :1.325e-01 Max. :1.0000
converged n_obs method
Mode :logical Min. :7 nlsLM:800
FALSE:53 1st Qu.:7
TRUE :747 Median :7
Mean :7
3rd Qu.:7
Max. :7
Compare reported X values with those calculated from re-estimated LMS values
The newly estimated LMS values were used to recalculate the X values. (In the technical brief, the problems exhibited by the published LMS values were also shown.)
Show the code
# Functionally assess estimatedLMS values
df_assess <- df_est |>
transmute(measure, gender, age, estL = L, estM = M, estS = S) |>
left_join(
df_orig |> select(measure, gender, age, L, M, S, everything()),
by = c("measure", "gender", "age")
) |>
mutate(
deltaL = estL - L, deltaM = estM - M, deltaS = estS - S
) |>
pivot_longer(p_3:p_97, names_to = "percentile", names_prefix = "p_", values_to = "X") |>
mutate(
# Calculate X from original and re-estimated LMS values
percentile = as.numeric(percentile),
calcX = z_lms_to_x(qnorm(percentile / 100.0), L, M, S), # using the originally published LMS values
estX = z_lms_to_x(qnorm(percentile / 100.0), estL, estM, estS)
) |>
select(
chart, age, age_units, gender, measure, measure_units, L, M, S, estL, estM, estS, deltaL, deltaM, deltaS, percentile, X, calcX, estX
)Overlaying the reported X values with those calculated from the re-estimated LMS values show near-perfect alignment. (Each panel is reproduced in larger size below, to better visualize the minor imperfections present from the flawed original percentile measurement data.)
Show the code
# Plot X, calcX, and estX
g <- df_assess |>
mutate(percentile = factor(percentile, ordered = TRUE)) |>
ggplot(aes(age, X, group = percentile)) +
geom_line(size = 0.2) +
# geom_point(aes(age, calcX), color = "red", size = 0.4, alpha = 0.5) +
geom_point(aes(age, estX), color = "blue", size = 0.3, alpha = 0.2) +
facet_grid2(gender ~ measure, scales = "free_y", independent = "y") +
theme_bw() +
labs(
x = "Age (years)",
y = "X measurement"
)
g
So, that’s how to regenerate LMS parameters from tables of percentile values. Theoretically, if the table of percentiles were correctly generated directly from LMS values, 3 values per group would be sufficient to perfectly recreate the LMS parameters. In that case, it’d probably best to use 10 / 50 / 90 percentile values, to avoid the decreased precision at more extreme Z scores.
Individual plots of reported vs calculated X values from re-estimated LMS
Show the code
for (m in c("bmi", "height", "height_velocity", "weight")) {
{
# for (g in c("f", "m")) {
g <- df_assess |>
# filter(measure == m, gender == g) |>
filter(measure == m) |>
mutate(percentile = factor(percentile, ordered = TRUE)) |>
ggplot(aes(age, X, group = percentile)) +
geom_line() +
# geom_point(aes(age, calcX), color = "red", size = 0.4, alpha = 0.5) +
geom_point(aes(age, estX), color = "blue", size = 0.4, alpha = 0.5) +
theme_bw() +
labs(
x = "Age (years)",
y = "X measurement"
) +
labs(
# title = paste0(m, " - ", g)
title = paste0(m)
) +
facet_wrap(~gender)
print(g)
}
}


