Recovering growth chart LMS parameters

This post provides R code to estimate growth chart LMS parameters¹ given a table of measurements and percentile values (or Z-scores). The initial rationale was to explore problems with LMS parameterization in a publication describing growth charts for Noonan syndrome², with the findings uploaded to MedRxiv³.

Load packages

Show the code

library(tidyverse)
library(knitr)
library(minpack.lm)  # nlsLM() for robust nonlinear least squares
library(ggh4x)       # facet_grid2(), for multiple panels with free scales

Load data

The source data was extracted from the supplemental materials from² using Tabula and collected to a single datagrame.

Show the code

##### Load original published data tables #####
# `cappa_full.csv` contains: chart, age / units, gender, measure / units, M, S, L, 3 / 10 / 25 / 50 / 75 / 90 / 97 percentiles

df_orig <- read.csv("cappa_full.csv") |> as_tibble()

df_orig |> select(measure, gender, age, M, S, L, p_3:p_97) |> head() |> knitr::kable()

measure	gender	age	M	S	L	p_3	p_10	p_25	p_50	p_75	p_90	p_97
height_velocity	f	0.0	19.301	0.262	0.780	10.353	13.069	15.955	19.301	22.780	26.015	29.298
height_velocity	f	0.2	18.601	0.264	0.781	9.923	12.557	15.355	18.601	21.976	25.114	28.298
height_velocity	f	0.4	17.903	0.266	0.782	9.497	12.048	14.759	17.903	21.172	24.212	27.297
height_velocity	f	0.6	17.207	0.267	0.784	9.077	11.544	14.166	17.207	20.370	23.311	26.296
height_velocity	f	0.8	16.515	0.269	0.785	8.663	11.045	13.577	16.515	19.570	22.412	25.295
height_velocity	f	1.0	15.827	0.271	0.786	8.254	10.551	12.994	15.828	18.775	21.516	24.297

There were problems with the LMS parameters, so the goal was to re-estimate them from the percentile values (3 / 10 / 25 / 50 / 75 / 90 / 97). The LMS parameterization relates the measurement value X with the parameters L, M, S, and Z as follows:

$X = M (1 + L S Z)^{1 / L}, L \neq 0$

$X = M e^{S Z}, L = 0$

Prepare training data

Show the code

df <- df_orig |> # wide to long format with published X at percentiles
  pivot_longer(
    p_3:p_97,
    names_to = "percentile",
    names_prefix = "p_",
    values_to = "X",
    names_transform = as.numeric
  ) |> 
  mutate(Z = qnorm(percentile / 100))

df_train <- df |>
  select(measure, gender, age, X, Z)

df_train |> head(n = 14) |> knitr::kable()

measure	gender	age	X	Z
height_velocity	f	0.0	10.353	-1.8807936
height_velocity	f	0.0	13.069	-1.2815516
height_velocity	f	0.0	15.955	-0.6744898
height_velocity	f	0.0	19.301	0.0000000
height_velocity	f	0.0	22.780	0.6744898
height_velocity	f	0.0	26.015	1.2815516
height_velocity	f	0.0	29.298	1.8807936
height_velocity	f	0.2	9.923	-1.8807936
height_velocity	f	0.2	12.557	-1.2815516
height_velocity	f	0.2	15.355	-0.6744898
height_velocity	f	0.2	18.601	0.0000000
height_velocity	f	0.2	21.976	0.6744898
height_velocity	f	0.2	25.114	1.2815516
height_velocity	f	0.2	28.298	1.8807936

Functions

The meat of the code is the estimate_LMS_parameters() function which takes a dataframe including columns of X and Z, optionally groups (for example, by anthropometric measurement, sex, and age), and performs non-linear least squares regression via minpack.lm::nlsLM() to estimate L, M, and S for each group.

The estimation function could be improved to more explicitly handle the case where L = 0, as the objective function converges to a different form as L approaches 0, but setting starting values for L that include values just above and below 0 seems to work. Because we are trying to estimate 3 parameters, at least 3 points per group are needed. In this dataset, we have 7 X / Z points per group.

##### Estimate_LMS_parameters #####

estimate_LMS_parameters <- function(df, ...) {
  # takes a dataframe of columns Z and X and optional (...) grouping variables
  # returns each group's estimated LMS parameters, as well as RMSE, R-squared,
  # convergence, n, and method comment
  #
  # Requires: `tidyverse` and `minpack.lm` libraries
  
  library(tidyverse)
  library(minpack.lm)  # nlsLM() for robust nonlinear least squares

  tolerance <- 1e-6 # used for L ≈ 0 and for when assessing `best_obj`
  rmse_tolerance <- 0.01 # max RMSE for regression to be considered "converged"
  
  # Vectorized model function for optimization
  LMS_model_vec <- function(params, Z) {
    L <- params[1]
    M <- params[2] 
    S <- params[3]
    if (abs(L) < tolerance) { # When L ≈ 0, use exponential form
      return(M * exp(S * Z))
    } else { # Normal case
      return(M * (1 + L*S*Z)^(1/L))
    }
  }
  
  # Function to estimate parameters for a single group
  estimate_single_group <- function(group_data) {
    Z <- group_data$Z
    X <- group_data$X
    n <- nrow(group_data)
    
    if (n < 3) { # Need at least 3 points to estimate 3 parameters
      return(data.frame(
        L = NA, M = NA, S = NA, rmse = NA,
        r_squared = NA, converged = FALSE,
        n_obs = n, method = "insufficient_data"))
    }

    # Test discontinuity at L = 0 (normal distribution with no skew)
    #
    # tibble(
    #   measure = "test", age = 0, gender = "m",
    #   percentile = c(3, 10, 50, 90, 97)
    # ) |>
    #   mutate(
    #     Z = qnorm(percentile / 100),
    #     X = 50 * exp(1 * Z) # formula for L = 0; M = 50, S = 1
    #   ) |> 
    #   estimate_LMS_parameters(measure, age, gender)
    #
    # L_starts <- c(-5, 5)              # fails to converge
    # L_starts <- c(-5, -0.01, 0.01, 5) # converges to L = 1.165e-06
    
    # Sets of starting values
    M_starts <- c(mean(X), median(X)) # estimating the median
    L_starts <- c(-5, -0.01, 0.01, 5) # handle potential L = 0
    S_starts <- c(0.01, 1)            # always positive
    
    best_params <- NULL
    best_obj <- Inf
    best_method <- "failed"
    
    # Try different starting combinations
    for (M_start in M_starts) {
      for (L_start in L_starts) {
        for (S_start in S_starts) {
          tryCatch({ # Using minpack.lm::nlsLM() robust nonlinear least squares
            # objective function NOT normalized to # of points
            # RMSE = sqrt(obj_va / n)
            if (is.null(best_params) || best_obj > tolerance) {
              fit <- nlsLM(
                # X ~ ifelse( # this didn't work
                #   abs(L) < tolerance,
                #   M * exp(S * Z),
                #   M * (1 + L * S * Z)^(1/L)
                # ),
                X ~ M * (1 + L * S * Z)^(1/L), # will fail if L is exactly 0
                data = data.frame(X = X, Z = Z),
                start = list(L = L_start, M = M_start, S = S_start),
                control = nls.lm.control(maxiter = 1000))
              params <- coef(fit)
              obj_val <- sum(residuals(fit)^2)
              if (obj_val < best_obj) {
                best_obj <- obj_val
                best_params <- params
                best_method <- "nlsLM"
              }
            }
          }, error = function(e) {
            # Unsuccessful fitting --> continue
          })
        }
      }
    }
    
    # Check if found a solution
    if (is.null(best_params)) {
      return(data.frame(
        L = NA, M = NA, S = NA,
        rmse = NA, r_squared = NA, converged = FALSE,
        n_obs = n, method = "all_failed"))
    }
    
    # Calculate fit statistics
    predicted <- LMS_model_vec(best_params, Z)
    residuals <- X - predicted
    rmse <- sqrt(mean(residuals^2))
    
    # R-squared
    tss <- sum((X - mean(X))^2)
    rss <- sum(residuals^2)
    r_squared <- 1 - rss/tss
    
    # Check convergence quality
    converged <- (rmse < rmse_tolerance) && # OK criteria
      all(is.finite(best_params)) &&        # pretty unrestrictive criteria
      (r_squared > 0)                       # pretty unrestrictive criteria
    
    return(data.frame(
      L = best_params[1],
      M = best_params[2], 
      S = best_params[3],
      rmse = rmse,
      r_squared = r_squared,
      converged = converged,
      n_obs = n,
      method = best_method
    ))
  }
  
  # Apply to each group
  grouping_syms <- rlang::ensyms(...)
  results <- df
  if (length(grouping_syms) > 0) {
    results <- results |> group_by(!!!grouping_syms)
  }
  results <- results |> 
    group_modify(~ estimate_single_group(.x)) %>%
    ungroup()
  return(results)
}

(Also included is a vectorized helper function from the peditools package to calculate X from L, M, S, and Z, which could also probably be improved by not using strict equality of L with 0.)

Show the code

##### z_lms_to_x #####
# From `peditools`
z_lms_to_x <- function (Z, L, M, S) {
  expand.arguments <- function(...) {
    dotList <- list(...)
    max.length <- max(sapply(dotList, length))
    lapply(dotList, rep, length = max.length)
  }
  temp <- expand.arguments(Z, L, M, S)
  Z = temp[[1]]
  L = temp[[2]]
  M = temp[[3]]
  S = temp[[4]]
  ifelse(L != 0, M * (1 + L * S * Z)^(1/L), M * exp(S * Z))
}

Use the training data to estimate LMS parameters

The full dataframe df_train includes 4 anthropometric measures, 2 genders, and 100 ages, so LMS parameters will be estimated from 800 separate groups of seven X / Z pairs (one for each of the 7 percentiles (3 / 10 / 25 / 50 / 75 / 90 / 97).

Show the code

df_est <- estimate_LMS_parameters(df_train, measure, gender, age) # group by measure, gender, and age

df_est |> head() |> knitr::kable()

measure	gender	age	L	M	S	rmse	r_squared	converged	n_obs	method
bmi	f	0.0	0.0204592	14.14973	0.0923936	0.0001710	1	TRUE	7	nlsLM
bmi	f	0.2	0.0124989	14.33114	0.0928675	0.0001556	1	TRUE	7	nlsLM
bmi	f	0.4	0.0053028	14.50763	0.0933272	0.0002511	1	TRUE	7	nlsLM
bmi	f	0.6	0.0007589	14.67681	0.0937863	0.0002292	1	TRUE	7	nlsLM
bmi	f	0.8	-0.0076834	14.83683	0.0942656	0.0001313	1	TRUE	7	nlsLM
bmi	f	1.0	-0.0158573	14.98664	0.0947315	0.0002322	1	TRUE	7	nlsLM

The LMS estimation results include some per-group statistics (like RMSE, R-squared, and number of observations). The RMSE values are quite good but not perfect, as they should be if the percentile values were all calculated correctly from source LMS values.This imperfect convergence reflects problems with the source data.

Show the code

summary(df_est |> mutate(measure = as.factor(measure), gender = as.factor(gender), method = as.factor(method)))

            measure    gender       age              L          
 bmi            :200   f:400   Min.   : 0.00   Min.   :-3.8136  
 height         :200   m:400   1st Qu.: 4.95   1st Qu.:-1.1473  
 height_velocity:200           Median : 9.90   Median :-0.1807  
 weight         :200           Mean   : 9.90   Mean   :-0.3799  
                               3rd Qu.:14.85   3rd Qu.: 0.6690  
                               Max.   :19.80   Max.   : 2.6379  
       M                  S                rmse             r_squared     
 Min.   :  0.8091   Min.   :0.04280   Min.   :3.656e-05   Min.   :0.9996  
 1st Qu.: 11.2486   1st Qu.:0.09131   1st Qu.:1.768e-04   1st Qu.:1.0000  
 Median : 17.4175   Median :0.14562   Median :2.264e-04   Median :1.0000  
 Mean   : 42.7057   Mean   :0.17739   Mean   :4.009e-03   Mean   :1.0000  
 3rd Qu.: 54.2090   3rd Qu.:0.22781   3rd Qu.:2.879e-04   3rd Qu.:1.0000  
 Max.   :166.2587   Max.   :0.48378   Max.   :1.325e-01   Max.   :1.0000  
 converged           n_obs     method   
 Mode :logical   Min.   :7   nlsLM:800  
 FALSE:53        1st Qu.:7              
 TRUE :747       Median :7              
                 Mean   :7              
                 3rd Qu.:7              
                 Max.   :7

Compare reported X values with those calculated from re-estimated LMS values

The newly estimated LMS values were used to recalculate the X values. (In the technical brief, the problems exhibited by the published LMS values were also shown.)

Show the code

# Functionally assess estimatedLMS values
df_assess <- df_est |>
  transmute(measure, gender, age, estL = L, estM = M, estS = S) |>
  left_join(
    df_orig |> select(measure, gender, age, L, M, S, everything()),
    by = c("measure", "gender", "age")
  ) |>
  mutate(
    deltaL = estL - L, deltaM = estM - M, deltaS = estS - S
  ) |> 
  pivot_longer(p_3:p_97, names_to = "percentile", names_prefix = "p_", values_to = "X") |>
  mutate(
    # Calculate X from original and re-estimated LMS values
    percentile = as.numeric(percentile),
    calcX = z_lms_to_x(qnorm(percentile / 100.0), L, M, S), # using the originally published LMS values
    estX  = z_lms_to_x(qnorm(percentile / 100.0), estL, estM, estS)
  ) |> 
  select(
    chart, age, age_units, gender, measure, measure_units, L, M, S, estL, estM, estS, deltaL, deltaM, deltaS, percentile, X, calcX, estX
  )

Overlaying the reported X values with those calculated from the re-estimated LMS values show near-perfect alignment. (Each panel is reproduced in larger size below, to better visualize the minor imperfections present from the flawed original percentile measurement data.)

Show the code

# Plot X, calcX, and estX
g <- df_assess |>
  mutate(percentile = factor(percentile, ordered = TRUE)) |>
  ggplot(aes(age, X, group = percentile)) +
  geom_line(size = 0.2) +
  # geom_point(aes(age, calcX), color = "red", size = 0.4, alpha = 0.5) +
  geom_point(aes(age, estX), color = "blue", size = 0.3, alpha = 0.2) +
  facet_grid2(gender ~ measure, scales = "free_y", independent = "y") +
  theme_bw() +
  labs(
    x = "Age (years)",
    y = "X measurement"
  )
g

So, that’s how to regenerate LMS parameters from tables of percentile values. Theoretically, if the table of percentiles were correctly generated directly from LMS values, 3 values per group would be sufficient to perfectly recreate the LMS parameters. In that case, it’d probably best to use 10 / 50 / 90 percentile values, to avoid the decreased precision at more extreme Z scores.

Individual plots of reported vs calculated X values from re-estimated LMS

Show the code

for (m in c("bmi", "height", "height_velocity", "weight")) {
  {
  # for (g in c("f", "m")) {
    g <- df_assess |> 
      # filter(measure == m, gender == g) |> 
      filter(measure == m) |> 
      mutate(percentile = factor(percentile, ordered = TRUE)) |>
      ggplot(aes(age, X, group = percentile)) +
      geom_line() +
      # geom_point(aes(age, calcX), color = "red", size = 0.4, alpha = 0.5) +
      geom_point(aes(age, estX), color = "blue", size = 0.4, alpha = 0.5) +
      theme_bw() +
      labs(
        x = "Age (years)",
        y = "X measurement"
      ) +
      labs(
        # title = paste0(m, " - ", g)
        title = paste0(m)
      ) +
      facet_wrap(~gender)
    print(g)
  }
}

References

Cole TJ. The LMS method for constructing normalized growth standards. European Journal of Clinical Nutrition. 1990;44(1):45-60.

Cappa M, d’Aniello F, Digilio MC, et al. Noonan Syndrome Growth Charts and Genotypes: 15-Year Longitudinal Single-Centre Study. Hormone Research in Paediatrics. Published online July 22, 2024:1-13. doi:10.1159/000540092

Chou JH. Correction of LMS Parameter Errors in Published Noonan Syndrome Growth Charts. MedRxiv. Published online July 2024. doi:10.1101/2025.07.29.25332354