Recovering growth chart LMS parameters

R code to use growth chart tables of percentile values to recreate LMS parameterization – with perfect data, only 3 values needed per set of parameters.
Published

July 30, 2025

This post provides R code to estimate growth chart LMS parameters given a table of measurements and percentile values (or Z-scores). The initial rationale was to explore problems with LMS parameterization in a publication describing growth charts for Noonan syndrome, with the findings uploaded to MedRxiv.

Load packages

Show the code
library(tidyverse)
library(knitr)
library(minpack.lm)  # nlsLM() for robust nonlinear least squares
library(ggh4x)       # facet_grid2(), for multiple panels with free scales

Load data

The source data was extracted from the supplemental materials from using Tabula and collected to a single datagrame.

Show the code
##### Load original published data tables #####
# `cappa_full.csv` contains: chart, age / units, gender, measure / units, M, S, L, 3 / 10 / 25 / 50 / 75 / 90 / 97 percentiles

df_orig <- read.csv("cappa_full.csv") |> as_tibble()

df_orig |> select(measure, gender, age, M, S, L, p_3:p_97) |> head() |> knitr::kable()
measure gender age M S L p_3 p_10 p_25 p_50 p_75 p_90 p_97
height_velocity f 0.0 19.301 0.262 0.780 10.353 13.069 15.955 19.301 22.780 26.015 29.298
height_velocity f 0.2 18.601 0.264 0.781 9.923 12.557 15.355 18.601 21.976 25.114 28.298
height_velocity f 0.4 17.903 0.266 0.782 9.497 12.048 14.759 17.903 21.172 24.212 27.297
height_velocity f 0.6 17.207 0.267 0.784 9.077 11.544 14.166 17.207 20.370 23.311 26.296
height_velocity f 0.8 16.515 0.269 0.785 8.663 11.045 13.577 16.515 19.570 22.412 25.295
height_velocity f 1.0 15.827 0.271 0.786 8.254 10.551 12.994 15.828 18.775 21.516 24.297

There were problems with the LMS parameters, so the goal was to re-estimate them from the percentile values (3 / 10 / 25 / 50 / 75 / 90 / 97). The LMS parameterization relates the measurement value X with the parameters L, M, S, and Z as follows:

X=M(1+LSZ)1/L,L0

or

X=MeSZ,L=0

Prepare training data

Show the code
df <- df_orig |> # wide to long format with published X at percentiles
  pivot_longer(
    p_3:p_97,
    names_to = "percentile",
    names_prefix = "p_",
    values_to = "X",
    names_transform = as.numeric
  ) |> 
  mutate(Z = qnorm(percentile / 100))

df_train <- df |>
  select(measure, gender, age, X, Z)

df_train |> head(n = 14) |> knitr::kable()
measure gender age X Z
height_velocity f 0.0 10.353 -1.8807936
height_velocity f 0.0 13.069 -1.2815516
height_velocity f 0.0 15.955 -0.6744898
height_velocity f 0.0 19.301 0.0000000
height_velocity f 0.0 22.780 0.6744898
height_velocity f 0.0 26.015 1.2815516
height_velocity f 0.0 29.298 1.8807936
height_velocity f 0.2 9.923 -1.8807936
height_velocity f 0.2 12.557 -1.2815516
height_velocity f 0.2 15.355 -0.6744898
height_velocity f 0.2 18.601 0.0000000
height_velocity f 0.2 21.976 0.6744898
height_velocity f 0.2 25.114 1.2815516
height_velocity f 0.2 28.298 1.8807936

Functions

The meat of the code is the estimate_LMS_parameters() function which takes a dataframe including columns of X and Z, optionally groups (for example, by anthropometric measurement, sex, and age), and performs non-linear least squares regression via minpack.lm::nlsLM() to estimate L, M, and S for each group.

The estimation function could be improved to more explicitly handle the case where L = 0, as the objective function converges to a different form as L approaches 0, but setting starting values for L that include values just above and below 0 seems to work. Because we are trying to estimate 3 parameters, at least 3 points per group are needed. In this dataset, we have 7 X / Z points per group.

##### Estimate_LMS_parameters #####

estimate_LMS_parameters <- function(df, ...) {
  # takes a dataframe of columns Z and X and optional (...) grouping variables
  # returns each group's estimated LMS parameters, as well as RMSE, R-squared,
  # convergence, n, and method comment
  #
  # Requires: `tidyverse` and `minpack.lm` libraries
  
  library(tidyverse)
  library(minpack.lm)  # nlsLM() for robust nonlinear least squares

  tolerance <- 1e-6 # used for L ≈ 0 and for when assessing `best_obj`
  rmse_tolerance <- 0.01 # max RMSE for regression to be considered "converged"
  
  # Vectorized model function for optimization
  LMS_model_vec <- function(params, Z) {
    L <- params[1]
    M <- params[2] 
    S <- params[3]
    if (abs(L) < tolerance) { # When L ≈ 0, use exponential form
      return(M * exp(S * Z))
    } else { # Normal case
      return(M * (1 + L*S*Z)^(1/L))
    }
  }
  
  # Function to estimate parameters for a single group
  estimate_single_group <- function(group_data) {
    Z <- group_data$Z
    X <- group_data$X
    n <- nrow(group_data)
    
    if (n < 3) { # Need at least 3 points to estimate 3 parameters
      return(data.frame(
        L = NA, M = NA, S = NA, rmse = NA,
        r_squared = NA, converged = FALSE,
        n_obs = n, method = "insufficient_data"))
    }

    # Test discontinuity at L = 0 (normal distribution with no skew)
    #
    # tibble(
    #   measure = "test", age = 0, gender = "m",
    #   percentile = c(3, 10, 50, 90, 97)
    # ) |>
    #   mutate(
    #     Z = qnorm(percentile / 100),
    #     X = 50 * exp(1 * Z) # formula for L = 0; M = 50, S = 1
    #   ) |> 
    #   estimate_LMS_parameters(measure, age, gender)
    #
    # L_starts <- c(-5, 5)              # fails to converge
    # L_starts <- c(-5, -0.01, 0.01, 5) # converges to L = 1.165e-06
    
    # Sets of starting values
    M_starts <- c(mean(X), median(X)) # estimating the median
    L_starts <- c(-5, -0.01, 0.01, 5) # handle potential L = 0
    S_starts <- c(0.01, 1)            # always positive
    
    best_params <- NULL
    best_obj <- Inf
    best_method <- "failed"
    
    # Try different starting combinations
    for (M_start in M_starts) {
      for (L_start in L_starts) {
        for (S_start in S_starts) {
          tryCatch({ # Using minpack.lm::nlsLM() robust nonlinear least squares
            # objective function NOT normalized to # of points
            # RMSE = sqrt(obj_va / n)
            if (is.null(best_params) || best_obj > tolerance) {
              fit <- nlsLM(
                # X ~ ifelse( # this didn't work
                #   abs(L) < tolerance,
                #   M * exp(S * Z),
                #   M * (1 + L * S * Z)^(1/L)
                # ),
                X ~ M * (1 + L * S * Z)^(1/L), # will fail if L is exactly 0
                data = data.frame(X = X, Z = Z),
                start = list(L = L_start, M = M_start, S = S_start),
                control = nls.lm.control(maxiter = 1000))
              params <- coef(fit)
              obj_val <- sum(residuals(fit)^2)
              if (obj_val < best_obj) {
                best_obj <- obj_val
                best_params <- params
                best_method <- "nlsLM"
              }
            }
          }, error = function(e) {
            # Unsuccessful fitting --> continue
          })
        }
      }
    }
    
    # Check if found a solution
    if (is.null(best_params)) {
      return(data.frame(
        L = NA, M = NA, S = NA,
        rmse = NA, r_squared = NA, converged = FALSE,
        n_obs = n, method = "all_failed"))
    }
    
    # Calculate fit statistics
    predicted <- LMS_model_vec(best_params, Z)
    residuals <- X - predicted
    rmse <- sqrt(mean(residuals^2))
    
    # R-squared
    tss <- sum((X - mean(X))^2)
    rss <- sum(residuals^2)
    r_squared <- 1 - rss/tss
    
    # Check convergence quality
    converged <- (rmse < rmse_tolerance) && # OK criteria
      all(is.finite(best_params)) &&        # pretty unrestrictive criteria
      (r_squared > 0)                       # pretty unrestrictive criteria
    
    return(data.frame(
      L = best_params[1],
      M = best_params[2], 
      S = best_params[3],
      rmse = rmse,
      r_squared = r_squared,
      converged = converged,
      n_obs = n,
      method = best_method
    ))
  }
  
  # Apply to each group
  grouping_syms <- rlang::ensyms(...)
  results <- df
  if (length(grouping_syms) > 0) {
    results <- results |> group_by(!!!grouping_syms)
  }
  results <- results |> 
    group_modify(~ estimate_single_group(.x)) %>%
    ungroup()
  return(results)
}

(Also included is a vectorized helper function from the peditools package to calculate X from L, M, S, and Z, which could also probably be improved by not using strict equality of L with 0.)

Show the code
##### z_lms_to_x #####
# From `peditools`
z_lms_to_x <- function (Z, L, M, S) {
  expand.arguments <- function(...) {
    dotList <- list(...)
    max.length <- max(sapply(dotList, length))
    lapply(dotList, rep, length = max.length)
  }
  temp <- expand.arguments(Z, L, M, S)
  Z = temp[[1]]
  L = temp[[2]]
  M = temp[[3]]
  S = temp[[4]]
  ifelse(L != 0, M * (1 + L * S * Z)^(1/L), M * exp(S * Z))
}

Use the training data to estimate LMS parameters

The full dataframe df_train includes 4 anthropometric measures, 2 genders, and 100 ages, so LMS parameters will be estimated from 800 separate groups of seven X / Z pairs (one for each of the 7 percentiles (3 / 10 / 25 / 50 / 75 / 90 / 97).

Show the code
df_est <- estimate_LMS_parameters(df_train, measure, gender, age) # group by measure, gender, and age

df_est |> head() |> knitr::kable()
measure gender age L M S rmse r_squared converged n_obs method
bmi f 0.0 0.0204592 14.14973 0.0923936 0.0001710 1 TRUE 7 nlsLM
bmi f 0.2 0.0124989 14.33114 0.0928675 0.0001556 1 TRUE 7 nlsLM
bmi f 0.4 0.0053028 14.50763 0.0933272 0.0002511 1 TRUE 7 nlsLM
bmi f 0.6 0.0007589 14.67681 0.0937863 0.0002292 1 TRUE 7 nlsLM
bmi f 0.8 -0.0076834 14.83683 0.0942656 0.0001313 1 TRUE 7 nlsLM
bmi f 1.0 -0.0158573 14.98664 0.0947315 0.0002322 1 TRUE 7 nlsLM

The LMS estimation results include some per-group statistics (like RMSE, R-squared, and number of observations). The RMSE values are quite good but not perfect, as they should be if the percentile values were all calculated correctly from source LMS values.This imperfect convergence reflects problems with the source data.

Show the code
summary(df_est |> mutate(measure = as.factor(measure), gender = as.factor(gender), method = as.factor(method)))
            measure    gender       age              L          
 bmi            :200   f:400   Min.   : 0.00   Min.   :-3.8136  
 height         :200   m:400   1st Qu.: 4.95   1st Qu.:-1.1473  
 height_velocity:200           Median : 9.90   Median :-0.1807  
 weight         :200           Mean   : 9.90   Mean   :-0.3799  
                               3rd Qu.:14.85   3rd Qu.: 0.6690  
                               Max.   :19.80   Max.   : 2.6379  
       M                  S                rmse             r_squared     
 Min.   :  0.8091   Min.   :0.04280   Min.   :3.656e-05   Min.   :0.9996  
 1st Qu.: 11.2486   1st Qu.:0.09131   1st Qu.:1.768e-04   1st Qu.:1.0000  
 Median : 17.4175   Median :0.14562   Median :2.264e-04   Median :1.0000  
 Mean   : 42.7057   Mean   :0.17739   Mean   :4.009e-03   Mean   :1.0000  
 3rd Qu.: 54.2090   3rd Qu.:0.22781   3rd Qu.:2.879e-04   3rd Qu.:1.0000  
 Max.   :166.2587   Max.   :0.48378   Max.   :1.325e-01   Max.   :1.0000  
 converged           n_obs     method   
 Mode :logical   Min.   :7   nlsLM:800  
 FALSE:53        1st Qu.:7              
 TRUE :747       Median :7              
                 Mean   :7              
                 3rd Qu.:7              
                 Max.   :7              

Compare reported X values with those calculated from re-estimated LMS values

The newly estimated LMS values were used to recalculate the X values. (In the technical brief, the problems exhibited by the published LMS values were also shown.)

Show the code
# Functionally assess estimatedLMS values
df_assess <- df_est |>
  transmute(measure, gender, age, estL = L, estM = M, estS = S) |>
  left_join(
    df_orig |> select(measure, gender, age, L, M, S, everything()),
    by = c("measure", "gender", "age")
  ) |>
  mutate(
    deltaL = estL - L, deltaM = estM - M, deltaS = estS - S
  ) |> 
  pivot_longer(p_3:p_97, names_to = "percentile", names_prefix = "p_", values_to = "X") |>
  mutate(
    # Calculate X from original and re-estimated LMS values
    percentile = as.numeric(percentile),
    calcX = z_lms_to_x(qnorm(percentile / 100.0), L, M, S), # using the originally published LMS values
    estX  = z_lms_to_x(qnorm(percentile / 100.0), estL, estM, estS)
  ) |> 
  select(
    chart, age, age_units, gender, measure, measure_units, L, M, S, estL, estM, estS, deltaL, deltaM, deltaS, percentile, X, calcX, estX
  )

Overlaying the reported X values with those calculated from the re-estimated LMS values show near-perfect alignment. (Each panel is reproduced in larger size below, to better visualize the minor imperfections present from the flawed original percentile measurement data.)

Show the code
# Plot X, calcX, and estX
g <- df_assess |>
  mutate(percentile = factor(percentile, ordered = TRUE)) |>
  ggplot(aes(age, X, group = percentile)) +
  geom_line(size = 0.2) +
  # geom_point(aes(age, calcX), color = "red", size = 0.4, alpha = 0.5) +
  geom_point(aes(age, estX), color = "blue", size = 0.3, alpha = 0.2) +
  facet_grid2(gender ~ measure, scales = "free_y", independent = "y") +
  theme_bw() +
  labs(
    x = "Age (years)",
    y = "X measurement"
  )
g

So, that’s how to regenerate LMS parameters from tables of percentile values. Theoretically, if the table of percentiles were correctly generated directly from LMS values, 3 values per group would be sufficient to perfectly recreate the LMS parameters. In that case, it’d probably best to use 10 / 50 / 90 percentile values, to avoid the decreased precision at more extreme Z scores.

Individual plots of reported vs calculated X values from re-estimated LMS

Show the code
for (m in c("bmi", "height", "height_velocity", "weight")) {
  {
  # for (g in c("f", "m")) {
    g <- df_assess |> 
      # filter(measure == m, gender == g) |> 
      filter(measure == m) |> 
      mutate(percentile = factor(percentile, ordered = TRUE)) |>
      ggplot(aes(age, X, group = percentile)) +
      geom_line() +
      # geom_point(aes(age, calcX), color = "red", size = 0.4, alpha = 0.5) +
      geom_point(aes(age, estX), color = "blue", size = 0.4, alpha = 0.5) +
      theme_bw() +
      labs(
        x = "Age (years)",
        y = "X measurement"
      ) +
      labs(
        # title = paste0(m, " - ", g)
        title = paste0(m)
      ) +
      facet_wrap(~gender)
    print(g)
  }
}

References

1.
Cole TJ. The LMS method for constructing normalized growth standards. European Journal of Clinical Nutrition. 1990;44(1):45-60.
2.
Cappa M, d’Aniello F, Digilio MC, et al. Noonan Syndrome Growth Charts and Genotypes: 15-Year Longitudinal Single-Centre Study. Hormone Research in Paediatrics. Published online July 22, 2024:1-13. doi:10.1159/000540092
3.
Chou JH. Correction of LMS Parameter Errors in Published Noonan Syndrome Growth Charts. MedRxiv. Published online July 2024. doi:10.1101/2025.07.29.25332354