WHO growth standards, expanded

WHO Growth Standard expanded chart parameters – mildly automated data scraping and metadata parsing.
R
Clinical
PediTools
Data visualization
Published

December 4, 2024

I was notified recently about the availability of “expanded” tables of parameters for the WHO Growth Standards that are much more fine-grained than the ones that I’ve used in the past. The resolution is at the daily rather than monthly level.

The primary purpose of the expanded tables seems to be less for clinical use and more for generating very high resolution growth chart plots for “national health cards”.

Countries that need to incorporate the WHO charts into existing national health cards have to locally reproduce the charts to fit the format of the health cards. These expanded tables are required to reproduce growth charts with adequate precision. They provide: a) the centiles with 3 decimal precision; b) age in days for age-based indicators; c) additional centiles.

Although I thought it unlikely to be of significance for clinical (or even research) use, once I know there’s a dataset out there, it’s really hard not to play with it.

However, there were a total of 18 additional charts and copying and pasting the chart parameters out of that many source documents sounded really tedious. So, what could have been maybe an hour of copy-pasting became several hours of coding. Such is the life of a coder.

This post describes the partial automation of data-scraping of the expanded WHO Growth Standard parameters.

Source Files

The WHO makes available the parameter tables in Excel format, at the bottom of each growth standard’s web page, as “Expanded tables for constructing national health cards.” I manually downloaded each of the 18 files and put them in a single standards directory.

filenames <- list.files("standards")
filenames
 [1] "acfa-boys-zscore-expanded-tables.xlsx" 
 [2] "acfa-girls-zscore-expanded-tables.xlsx"
 [3] "bfa-boys-zscore-expanded-tables.xlsx"  
 [4] "bfa-girls-zscore-expanded-tables.xlsx" 
 [5] "hcfa-boys-zscore-expanded-tables.xlsx" 
 [6] "hcfa-girls-zscore-expanded-tables.xlsx"
 [7] "lhfa-boys-zscore-expanded-tables.xlsx" 
 [8] "lhfa-girls-zscore-expanded-tables.xlsx"
 [9] "ssfa-boys-zscore-expanded-table.xlsx"  
[10] "ssfa-girls-zscore-expanded-table.xlsx" 
[11] "tsfa-boys-zscore-expanded-tables.xlsx" 
[12] "tsfa-girls-zscore-expanded-tables.xlsx"
[13] "wfa-boys-zscore-expanded-tables.xlsx"  
[14] "wfa-girls-zscore-expanded-tables.xlsx" 
[15] "wfh-boys-zscore-expanded-tables.xlsx"  
[16] "wfh-girls-zscore-expanded-tables.xlsx" 
[17] "wfl-boys-zscore-expanded-table.xlsx"   
[18] "wfl-girls-zscore-expanded-table.xlsx"  

What’s pretty cool is that there is pretty clear naming convention and each Excel file has the same overall structure. In each file:

  • first column represents the continuous variable on the X-axis (age in days, except for the Weight for Length/Height charts, which are in cm)
  • second through fourth columns are the L, M, and S parameters, to allow calculation of Z-scores via the Cole LMS method
  • (the other columns are redundant – measurement values for a few specified Z-scores or percentiles, all of which can be derived by the Cole method)

Converting tables to dataframes with some metadata

The filenames have a consistent structure as well, with the first two parts identifying the specific chart type (“wfa” is “weight for age”) and the second part identifies the sex.

We can split the filenames and use a simple dictionary convert the chart types into the same naming convention I already use in my R peditools package.

Next, iterate over each of the Excel files and convert them to R dataframes with columns compatible with pre-existing peditools columns.

lmsdata_list <- list() # list to store extracted LMS dataframes
for (i in 1:length(filenames)) {
  df <- read_excel(paste0("standards/", filenames[i])) %>% select(1:4) # age, L, M, S
  gender <- ifelse(parts[[i]][2] == "boys", "m", "f") # convert "boys" / "girls" to "m" / "f"
  chart_code <- parts[[i]][1] # e.g., acfa, bfa, hcfa, lhfa, ssfa, tsfa, wfa, wfh, wfl
  temp <- strsplit(parts[[i]][1], "f")
  measure <- measures[[temp[[1]][1]]] # before 'f' in chart_code
  age_type <- measures[[temp[[1]][2]]] # after 'f' in chart_code
  
  # Group charts together. Each group can have multiple measures
  # - HCFA, LHFA, WFA, (and BFA) all have age (x-axis) range from 0 to 1856 days
  # - ACFA, SFA, and TFA all have age (x-axis) range from 91 to 1856 days
  # - WFH (x-axis range 65-120 cm) and WFL (x-axis range 45-110 cm) are the ONLY non-age-based charts; will separate given different ranges
  chart = case_when( # group similar charts, which will have distinct "measures"
        chart_code %in% c("wfa", "hcfa", "lhfa", "bfa") ~ "who_expanded",
        chart_code %in% c("ssfa", "tsfa", "acfa") ~ "who_expanded_arm_skin",
        chart_code %in% c("wfl") ~ "who_wt_for_len",
        chart_code %in% c("wfh") ~ "who_wt_for_ht",
        .default = "ERROR"
  )

  age_units <- tolower(names(df)[1])
  age_units = case_when(
    age_units == "day" ~ "days",
    age_units %in% c("height", "length") ~ "cm"
  )
  names(df)[1] <- "age"
  measure_units = case_when(
        measure %in% c("arm_circ", "head_circ", "length_height") ~ "cm",
        measure %in% c("triceps", "subscapular") ~ "mm",
        measure == "bmi" ~ "kg/m2",
        measure == "weight" ~ "kg",
        .default = "ERROR"
  )

  df <- df %>% 
    transmute(
      chart,
      age,
      age_units,
      gender,
      measure,
      measure_units,
      L, M, S
    )
  lmsdata_list[[i]] <- df
  print(paste0("Processed ", measure, " (", measure_units, ") for ", age_type, " (", age_units, ") - ", gender, " into ", chart))
}
[1] "Processed arm_circ (cm) for age (days) - m into who_expanded_arm_skin"
[1] "Processed arm_circ (cm) for age (days) - f into who_expanded_arm_skin"
[1] "Processed bmi (kg/m2) for age (days) - m into who_expanded"
[1] "Processed bmi (kg/m2) for age (days) - f into who_expanded"
[1] "Processed head_circ (cm) for age (days) - m into who_expanded"
[1] "Processed head_circ (cm) for age (days) - f into who_expanded"
[1] "Processed length_height (cm) for age (days) - m into who_expanded"
[1] "Processed length_height (cm) for age (days) - f into who_expanded"
[1] "Processed subscapular (mm) for age (days) - m into who_expanded_arm_skin"
[1] "Processed subscapular (mm) for age (days) - f into who_expanded_arm_skin"
[1] "Processed triceps (mm) for age (days) - m into who_expanded_arm_skin"
[1] "Processed triceps (mm) for age (days) - f into who_expanded_arm_skin"
[1] "Processed weight (kg) for age (days) - m into who_expanded"
[1] "Processed weight (kg) for age (days) - f into who_expanded"
[1] "Processed weight (kg) for height (cm) - m into who_wt_for_ht"
[1] "Processed weight (kg) for height (cm) - f into who_wt_for_ht"
[1] "Processed weight (kg) for length (cm) - m into who_wt_for_len"
[1] "Processed weight (kg) for length (cm) - f into who_wt_for_len"

Checking the import

Finally, combine them all into a single dataframe and check it.

lmsdata_new <- bind_rows(lmsdata_list) # combine all into a single dataframe
lmsdata_new %>% count(chart, measure, measure_units, age_units) %>% knitr::kable()
chart measure measure_units age_units n
who_expanded bmi kg/m2 days 3714
who_expanded head_circ cm days 3714
who_expanded length_height cm days 3714
who_expanded weight kg days 3714
who_expanded_arm_skin arm_circ cm days 3532
who_expanded_arm_skin subscapular mm days 3532
who_expanded_arm_skin triceps mm days 3532
who_wt_for_ht weight kg cm 1102
who_wt_for_len weight kg cm 1302

Another quick sanity check is to plot all the charts out, one panel per chart group, one color per measure, plotting both genders. This figure is generated with plotly so it’s possible to interact with each panel.

library(plotly)

g <- lmsdata_new %>%
  arrange(age) %>% 
  ggplot(aes(age, M, color = measure, group = gender)) +
  geom_line() +
  facet_wrap(~chart, scales = "free") +
  theme_bw()

ggplotly(g)