I was notified recently about the availability of “expanded” tables of parameters for the WHO Growth Standards that are much more fine-grained than the ones that I’ve used in the past. The resolution is at the daily rather than monthly level.
The primary purpose of the expanded tables seems to be less for clinical use and more for generating very high resolution growth chart plots for “national health cards”.
Countries that need to incorporate the WHO charts into existing national health cards have to locally reproduce the charts to fit the format of the health cards. These expanded tables are required to reproduce growth charts with adequate precision. They provide: a) the centiles with 3 decimal precision; b) age in days for age-based indicators; c) additional centiles.
Although I thought it unlikely to be of significance for clinical (or even research) use, once I know there’s a dataset out there, it’s really hard not to play with it.
However, there were a total of 18 additional charts and copying and pasting the chart parameters out of that many source documents sounded really tedious. So, what could have been maybe an hour of copy-pasting became several hours of coding. Such is the life of a coder.
This post describes the partial automation of data-scraping of the expanded WHO Growth Standard parameters.
Source Files
The WHO makes available the parameter tables in Excel format, at the bottom of each growth standard’s web page, as “Expanded tables for constructing national health cards.” I manually downloaded each of the 18 files and put them in a single standards
directory.
<- list.files("standards")
filenames filenames
[1] "acfa-boys-zscore-expanded-tables.xlsx"
[2] "acfa-girls-zscore-expanded-tables.xlsx"
[3] "bfa-boys-zscore-expanded-tables.xlsx"
[4] "bfa-girls-zscore-expanded-tables.xlsx"
[5] "hcfa-boys-zscore-expanded-tables.xlsx"
[6] "hcfa-girls-zscore-expanded-tables.xlsx"
[7] "lhfa-boys-zscore-expanded-tables.xlsx"
[8] "lhfa-girls-zscore-expanded-tables.xlsx"
[9] "ssfa-boys-zscore-expanded-table.xlsx"
[10] "ssfa-girls-zscore-expanded-table.xlsx"
[11] "tsfa-boys-zscore-expanded-tables.xlsx"
[12] "tsfa-girls-zscore-expanded-tables.xlsx"
[13] "wfa-boys-zscore-expanded-tables.xlsx"
[14] "wfa-girls-zscore-expanded-tables.xlsx"
[15] "wfh-boys-zscore-expanded-tables.xlsx"
[16] "wfh-girls-zscore-expanded-tables.xlsx"
[17] "wfl-boys-zscore-expanded-table.xlsx"
[18] "wfl-girls-zscore-expanded-table.xlsx"
What’s pretty cool is that there is pretty clear naming convention and each Excel file has the same overall structure. In each file:
- first column represents the continuous variable on the X-axis (age in days, except for the Weight for Length/Height charts, which are in cm)
- second through fourth columns are the L, M, and S parameters, to allow calculation of Z-scores via the Cole LMS method
- (the other columns are redundant – measurement values for a few specified Z-scores or percentiles, all of which can be derived by the Cole method)
Converting tables to dataframes with some metadata
The filenames have a consistent structure as well, with the first two parts identifying the specific chart type (“wfa” is “weight for age”) and the second part identifies the sex.
We can split the filenames and use a simple dictionary convert the chart types into the same naming convention I already use in my R peditools
package.
Next, iterate over each of the Excel files and convert them to R dataframes with columns compatible with pre-existing peditools
columns.
<- list() # list to store extracted LMS dataframes
lmsdata_list for (i in 1:length(filenames)) {
<- read_excel(paste0("standards/", filenames[i])) %>% select(1:4) # age, L, M, S
df <- ifelse(parts[[i]][2] == "boys", "m", "f") # convert "boys" / "girls" to "m" / "f"
gender <- parts[[i]][1] # e.g., acfa, bfa, hcfa, lhfa, ssfa, tsfa, wfa, wfh, wfl
chart_code <- strsplit(parts[[i]][1], "f")
temp <- measures[[temp[[1]][1]]] # before 'f' in chart_code
measure <- measures[[temp[[1]][2]]] # after 'f' in chart_code
age_type
# Group charts together. Each group can have multiple measures
# - HCFA, LHFA, WFA, (and BFA) all have age (x-axis) range from 0 to 1856 days
# - ACFA, SFA, and TFA all have age (x-axis) range from 91 to 1856 days
# - WFH (x-axis range 65-120 cm) and WFL (x-axis range 45-110 cm) are the ONLY non-age-based charts; will separate given different ranges
= case_when( # group similar charts, which will have distinct "measures"
chart %in% c("wfa", "hcfa", "lhfa", "bfa") ~ "who_expanded",
chart_code %in% c("ssfa", "tsfa", "acfa") ~ "who_expanded_arm_skin",
chart_code %in% c("wfl") ~ "who_wt_for_len",
chart_code %in% c("wfh") ~ "who_wt_for_ht",
chart_code .default = "ERROR"
)
<- tolower(names(df)[1])
age_units = case_when(
age_units == "day" ~ "days",
age_units %in% c("height", "length") ~ "cm"
age_units
)names(df)[1] <- "age"
= case_when(
measure_units %in% c("arm_circ", "head_circ", "length_height") ~ "cm",
measure %in% c("triceps", "subscapular") ~ "mm",
measure == "bmi" ~ "kg/m2",
measure == "weight" ~ "kg",
measure .default = "ERROR"
)
<- df %>%
df transmute(
chart,
age,
age_units,
gender,
measure,
measure_units,
L, M, S
)<- df
lmsdata_list[[i]] print(paste0("Processed ", measure, " (", measure_units, ") for ", age_type, " (", age_units, ") - ", gender, " into ", chart))
}
[1] "Processed arm_circ (cm) for age (days) - m into who_expanded_arm_skin"
[1] "Processed arm_circ (cm) for age (days) - f into who_expanded_arm_skin"
[1] "Processed bmi (kg/m2) for age (days) - m into who_expanded"
[1] "Processed bmi (kg/m2) for age (days) - f into who_expanded"
[1] "Processed head_circ (cm) for age (days) - m into who_expanded"
[1] "Processed head_circ (cm) for age (days) - f into who_expanded"
[1] "Processed length_height (cm) for age (days) - m into who_expanded"
[1] "Processed length_height (cm) for age (days) - f into who_expanded"
[1] "Processed subscapular (mm) for age (days) - m into who_expanded_arm_skin"
[1] "Processed subscapular (mm) for age (days) - f into who_expanded_arm_skin"
[1] "Processed triceps (mm) for age (days) - m into who_expanded_arm_skin"
[1] "Processed triceps (mm) for age (days) - f into who_expanded_arm_skin"
[1] "Processed weight (kg) for age (days) - m into who_expanded"
[1] "Processed weight (kg) for age (days) - f into who_expanded"
[1] "Processed weight (kg) for height (cm) - m into who_wt_for_ht"
[1] "Processed weight (kg) for height (cm) - f into who_wt_for_ht"
[1] "Processed weight (kg) for length (cm) - m into who_wt_for_len"
[1] "Processed weight (kg) for length (cm) - f into who_wt_for_len"
Checking the import
Finally, combine them all into a single dataframe and check it.
<- bind_rows(lmsdata_list) # combine all into a single dataframe
lmsdata_new %>% count(chart, measure, measure_units, age_units) %>% knitr::kable() lmsdata_new
chart | measure | measure_units | age_units | n |
---|---|---|---|---|
who_expanded | bmi | kg/m2 | days | 3714 |
who_expanded | head_circ | cm | days | 3714 |
who_expanded | length_height | cm | days | 3714 |
who_expanded | weight | kg | days | 3714 |
who_expanded_arm_skin | arm_circ | cm | days | 3532 |
who_expanded_arm_skin | subscapular | mm | days | 3532 |
who_expanded_arm_skin | triceps | mm | days | 3532 |
who_wt_for_ht | weight | kg | cm | 1102 |
who_wt_for_len | weight | kg | cm | 1302 |
Another quick sanity check is to plot all the charts out, one panel per chart group, one color per measure, plotting both genders. This figure is generated with plotly
so it’s possible to interact with each panel.
library(plotly)
<- lmsdata_new %>%
g arrange(age) %>%
ggplot(aes(age, M, color = measure, group = gender)) +
geom_line() +
facet_wrap(~chart, scales = "free") +
theme_bw()
ggplotly(g)