Gestational age parsing

Helper functions for vectorized parsing of gestational age strings to numeric and back
R
Code snippet
Clinical
Published

February 4, 2025

With the goal of transitioning away from PHP-based web tools, I’ve started writing functions to improve user interface and experience when working with newborn gestational ages.

One of the goals in rewriting the original PHP functions is to ensure that the functions are vectorized to support bulk analysis.

Although it would have been possible to write it all in Base R, string regular expressions are just a pain without stringr from the tidyverse.

The following functions are fairly self-explanatory:

library(dplyr)
library(stringr)
library(knitr)

extract_weeks <- function(ga_string) {
  # Extract the first consecutive string of digits within a string and return as integer
  as.integer(str_extract(ga_string, "\\d+"))
}

extract_days <- function(ga_string) {
  # Extract the first single digit that is followed by either "/7" or "d"
  # - if not present, return 0 by default
  # - probably not worth the effort of requiring that valid "weeks" occurs prior to this digit
  # - badly malformed gestational ages will fail: i.e., 18/7
  #   - extract_weeks() will return 18 and extract_days() will return 8
  days <- str_extract(ga_string, "\\d(?=(/7|d))") # Extract single digit before "/7" or "d"
  days[is.na(days)] <- "0"
  as.integer(days)
}

ga_to_days <- function(ga_string) {
  # convert string gestational age to integer number of days after LMP
  as.integer(extract_weeks(ga_string)*7 + extract_days(ga_string))
}

days_to_weeks <- function(days) {
  # convert number of days after LMP to gestation age in weeks as string in ## #/7 format
  weeks <- floor(days / 7)
  days <- days %% 7
  result <- ifelse(is.na(weeks), NA, paste0(weeks, " ", days, "/7"))
  return(result)
}

The following is a test of a bunch of possible string inputs and the resulting extraction or calculation of:

input <- c(
  "39",
  "40 1/7",
  "GA 23w2/7",
  "24w3d",
  "Gestation: 35 4/7",
  "Unknown",
  "39/7 fails as 39 9/7"
)

tibble(input) |>
  mutate(
    weeks = extract_weeks(input),
    days = extract_days(input),
    ga_days = ga_to_days(input),
    ga_float = round(ga_days/7, 3),
    output = days_to_weeks(ga_days)
  ) |> 
  knitr::kable()
input weeks days ga_days ga_float output
39 39 0 273 39.000 39 0/7
40 1/7 40 1 281 40.143 40 1/7
GA 23w2/7 23 2 163 23.286 23 2/7
24w3d 24 3 171 24.429 24 3/7
Gestation: 35 4/7 35 4 249 35.571 35 4/7
Unknown NA 0 NA NA NA
39/7 fails as 39 9/7 39 9 282 40.286 40 2/7

As a side note, this is the first page developed with the Positron IDE, and it actually went more smoothly than using RStudio.