Interpolating missing values – Incidental Findings

Linear interpolation of missing dependent values based on an independent variable Only interpolation permitted – values requiring extrapolation would remain NA.

One use case is data collected from multiple patients (IDs), with a column of ages, and a column of weights where some weights are missing (NA). For each ID, weights will be linearly interpolated based on the ages.

library(dplyr)

# Function to linearly interpolate missing values
interpolate_missing <- function(x, y) {
  na_inds <- which(is.na(y))
  if (length(na_inds) > 0) {
    approx_x <- x[!is.na(y)]
    approx_y <- y[!is.na(y)]
    y[na_inds] <- approx(approx_x, approx_y, xout = x[na_inds])$y
  }
  return(y)
}

Example

Create a dataframe with 3 columns:

id
independent value x
dependent value y with some values missing

I intentionally have some rows of the independent variable out of order and included one missing dependent value that would need extrapolation.

df <- data.frame( # Example dataframe
  id = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3),
  x = c(4, 2, 1, 5, 1, 3, 5, 1, 2, 3),
  y = c(20, NA, 10, NA, 5, NA, 15, 4, 5, 6)
)

df %>% knitr::kable()

id	x	y
1	4	20
1	2	NA
1	1	10
1	5	NA
2	1	5
2	3	NA
2	5	15
3	1	4
3	2	5
3	3	6

df %>% # Group by id and interpolate missing y values based on surrounding x and y values
  group_by(id) %>%
  mutate(y_interpolate = round(interpolate_missing(x, y), 2)) %>% 
  arrange(id, x) %>% 
  knitr::kable()

id	x	y	y_interpolate
1	1	10	10.00
1	2	NA	13.33
1	4	20	20.00
1	5	NA	NA
2	1	5	5.00
2	3	NA	10.00
2	5	15	15.00
3	1	4	4.00
3	2	5	5.00
3	3	6	6.00