Interpolating missing values

Linear interpolation of missing dependent values based on an independent variable, not permitting extrapolation
Published

July 10, 2024

Linear interpolation of missing dependent values based on an independent variable Only interpolation permitted – values requiring extrapolation would remain NA.

One use case is data collected from multiple patients (IDs), with a column of ages, and a column of weights where some weights are missing (NA). For each ID, weights will be linearly interpolated based on the ages.

library(dplyr)

# Function to linearly interpolate missing values
interpolate_missing <- function(x, y) {
  na_inds <- which(is.na(y))
  if (length(na_inds) > 0) {
    approx_x <- x[!is.na(y)]
    approx_y <- y[!is.na(y)]
    y[na_inds] <- approx(approx_x, approx_y, xout = x[na_inds])$y
  }
  return(y)
}

Example

Create a dataframe with 3 columns:

  • id
  • independent value x
  • dependent value y with some values missing

I intentionally have some rows of the independent variable out of order and included one missing dependent value that would need extrapolation.

df <- data.frame( # Example dataframe
  id = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3),
  x = c(4, 2, 1, 5, 1, 3, 5, 1, 2, 3),
  y = c(20, NA, 10, NA, 5, NA, 15, 4, 5, 6)
)

df %>% knitr::kable()
id x y
1 4 20
1 2 NA
1 1 10
1 5 NA
2 1 5
2 3 NA
2 5 15
3 1 4
3 2 5
3 3 6
df %>% # Group by id and interpolate missing y values based on surrounding x and y values
  group_by(id) %>%
  mutate(y_interpolate = round(interpolate_missing(x, y), 2)) %>% 
  arrange(id, x) %>% 
  knitr::kable()
id x y y_interpolate
1 1 10 10.00
1 2 NA 13.33
1 4 20 20.00
1 5 NA NA
2 1 5 5.00
2 3 NA 10.00
2 5 15 15.00
3 1 4 4.00
3 2 5 5.00
3 3 6 6.00