# Interpolating missing values

Linear interpolation of missing dependent values based on an independent variable, not permitting extrapolation
R
Code snippet
Published

July 10, 2024

Linear interpolation of missing dependent values based on an independent variable Only interpolation permitted – values requiring extrapolation would remain NA.

One use case is data collected from multiple patients (IDs), with a column of ages, and a column of weights where some weights are missing (NA). For each ID, weights will be linearly interpolated based on the ages.

``````library(dplyr)

# Function to linearly interpolate missing values
interpolate_missing <- function(x, y) {
na_inds <- which(is.na(y))
if (length(na_inds) > 0) {
approx_x <- x[!is.na(y)]
approx_y <- y[!is.na(y)]
y[na_inds] <- approx(approx_x, approx_y, xout = x[na_inds])\$y
}
return(y)
}``````

## Example

Create a dataframe with 3 columns:

• `id`
• independent value `x`
• dependent value `y` with some values missing

I intentionally have some rows of the independent variable out of order and included one missing dependent value that would need extrapolation.

``````df <- data.frame( # Example dataframe
id = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3),
x = c(4, 2, 1, 5, 1, 3, 5, 1, 2, 3),
y = c(20, NA, 10, NA, 5, NA, 15, 4, 5, 6)
)

df %>% knitr::kable()``````
id x y
1 4 20
1 2 NA
1 1 10
1 5 NA
2 1 5
2 3 NA
2 5 15
3 1 4
3 2 5
3 3 6
``````df %>% # Group by id and interpolate missing y values based on surrounding x and y values
group_by(id) %>%
mutate(y_interpolate = round(interpolate_missing(x, y), 2)) %>%
arrange(id, x) %>%
knitr::kable()``````
id x y y_interpolate
1 1 10 10.00
1 2 NA 13.33
1 4 20 20.00
1 5 NA NA
2 1 5 5.00
2 3 NA 10.00
2 5 15 15.00
3 1 4 4.00
3 2 5 5.00
3 3 6 6.00