# Generating a union of overlapping intervals

This function takes a set of overlapping intervals and returns the union of the intervals.

## Use case example

- calculate the total duration a patient has had any line in place
- your dataframe has:
- one row per line
- columns of start and end times for each line

- if two lines are in place at the same time, they should
be double-counted to contributing to duration*not* - so, you need to generate a non-overlapping set of intervals prior to calculating the total duration

```
########## Function to generate non-overlapping intervals ##########
# - dataframe with row of intervals, with column names 'start' and 'stop'
# - consolidates overlapping intervals and returns a dataframe with non-redundant, non-overlapping dataframes
# - logic of the if(elseif) -> if: works but might not be optimized
library(dplyr) # for %>%, arrange, bind_rows
interval_union <- function(input) {
if (nrow(input) == 1) { # if only 1 input interval, return it
return(input)
}
input <- input %>% arrange(start) # sort inputs by start
output = input[1, ] # start off output with just the first (earliest) interval
for (i in 2:nrow(input)) { # loop from 2nd to last input interval
x <- input[i, ] # the next interval to work on
if (output$stop[nrow(output)] < x$start) { # the next interval starts after the end of the last output interval...
output <- bind_rows(output, x) # ... so add it as a NEW output interval; it will not be the last output interval
} else if (output$stop[nrow(output)] == x$start) { # the next interval starts exactly at the end of the last output interval...
output$stop[nrow(output)] <- x$stop # ... so just extend the previous last output interval; it's still the last one
}
if (x$stop > output$stop[nrow(output)]) { # the next interval ENDS after the end of the last output interval...
output$stop[nrow(output)] <- x$stop # ... so just extend the previous last output interval; it's still the last one
}
}
return(output)
}
```

```
d <- data.frame(
start = c('2005-01-01', '2000-01-01', '2001-01-01'),
stop = c('2006-01-02', '2001-01-02', '2004-01-02'),
stringsAsFactors = FALSE
)
d
```

```
## start stop
## 1 2005-01-01 2006-01-02
## 2 2000-01-01 2001-01-02
## 3 2001-01-01 2004-01-02
```

`interval_union(d)`

```
## start stop
## 1 2000-01-01 2004-01-02
## 2 2005-01-01 2006-01-02
```

It would definitely be possible to rewrite this using just base R, but because all my coding projects are in the tidyverse, I prefer to continue using its functions.