Classify Suspected Infection in Patients

project
code
analysis
Author

Hamza Rahal

Published

June 5, 2024

1. This patient may have sepsis

Sepsis is a deadly syndrome where a patient has a severe infection that causes organ failure. The sooner septic patients are treated, the more likely they are to survive, but sepsis can be challenging to recognize. It may be possible to use hospital data to develop machine learning models that could flag patients who are likely to be septic. However, before we develop predictive algorithms, we need a reliable method to determine patients who are septic. One component of sepsis is a severe infection.

In this project, we will use two weeks of hospital electronic health record (EHR) data to find out which patients had a severe infection according to four criteria. We will look into the data to see if a doctor ordered a blood test to look for bacteria (a blood culture) and gave the patient a series of intervenous antibiotics.

Let’s get started!

Code
# Load packages
library(data.table)

# Read in the data
antibioticDT <- fread('antibioticDT.csv')

# Look at the first 30 rows
head(antibioticDT, 30)
    patient_id day_given antibiotic_type  route
         <int>     <int>          <char> <char>
 1:          1         2   ciprofloxacin     IV
 2:          1         4   ciprofloxacin     IV
 3:          1         6   ciprofloxacin     IV
 4:          1         7     doxycycline     IV
 5:          1         9     doxycycline     IV
 6:          1        15      penicillin     IV
 7:          1        16     doxycycline     IV
 8:          1        18   ciprofloxacin     IV
 9:          8         1     doxycycline     PO
10:          8         2      penicillin     IV
11:          8         3     doxycycline     IV
12:          8         6     doxycycline     PO
13:          8         8      penicillin     PO
14:          8        12      penicillin     IV
15:          9         8     doxycycline     IV
16:          9        12     doxycycline     PO
17:         12         4     doxycycline     PO
18:         12         9     doxycycline     IV
19:         16         1     doxycycline     IV
20:         16         4     amoxicillin     IV
21:         19         3     doxycycline     PO
22:         19         5     amoxicillin     IV
23:         19         6   ciprofloxacin     IV
24:         19        10     doxycycline     IV
25:         19        12      penicillin     IV
26:         23         1     doxycycline     IV
27:         23         1      penicillin     IV
28:         23         3     amoxicillin     IV
29:         23         3   ciprofloxacin     IV
30:         23         3     doxycycline     IV
    patient_id day_given antibiotic_type  route

2. Which antibiotics are “new”?

These data represent all drugs administered in a hospital over two weeks. Each row represents one time a patient was given an antibiotic. The variables include the patient identification number, the day the drug was administered, the name of the antibiotic, and how it was administered. For example, patient “8” received doxycycline by mouth on the first day of their stay.

We will identify patients with a serious infection using the following criteria.

Criteria for Suspected Infection*

  1. The patient receives antibiotics for a sequence of four days, with gaps of one day allowed.

  2. The sequence must start with a new antibiotic, defined as an antibiotic type that was not given in the previous two days.

  3. The sequence must start within two days of a blood culture.

  4. There must be at least one intervenous (I.V.) antibiotic within the +/-2 day window.

Let’s start with the second item by finding which rows represent “new antibiotics”. We will determine if each antibiotic was given to the patient in the prior two days. We’ll visualize this task by looking at the data sorted by id, antibiotic type, and day.

Code
# Sort the data and examine the first 40 rows
setorder(x = antibioticDT, patient_id, antibiotic_type, day_given)
antibioticDT[1:40]
    patient_id day_given antibiotic_type  route
         <int>     <int>          <char> <char>
 1:          1         2   ciprofloxacin     IV
 2:          1         4   ciprofloxacin     IV
 3:          1         6   ciprofloxacin     IV
 4:          1        18   ciprofloxacin     IV
 5:          1         7     doxycycline     IV
 6:          1         9     doxycycline     IV
 7:          1        16     doxycycline     IV
 8:          1        15      penicillin     IV
 9:          8         1     doxycycline     PO
10:          8         3     doxycycline     IV
11:          8         6     doxycycline     PO
12:          8         2      penicillin     IV
13:          8         8      penicillin     PO
14:          8        12      penicillin     IV
15:          9         8     doxycycline     IV
16:          9        12     doxycycline     PO
17:         12         4     doxycycline     PO
18:         12         9     doxycycline     IV
19:         16         4     amoxicillin     IV
20:         16         1     doxycycline     IV
21:         19         5     amoxicillin     IV
22:         19         6   ciprofloxacin     IV
23:         19         3     doxycycline     PO
24:         19        10     doxycycline     IV
25:         19        12      penicillin     IV
26:         23         3     amoxicillin     IV
27:         23         8     amoxicillin     IV
28:         23        10     amoxicillin     PO
29:         23         3   ciprofloxacin     IV
30:         23         5   ciprofloxacin     PO
31:         23        16   ciprofloxacin     IV
32:         23         1     doxycycline     IV
33:         23         3     doxycycline     IV
34:         23         4     doxycycline     IV
35:         23         5     doxycycline     IV
36:         23         6     doxycycline     IV
37:         23         6     doxycycline     PO
38:         23         9     doxycycline     PO
39:         23        10     doxycycline     IV
40:         23        11     doxycycline     PO
    patient_id day_given antibiotic_type  route
Code
# Use shift to calculate the last day a particular drug was administered
antibioticDT[ , last_administration_day := shift(day_given, n = 1, type = "lag"), 
  by = .(patient_id, antibiotic_type)]

# Calculate the number of days since the drug was last administered
antibioticDT[ , days_since_last_admin := day_given - last_administration_day]

# Create antibiotic_new with an initial value of one, then reset it to zero as needed
antibioticDT[, antibiotic_new := 1]
antibioticDT[days_since_last_admin <= 2, antibiotic_new := 0]
antibioticDT[1:40]
    patient_id day_given antibiotic_type  route last_administration_day
         <int>     <int>          <char> <char>                   <int>
 1:          1         2   ciprofloxacin     IV                      NA
 2:          1         4   ciprofloxacin     IV                       2
 3:          1         6   ciprofloxacin     IV                       4
 4:          1        18   ciprofloxacin     IV                       6
 5:          1         7     doxycycline     IV                      NA
 6:          1         9     doxycycline     IV                       7
 7:          1        16     doxycycline     IV                       9
 8:          1        15      penicillin     IV                      NA
 9:          8         1     doxycycline     PO                      NA
10:          8         3     doxycycline     IV                       1
11:          8         6     doxycycline     PO                       3
12:          8         2      penicillin     IV                      NA
13:          8         8      penicillin     PO                       2
14:          8        12      penicillin     IV                       8
15:          9         8     doxycycline     IV                      NA
16:          9        12     doxycycline     PO                       8
17:         12         4     doxycycline     PO                      NA
18:         12         9     doxycycline     IV                       4
19:         16         4     amoxicillin     IV                      NA
20:         16         1     doxycycline     IV                      NA
21:         19         5     amoxicillin     IV                      NA
22:         19         6   ciprofloxacin     IV                      NA
23:         19         3     doxycycline     PO                      NA
24:         19        10     doxycycline     IV                       3
25:         19        12      penicillin     IV                      NA
26:         23         3     amoxicillin     IV                      NA
27:         23         8     amoxicillin     IV                       3
28:         23        10     amoxicillin     PO                       8
29:         23         3   ciprofloxacin     IV                      NA
30:         23         5   ciprofloxacin     PO                       3
31:         23        16   ciprofloxacin     IV                       5
32:         23         1     doxycycline     IV                      NA
33:         23         3     doxycycline     IV                       1
34:         23         4     doxycycline     IV                       3
35:         23         5     doxycycline     IV                       4
36:         23         6     doxycycline     IV                       5
37:         23         6     doxycycline     PO                       6
38:         23         9     doxycycline     PO                       6
39:         23        10     doxycycline     IV                       9
40:         23        11     doxycycline     PO                      10
    patient_id day_given antibiotic_type  route last_administration_day
    days_since_last_admin antibiotic_new
                    <int>          <num>
 1:                    NA              1
 2:                     2              0
 3:                     2              0
 4:                    12              1
 5:                    NA              1
 6:                     2              0
 7:                     7              1
 8:                    NA              1
 9:                    NA              1
10:                     2              0
11:                     3              1
12:                    NA              1
13:                     6              1
14:                     4              1
15:                    NA              1
16:                     4              1
17:                    NA              1
18:                     5              1
19:                    NA              1
20:                    NA              1
21:                    NA              1
22:                    NA              1
23:                    NA              1
24:                     7              1
25:                    NA              1
26:                    NA              1
27:                     5              1
28:                     2              0
29:                    NA              1
30:                     2              0
31:                    11              1
32:                    NA              1
33:                     2              0
34:                     1              0
35:                     1              0
36:                     1              0
37:                     0              0
38:                     3              1
39:                     1              0
40:                     1              0
    days_since_last_admin antibiotic_new

3. Looking at the blood culture data

Now let’s look at blood culture data from the same two-week period in this hospital. These data are in blood_cultureDT.csv. Let’s start by reading it into the workspace and having a look at a few rows.

Each row represents one blood culture and gives the patient’s id and the day the blood culture test occurred. For example, patient “8” had a blood culture on the second day of their hospitalization and again on the thirteenth day. Notice that some patients from the antibiotic dataset are not in this dataset and vice versa. Some patients are in neither because they received neither antibiotics nor a blood culture.

Code
# Read in blood_cultureDT.csv
blood_cultureDT <- fread("blood_cultureDT.csv")

# Print the first 30 rows
blood_cultureDT[1:30]
    patient_id blood_culture_day
         <int>             <int>
 1:          1                 3
 2:          1                13
 3:          8                 2
 4:          8                13
 5:         23                 3
 6:         39                10
 7:         45                 4
 8:         45                 9
 9:         45                11
10:         51                 3
11:         51                 6
12:         59                 2
13:         64                 1
14:         66                 9
15:         66                10
16:         69                 2
17:         69                 6
18:         69                 7
19:         69                11
20:         69                16
21:         76                 1
22:         77                 3
23:         79                 5
24:         79                11
25:         79                12
26:         80                 3
27:         80                12
28:         81                 2
29:        112                 6
30:        115                 2
    patient_id blood_culture_day

4. Combine the antibiotic data and the blood culture data

To find which antibiotics were given close to a blood culture test, we need to combine the drug administration data with the blood culture data. We’ll keep only patients that are still candidates for infection—only those in both data sets.

A challenge with the data is that some patients had blood cultures on several different days. For each of those days, we will see if there is a sequence of antibiotic days close to them. To accomplish this, in the merge we will match each blood culture to each antibiotic day.

After sorting the data following the merge, you will see that each patient’s antibiotic sequence repeats for each blood culture day. This repetition allows us to look at each blood culture day and check if it is associated with a qualifying sequence of antibiotics.

Code
# Merge antibioticDT with blood_cultureDT
combinedDT <- merge(antibioticDT, blood_cultureDT, by = "patient_id", all = FALSE)

# Sort by patient_id, blood_culture_day, day_given, and antibiotic_type
setorder(combinedDT, patient_id, blood_culture_day, day_given, antibiotic_type)

# Print and examine the first 30 rows
combinedDT[1:30]
Key: <patient_id>
    patient_id day_given antibiotic_type  route last_administration_day
         <int>     <int>          <char> <char>                   <int>
 1:          1         2   ciprofloxacin     IV                      NA
 2:          1         4   ciprofloxacin     IV                       2
 3:          1         6   ciprofloxacin     IV                       4
 4:          1         7     doxycycline     IV                      NA
 5:          1         9     doxycycline     IV                       7
 6:          1        15      penicillin     IV                      NA
 7:          1        16     doxycycline     IV                       9
 8:          1        18   ciprofloxacin     IV                       6
 9:          1         2   ciprofloxacin     IV                      NA
10:          1         4   ciprofloxacin     IV                       2
11:          1         6   ciprofloxacin     IV                       4
12:          1         7     doxycycline     IV                      NA
13:          1         9     doxycycline     IV                       7
14:          1        15      penicillin     IV                      NA
15:          1        16     doxycycline     IV                       9
16:          1        18   ciprofloxacin     IV                       6
17:          8         1     doxycycline     PO                      NA
18:          8         2      penicillin     IV                      NA
19:          8         3     doxycycline     IV                       1
20:          8         6     doxycycline     PO                       3
21:          8         8      penicillin     PO                       2
22:          8        12      penicillin     IV                       8
23:          8         1     doxycycline     PO                      NA
24:          8         2      penicillin     IV                      NA
25:          8         3     doxycycline     IV                       1
26:          8         6     doxycycline     PO                       3
27:          8         8      penicillin     PO                       2
28:          8        12      penicillin     IV                       8
29:         23         1     doxycycline     IV                      NA
30:         23         1      penicillin     IV                      NA
    patient_id day_given antibiotic_type  route last_administration_day
    days_since_last_admin antibiotic_new blood_culture_day
                    <int>          <num>             <int>
 1:                    NA              1                 3
 2:                     2              0                 3
 3:                     2              0                 3
 4:                    NA              1                 3
 5:                     2              0                 3
 6:                    NA              1                 3
 7:                     7              1                 3
 8:                    12              1                 3
 9:                    NA              1                13
10:                     2              0                13
11:                     2              0                13
12:                    NA              1                13
13:                     2              0                13
14:                    NA              1                13
15:                     7              1                13
16:                    12              1                13
17:                    NA              1                 2
18:                    NA              1                 2
19:                     2              0                 2
20:                     3              1                 2
21:                     6              1                 2
22:                     4              1                 2
23:                    NA              1                13
24:                    NA              1                13
25:                     2              0                13
26:                     3              1                13
27:                     6              1                13
28:                     4              1                13
29:                    NA              1                 3
30:                    NA              1                 3
    days_since_last_admin antibiotic_new blood_culture_day

5. Determine whether each row is in-window

Now that we have the antibiotic and blood culture data combined, we can test each drug administration against each blood culture to see if it’s “in the window.”

Code
# Make a new variable called drug_in_bcx_window
combinedDT[ , 
  drug_in_bcx_window := 
           as.numeric(
               day_given - blood_culture_day <= 2 
               & 
               day_given - blood_culture_day >= -2)]
combinedDT[1:5]
Key: <patient_id>
   patient_id day_given antibiotic_type  route last_administration_day
        <int>     <int>          <char> <char>                   <int>
1:          1         2   ciprofloxacin     IV                      NA
2:          1         4   ciprofloxacin     IV                       2
3:          1         6   ciprofloxacin     IV                       4
4:          1         7     doxycycline     IV                      NA
5:          1         9     doxycycline     IV                       7
   days_since_last_admin antibiotic_new blood_culture_day drug_in_bcx_window
                   <int>          <num>             <int>              <num>
1:                    NA              1                 3                  1
2:                     2              0                 3                  1
3:                     2              0                 3                  0
4:                    NA              1                 3                  0
5:                     2              0                 3                  0

6. Check the I.V. requirement

Now let’s look at the fourth item in the criteria.

Criteria for Suspected Infection*

  1. The patient receives antibiotics for a sequence of four days, with gaps of one day allowed.

  2. The sequence must start with a new antibiotic, defined as an antibiotic type that was not given in the previous two days.

  3. The sequence must start within two days of a blood culture.

  4. There must be at least one intervenous (I.V.) antibiotic within the +/-2 day window.

    Code
    # Create a variable indicating if there was at least one I.V. drug given in the window
    combinedDT[ , 
      any_iv_in_bcx_window := as.numeric(any(route == 'IV' & drug_in_bcx_window == 1)),
      by = .(patient_id, blood_culture_day)]
    # Exclude rows in which the blood_culture_day does not have any I.V. drugs in window 
    combinedDT <- combinedDT[any_iv_in_bcx_window == 1]
    combinedDT[1:5]
    Key: <patient_id>
       patient_id day_given antibiotic_type  route last_administration_day
            <int>     <int>          <char> <char>                   <int>
    1:          1         2   ciprofloxacin     IV                      NA
    2:          1         4   ciprofloxacin     IV                       2
    3:          1         6   ciprofloxacin     IV                       4
    4:          1         7     doxycycline     IV                      NA
    5:          1         9     doxycycline     IV                       7
       days_since_last_admin antibiotic_new blood_culture_day drug_in_bcx_window
                       <int>          <num>             <int>              <num>
    1:                    NA              1                 3                  1
    2:                     2              0                 3                  1
    3:                     2              0                 3                  0
    4:                    NA              1                 3                  0
    5:                     2              0                 3                  0
       any_iv_in_bcx_window
                      <num>
    1:                    1
    2:                    1
    3:                    1
    4:                    1
    5:                    1

7. Find the first day of possible sequences

We’re getting close! Let’s review the criteria again.

Criteria for Suspected Infection*

  1. The patient receives antibiotics for a sequence of four days, with gaps of one day allowed.

  2. The sequence must start with a new antibiotic, defined as an antibiotic type that was not given in the previous two days.

  3. The sequence must start within two days of a blood culture.

  4. There must be at least one intervenous (I.V.) antibiotic within the +/-2 day window.

Let’s assess the first criterion by finding the first day of possible 4-day qualifying sequences.

Code
# Create a new variable called day_of_first_new_abx_in_window
combinedDT[,
          day_of_first_new_abx_in_window := 
           day_given[antibiotic_new == 1 & drug_in_bcx_window == 1][1],
           by = .(patient_id, blood_culture_day)
          ]

# Remove rows where the day is before this first qualifying day
combinedDT <- combinedDT[day_given >= day_of_first_new_abx_in_window]
combinedDT[1:5]
Key: <patient_id>
   patient_id day_given antibiotic_type  route last_administration_day
        <int>     <int>          <char> <char>                   <int>
1:          1         2   ciprofloxacin     IV                      NA
2:          1         4   ciprofloxacin     IV                       2
3:          1         6   ciprofloxacin     IV                       4
4:          1         7     doxycycline     IV                      NA
5:          1         9     doxycycline     IV                       7
   days_since_last_admin antibiotic_new blood_culture_day drug_in_bcx_window
                   <int>          <num>             <int>              <num>
1:                    NA              1                 3                  1
2:                     2              0                 3                  1
3:                     2              0                 3                  0
4:                    NA              1                 3                  0
5:                     2              0                 3                  0
   any_iv_in_bcx_window day_of_first_new_abx_in_window
                  <num>                          <int>
1:                    1                              2
2:                    1                              2
3:                    1                              2
4:                    1                              2
5:                    1                              2

8. Simplify the data

The first criterion is: The patient receives antibiotics for a sequence of four days, with gaps of one day allowed.

We’ve pinned down the first day of possible sequences in the previous task. Now we have to check for four-day sequences. We don’t need the drug type (name); we need the days the antibiotics were administered.

Code
# Create a new data.table containing only patient_id, blood_culture_day, and day_given
simplified_data <- combinedDT[, .(patient_id, blood_culture_day, day_given)]

# Remove duplicate rows
simplified_data <- unique(simplified_data)
simplified_data[1:5]
Key: <patient_id>
   patient_id blood_culture_day day_given
        <int>             <int>     <int>
1:          1                 3         2
2:          1                 3         4
3:          1                 3         6
4:          1                 3         7
5:          1                 3         9

9. Extract first four rows for each blood culture

To check for four-day sequences, let’s pull out the first four days (rows) for each patient/blood culture combination. Some patients will have less than four antibiotic days. We’ll remove them first.

Code
# Count the antibiotic days within each patient/blood culture day combination
simplified_data[, num_antibiotic_days := .N, by = .(patient_id, blood_culture_day)]

# Remove blood culture days with less than four rows 
simplified_data <- simplified_data[num_antibiotic_days >= 4]

# Select the first four days for each blood culture
first_four_days <- simplified_data[, .SD[1:4], by = .(patient_id, blood_culture_day)]
first_four_days[1:5]
   patient_id blood_culture_day day_given num_antibiotic_days
        <int>             <int>     <int>               <int>
1:          1                 3         2                   8
2:          1                 3         4                   8
3:          1                 3         6                   8
4:          1                 3         7                   8
5:          8                 2         1                   6

10. Consecutive sequence

Now we need to check whether each four-day sequence qualifies by having no gaps of more than one day.

Code
# Make the indicator for consecutive sequence
first_four_days[,
               four_in_seq := as.numeric(max(diff(day_given)) < 3), by = .(patient_id, blood_culture_day)
               ]

11. Select the patients who meet criteria

A patient would meet the criteria if any of their blood cultures were accompanied by a qualifying sequence of antibiotics. Now that we’ve determined which each blood culture qualify let’s select the patients who meet the criteria.

Code
# Select the rows which have four_in_seq equal to 1
suspected_infection <- first_four_days[four_in_seq == 1]

# Retain only the patient_id column
suspected_infection <- suspected_infection[, .(patient_id)]

# Remove duplicates
suspected_infection <- unique(suspected_infection)

# Make an infection indicator
suspected_infection[, infection := 1]
suspected_infection[1:5]
   patient_id infection
        <int>     <num>
1:          1         1
2:         23         1
3:         64         1
4:         76         1
5:        164         1

12. Find the prevalence of sepsis

In this project, we used two EHR datasets to flag patients who were suspected of having a serious infection. We also got a data.table workout!

So far, we’ve been looking at records of all antibiotics administered and blood cultures that occurred over two weeks at a particular hospital. However, not all patients who were hospitalized over this period are represented in combinedDT because not all of them took antibiotics or had blood culture tests. We have to read in and merge the rest of the patient information to see what percentage of patients at the hospital might have had a serious infection.

Code
# Read in "all_patients.csv"
all_patientsDT <- fread("all_patients.csv")

# Merge this with the infection flag data
all_patientsDT <- merge(all_patientsDT, suspected_infection, all = TRUE)

# Set any missing values of the infection flag to 0
all_patientsDT <- all_patientsDT[is.na(infection), infection := 0]
all_patientsDT[1:10]
Key: <patient_id>
    patient_id infection
         <int>     <num>
 1:          1         1
 2:          5         0
 3:          8         0
 4:          9         0
 5:         12         0
 6:         16         0
 7:         19         0
 8:         23         1
 9:         25         0
10:         39         0
Code
# Calculate the percentage of patients who met the criteria for presumed infection
ans  <- 100* all_patientsDT[, mean(infection)]
ans
[1] 14.94382