Sepsis is a deadly syndrome where a patient has a severe infection that causes organ failure. The sooner septic patients are treated, the more likely they are to survive, but sepsis can be challenging to recognize. It may be possible to use hospital data to develop machine learning models that could flag patients who are likely to be septic. However, before we develop predictive algorithms, we need a reliable method to determine patients who are septic. One component of sepsis is a severe infection.
In this project, we will use two weeks of hospital electronic health record (EHR) data to find out which patients had a severe infection according to four criteria. We will look into the data to see if a doctor ordered a blood test to look for bacteria (a blood culture) and gave the patient a series of intervenous antibiotics.
Let’s get started!
Code
# Load packageslibrary(data.table)# Read in the dataantibioticDT <-fread('antibioticDT.csv')# Look at the first 30 rowshead(antibioticDT, 30)
patient_id day_given antibiotic_type route
<int> <int> <char> <char>
1: 1 2 ciprofloxacin IV
2: 1 4 ciprofloxacin IV
3: 1 6 ciprofloxacin IV
4: 1 7 doxycycline IV
5: 1 9 doxycycline IV
6: 1 15 penicillin IV
7: 1 16 doxycycline IV
8: 1 18 ciprofloxacin IV
9: 8 1 doxycycline PO
10: 8 2 penicillin IV
11: 8 3 doxycycline IV
12: 8 6 doxycycline PO
13: 8 8 penicillin PO
14: 8 12 penicillin IV
15: 9 8 doxycycline IV
16: 9 12 doxycycline PO
17: 12 4 doxycycline PO
18: 12 9 doxycycline IV
19: 16 1 doxycycline IV
20: 16 4 amoxicillin IV
21: 19 3 doxycycline PO
22: 19 5 amoxicillin IV
23: 19 6 ciprofloxacin IV
24: 19 10 doxycycline IV
25: 19 12 penicillin IV
26: 23 1 doxycycline IV
27: 23 1 penicillin IV
28: 23 3 amoxicillin IV
29: 23 3 ciprofloxacin IV
30: 23 3 doxycycline IV
patient_id day_given antibiotic_type route
2. Which antibiotics are “new”?
These data represent all drugs administered in a hospital over two weeks. Each row represents one time a patient was given an antibiotic. The variables include the patient identification number, the day the drug was administered, the name of the antibiotic, and how it was administered. For example, patient “8” received doxycycline by mouth on the first day of their stay.
We will identify patients with a serious infection using the following criteria.
The patient receives antibiotics for a sequence of four days, with gaps of one day allowed.
The sequence must start with a new antibiotic, defined as an antibiotic type that was not given in the previous two days.
The sequence must start within two days of a blood culture.
There must be at least one intervenous (I.V.) antibiotic within the +/-2 day window.
Let’s start with the second item by finding which rows represent “new antibiotics”. We will determine if each antibiotic was given to the patient in the prior two days. We’ll visualize this task by looking at the data sorted by id, antibiotic type, and day.
Code
# Sort the data and examine the first 40 rowssetorder(x = antibioticDT, patient_id, antibiotic_type, day_given)antibioticDT[1:40]
patient_id day_given antibiotic_type route
<int> <int> <char> <char>
1: 1 2 ciprofloxacin IV
2: 1 4 ciprofloxacin IV
3: 1 6 ciprofloxacin IV
4: 1 18 ciprofloxacin IV
5: 1 7 doxycycline IV
6: 1 9 doxycycline IV
7: 1 16 doxycycline IV
8: 1 15 penicillin IV
9: 8 1 doxycycline PO
10: 8 3 doxycycline IV
11: 8 6 doxycycline PO
12: 8 2 penicillin IV
13: 8 8 penicillin PO
14: 8 12 penicillin IV
15: 9 8 doxycycline IV
16: 9 12 doxycycline PO
17: 12 4 doxycycline PO
18: 12 9 doxycycline IV
19: 16 4 amoxicillin IV
20: 16 1 doxycycline IV
21: 19 5 amoxicillin IV
22: 19 6 ciprofloxacin IV
23: 19 3 doxycycline PO
24: 19 10 doxycycline IV
25: 19 12 penicillin IV
26: 23 3 amoxicillin IV
27: 23 8 amoxicillin IV
28: 23 10 amoxicillin PO
29: 23 3 ciprofloxacin IV
30: 23 5 ciprofloxacin PO
31: 23 16 ciprofloxacin IV
32: 23 1 doxycycline IV
33: 23 3 doxycycline IV
34: 23 4 doxycycline IV
35: 23 5 doxycycline IV
36: 23 6 doxycycline IV
37: 23 6 doxycycline PO
38: 23 9 doxycycline PO
39: 23 10 doxycycline IV
40: 23 11 doxycycline PO
patient_id day_given antibiotic_type route
Code
# Use shift to calculate the last day a particular drug was administeredantibioticDT[ , last_administration_day :=shift(day_given, n =1, type ="lag"), by = .(patient_id, antibiotic_type)]# Calculate the number of days since the drug was last administeredantibioticDT[ , days_since_last_admin := day_given - last_administration_day]# Create antibiotic_new with an initial value of one, then reset it to zero as neededantibioticDT[, antibiotic_new :=1]antibioticDT[days_since_last_admin <=2, antibiotic_new :=0]antibioticDT[1:40]
patient_id day_given antibiotic_type route last_administration_day
<int> <int> <char> <char> <int>
1: 1 2 ciprofloxacin IV NA
2: 1 4 ciprofloxacin IV 2
3: 1 6 ciprofloxacin IV 4
4: 1 18 ciprofloxacin IV 6
5: 1 7 doxycycline IV NA
6: 1 9 doxycycline IV 7
7: 1 16 doxycycline IV 9
8: 1 15 penicillin IV NA
9: 8 1 doxycycline PO NA
10: 8 3 doxycycline IV 1
11: 8 6 doxycycline PO 3
12: 8 2 penicillin IV NA
13: 8 8 penicillin PO 2
14: 8 12 penicillin IV 8
15: 9 8 doxycycline IV NA
16: 9 12 doxycycline PO 8
17: 12 4 doxycycline PO NA
18: 12 9 doxycycline IV 4
19: 16 4 amoxicillin IV NA
20: 16 1 doxycycline IV NA
21: 19 5 amoxicillin IV NA
22: 19 6 ciprofloxacin IV NA
23: 19 3 doxycycline PO NA
24: 19 10 doxycycline IV 3
25: 19 12 penicillin IV NA
26: 23 3 amoxicillin IV NA
27: 23 8 amoxicillin IV 3
28: 23 10 amoxicillin PO 8
29: 23 3 ciprofloxacin IV NA
30: 23 5 ciprofloxacin PO 3
31: 23 16 ciprofloxacin IV 5
32: 23 1 doxycycline IV NA
33: 23 3 doxycycline IV 1
34: 23 4 doxycycline IV 3
35: 23 5 doxycycline IV 4
36: 23 6 doxycycline IV 5
37: 23 6 doxycycline PO 6
38: 23 9 doxycycline PO 6
39: 23 10 doxycycline IV 9
40: 23 11 doxycycline PO 10
patient_id day_given antibiotic_type route last_administration_day
days_since_last_admin antibiotic_new
<int> <num>
1: NA 1
2: 2 0
3: 2 0
4: 12 1
5: NA 1
6: 2 0
7: 7 1
8: NA 1
9: NA 1
10: 2 0
11: 3 1
12: NA 1
13: 6 1
14: 4 1
15: NA 1
16: 4 1
17: NA 1
18: 5 1
19: NA 1
20: NA 1
21: NA 1
22: NA 1
23: NA 1
24: 7 1
25: NA 1
26: NA 1
27: 5 1
28: 2 0
29: NA 1
30: 2 0
31: 11 1
32: NA 1
33: 2 0
34: 1 0
35: 1 0
36: 1 0
37: 0 0
38: 3 1
39: 1 0
40: 1 0
days_since_last_admin antibiotic_new
3. Looking at the blood culture data
Now let’s look at blood culture data from the same two-week period in this hospital. These data are in blood_cultureDT.csv. Let’s start by reading it into the workspace and having a look at a few rows.
Each row represents one blood culture and gives the patient’s id and the day the blood culture test occurred. For example, patient “8” had a blood culture on the second day of their hospitalization and again on the thirteenth day. Notice that some patients from the antibiotic dataset are not in this dataset and vice versa. Some patients are in neither because they received neither antibiotics nor a blood culture.
Code
# Read in blood_cultureDT.csvblood_cultureDT <-fread("blood_cultureDT.csv")# Print the first 30 rowsblood_cultureDT[1:30]
4. Combine the antibiotic data and the blood culture data
To find which antibiotics were given close to a blood culture test, we need to combine the drug administration data with the blood culture data. We’ll keep only patients that are still candidates for infection—only those in both data sets.
A challenge with the data is that some patients had blood cultures on several different days. For each of those days, we will see if there is a sequence of antibiotic days close to them. To accomplish this, in the merge we will match each blood culture to each antibiotic day.
After sorting the data following the merge, you will see that each patient’s antibiotic sequence repeats for each blood culture day. This repetition allows us to look at each blood culture day and check if it is associated with a qualifying sequence of antibiotics.
Code
# Merge antibioticDT with blood_cultureDTcombinedDT <-merge(antibioticDT, blood_cultureDT, by ="patient_id", all =FALSE)# Sort by patient_id, blood_culture_day, day_given, and antibiotic_typesetorder(combinedDT, patient_id, blood_culture_day, day_given, antibiotic_type)# Print and examine the first 30 rowscombinedDT[1:30]
Key: <patient_id>
patient_id day_given antibiotic_type route last_administration_day
<int> <int> <char> <char> <int>
1: 1 2 ciprofloxacin IV NA
2: 1 4 ciprofloxacin IV 2
3: 1 6 ciprofloxacin IV 4
4: 1 7 doxycycline IV NA
5: 1 9 doxycycline IV 7
6: 1 15 penicillin IV NA
7: 1 16 doxycycline IV 9
8: 1 18 ciprofloxacin IV 6
9: 1 2 ciprofloxacin IV NA
10: 1 4 ciprofloxacin IV 2
11: 1 6 ciprofloxacin IV 4
12: 1 7 doxycycline IV NA
13: 1 9 doxycycline IV 7
14: 1 15 penicillin IV NA
15: 1 16 doxycycline IV 9
16: 1 18 ciprofloxacin IV 6
17: 8 1 doxycycline PO NA
18: 8 2 penicillin IV NA
19: 8 3 doxycycline IV 1
20: 8 6 doxycycline PO 3
21: 8 8 penicillin PO 2
22: 8 12 penicillin IV 8
23: 8 1 doxycycline PO NA
24: 8 2 penicillin IV NA
25: 8 3 doxycycline IV 1
26: 8 6 doxycycline PO 3
27: 8 8 penicillin PO 2
28: 8 12 penicillin IV 8
29: 23 1 doxycycline IV NA
30: 23 1 penicillin IV NA
patient_id day_given antibiotic_type route last_administration_day
days_since_last_admin antibiotic_new blood_culture_day
<int> <num> <int>
1: NA 1 3
2: 2 0 3
3: 2 0 3
4: NA 1 3
5: 2 0 3
6: NA 1 3
7: 7 1 3
8: 12 1 3
9: NA 1 13
10: 2 0 13
11: 2 0 13
12: NA 1 13
13: 2 0 13
14: NA 1 13
15: 7 1 13
16: 12 1 13
17: NA 1 2
18: NA 1 2
19: 2 0 2
20: 3 1 2
21: 6 1 2
22: 4 1 2
23: NA 1 13
24: NA 1 13
25: 2 0 13
26: 3 1 13
27: 6 1 13
28: 4 1 13
29: NA 1 3
30: NA 1 3
days_since_last_admin antibiotic_new blood_culture_day
5. Determine whether each row is in-window
Now that we have the antibiotic and blood culture data combined, we can test each drug administration against each blood culture to see if it’s “in the window.”
Code
# Make a new variable called drug_in_bcx_windowcombinedDT[ , drug_in_bcx_window :=as.numeric( day_given - blood_culture_day <=2& day_given - blood_culture_day >=-2)]combinedDT[1:5]
Key: <patient_id>
patient_id day_given antibiotic_type route last_administration_day
<int> <int> <char> <char> <int>
1: 1 2 ciprofloxacin IV NA
2: 1 4 ciprofloxacin IV 2
3: 1 6 ciprofloxacin IV 4
4: 1 7 doxycycline IV NA
5: 1 9 doxycycline IV 7
days_since_last_admin antibiotic_new blood_culture_day drug_in_bcx_window
<int> <num> <int> <num>
1: NA 1 3 1
2: 2 0 3 1
3: 2 0 3 0
4: NA 1 3 0
5: 2 0 3 0
6. Check the I.V. requirement
Now let’s look at the fourth item in the criteria.
The patient receives antibiotics for a sequence of four days, with gaps of one day allowed.
The sequence must start with a new antibiotic, defined as an antibiotic type that was not given in the previous two days.
The sequence must start within two days of a blood culture.
There must be at least one intervenous (I.V.) antibiotic within the +/-2 day window.
Code
# Create a variable indicating if there was at least one I.V. drug given in the windowcombinedDT[ , any_iv_in_bcx_window :=as.numeric(any(route =='IV'& drug_in_bcx_window ==1)), by = .(patient_id, blood_culture_day)]# Exclude rows in which the blood_culture_day does not have any I.V. drugs in window combinedDT <- combinedDT[any_iv_in_bcx_window ==1]combinedDT[1:5]
Key: <patient_id>
patient_id day_given antibiotic_type route last_administration_day
<int> <int> <char> <char> <int>
1: 1 2 ciprofloxacin IV NA
2: 1 4 ciprofloxacin IV 2
3: 1 6 ciprofloxacin IV 4
4: 1 7 doxycycline IV NA
5: 1 9 doxycycline IV 7
days_since_last_admin antibiotic_new blood_culture_day drug_in_bcx_window
<int> <num> <int> <num>
1: NA 1 3 1
2: 2 0 3 1
3: 2 0 3 0
4: NA 1 3 0
5: 2 0 3 0
any_iv_in_bcx_window
<num>
1: 1
2: 1
3: 1
4: 1
5: 1
7. Find the first day of possible sequences
We’re getting close! Let’s review the criteria again.
The patient receives antibiotics for a sequence of four days, with gaps of one day allowed.
The sequence must start with a new antibiotic, defined as an antibiotic type that was not given in the previous two days.
The sequence must start within two days of a blood culture.
There must be at least one intervenous (I.V.) antibiotic within the +/-2 day window.
Let’s assess the first criterion by finding the first day of possible 4-day qualifying sequences.
Code
# Create a new variable called day_of_first_new_abx_in_windowcombinedDT[, day_of_first_new_abx_in_window := day_given[antibiotic_new ==1& drug_in_bcx_window ==1][1], by = .(patient_id, blood_culture_day) ]# Remove rows where the day is before this first qualifying daycombinedDT <- combinedDT[day_given >= day_of_first_new_abx_in_window]combinedDT[1:5]
The first criterion is: The patient receives antibiotics for a sequence of four days, with gaps of one day allowed.
We’ve pinned down the first day of possible sequences in the previous task. Now we have to check for four-day sequences. We don’t need the drug type (name); we need the days the antibiotics were administered.
Code
# Create a new data.table containing only patient_id, blood_culture_day, and day_givensimplified_data <- combinedDT[, .(patient_id, blood_culture_day, day_given)]# Remove duplicate rowssimplified_data <-unique(simplified_data)simplified_data[1:5]
To check for four-day sequences, let’s pull out the first four days (rows) for each patient/blood culture combination. Some patients will have less than four antibiotic days. We’ll remove them first.
Code
# Count the antibiotic days within each patient/blood culture day combinationsimplified_data[, num_antibiotic_days := .N, by = .(patient_id, blood_culture_day)]# Remove blood culture days with less than four rows simplified_data <- simplified_data[num_antibiotic_days >=4]# Select the first four days for each blood culturefirst_four_days <- simplified_data[, .SD[1:4], by = .(patient_id, blood_culture_day)]first_four_days[1:5]
Now we need to check whether each four-day sequence qualifies by having no gaps of more than one day.
Code
# Make the indicator for consecutive sequencefirst_four_days[, four_in_seq :=as.numeric(max(diff(day_given)) <3), by = .(patient_id, blood_culture_day) ]
11. Select the patients who meet criteria
A patient would meet the criteria if any of their blood cultures were accompanied by a qualifying sequence of antibiotics. Now that we’ve determined which each blood culture qualify let’s select the patients who meet the criteria.
Code
# Select the rows which have four_in_seq equal to 1suspected_infection <- first_four_days[four_in_seq ==1]# Retain only the patient_id columnsuspected_infection <- suspected_infection[, .(patient_id)]# Remove duplicatessuspected_infection <-unique(suspected_infection)# Make an infection indicatorsuspected_infection[, infection :=1]suspected_infection[1:5]
In this project, we used two EHR datasets to flag patients who were suspected of having a serious infection. We also got a data.table workout!
So far, we’ve been looking at records of all antibiotics administered and blood cultures that occurred over two weeks at a particular hospital. However, not all patients who were hospitalized over this period are represented in combinedDT because not all of them took antibiotics or had blood culture tests. We have to read in and merge the rest of the patient information to see what percentage of patients at the hospital might have had a serious infection.
Code
# Read in "all_patients.csv"all_patientsDT <-fread("all_patients.csv")# Merge this with the infection flag dataall_patientsDT <-merge(all_patientsDT, suspected_infection, all =TRUE)# Set any missing values of the infection flag to 0all_patientsDT <- all_patientsDT[is.na(infection), infection :=0]all_patientsDT[1:10]