Survival Analysis using R

project
code
analysis
Author

Hamza Rahal

Published

September 5, 2024

Survival Analysis with R

Background

In the class on essential statistics we covered basic categorical data analysis – comparing proportions (risks, rates, etc) between different groups using a chi-square or fisher exact test, or logistic regression. For example, we looked at how the diabetes rate differed between males and females. In this kind of analysis you implicitly assume that the rates are constant over the period of the study, or as defined by the different groups you defined.

But, in longitudinal studies where you track samples or subjects from one time point (e.g., entry into a study, diagnosis, start of a treatment) until you observe some outcome event (e.g., death, onset of disease, relapse), it doesn’t make sense to assume the rates are constant. For example: the risk of death after heart surgery is highest immediately post-op, decreases as the patient recovers, then rises slowly again as the patient ages. Or, recurrence rate of different cancers varies highly over time, and depends on tumor genetics, treatment, and other environmental factors.

Definitions

Survival analysis lets you analyze the rates of occurrence of events over time, without assuming the rates are constant. Generally, survival analysis lets you model the time until an event occurs,1 or compare the time-to-event between different groups, or how time-to-event correlates with quantitative variables.

The hazard is the instantaneous event (death) rate at a particular time point t. Survival analysis doesn’t assume the hazard is constant over time. The cumulative hazard is the total hazard experienced up to time t.

The survival function, is the probability an individual survives (or, the probability that the event of interest does not occur) up to and including time t. It’s the probability that the event (e.g., death) hasn’t occured yet. It looks like this, where TT is the time of death, and Pr(T>t)Pr(T>t) is the probability that the time of death is greater than some time tt. SS is a probability, so 0≤S(t)≤10≤S(t)≤1, since survival times are always positive (T≥0T≥0).

S(t)=Pr(T>t)S(t)=Pr(T>t)

The Kaplan-Meier curve illustrates the survival function. It’s a step function illustrating the cumulative survival probability over time. The curve is horizontal over periods where no event occurs, then drops vertically corresponding to a change in the survival function at each time an event occurs.

Censoring is a type of missing data problem unique to survival analysis. This happens when you track the sample/subject through the end of the study and the event never occurs. This could also happen due to the sample/subject dropping out of the study for reasons other than death, or some other loss to followup. The sample is censored in that you only know that the individual survived up to the loss to followup, but you don’t know anything about survival after that.

Proportional hazards assumption: The main goal of survival analysis is to compare the survival functions in different groups, e.g., leukemia patients as compared to cancer-free controls. If you followed both groups until everyone died, both survival curves would end at 0%, but one group might have survived on average a lot longer than the other group. Survival analysis does this by comparing the hazard at different times over the observation period. Survival analysis doesn’t assume that the hazard is constant, but does assume that the ratio of hazards between groups is constant over time. This class does not cover methods to deal with non-proportional hazards, or interactions of covariates with the time to event.

Proportional hazards regression a.k.a. Cox regression is the most common approach to assess the effect of different variables on survival.

Cox PH Model

Kaplan-Meier curves are good for visualizing differences in survival between two categorical groups, but they don’t work well for assessing the effect of quantitative variables like age, gene expression, leukocyte count, etc. Cox PH regression can assess the effect of both categorical and continuous variables, and can model the effect of multiple variables at once.

Cox PH regression models the natural log of the hazard at time t, denoted h(t)h(t), as a function of the baseline hazard (h0(t)h0(t)) (the hazard for an individual where all exposure variables are 0) and multiple exposure variables x1x1, x1x1, ……, xpxp. The form of the Cox PH model is:

log(h(t))=log(h0(t))+β1x1+β2x2+…+βpxplog(h(t))=log(h0(t))+β1x1+β2x2+…+βpxp

If you exponentiate both sides of the equation, and limit the right hand side to just a single categorical exposure variable (x1x1) with two groups (x1=1x1=1 for exposed and x1=0x1=0 for unexposed), the equation becomes:

h1(t)=h0(t)×eβ1x1h1(t)=h0(t)×eβ1x1

Rearranging that equation lets you estimate the hazard ratio, comparing the exposed to the unexposed individuals at time t:

HR(t)=h1(t)h0(t)=eβ1HR(t)=h1(t)h0(t)=eβ1

This model shows that the hazard ratio is eβ1eβ1, and remains constant over time t (hence the name proportional hazards regression). The ββ values are the regression coefficients that are estimated from the model, and represent the log(HazardRatio)log(HazardRatio) for each unit increase in the corresponding predictor variable. The interpretation of the hazards ratio depends on the measurement scale of the predictor variable, but in simple terms, a positive coefficient indicates worse survival and a negative coefficient indicates better survival for the variable in question.

Survival analysis in R

The core survival analysis functions are in the survival package. The survival package is one of the few “core” packages that comes bundled with your basic R installation, so you probably didn’t need to install.packages() it. But, you’ll need to load it like any other library when you want to use it. We’ll also be using the dplyr package, so let’s load that too. Finally, we’ll also want to load the survminer package, which provides much nicer Kaplan-Meier plots out-of-the-box than what you get out of base graphics.

Code
library(dplyr)

Attachement du package : 'dplyr'
Les objets suivants sont masqués depuis 'package:stats':

    filter, lag
Les objets suivants sont masqués depuis 'package:base':

    intersect, setdiff, setequal, union
Code
library(survival)
library(survminer)
Le chargement a nécessité le package : ggplot2
Le chargement a nécessité le package : ggpubr

Attachement du package : 'survminer'
L'objet suivant est masqué depuis 'package:survival':

    myeloma

The core functions we’ll use out of the survival package include:

  • Surv(): Creates a survival object.

  • survfit(): Fits a survival curve using either a formula, of from a previously fitted Cox model.

  • coxph(): Fits a Cox proportional hazards regression model.

Other optional functions you might use include:

  • cox.zph(): Tests the proportional hazards assumption of a Cox regression model.

  • survdiff(): Tests for differences in survival between two groups using a log-rank / Mantel-Haenszel test.

Surv() creates the response variable, and typical usage takes the time to event, and whether or not the event occured (i.e., death vs censored). survfit() creates a survival curve that you could then display or plot. coxph() implements the regression analysis, and models specified the same way as in regular linear models, but using the coxph() function.

Getting started

We’re going to be using the built-in lung cancer dataset that ships with the survival package. You can get some more information about the dataset by running ?lung. The help tells us there are 10 variables in this data:

Code
library(survival)
?lung
démarrage du serveur d'aide httpd ... fini
  1. inst: Institution code

  2. time: Survival time in days

  3. status: censoring status 1=censored, 2=dead

  4. age: Age in years

  5. sex: Male=1 Female=2

  6. ph.ecog: ECOG performance score (0=good 5=dead)

  7. ph.karno: Karnofsky performance score as rated by physician

  8. pat.karno: Karnofsky performance score as rated by patient

  9. meal.cal: Calories consumed at meals

  10. wt.loss: Weight loss in last six months

You can access the data just by running lung, as if you had read in a dataset and called it lung. You can operate on it just like any other data frame.

Code
head(lung)
  inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
1    3  306      2  74   1       1       90       100     1175      NA
2    3  455      2  68   1       0       90        90     1225      15
3    3 1010      1  56   1       0       90        90       NA      15
4    5  210      2  57   1       1       90        60     1150      11
5    1  883      2  60   1       0      100        90       NA       0
6   12 1022      1  74   1       1       50        80      513       0
Code
class(lung)
[1] "data.frame"
Code
dim(lung)
[1] 228  10
Code
View(lung)

Notice that lung is a plain data.frame object. You could see what it looks like as a tibble (prints nicely, tells you the type of variable each column is). You could then reassign lung to the as_tibble()-ified version.

Code
as_tibble(lung)
# A tibble: 228 × 10
    inst  time status   age   sex ph.ecog ph.karno pat.karno meal.cal wt.loss
   <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>    <dbl>     <dbl>    <dbl>   <dbl>
 1     3   306      2    74     1       1       90       100     1175      NA
 2     3   455      2    68     1       0       90        90     1225      15
 3     3  1010      1    56     1       0       90        90       NA      15
 4     5   210      2    57     1       1       90        60     1150      11
 5     1   883      2    60     1       0      100        90       NA       0
 6    12  1022      1    74     1       1       50        80      513       0
 7     7   310      2    68     2       2       70        60      384      10
 8    11   361      2    71     2       2       60        80      538       1
 9     1   218      2    53     1       1       70        80      825      16
10     7   166      2    61     1       2       70        70      271      34
# ℹ 218 more rows
Code
lung <- as_tibble(lung)
lung
# A tibble: 228 × 10
    inst  time status   age   sex ph.ecog ph.karno pat.karno meal.cal wt.loss
   <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>    <dbl>     <dbl>    <dbl>   <dbl>
 1     3   306      2    74     1       1       90       100     1175      NA
 2     3   455      2    68     1       0       90        90     1225      15
 3     3  1010      1    56     1       0       90        90       NA      15
 4     5   210      2    57     1       1       90        60     1150      11
 5     1   883      2    60     1       0      100        90       NA       0
 6    12  1022      1    74     1       1       50        80      513       0
 7     7   310      2    68     2       2       70        60      384      10
 8    11   361      2    71     2       2       60        80      538       1
 9     1   218      2    53     1       1       70        80      825      16
10     7   166      2    61     1       2       70        70      271      34
# ℹ 218 more rows

Survival Curves

Check out the help for ?Surv. This is the main function we’ll use to create the survival object. You can play fast and loose with how you specify the arguments to Surv. The help tells you that when there are two unnamed arguments, they will match time and event in that order. This is the common shorthand you’ll often see for right-censored data. The alternative lets you specify interval data, where you give it the start and end times (time and time2). If you keep reading you’ll see how Surv tries to guess how you’re coding the status variable. It will try to guess whether you’re using 0/1 or 1/2 to represent censored vs “dead”, respectively.

Try creating a survival object called s, then display it. If you go back and head(lung) the data, you can see how these are related. It’s a special type of vector that tells you both how long the subject was tracked for, and whether or not the event occured or the sample was censored (shown by the +).

Code
s <- Surv(lung$time, lung$status)
class(s)
[1] "Surv"
Code
s
  [1]  306   455  1010+  210   883  1022+  310   361   218   166   170   654 
 [13]  728    71   567   144   613   707    61    88   301    81   624   371 
 [25]  394   520   574   118   390    12   473    26   533   107    53   122 
 [37]  814   965+   93   731   460   153   433   145   583    95   303   519 
 [49]  643   765   735   189    53   246   689    65     5   132   687   345 
 [61]  444   223   175    60   163    65   208   821+  428   230   840+  305 
 [73]   11   132   226   426   705   363    11   176   791    95   196+  167 
 [85]  806+  284   641   147   740+  163   655   239    88   245   588+   30 
 [97]  179   310   477   166   559+  450   364   107   177   156   529+   11 
[109]  429   351    15   181   283   201   524    13   212   524   288   363 
[121]  442   199   550    54   558   207    92    60   551+  543+  293   202 
[133]  353   511+  267   511+  371   387   457   337   201   404+  222    62 
[145]  458+  356+  353   163    31   340   229   444+  315+  182   156   329 
[157]  364+  291   179   376+  384+  268   292+  142   413+  266+  194   320 
[169]  181   285   301+  348   197   382+  303+  296+  180   186   145   269+
[181]  300+  284+  350   272+  292+  332+  285   259+  110   286   270    81 
[193]  131   225+  269   225+  243+  279+  276+  135    79    59   240+  202+
[205]  235+  105   224+  239   237+  173+  252+  221+  185+   92+   13   222+
[217]  192+  183   211+  175+  197+  203+  116   188+  191+  105+  174+  177+
Code
head(lung)
# A tibble: 6 × 10
   inst  time status   age   sex ph.ecog ph.karno pat.karno meal.cal wt.loss
  <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>    <dbl>     <dbl>    <dbl>   <dbl>
1     3   306      2    74     1       1       90       100     1175      NA
2     3   455      2    68     1       0       90        90     1225      15
3     3  1010      1    56     1       0       90        90       NA      15
4     5   210      2    57     1       1       90        60     1150      11
5     1   883      2    60     1       0      100        90       NA       0
6    12  1022      1    74     1       1       50        80      513       0

Now, let’s fit a survival curve with the survfit() function. See the help for ?survfit. Here we’ll create a simple survival curve that doesn’t consider any different groupings, so we’ll specify just an intercept (e.g., ~1) in the formula that survfit expects. We can do what we just did by “modeling” the survival object s we just created against an intercept only, but from here out, we’ll just do this in one step by nesting the Surv() call within the survfit() call, and similar to how we specify data for linear models with lm(), we’ll use the data= argument to specify which data we’re using. Similarly, we can assign that to another object called sfit (or whatever we wanted to call it).

Code
survfit(s~1)
Call: survfit(formula = s ~ 1)

       n events median 0.95LCL 0.95UCL
[1,] 228    165    310     285     363
Code
survfit(Surv(time, status)~1, data=lung)
Call: survfit(formula = Surv(time, status) ~ 1, data = lung)

       n events median 0.95LCL 0.95UCL
[1,] 228    165    310     285     363
Code
sfit <- survfit(Surv(time, status)~1, data=lung)
sfit
Call: survfit(formula = Surv(time, status) ~ 1, data = lung)

       n events median 0.95LCL 0.95UCL
[1,] 228    165    310     285     363

Now, that object itself isn’t very interesting. It’s more interesting to run summary on what it creates. This will show a life table.

Code
summary(sfit)
Call: survfit(formula = Surv(time, status) ~ 1, data = lung)

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    5    228       1   0.9956 0.00438       0.9871        1.000
   11    227       3   0.9825 0.00869       0.9656        1.000
   12    224       1   0.9781 0.00970       0.9592        0.997
   13    223       2   0.9693 0.01142       0.9472        0.992
   15    221       1   0.9649 0.01219       0.9413        0.989
   26    220       1   0.9605 0.01290       0.9356        0.986
   30    219       1   0.9561 0.01356       0.9299        0.983
   31    218       1   0.9518 0.01419       0.9243        0.980
   53    217       2   0.9430 0.01536       0.9134        0.974
   54    215       1   0.9386 0.01590       0.9079        0.970
   59    214       1   0.9342 0.01642       0.9026        0.967
   60    213       2   0.9254 0.01740       0.8920        0.960
   61    211       1   0.9211 0.01786       0.8867        0.957
   62    210       1   0.9167 0.01830       0.8815        0.953
   65    209       2   0.9079 0.01915       0.8711        0.946
   71    207       1   0.9035 0.01955       0.8660        0.943
   79    206       1   0.8991 0.01995       0.8609        0.939
   81    205       2   0.8904 0.02069       0.8507        0.932
   88    203       2   0.8816 0.02140       0.8406        0.925
   92    201       1   0.8772 0.02174       0.8356        0.921
   93    199       1   0.8728 0.02207       0.8306        0.917
   95    198       2   0.8640 0.02271       0.8206        0.910
  105    196       1   0.8596 0.02302       0.8156        0.906
  107    194       2   0.8507 0.02362       0.8056        0.898
  110    192       1   0.8463 0.02391       0.8007        0.894
  116    191       1   0.8418 0.02419       0.7957        0.891
  118    190       1   0.8374 0.02446       0.7908        0.887
  122    189       1   0.8330 0.02473       0.7859        0.883
  131    188       1   0.8285 0.02500       0.7810        0.879
  132    187       2   0.8197 0.02550       0.7712        0.871
  135    185       1   0.8153 0.02575       0.7663        0.867
  142    184       1   0.8108 0.02598       0.7615        0.863
  144    183       1   0.8064 0.02622       0.7566        0.859
  145    182       2   0.7975 0.02667       0.7469        0.852
  147    180       1   0.7931 0.02688       0.7421        0.848
  153    179       1   0.7887 0.02710       0.7373        0.844
  156    178       2   0.7798 0.02751       0.7277        0.836
  163    176       3   0.7665 0.02809       0.7134        0.824
  166    173       2   0.7577 0.02845       0.7039        0.816
  167    171       1   0.7532 0.02863       0.6991        0.811
  170    170       1   0.7488 0.02880       0.6944        0.807
  175    167       1   0.7443 0.02898       0.6896        0.803
  176    165       1   0.7398 0.02915       0.6848        0.799
  177    164       1   0.7353 0.02932       0.6800        0.795
  179    162       2   0.7262 0.02965       0.6704        0.787
  180    160       1   0.7217 0.02981       0.6655        0.783
  181    159       2   0.7126 0.03012       0.6559        0.774
  182    157       1   0.7081 0.03027       0.6511        0.770
  183    156       1   0.7035 0.03041       0.6464        0.766
  186    154       1   0.6989 0.03056       0.6416        0.761
  189    152       1   0.6943 0.03070       0.6367        0.757
  194    149       1   0.6897 0.03085       0.6318        0.753
  197    147       1   0.6850 0.03099       0.6269        0.749
  199    145       1   0.6803 0.03113       0.6219        0.744
  201    144       2   0.6708 0.03141       0.6120        0.735
  202    142       1   0.6661 0.03154       0.6071        0.731
  207    139       1   0.6613 0.03168       0.6020        0.726
  208    138       1   0.6565 0.03181       0.5970        0.722
  210    137       1   0.6517 0.03194       0.5920        0.717
  212    135       1   0.6469 0.03206       0.5870        0.713
  218    134       1   0.6421 0.03218       0.5820        0.708
  222    132       1   0.6372 0.03231       0.5769        0.704
  223    130       1   0.6323 0.03243       0.5718        0.699
  226    126       1   0.6273 0.03256       0.5666        0.694
  229    125       1   0.6223 0.03268       0.5614        0.690
  230    124       1   0.6172 0.03280       0.5562        0.685
  239    121       2   0.6070 0.03304       0.5456        0.675
  245    117       1   0.6019 0.03316       0.5402        0.670
  246    116       1   0.5967 0.03328       0.5349        0.666
  267    112       1   0.5913 0.03341       0.5294        0.661
  268    111       1   0.5860 0.03353       0.5239        0.656
  269    110       1   0.5807 0.03364       0.5184        0.651
  270    108       1   0.5753 0.03376       0.5128        0.645
  283    104       1   0.5698 0.03388       0.5071        0.640
  284    103       1   0.5642 0.03400       0.5014        0.635
  285    101       2   0.5531 0.03424       0.4899        0.624
  286     99       1   0.5475 0.03434       0.4841        0.619
  288     98       1   0.5419 0.03444       0.4784        0.614
  291     97       1   0.5363 0.03454       0.4727        0.608
  293     94       1   0.5306 0.03464       0.4669        0.603
  301     91       1   0.5248 0.03475       0.4609        0.597
  303     89       1   0.5189 0.03485       0.4549        0.592
  305     87       1   0.5129 0.03496       0.4488        0.586
  306     86       1   0.5070 0.03506       0.4427        0.581
  310     85       2   0.4950 0.03523       0.4306        0.569
  320     82       1   0.4890 0.03532       0.4244        0.563
  329     81       1   0.4830 0.03539       0.4183        0.558
  337     79       1   0.4768 0.03547       0.4121        0.552
  340     78       1   0.4707 0.03554       0.4060        0.546
  345     77       1   0.4646 0.03560       0.3998        0.540
  348     76       1   0.4585 0.03565       0.3937        0.534
  350     75       1   0.4524 0.03569       0.3876        0.528
  351     74       1   0.4463 0.03573       0.3815        0.522
  353     73       2   0.4340 0.03578       0.3693        0.510
  361     70       1   0.4278 0.03581       0.3631        0.504
  363     69       2   0.4154 0.03583       0.3508        0.492
  364     67       1   0.4092 0.03582       0.3447        0.486
  371     65       2   0.3966 0.03581       0.3323        0.473
  387     60       1   0.3900 0.03582       0.3258        0.467
  390     59       1   0.3834 0.03582       0.3193        0.460
  394     58       1   0.3768 0.03580       0.3128        0.454
  426     55       1   0.3700 0.03580       0.3060        0.447
  428     54       1   0.3631 0.03579       0.2993        0.440
  429     53       1   0.3563 0.03576       0.2926        0.434
  433     52       1   0.3494 0.03573       0.2860        0.427
  442     51       1   0.3426 0.03568       0.2793        0.420
  444     50       1   0.3357 0.03561       0.2727        0.413
  450     48       1   0.3287 0.03555       0.2659        0.406
  455     47       1   0.3217 0.03548       0.2592        0.399
  457     46       1   0.3147 0.03539       0.2525        0.392
  460     44       1   0.3076 0.03530       0.2456        0.385
  473     43       1   0.3004 0.03520       0.2388        0.378
  477     42       1   0.2933 0.03508       0.2320        0.371
  519     39       1   0.2857 0.03498       0.2248        0.363
  520     38       1   0.2782 0.03485       0.2177        0.356
  524     37       2   0.2632 0.03455       0.2035        0.340
  533     34       1   0.2554 0.03439       0.1962        0.333
  550     32       1   0.2475 0.03423       0.1887        0.325
  558     30       1   0.2392 0.03407       0.1810        0.316
  567     28       1   0.2307 0.03391       0.1729        0.308
  574     27       1   0.2221 0.03371       0.1650        0.299
  583     26       1   0.2136 0.03348       0.1571        0.290
  613     24       1   0.2047 0.03325       0.1489        0.281
  624     23       1   0.1958 0.03297       0.1407        0.272
  641     22       1   0.1869 0.03265       0.1327        0.263
  643     21       1   0.1780 0.03229       0.1247        0.254
  654     20       1   0.1691 0.03188       0.1169        0.245
  655     19       1   0.1602 0.03142       0.1091        0.235
  687     18       1   0.1513 0.03090       0.1014        0.226
  689     17       1   0.1424 0.03034       0.0938        0.216
  705     16       1   0.1335 0.02972       0.0863        0.207
  707     15       1   0.1246 0.02904       0.0789        0.197
  728     14       1   0.1157 0.02830       0.0716        0.187
  731     13       1   0.1068 0.02749       0.0645        0.177
  735     12       1   0.0979 0.02660       0.0575        0.167
  765     10       1   0.0881 0.02568       0.0498        0.156
  791      9       1   0.0783 0.02462       0.0423        0.145
  814      7       1   0.0671 0.02351       0.0338        0.133
  883      4       1   0.0503 0.02285       0.0207        0.123

These tables show a row for each time point where either the event occured or a sample was censored. It shows the number at risk (number still remaining), and the cumulative survival at that instant.

What’s more interesting though is if we model something besides just an intercept. Let’s fit survival curves separately by sex.

Code
sfit <- survfit(Surv(time, status)~sex, data=lung)
sfit
Call: survfit(formula = Surv(time, status) ~ sex, data = lung)

        n events median 0.95LCL 0.95UCL
sex=1 138    112    270     212     310
sex=2  90     53    426     348     550
Code
summary(sfit)
Call: survfit(formula = Surv(time, status) ~ sex, data = lung)

                sex=1 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   11    138       3   0.9783  0.0124       0.9542        1.000
   12    135       1   0.9710  0.0143       0.9434        0.999
   13    134       2   0.9565  0.0174       0.9231        0.991
   15    132       1   0.9493  0.0187       0.9134        0.987
   26    131       1   0.9420  0.0199       0.9038        0.982
   30    130       1   0.9348  0.0210       0.8945        0.977
   31    129       1   0.9275  0.0221       0.8853        0.972
   53    128       2   0.9130  0.0240       0.8672        0.961
   54    126       1   0.9058  0.0249       0.8583        0.956
   59    125       1   0.8986  0.0257       0.8496        0.950
   60    124       1   0.8913  0.0265       0.8409        0.945
   65    123       2   0.8768  0.0280       0.8237        0.933
   71    121       1   0.8696  0.0287       0.8152        0.928
   81    120       1   0.8623  0.0293       0.8067        0.922
   88    119       2   0.8478  0.0306       0.7900        0.910
   92    117       1   0.8406  0.0312       0.7817        0.904
   93    116       1   0.8333  0.0317       0.7734        0.898
   95    115       1   0.8261  0.0323       0.7652        0.892
  105    114       1   0.8188  0.0328       0.7570        0.886
  107    113       1   0.8116  0.0333       0.7489        0.880
  110    112       1   0.8043  0.0338       0.7408        0.873
  116    111       1   0.7971  0.0342       0.7328        0.867
  118    110       1   0.7899  0.0347       0.7247        0.861
  131    109       1   0.7826  0.0351       0.7167        0.855
  132    108       2   0.7681  0.0359       0.7008        0.842
  135    106       1   0.7609  0.0363       0.6929        0.835
  142    105       1   0.7536  0.0367       0.6851        0.829
  144    104       1   0.7464  0.0370       0.6772        0.823
  147    103       1   0.7391  0.0374       0.6694        0.816
  156    102       2   0.7246  0.0380       0.6538        0.803
  163    100       3   0.7029  0.0389       0.6306        0.783
  166     97       1   0.6957  0.0392       0.6230        0.777
  170     96       1   0.6884  0.0394       0.6153        0.770
  175     94       1   0.6811  0.0397       0.6076        0.763
  176     93       1   0.6738  0.0399       0.5999        0.757
  177     92       1   0.6664  0.0402       0.5922        0.750
  179     91       2   0.6518  0.0406       0.5769        0.736
  180     89       1   0.6445  0.0408       0.5693        0.730
  181     88       2   0.6298  0.0412       0.5541        0.716
  183     86       1   0.6225  0.0413       0.5466        0.709
  189     83       1   0.6150  0.0415       0.5388        0.702
  197     80       1   0.6073  0.0417       0.5309        0.695
  202     78       1   0.5995  0.0419       0.5228        0.687
  207     77       1   0.5917  0.0420       0.5148        0.680
  210     76       1   0.5839  0.0422       0.5068        0.673
  212     75       1   0.5762  0.0424       0.4988        0.665
  218     74       1   0.5684  0.0425       0.4909        0.658
  222     72       1   0.5605  0.0426       0.4829        0.651
  223     70       1   0.5525  0.0428       0.4747        0.643
  229     67       1   0.5442  0.0429       0.4663        0.635
  230     66       1   0.5360  0.0431       0.4579        0.627
  239     64       1   0.5276  0.0432       0.4494        0.619
  246     63       1   0.5192  0.0433       0.4409        0.611
  267     61       1   0.5107  0.0434       0.4323        0.603
  269     60       1   0.5022  0.0435       0.4238        0.595
  270     59       1   0.4937  0.0436       0.4152        0.587
  283     57       1   0.4850  0.0437       0.4065        0.579
  284     56       1   0.4764  0.0438       0.3979        0.570
  285     54       1   0.4676  0.0438       0.3891        0.562
  286     53       1   0.4587  0.0439       0.3803        0.553
  288     52       1   0.4499  0.0439       0.3716        0.545
  291     51       1   0.4411  0.0439       0.3629        0.536
  301     48       1   0.4319  0.0440       0.3538        0.527
  303     46       1   0.4225  0.0440       0.3445        0.518
  306     44       1   0.4129  0.0440       0.3350        0.509
  310     43       1   0.4033  0.0441       0.3256        0.500
  320     42       1   0.3937  0.0440       0.3162        0.490
  329     41       1   0.3841  0.0440       0.3069        0.481
  337     40       1   0.3745  0.0439       0.2976        0.471
  353     39       2   0.3553  0.0437       0.2791        0.452
  363     37       1   0.3457  0.0436       0.2700        0.443
  364     36       1   0.3361  0.0434       0.2609        0.433
  371     35       1   0.3265  0.0432       0.2519        0.423
  387     34       1   0.3169  0.0430       0.2429        0.413
  390     33       1   0.3073  0.0428       0.2339        0.404
  394     32       1   0.2977  0.0425       0.2250        0.394
  428     29       1   0.2874  0.0423       0.2155        0.383
  429     28       1   0.2771  0.0420       0.2060        0.373
  442     27       1   0.2669  0.0417       0.1965        0.362
  455     25       1   0.2562  0.0413       0.1868        0.351
  457     24       1   0.2455  0.0410       0.1770        0.341
  460     22       1   0.2344  0.0406       0.1669        0.329
  477     21       1   0.2232  0.0402       0.1569        0.318
  519     20       1   0.2121  0.0397       0.1469        0.306
  524     19       1   0.2009  0.0391       0.1371        0.294
  533     18       1   0.1897  0.0385       0.1275        0.282
  558     17       1   0.1786  0.0378       0.1179        0.270
  567     16       1   0.1674  0.0371       0.1085        0.258
  574     15       1   0.1562  0.0362       0.0992        0.246
  583     14       1   0.1451  0.0353       0.0900        0.234
  613     13       1   0.1339  0.0343       0.0810        0.221
  624     12       1   0.1228  0.0332       0.0722        0.209
  643     11       1   0.1116  0.0320       0.0636        0.196
  655     10       1   0.1004  0.0307       0.0552        0.183
  689      9       1   0.0893  0.0293       0.0470        0.170
  707      8       1   0.0781  0.0276       0.0390        0.156
  791      7       1   0.0670  0.0259       0.0314        0.143
  814      5       1   0.0536  0.0239       0.0223        0.128
  883      3       1   0.0357  0.0216       0.0109        0.117

                sex=2 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    5     90       1   0.9889  0.0110       0.9675        1.000
   60     89       1   0.9778  0.0155       0.9478        1.000
   61     88       1   0.9667  0.0189       0.9303        1.000
   62     87       1   0.9556  0.0217       0.9139        0.999
   79     86       1   0.9444  0.0241       0.8983        0.993
   81     85       1   0.9333  0.0263       0.8832        0.986
   95     83       1   0.9221  0.0283       0.8683        0.979
  107     81       1   0.9107  0.0301       0.8535        0.972
  122     80       1   0.8993  0.0318       0.8390        0.964
  145     79       2   0.8766  0.0349       0.8108        0.948
  153     77       1   0.8652  0.0362       0.7970        0.939
  166     76       1   0.8538  0.0375       0.7834        0.931
  167     75       1   0.8424  0.0387       0.7699        0.922
  182     71       1   0.8305  0.0399       0.7559        0.913
  186     70       1   0.8187  0.0411       0.7420        0.903
  194     68       1   0.8066  0.0422       0.7280        0.894
  199     67       1   0.7946  0.0432       0.7142        0.884
  201     66       2   0.7705  0.0452       0.6869        0.864
  208     62       1   0.7581  0.0461       0.6729        0.854
  226     59       1   0.7452  0.0471       0.6584        0.843
  239     57       1   0.7322  0.0480       0.6438        0.833
  245     54       1   0.7186  0.0490       0.6287        0.821
  268     51       1   0.7045  0.0501       0.6129        0.810
  285     47       1   0.6895  0.0512       0.5962        0.798
  293     45       1   0.6742  0.0523       0.5791        0.785
  305     43       1   0.6585  0.0534       0.5618        0.772
  310     42       1   0.6428  0.0544       0.5447        0.759
  340     39       1   0.6264  0.0554       0.5267        0.745
  345     38       1   0.6099  0.0563       0.5089        0.731
  348     37       1   0.5934  0.0572       0.4913        0.717
  350     36       1   0.5769  0.0579       0.4739        0.702
  351     35       1   0.5604  0.0586       0.4566        0.688
  361     33       1   0.5434  0.0592       0.4390        0.673
  363     32       1   0.5265  0.0597       0.4215        0.658
  371     30       1   0.5089  0.0603       0.4035        0.642
  426     26       1   0.4893  0.0610       0.3832        0.625
  433     25       1   0.4698  0.0617       0.3632        0.608
  444     24       1   0.4502  0.0621       0.3435        0.590
  450     23       1   0.4306  0.0624       0.3241        0.572
  473     22       1   0.4110  0.0626       0.3050        0.554
  520     19       1   0.3894  0.0629       0.2837        0.534
  524     18       1   0.3678  0.0630       0.2628        0.515
  550     15       1   0.3433  0.0634       0.2390        0.493
  641     11       1   0.3121  0.0649       0.2076        0.469
  654     10       1   0.2808  0.0655       0.1778        0.443
  687      9       1   0.2496  0.0652       0.1496        0.417
  705      8       1   0.2184  0.0641       0.1229        0.388
  728      7       1   0.1872  0.0621       0.0978        0.359
  731      6       1   0.1560  0.0590       0.0743        0.328
  735      5       1   0.1248  0.0549       0.0527        0.295
  765      3       1   0.0832  0.0499       0.0257        0.270

Now, check out the help for ?summary.survfit. You can give the summary() function an option for what times you want to show in the results. Look at the range of followup times in the lung dataset with range(). You can create a sequence of numbers going from one number to another number by increments of yet another number with the seq() function.

Code
# ?summary.survfit
range(lung$time)
[1]    5 1022
Code
seq(0, 1100, 100)
 [1]    0  100  200  300  400  500  600  700  800  900 1000 1100

And we can use that sequence vector with a summary call on sfit to get life tables at those intervals separately for both males (1) and females (2). From these tables we can start to see that males tend to have worse survival than females.

Code
summary(sfit, times=seq(0, 1000, 100))
Call: survfit(formula = Surv(time, status) ~ sex, data = lung)

                sex=1 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    0    138       0   1.0000  0.0000       1.0000        1.000
  100    114      24   0.8261  0.0323       0.7652        0.892
  200     78      30   0.6073  0.0417       0.5309        0.695
  300     49      20   0.4411  0.0439       0.3629        0.536
  400     31      15   0.2977  0.0425       0.2250        0.394
  500     20       7   0.2232  0.0402       0.1569        0.318
  600     13       7   0.1451  0.0353       0.0900        0.234
  700      8       5   0.0893  0.0293       0.0470        0.170
  800      6       2   0.0670  0.0259       0.0314        0.143
  900      2       2   0.0357  0.0216       0.0109        0.117
 1000      2       0   0.0357  0.0216       0.0109        0.117

                sex=2 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    0     90       0   1.0000  0.0000       1.0000        1.000
  100     82       7   0.9221  0.0283       0.8683        0.979
  200     66      11   0.7946  0.0432       0.7142        0.884
  300     43       9   0.6742  0.0523       0.5791        0.785
  400     26      10   0.5089  0.0603       0.4035        0.642
  500     21       5   0.4110  0.0626       0.3050        0.554
  600     11       3   0.3433  0.0634       0.2390        0.493
  700      8       3   0.2496  0.0652       0.1496        0.417
  800      2       5   0.0832  0.0499       0.0257        0.270
  900      1       0   0.0832  0.0499       0.0257        0.270

Kaplan-Meier Plots

Now that we’ve fit a survival curve to the data it’s pretty easy to visualize it with a Kaplan-Meier plot. Create the survival object if you don’t have it yet, and instead of using summary(), use plot() instead.

Code
sfit <- survfit(Surv(time, status)~sex, data=lung)
plot(sfit)

There are lots of ways to modify the plot produced by base R’s plot() function. You can see more options with the help for ?plot.survfit. We’re not going to go into any more detail here, because there’s another package called survminer that provides a function called ggsurvplot() that makes it much easier to produce publication-ready survival plots, and if you’re familiar with ggplot2 syntax it’s pretty easy to modify. So, let’s load the package and try it out.

Code
library(survminer)
ggsurvplot(sfit)

This plot is substantially more informative by default, just because it automatically color codes the different groups, adds axis labels, and creates and automatic legend. But there’s a lot more you can do pretty easily here. Let’s add confidence intervals, show the p-value for the log-rank test, show a risk table below the plot, and change the colors and the group labels.

Code
ggsurvplot(sfit, conf.int=TRUE, pval=TRUE, risk.table=TRUE, 
           legend.labs=c("Male", "Female"), legend.title="Sex",  
           palette=c("dodgerblue2", "orchid2"), 
           title="Kaplan-Meier Curve for Lung Cancer Survival", 
           risk.table.height=.15)