Chapter 2 Snippets

Random useful snippets that do not fit anywhere else.

2.1 Useful RStudio functions

2.1.1 Clean restart

Cleans current environment and restarts R session (no code can run past this). Indicate objects to leave alone by supplying string to leave argument.

clean_restart = function(leave = NULL){
  rm(list=setdiff(ls(envir = .GlobalEnv), leave), envir = .GlobalEnv)
  invisible(.rs.restartR())
}

2.1.2 Remove duplicated packages between personal and system libraries

This removes any user library packages that are also present in the system library. Use this to minimise loading conflicts. Change tdrake in example to your rstudio username.

library(tidyverse)

tdrake_packages = installed.packages() %>% #change 'tdrake' to your rstudio username
  as_tibble() %>% 
  filter(LibPath %>% str_detect("tdrake")) %>% 
  pull(Package)

system_packages = installed.packages() %>% 
  as_tibble() %>% 
  filter(LibPath %>% str_detect("local")) %>% 
  pull(Package)

to_remove = tdrake_packages[tdrake_packages %in% system_packages]

remove.packages(to_remove)
}

2.1.3 Remove large file accidentally committed to git

git filter-branch -f --index-filter "git rm -rf --cached --ignore-unmatch filename.rda" HEAD.

2.2 Memory and efficient RAM usage

Restart R often
Use rm(...) to get rid of big files after using them
Save large objects (and compress with “gz”) that take a while to create but are needed often (write_rds(x, "big_file.rds.gz", compress = "gz))
Run gc() (garbage collect)
Check memory usage: pryr::mem_used()

Most importantly if using big files (even if they aren’t loaded in your environment!) do the following:

save.image("temp.rda")
Restart R
load("temp.rda")
Delete “temp.rda” from Files pane

2.3 Exporting tables that work in both PDF and Word

The kable() function from library(knitr) formats tables for all 3 output formats, but it is a bit limited. library(kableExtra) is great for table customisations that work in PDF and HTML. kableExtra functions do not work for Word output, but library(flextable) does work for Word (but not for PDF).

We can define our own table printing function that uses kableExtra or flextable based on output type:

# This makes table resize or continue over multiple pages in all output types
# PDF powered by kableExtra, Word by flextable
mytable = function(x, caption = "", longtable = FALSE, ...){
  library(kableExtra)
  # if not latex or html then else is Word
  if (knitr::is_latex_output() | knitr::is_html_output()){
    knitr::kable(x, row.names = FALSE, align = c("l", "l", "r", "r", "r", "r", "r", "r", "r"), 
          booktabs = TRUE, caption = caption, longtable = longtable,
          linesep = "", ...) %>%
    kableExtra::kable_styling(latex_options = c("scale_down", "hold_position"))
  }else{
    flextable::flextable(x) %>% 
      flextable::autofit() %>% 
      flextable::width(j = 1, width = 1.5) %>% 
      flextable::height(i = 1, height = 0.5, part = "header")
  }
  
}
library(dplyr)
cars %>% 
  head() %>% 
  mytable()

Table 2.1: Table 2.2:
speed	dist
4	2
4	10
7	4
7	22
8	16
9	10

2.3.1 Warning in kableExtra. Please specify format in kable. kableExtra can customize either HTML or LaTeX outputs.

This warning goes away when you load library(kableExtra) in your script, instead of just using kableExtra::function().

What happens when you load library(kableExtra) is that it sets that specifies the format for the R session, so that Warning goes away. It does mean that the render() function may fail since it knits in the same environment you have, rather than a new one like the Knit button does.

2.4 Creating Reproducible R Examples to Share in the Group (binder, holepunch and docker)

When asking for help with R code having a reproducible example is crucial (some mock data that others can use along with your code to reproduce your error). Often this can be done easily with creation of a small tibble and posting of the code on slack but sometimes it requires more complex data or the error is due to something in the Linux system in which RStudio server is hosted. For example if the cairo package for Linux isn’t installed then plots don’t work. The holepunch package helps to reproduce examples like these (not suitable for projects with confidential data).

2.4.1 Create Basic Reproducible Examples

The three main parts of the reproducible example (reprex) in Surgical Informatics are 1. packages, 2. small dataset and 3. code. Other things like R version and Linux version can be assumed as we all use one of only a few servers.

If you have a small (and confidential) set of data in a tibble or data frame called my_data and want it to be easily copied run: dput(droplevels(my_data)). This will print out code in the console that can be copy-pasted to reproduce the data frame. Alternatively use the tibble or tribble functions to create it from scratch (this is preferable for simple datasets). Then copy in the packages and finally the code (ideally the least amount possible to generate the error) and share with the group e.g.:

library(tidyverse)

# Output generated from dput(droplevels(my_data))
data = structure(list(a = c(1, 2, 3), b = c("a", "b", "c"), c = 10:12), .Names = c("a", 
"b", "c"), row.names = c(NA, -3L), class = c("tbl_df", "tbl", 
"data.frame"))

data %>% 
  mutate(newvar = a /b)

## Error in `mutate()`:
## ℹ In argument: `newvar = a/b`.
## Caused by error in `a / b`:
## ! non-numeric argument to binary operator

2.4.2 `holepunch` - Complex Reproducible Examples

From your project with data you are happy to make public make sure you are backed up to git and GitHub. See the relevant chapter on how to do this. Then run the following:

# Holepunch testing

remotes::install_github("karthik/holepunch")

library(holepunch)
write_compendium_description(package = "Title of my project", 
                             description = "Rough description of project or issue")

write_dockerfile(maintainer = "SurgicalInformatics")

generate_badge()

build_binder()

The file will generate some text to copy into the top of a README.md file. It will look like:

<!-- badges: start -->
[![Launch Rstudio Binder](http://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/SurgicalInformatics/<project>/master?urlpath=rstudio)
<!-- badges: end -->

Now, whenever somebody clicks on the badge in the README on GitHub they will be taken to an RStudio server instance with all your files (excluding files listed in .gitignore), all the current versions of your package, all the current Linux packages and the current R version. They can then test your code in an near-identical environment to help identify the source of the error, their session will time out after 10 minutes of inactivity or 12 hours since starting and will not save anything so should only be used for bug-testing or quick examples.

As this is a free version of RStudio server there is a limit to what is supported and it shouldn’t be used for computationally-intensive processes.

And, as mentioned: No confidential data.

2.5 Working with CHIs

Here are 4 functions for CHIs that could even be put in a small package. The Community Health Index (CHI) is a population register, which is used in Scotland for health care purposes. The CHI number uniquely identifies a person on the index.

2.5.1 `chi_dob()` - Extract date of birth from CHI

Note cutoff_2000. As CHI has only a two digit year, need to decide whether year is 1900s or 2000s. I don’t think there is a formal way of determining this.

library(dplyr)
chi = c("1009701234", "1811431232", "1304496368")
# These CHIs are not real. 
# The first is invalid, two and three are valid. 

# Cut-off any thing before that number is considered 2000s
# i.e. at cutoff_2000 = 20, "18" is considered 2018, rather than 1918. 
chi_dob = function(.data, cutoff_2000 = 20){
  .data %>% 
    stringr::str_extract(".{6}") %>% 
    lubridate::parse_date_time2("dmy", cutoff_2000 = cutoff_2000) %>% 
    lubridate::as_date() # Make Date object, rather than POSIXct
}

chi_dob(chi)

## [1] "1970-09-10" "1943-11-18" "1949-04-13"

# From tibble
tibble(chi = chi) %>% 
  mutate(
    dob = chi_dob(chi)
  )

## # A tibble: 3 × 2
##   chi        dob       
##   <chr>      <date>    
## 1 1009701234 1970-09-10
## 2 1811431232 1943-11-18
## 3 1304496368 1949-04-13

2.5.2 `chi_gender()` - Extract gender from CHI

Ninth digit is odd for men and even for women. A test for even is x modulus 2 == 0.

chi_gender = function(.data){
  .data %>% 
    stringr::str_sub(9, 9) %>% 
    as.numeric() %>% 
    {ifelse(. %% 2 == 0, "Female", "Male")}
}

chi_gender(chi)

## [1] "Male"   "Male"   "Female"

# From tibble
tibble(chi = chi) %>% 
  mutate(
    dob = chi_dob(chi),
    gender = chi_gender(chi)
  )

## # A tibble: 3 × 3
##   chi        dob        gender
##   <chr>      <date>     <chr> 
## 1 1009701234 1970-09-10 Male  
## 2 1811431232 1943-11-18 Male  
## 3 1304496368 1949-04-13 Female

2.5.3 `chi_age()` - Extract age from CHI

Works for a single date or a vector of dates.

chi_age = function(.data, ref_date, cutoff_2000 = 20){
  dob = chi_dob(.data, cutoff_2000 = cutoff_2000)
  lubridate::interval(dob, ref_date) %>% 
    as.numeric("years") %>% 
    floor()
}

# Today
chi_age(chi, Sys.time())

## [1] 54 81 76

# Single date
library(lubridate)
chi_age(chi, dmy("11/09/2018"))

## [1] 48 74 69

# Vector
dates = dmy("11/09/2018",
            "09/05/2015",
            "10/03/2014")
chi_age(chi, dates)

## [1] 48 71 64

# From tibble
tibble(chi = chi) %>% 
  mutate(
    dob = chi_dob(chi),
    gender = chi_gender(chi),
    age = chi_age(chi, Sys.time())
  )

## # A tibble: 3 × 4
##   chi        dob        gender   age
##   <chr>      <date>     <chr>  <dbl>
## 1 1009701234 1970-09-10 Male      54
## 2 1811431232 1943-11-18 Male      81
## 3 1304496368 1949-04-13 Female    76

2.5.4 `chi_valid()` - Logical test for valid CHI

The final digit of the CHI can be used to test that the number is correct via the modulus 11 algorithm.

chi_valid = function(.data){
  .data %>% 
    stringr::str_split("", simplify = TRUE) %>% 
    .[, -10] %>%              # Working with matrices hence brackets
    apply(1, as.numeric) %>%  # Convert from string
    {seq(10, 2) %*% .} %>%    # Multiply and sum step
    {. %% 11} %>%             # Modulus 11
    {11 - .} %>%              # Substract from 11
    dplyr::near(              # Compare result with 10th digit. 
      {stringr::str_sub(chi, 10) %>% as.numeric()}
    ) %>% 
    as.vector()
}

chi_valid(chi)

## [1] FALSE  TRUE  TRUE

# From tibble
tibble(chi = chi) %>% 
  mutate(
    dob = chi_dob(chi),
    gender = chi_gender(chi),
    age = chi_age(chi, Sys.time()),
    chi_valid = chi_valid(chi)
  )

## # A tibble: 3 × 5
##   chi        dob        gender   age chi_valid
##   <chr>      <date>     <chr>  <dbl> <lgl>    
## 1 1009701234 1970-09-10 Male      54 FALSE    
## 2 1811431232 1943-11-18 Male      81 TRUE     
## 3 1304496368 1949-04-13 Female    76 TRUE

2.6 Working with dates

2.6.1 Difference between two dates

I always forget how to do this neatly. I often want days as a numeric, not a lubridate type object.

library(lubridate)
date1 = dmy("12/03/2018", "14/05/2017")
date2 = dmy("11/09/2019", "11/04/2019")

interval(date1, date2) %>% 
  as.numeric("days")

## [1] 548 697

2.6.2 Lags

This is useful for calculating, for instance, the period off medications. Lags are much better than long to wide solutions for this.

library(tidyverse)
library(lubridate)
id = c(2, 2, 2, 2, 3, 5) 
medication = c("aspirin", "aspirin", "aspirin", "tylenol", "lipitor", "advil") 
start.date = c("05/01/2017", "05/30/2017", "07/15/2017", "05/01/2017", "05/06/2017", "05/28/2017")
stop.date = c("05/04/2017", "06/10/2017", "07/27/2017", "05/15/2017", "05/12/2017", "06/13/2017")
df = tibble(id, medication, start.date, stop.date)
df

## # A tibble: 6 × 4
##      id medication start.date stop.date 
##   <dbl> <chr>      <chr>      <chr>     
## 1     2 aspirin    05/01/2017 05/04/2017
## 2     2 aspirin    05/30/2017 06/10/2017
## 3     2 aspirin    07/15/2017 07/27/2017
## 4     2 tylenol    05/01/2017 05/15/2017
## 5     3 lipitor    05/06/2017 05/12/2017
## 6     5 advil      05/28/2017 06/13/2017

df %>%
  mutate_at(c("start.date", "stop.date"), lubridate::mdy) %>% # make a date
  arrange(id, medication, start.date) %>% 
  group_by(id, medication) %>% 
  mutate(
    start_date_diff = start.date - lag(start.date),
    medication_period = stop.date-start.date
  )

## # A tibble: 6 × 6
## # Groups:   id, medication [4]
##      id medication start.date stop.date  start_date_diff medication_period
##   <dbl> <chr>      <date>     <date>     <drtn>          <drtn>           
## 1     2 aspirin    2017-05-01 2017-05-04 NA days          3 days          
## 2     2 aspirin    2017-05-30 2017-06-10 29 days         11 days          
## 3     2 aspirin    2017-07-15 2017-07-27 46 days         12 days          
## 4     2 tylenol    2017-05-01 2017-05-15 NA days         14 days          
## 5     3 lipitor    2017-05-06 2017-05-12 NA days          6 days          
## 6     5 advil      2017-05-28 2017-06-13 NA days         16 days

2.6.3 Pulling out “change in status” data

If you have a number of episodes per patient, each with a status and a time, then you need to do this as a starting point for CPH analysis.

2.6.4 Guess the Format of Dates

This function helps guess the date format from a dataset you have received. Add in new formats (as per lubridate) if they don’t match. Once you work out what format it is then extract date with mdy() or dmy() etc. based on format.

library(tidyverse)
library(lubridate)
lakers = lakers %>%
  mutate(mm_date = paste0(str_sub(date, start =5, end = 6), "/",
                          str_sub(date, start = 7, end = 8), "/",
                          str_sub(date, start = 1, end = 4)),
         dd_date = paste0(str_sub(date, start =7, end = 8), "/",
                          str_sub(date, start = 5, end = 6), "/",
                          str_sub(date, start = 1, end = 4)))

guess_formats(lakers$mm_date, c("dmy", "mdy"), print_matches = TRUE)
guess_formats(lakers$dd_date, c("dmy", "mdy"), print_matches = TRUE)

2.6.4.1 Example data

library(dplyr)
library(lubridate)
library(finalfit)
mydata = tibble(
  id = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5),
  status = c(0,0,0,1,0,0,1,1,0,0,0,0,0,1,1,1,0,0,1,1),
  group = c(rep(0, 8), rep(1, 12)) %>% factor(),
  opdate = rep("2010/01/01", 20) %>% ymd(),
  status_date = c(
    "2010/02/01", "2010/03/01", "2010/04/01", "2010/05/01",
    "2010/02/02", "2010/03/02", "2010/04/02", "2010/05/02",
    "2010/02/03", "2010/03/03", "2010/04/03", "2010/05/03",
    "2010/02/04", "2010/03/04", "2010/04/04", "2010/05/04",
    "2010/02/05", "2010/03/05", "2010/04/05", "2010/05/05"
  ) %>% ymd()
)
mydata

## # A tibble: 20 × 5
##       id status group opdate     status_date
##    <dbl>  <dbl> <fct> <date>     <date>     
##  1     1      0 0     2010-01-01 2010-02-01 
##  2     1      0 0     2010-01-01 2010-03-01 
##  3     1      0 0     2010-01-01 2010-04-01 
##  4     1      1 0     2010-01-01 2010-05-01 
##  5     2      0 0     2010-01-01 2010-02-02 
##  6     2      0 0     2010-01-01 2010-03-02 
##  7     2      1 0     2010-01-01 2010-04-02 
##  8     2      1 0     2010-01-01 2010-05-02 
##  9     3      0 1     2010-01-01 2010-02-03 
## 10     3      0 1     2010-01-01 2010-03-03 
## 11     3      0 1     2010-01-01 2010-04-03 
## 12     3      0 1     2010-01-01 2010-05-03 
## 13     4      0 1     2010-01-01 2010-02-04 
## 14     4      1 1     2010-01-01 2010-03-04 
## 15     4      1 1     2010-01-01 2010-04-04 
## 16     4      1 1     2010-01-01 2010-05-04 
## 17     5      0 1     2010-01-01 2010-02-05 
## 18     5      0 1     2010-01-01 2010-03-05 
## 19     5      1 1     2010-01-01 2010-04-05 
## 20     5      1 1     2010-01-01 2010-05-05

2.6.4.2 Compute time from op date to current review

… if necessary

mydata = mydata %>% 
  arrange(id, status_date) %>% 
  mutate(
    time = interval(opdate, status_date) %>% as.numeric("days")
  )
mydata

## # A tibble: 20 × 6
##       id status group opdate     status_date  time
##    <dbl>  <dbl> <fct> <date>     <date>      <dbl>
##  1     1      0 0     2010-01-01 2010-02-01     31
##  2     1      0 0     2010-01-01 2010-03-01     59
##  3     1      0 0     2010-01-01 2010-04-01     90
##  4     1      1 0     2010-01-01 2010-05-01    120
##  5     2      0 0     2010-01-01 2010-02-02     32
##  6     2      0 0     2010-01-01 2010-03-02     60
##  7     2      1 0     2010-01-01 2010-04-02     91
##  8     2      1 0     2010-01-01 2010-05-02    121
##  9     3      0 1     2010-01-01 2010-02-03     33
## 10     3      0 1     2010-01-01 2010-03-03     61
## 11     3      0 1     2010-01-01 2010-04-03     92
## 12     3      0 1     2010-01-01 2010-05-03    122
## 13     4      0 1     2010-01-01 2010-02-04     34
## 14     4      1 1     2010-01-01 2010-03-04     62
## 15     4      1 1     2010-01-01 2010-04-04     93
## 16     4      1 1     2010-01-01 2010-05-04    123
## 17     5      0 1     2010-01-01 2010-02-05     35
## 18     5      0 1     2010-01-01 2010-03-05     63
## 19     5      1 1     2010-01-01 2010-04-05     94
## 20     5      1 1     2010-01-01 2010-05-05    124

2.6.4.3 Pull out “change of status”

mydata = mydata %>% 
  group_by(id) %>% 
  mutate(
    status_change = status - lag(status) == 1,                          # Mark TRUE if goes from 0 to 1
    status_nochange = sum(status) == 0,                                 # Mark if no change from 0
    status_nochange_keep = !duplicated(status_nochange, fromLast= TRUE) # Mark most recent "no change" episode
  ) %>% 
  filter(status_change | (status_nochange & status_nochange_keep)) %>%  # Filter out other episodes
  select(-c(status_change, status_nochange, status_nochange_keep))      # Remove columns not needed
mydata

## # A tibble: 5 × 6
## # Groups:   id [5]
##      id status group opdate     status_date  time
##   <dbl>  <dbl> <fct> <date>     <date>      <dbl>
## 1     1      1 0     2010-01-01 2010-05-01    120
## 2     2      1 0     2010-01-01 2010-04-02     91
## 3     3      0 1     2010-01-01 2010-05-03    122
## 4     4      1 1     2010-01-01 2010-03-04     62
## 5     5      1 1     2010-01-01 2010-04-05     94

2.6.4.4 Run CPH

mydata %>% 
  finalfit("Surv(time, status)", "group")

##  Dependent: Surv(time, status)        all          HR (univariable)
##                          group 0 2 (40.0)                         -
##                                1 3 (60.0) 0.76 (0.10-5.51, p=0.786)
##         HR (multivariable)
##                          -
##  0.76 (0.10-5.51, p=0.786)

2.6.5 Isoweek and epiweek

This is frequently painful, when working with weeks in a year to aggregate data, how to best plot and how to deal with multiple years.

Easiest solution I have found (EMH) is just to work with the first date of each isoweek. Make a look up table and join.

# Generate random set of dates for demonstration purposes
## This is not required when using normally
library(tidyverse)
library(lubridate)
library(finalfit)
colon_s = colon_s %>% 
  mutate(
    date = seq(ymd(20200101), ymd(20220601), by="day") %>% 
      sample(929, replace = TRUE)
  )

# Make isoweek start date look-up table
## Make sure includes entire range of your dates of interest. 
isoweek_lookup = tibble(date = seq(ymd(20200101), ymd(20220601), by = 1)) %>% 
  mutate(isoweek = lubridate::isoweek(date),
         year = lubridate::year(date))

isoweek_lookup = isoweek_lookup %>% 
  left_join(isoweek_lookup %>% 
              group_by(isoweek, year) %>% 
              slice(1) %>% 
              select(isoweek_date = date, isoweek, year)
  )

## Joining with `by = join_by(isoweek, year)`

# Add isoweek_date (first date of a particular isoweek) and isoweek to analysis dataset. 
colon_s %>% 
  left_join(isoweek_lookup, by = c("date" = "date")) %>% 
  select(date, isoweek, isoweek_date) %>% 
  slice(1:6)

##         date isoweek isoweek_date
## 1 2021-05-24      21   2021-05-24
## 2 2021-03-01       9   2021-03-01
## 3 2022-02-21       8   2022-02-21
## 4 2020-12-31      53   2020-12-28
## 5 2020-05-15      20   2020-05-11
## 6 2020-07-04      27   2020-06-29

2.7 Labelling tables and figures in markdown chunks

2.7.1 Plots

I have made a plot and it looks nice (See Figure \ref{fig:a-nice-plot})

```{r fig.cap="A nice plot.\\label{fig:a-nice-plot}}
ggplot(mydata, aes(x = x)) +
  geom_histogram()
```

2.7.2 Tables

I have made a table and it looks nice (See Table \ref{tab:a-nice-table})

```{r}
library(knitr)
kable(mtcars[1:3,1:4], caption="A nice table.\\label{tab:a-nice-table}")
```

2.8 Join tables and overwrite values (that are often missing)

From here: https://stackoverflow.com/questions/35637135/left-join-two-data-frames-and-overwrite

library(dplyr)
.keys <- c("key1", "key2")

.base_table <- tribble(
    ~key1, ~key2, ~val1, ~val2,
    "A", "a", 0, 0,
    "A", "b", 0, 1,
    "B", "a", 1, 0,
    "B", "b", 1, 1)

.join_table <- tribble(
    ~key1, ~key2, ~val2,
    "A", "b", 100,
    "B", "a", 111)

# This works
df_result <- .base_table %>%
    # Pull off rows from base table that match the join table
    semi_join(.join_table, .keys) %>%
    # Drop cols from base table that are in join table, except for the key columns
    select(-matches(setdiff(names(.join_table), .keys))) %>%
    # Left join on the join table columns
    left_join(.join_table, .keys) %>%
    # Remove the matching rows from the base table, and 
    # bind on the newly joined result from above.
    bind_rows(.base_table %>% anti_join(.join_table, .keys))

df_result %>%
    print()

## # A tibble: 4 × 4
##   key1  key2   val1  val2
##   <chr> <chr> <dbl> <dbl>
## 1 A     b         0   100
## 2 B     a         1   111
## 3 A     a         0     0
## 4 B     b         1     1

2.9 Read in multiple files

As of July 2021, library(readr) is happy to read in multiple files into the same tibble without you having to use purrr::map_dfr().

library(tidyverse)

exampledata = read_csv(fs::dir_ls(path = "data_folder", glob = "*.csv"))

It’s also happy to automatically uncompress files ending with .zip, .gz or similar.

2.10 Averaging multiple AUCs (say from imputed datasets)

Final method here probably makes most sense: `

auc = c(0.5, 0.6, 0.7, 0.8, 0.9, 0.99999)
mean(auc)                       # Mean 
median(auc)                     # Median
prod(auc)^(1/length(auc))       # Geometric mean
exp(mean(log(auc)))             # Geometric mean again
log(auc)                        # Logs
log(auc / (1-auc) )             # Put values on scale from 0 (0.5) to Inf (1)
exp(mean(log(auc/(1-auc)))) /   # Mean on the above scale. 
  (1 + exp(mean(log(auc/(1-auc)))))

auc = c(0.5, 0.6, 0.7, 0.8, 0.9, 0.99999)
mean(auc)                       # Mean

## [1] 0.7499983

median(auc)                     # Median

## [1] 0.75

prod(auc)^(1/length(auc))       # Geometric mean

## [1] 0.7298908

exp(mean(log(auc)))             # Geometric mean again

## [1] 0.7298908

log(auc)                        # Logs

## [1] -6.931472e-01 -5.108256e-01 -3.566749e-01 -2.231436e-01 -1.053605e-01
## [6] -1.000005e-05

log(auc / (1-auc) )             # Put values on scale from 0 (0.5) to Inf (1)

## [1]  0.0000000  0.4054651  0.8472979  1.3862944  2.1972246 11.5129155

exp(mean(log(auc/(1-auc)))) /   # Mean on the above scale. 
  (1 + exp(mean(log(auc/(1-auc)))))

## [1] 0.9384781

2.11 Formating thousands automatically

Works in R Markdown and Quarto.

Before: There are 1.2345^{4} patients in this dataset.

inline_hook = function(x) {
  if (is.numeric(x)) {
    format(x,big.mark = ",")
  } else x
}
knitr::knit_hooks$set(inline = inline_hook)

After: There are 12,345 patients in this dataset.

This only works on numbers entered by inline R.

2.12 DataTable function

mydt = function(df, pagelength = 20){
  DT::datatable(df,
                rownames = FALSE,
                extensions = c('Buttons'),
                filter = "top",
                options = list(
                  pageLength = pagelength,
                  autoWidth = TRUE,
                  scrollX = TRUE,
                  searchHighlight = TRUE))
}

Usage:

palmerpenguins::penguins %>% 
  slice(1:5) %>% 
  mydt()