R Intermediate Course 2025

R Background and History – Recap and Extension

Bioinformatics Core Facility CECAD

2025-03-17

Slides & Code

  • [f] Full screen
  • [o] Slide Overview
  • [c] Notes
  • [h] help

git repo

R-Basic


Clone repo

git clone https://github.com/CECADBioinformaticsCoreFacility/Intermediate_R_Course_2025.git


Slides Directly

https://cecadbioinformaticscorefacility.github.io/Intermediate_R_Course_2025/

Introduction :: Background and History

The Lifeline of R

  • R is an offspring of the S programming language
  • S was a pioneer in making exploratory data analysis easy
    • make existing algorithms available in user friendly functions
    • provide accessible function documentation
    • provide interactive graphics devices

The Lifeline of R

  • R was born as an implementation of S in 1993, because native S had gone commercial
  • Being free and encouraging contributions by the user community, R can easily "evolve" to adapt to new needs and trends
  • Data driven science , including the genome projects, was the perfect “niche” which R could successfully claim for itself
  • Indeed the Bioconductor project was initiated by one of the founders of R

The Lifeline of R

  • The RStudio (now: Posit) company is gaining increasing influence on the evolution of the language, because
    • its Integrated Development Environment (IDE) is popular
    • it is actively developing and promoting the tidyverse, which is both a special style and a code repository for the analysis of data tables
  • Currently there is quite some evolutionary pressure for change of the language!

The Lifeline of R

  • "Base R" style:
    • “Base” packages come with R itself
    • Typical “base” functions are multi-tasking workhorses: they can be tuned by parameters for a range of related tasks
    • Main operation = “call a function and assign the result to a variable
    • Relies on the "all R objects are vectors" property to build complex data structures

The Lifeline of R

  • "Tidyverse" style:
    • Tidyverse functions manipulate and visualize 2D data tables ("tibbles")
    • Main operation = direct output-to-input connection of functions through the pipe operator %>% (or |>)
    • Many specialized functions!
    • Goals (1): well-defined and reproducible ("tidy") workflows
    • Goals (2): pretty and adaptable visualization (ggplot)

A Biologist’s View on the Evolution of R

Ecologist Timothy Staples collected R function names used in GitHub repositories from 2014 to 2021  

 

  • Most base R functions show no accelerated increase or decrease of use during the period of observation.
  • However the tidyverse behaves like an invasive biological species (use accelerates with time)

The Iris Data of Edgar Anderson and Ronald A. Fisher

– Some Background Information on Our Practice Dataset

We will use the iris dataset of floral traits for practicing throughout the course:

data("iris") ## Load the data
head(iris, n=3)   ## glimpse at the first 3 lines
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa


It contains data on three species
with 50 observations each:  

table(iris$Species) ## species in dataset

    setosa versicolor  virginica 
        50         50         50 


The Iris Data of Edgar Anderson and Ronald A. Fisher

– Some Background Information on Our Practice Dataset

  • Published by Ronald A. Fisher in 1936
  • Statistician and co-founder of 
    population genetics
  • Plants collected by Edgar Anderson
    • I. setosa and I. versicolor in 1935,  
      in the same natural habitat
    • I. virginica likely in 1926, 
      at a different place
  • like Fisher involved in the making of the Modern/Evolutionary Synthesis