R Beginners Course 2025

Introduction to R and Basic Programming Concepts

Dr. Debasish Mukherjee, Dr. Ulrike Goebel, Dr. Ali Abdallah

Bioinformatics Core Facility CECAD

2025-05-13

Session 1 :: Background & R Basics

The Lifeline of R

  • R is an offspring of the S programming language
  • S was a pioneer in making exploratory data analysis easy
    • make existing algorithms available in user friendly functions
    • provide accessible function documentation
    • provide interactive graphics devices

The Lifeline of R

  • R was born as an implementation of S in 1993, because native S had gone commercial
  • Being free and encouraging contributions by the user community, R can easily "evolve" to adapt to new needs and trends
  • Data driven science , including the genome projects, was the perfect “niche” which R could successfully claim for itself
  • Indeed the Bioconductor project was initiated by one of the founders of R

The Lifeline of R

  • The RStudio (now: Posit) company is gaining increasing influence on the evolution of the language
    • its Integrated Development Environment (IDE) is increasingly used by people doing data analysis with R
    • Posit is actively developing and promoting the tidyverse, which is both a special style and a code repository for the analysis of data tables
  • Currently there is quite some evolutionary pressure for change of the language!

The Iris Data of Edgar Anderson and Ronald A. Fisher

– Some Background Information on Our Practice Dataset

We will use the iris dataset of floral traits for practicing throughout the course:

data("iris") ## Load the data
head(iris, n=3)   ## glimpse at the first 3 lines
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa


It contains data on three species
with 50 observations each:  

table(iris$Species) ## species in dataset

    setosa versicolor  virginica 
        50         50         50 


The Iris Data of Edgar Anderson and Ronald A. Fisher

– Some Background Information on Our Practice Dataset

  • Data published by Ronald A. Fisher in 1936
  • Statistician and co-founder of population genetics
  • Plants collected by Edgar Anderson
    • I. setosa and I. versicolor in 1935,  
      in the same natural habitat
    • I. virginica likely in 1926, 
      at a different place
  • Field botanist with a focus on speciation mechanisms


The Iris Data of Edgar Anderson and Ronald A. Fisher

– Some Background Information on Our Practice Dataset

  • Both men were involved in the making of the Modern/Evolutionary Synthesis, with complementary central tenets: 

    • R. A. Fisher: Model evolutionary processes from the known facts of genetics
    • E. Anderson: Observe the real dynamics of (plant) populations, in order to understand the role of genetics in evolution


The Iris Data of Edgar Anderson and Ronald A. Fisher

– Some Background Information on Our Practice Dataset

  • Edgar Anderson suspected that I. versicolor may be an allopolyploid hybrid: 
    I. versicolor =  
    I. setosa (2n) x I. virginica (4n)  
    (confirmed by Lim et al. 2007)

  • This may have supported the establishment of the species, by preventing back-crossing to its parents. 

  • Fisher applied his Linear Discriminant Analysis technique to Anderson’s data, in order to test the hypothesis of additive gene action
    if true, versicolor should be twice as similar to virginica than to sertosa!


Digression: Software is Usually Built From Bits and Pieces


  • … they come under the names of subroutines, macros, functions
  • … in code, they are used like ’commands:


  • Actually the function name invokes a piece of hidden code
  • … hiding complexity
  • … yet allowing to easily access complex algorithms
  • … and to make local extensions of a language