R Intermediate Course 2025

Dr. Debasish Mukherjee, Dr. Ulrike Goebel, Dr. Ali Abdallah

Bioinformatics Core Facility CECAD

2025-09-19

Slides & Code

  • [f] Full screen
  • [o] Slide Overview
  • [c] Notes
  • [h] help

git repo

R-Intermediate


Clone repo

git clone https://github.com/CECADBioinformaticsCoreFacility/Intermediate_R_Course_2025.git


Slides Directly

https://cecadbioinformaticscorefacility.github.io/Intermediate_R_Course_2025/

Session 5 :: Inferential Statistics - I

Why statistical analysis ?

  • Draw valid conclusions from sample data

  • Test significance and quantify uncertainty

  • Compare groups and reveal relationships

  • Ensure reproducibility and credible communication

Tip

Biology is ultimately about understanding living systems, not just numbers. Statistics is a tool, not the goal—it helps us make sense of data, test ideas, and avoid misleading conclusions.

Use statistics to strengthen biological insight, but let biological questions drive the analysis.

Types of Data

flowchart LR
    A("Data") --> B("Quantitative")
    A --> C("Qualitative")
    B --> D("Continuous")
    B --> E("Discrete")
    C --> F("Nominal") 
    C --> G("Ordinal")
    C --> H("Binary")
    
    B:::Sky
    D:::Sky
    E:::Sky
    classDef Sky stroke-width:1px, stroke-dasharray:none, stroke:#374D7C, fill:#ABABAB, color:#374D7C

Types of Data

flowchart LR
    A("Data") --> B("Quantitative")
    A --> C("Qualitative")
    B --> D("Continuous")
    B --> E("Discrete")
    C --> F("Nominal") 
    C --> G("Ordinal")
    C --> H("Binary")
    
    C:::Sky
    F:::Sky
    G:::Sky
    H:::Sky
    classDef Sky stroke-width:1px, stroke-dasharray:none, stroke:#374D7C, fill:#ABABAB, color:#374D7C

Chi-square test for independence

In a chi-square test, the observed frequencies for two or more groups are compared with expected frequencies by chance

\[\chi^2 = \sum \frac {(O - E)^2}{E}\]

Where \(O\) is the observed frequency and \(E\) is the expected frequency

Chi2 test assumptions:

  • The observations are independent
  • 2x2 table: no expected count < 5
  • Bigger tables: all expected > 1 and no more than 20% < 5

Chi-square test for independence in R

dat <- data.frame(
  "smoke_no" = c(17, 20),
  "smoke_yes" = c(32, 5),
  row.names = c("Athlete", "Non-athlete"),
  stringsAsFactors = FALSE
)
colnames(dat) <- c("Non-smoker", "Smoker")

dat
            Non-smoker Smoker
Athlete             17     32
Non-athlete         20      5
# View data
barplot(as.matrix(dat))

# Perform Chi-square test
chisq.test(dat)

    Pearson's Chi-squared test with Yates' continuity correction

data:  dat
X-squared = 11.84, df = 1, p-value = 0.0005797
  • p-value < 0.05 indicates a significant association between the variables

  • Chi-squared statistic indicates the strength of the association

Fisher’s Exact Test

Fisher’s exact test is a statistical test for categorical variables, used to determine if there’s a statistically significant association between two categorical variables, especially when dealing with small sample sizes.

dat <- data.frame(
  "smoke_no" = c(7, 0),
  "smoke_yes" = c(2, 5),
  row.names = c("Athlete", "Non-athlete"),
  stringsAsFactors = FALSE
)
colnames(dat) <- c("Non-smoker", "Smoker")

dat
            Non-smoker Smoker
Athlete              7      2
Non-athlete          0      5
# View data
barplot(as.matrix(dat))

# Perform Fisher's Exact Test
fisher.test(dat)

    Fisher's Exact Test for Count Data

data:  dat
p-value = 0.02098
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 1.449481      Inf
sample estimates:
odds ratio 
       Inf 
  • p-value < 0.05 indicates a significant association between the variables

  • Odds ratio indicates the strength of the association

When to use which test?

  • Fisher’s test more accurate than Chi2 test on small samples
  • Chi2 test generally preferable on large samples