R Beginners Course 2026

Introduction to R and Basic Programming Concepts

Bioinformatics Core Facility CECAD

2026-05-21

Slides & Code

  • [f] Full screen
  • [o] Slide Overview
  • [c] Notes
  • [h] help

git repo

R-Basic


Clone repo

git clone https://github.com/CECADBioinformaticsCoreFacility/Beginners_R_Course_2026.git


Slides Directly

https://cecadbioinformaticscorefacility.github.io/Beginners_R_Course_2026/

Session 6 :: Descriptive Statistics

What is Descriptive Statistics ?

descriptive statistics (in the broad sense of the term) is a branch of statistics aiming at summarizing, describing and presenting a series of values or a dataset.


There exists many measures to summarize a dataset. They are broadly divided into three types:

  • Central Tendency
    • Mean, median, and mode
  • Variability
    • Range, variance, and standard deviation
  • Distribution
    • Modality, Skewness, Kurtosis

Measures of Central Tendency

mean, median, mode

mean(iris$Sepal.Width)
[1] 3.057333


median(iris$Sepal.Width)
[1] 3


x <- table(iris$Sepal.Width)
sort(x, decreasing = TRUE)

  3 2.8 3.2 3.4 3.1 2.9 2.7 2.5 3.3 3.5 3.8 2.6 2.3 3.6 2.2 2.4 3.7 3.9   2   4 
 26  14  13  12  11  10   9   8   6   6   6   5   4   4   3   3   3   2   1   1 
4.1 4.2 4.4 
  1   1   1 

Measures of Variability

Min, max, range

min(iris$Sepal.Length)
[1] 4.3
max(iris$Sepal.Length)
[1] 7.9
range(iris$Sepal.Length)
[1] 4.3 7.9


Variance

# variance
var(iris$Sepal.Length) 
[1] 0.6856935


Standard Deviation

# standard deviation
sd(iris$Sepal.Length)
[1] 0.8280661

Measures of Distribution

Modes

The modality of a distribution is determined by the number of peaks it contains


Skewness

Skewness is a measurement of the symmetry of a distribution.

x<-iris$Sepal.Width
sum((x-mean(x))^3)/((length(x)-1)*sd(x)^3)
[1] 0.3147128


Kurtosis

Kurtosis measures whether your dataset is heavy-tailed or light-tailed compared to a normal distribution.

sum((x-mean(x))^4)/((length(x)-1)*sd(x)^4)
[1] 3.15977




#library(moments)

Descriptive Statistics

quantile(iris$Sepal.Length, 0.25) # first quartile
25% 
5.1 


quantile(iris$Sepal.Length, 0.75) # third quartile
75% 
6.4 


IQR(iris$Sepal.Length) # interquartile range 
[1] 1.3
summary(iris)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                

Frequency/Cross/Contingency Tables

Cross-tabulation analysis, also known as contingency table analysis, is most often used to analyze categorical (nominal measurement scale) data.

At their core, cross-tabulations are simply data tables that present the results of the entire group of respondents, as well as results from subgroups of survey respondents. With them, you can examine relationships within the data that might not be readily apparent when only looking at total survey responses


demo_data <- iris

demo_data$size <- 
    ifelse(demo_data$Sepal.Length <
                            median(demo_data$Sepal.Length),
  "small", "big"
)
table(demo_data$Species, demo_data$size)
            
             big small
  setosa       1    49
  versicolor  29    21
  virginica   47     3

Correlation

Correlation measures the relationship between two variables if they are linked to each other. It denotes if variables evolve in the same direction, in the opposite direction, or are independent.

  • Correlation is usually computed on two quantitative variables, but it can also be computed on two qualitative ordinal variables.

  • Pearson correlation is often used for quantitative continuous variables that have a linear relationship

  • Spearman correlation (which is actually similar to Pearson but based on the ranked values for each variable rather than on the raw data) is often used to evaluate relationships involving at least one qualitative ordinal variable or two quantitative variables if the link is partially linear


cor(iris$Sepal.Length,iris$Sepal.Width, method = "pearson")
[1] -0.1175698
cor(iris$Sepal.Length,iris$Sepal.Width, method = "spearman")
[1] -0.1667777

Correlation Test

a correlation coefficient different from 0 in the sample does not mean that the correlation is significantly different from 0 in the population. This needs to be tested with a hypothesis test—and known as the correlation test.

The null and alternative hypothesis for the correlation test are as follows:

  • H0 : ρ = 0 (meaning that there is no linear relationship between the two variables)
  • H1 : ρ ≠ 0 (meaning that there is a linear relationship between the two variables)


cor.test(iris$Sepal.Length,iris$Sepal.Width)

    Pearson's product-moment correlation

data:  iris$Sepal.Length and iris$Sepal.Width
t = -1.4403, df = 148, p-value = 0.1519
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.27269325  0.04351158
sample estimates:
       cor 
-0.1175698