R Beginners Course 2026

Introduction to R and Basic Programming Concepts

Dr. Debasish Mukherjee, Dr. Ulrike Goebel, Dr. Ali Abdallah

Bioinformatics Core Facility CECAD

2025-05-13

Session 5 :: Visualization

R Plots (base) – some preparations

First we will pre-compute the mean values of each flower trait in each species for later use.

Step 1: Split the full table into species-specific tables

species_tables <- 
    split(iris[,-5],  ## what to split
          iris[, 5])  ## split by what


class(species_tables)
[1] "list"

 

names(species_tables)
[1] "setosa"     "versicolor" "virginica" 

 

head(species_tables[["setosa"]],n=3)
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1          5.1         3.5          1.4         0.2
2          4.9         3.0          1.4         0.2
3          4.7         3.2          1.3         0.2

R Plots (base) – some preparations

Step 2: Compute the means and assemble them into a matrix

species_means <- 
    rbind(setosa = 
            colMeans( ## return table column means as a vector
                species_tables[["setosa"]]
                ),
          versicolor = 
            colMeans(species_tables[["versicolor"]]),
          virginica = 
            colMeans(species_tables[["virginica"]])
    )

species_means
           Sepal.Length Sepal.Width Petal.Length Petal.Width
setosa            5.006       3.428        1.462       0.246
versicolor        5.936       2.770        4.260       1.326
virginica         6.588       2.974        5.552       2.026

 

There are easier ways to run a function over the columns of a table – tomorrow!

R Plots (base) – some preparations

Finally we define our own coloring scheme:

## These colors are supposed to be easy to
## discriminate for sight-impaired people.
species_colors = setNames( ## makes a named vector
                    palette()[1:3],  rownames(species_means)
                 )
species_colors
    setosa versicolor  virginica 
   "black"  "#DF536B"  "#61D04F" 

 

## colors reflecting the organ type:
trait_colors <-
    c(Petal.Length = "orange2",Petal.Width = "yellow2",
      Sepal.Length = "blue3",Sepal.Width = "lightblue"
    )
trait_colors 
Petal.Length  Petal.Width Sepal.Length  Sepal.Width 
   "orange2"    "yellow2"      "blue3"  "lightblue" 

R Plots (base)

R Plots (base) – barplot()

Barplots represent sign and absolute value of numbers by the direction and length of bars.  

If called with a matrix as first argument, the function produces one plot for each column:

barplot(## one plot per column == trait,
        ## one bar == species mean!
        species_means,
        
        ## do not stack the bars
        beside=TRUE,
        
        ## larger group labels
        cex.names=1.5,
        
        col=species_colors,
        )

## add a legend (plot "augmentation"!)
legend("topright",
       rownames(species_means), 
       fill=species_colors,
       cex=1.5)

R Plots (base) – barplot()

If we want to plot trait means per species, we must change the rows of matrix species_means (= the species) into columns, because barplot() reads a matrix by column. 

This is done by the t() function ("transpose"):

m <-  t(species_means) ## TRANSPOSE

barplot(## one plot per column == species,
        ## one bar == trait mean!
        m,
        
        ## do not stack the bars
        beside=TRUE,
        
        ## larger group labels
        cex.names=2,
        
        col=trait_colors,
        
        ## increase y limit to fit the legend
        ylim = c(0,10),
        cex = 2
        )

## add a legend (plot "augmentation"!)
legend(x=1,y=10, ##"topright",
       rownames(m), 
       fill=trait_colors)

R Plots (base) – pie()

Piecharts are a quick-and-dirty alternative for representing numbers. 

The pie() function can only represent one set of numbers at a time. In addition, comparing angles on a piechart is visually not as easy as comparing bar heights.

pie(species_means[,"Petal.Length"], 
    labels = rownames(species_means),
    main = paste("Mean Petal Lengths in ",
                 "Fisher's Iris Species"),
    col=species_colors,
    
    cex=2, ## larger text annotation
    cex.main = 2 ## larger title
)

R Plots (base) – pie()

Piecharts are a quick-and-dirty alternative for representing numbers. 

The pie() function can only represent one set of numbers at a time. In addition, comparing angles on a piechart is visually not as easy as comparing bar heights.

pie(species_means["setosa",], 
    labels = colnames(species_means),
    main = paste("Mean Flower Organ Dimensions",
                 "in Iris setosa"),
    col=trait_colors,
    
    cex=2, ## larger text annotation
    cex.main = 2 ## larger title
)

R Plots (base) – plot()

The plot() function is an extremely versatile workhorse for x/y plots

As an “initializing” function, it may be called to just create an empty canvas, to be filled later:

## (see par() for graphical parameters!)
plot(x=NULL,y=NULL,
     
     ## Note that if you start empty, 
     ## you have to set the canvas 
     ## dimensions yourself!
     
     xlim=c(0,3),
     ylim=c(0,9),
     xlab = "x dimension",
     ylab = "y dimension"
     
)

R Plots (base) – plot()

Or it is called with an initial set of data, with the option to extend the plot later:

plot(x=iris$Petal.Width, 
     y=iris$Petal.Length
)

R Plots (base) – plot()

There is more than one way to relate x/y dimensions to data!

## Initialize a new plot, 
## using formula notation to specify x and y:
plot(Petal.Length ~ Petal.Width,
     data = iris)

R Plots (base) – plot()

Add a grid:

grid() 

R Plots (base) – plot()

Overplot some points with color, in order to identify a group in your data:

points(Petal.Length ~ Petal.Width, 
       
       data=subset(iris,
                Species == "versicolor"
            ),
       
       pch=21, # symbol code 21: bullet with
               # separate interior color (bg) 
               # and border color (col)
               ## See points()!

       ## Set color manually:
       col= "red", 
       bg = "red"  
)

R Plots (base) – plot()

Color all points by species, using our named vector species_colors:

plot(Petal.Length ~ Petal.Width, 
     data=iris,
     pch=21,
     
     ## index the "species_colors" vector
     ## by species names:
     col=species_colors[Species], 
     bg =species_colors[Species]  
     )

legend("topleft",
       rownames(species_means),
       fill=species_colors)

R Plots (base) – plot()

Points with adjacent positions in the input can be connected by lines, using different line styles. A typical use case is a line graph, with x as a running number or ID.

## See par() for line-related parameters!

## Make a new data.frame, 
## containing only setosa:
df <- subset(iris, Species=="setosa")

plot(
     # x is now the row number in df
     x=1:nrow(df),
     xlab="individual plant",
         
     y=df$Petal.Width,
     ylab="Petal.Width",
         
     ## show both points and 
     ## connecting lines:
     type="b",
     
     ## line width:
     lwd = 2,
     ## line style = dashed:
     lty=2,
         
     main="Iris setosa"
)

R Plots (base) – plot()

It can make sense to connect some points in a general scatterplot by lines. 
The augmenting function lines() can do this. 

 

Here, we want to connect the (x,y) means of our three species:

## Initialize a plain x/y plot:
plot(Petal.Length ~ Petal.Width,
     data=iris)

## Add colored mean points, connected by lines
lines(x=species_means[,"Petal.Width"],
      y=species_means[,"Petal.Length"],
      type="b", # show both points and lines 
      pch=21,
      bg=species_colors,
      cex=2.5
)

R Plots (base) – plot()

Annotate individual points:

text(x=species_means[,"Petal.Width"],   
     y=species_means[,"Petal.Length"],
     
     labels=rownames(species_means),
     
     pos=4, ## put labels to the right of points
            ## (see ?text)
     cex=3  ## expansion factor for the text
)

R Plots (base) – plot()

Function abline() adds indicator lines to a plot.

Regression line:

## Initialize a plain x/y plot:
plot(Petal.Length ~ Petal.Width,
     data=iris,
     ## Put a title:
     main = "Regression Line Example")
 
## Mark the linear regression line:
abline(lm(Petal.Length ~ Petal.Width,
          data=iris
       ),
       lty=2, lwd=2,col="blue"
       )

R Plots (base) – plot()

Function abline() adds indicator lines to a plot.

Lines marking locations or slopes of interest:

## Initialize a plain x/y plot:
plot(Petal.Length ~ Petal.Width,
     data=iris,
     main = "abline() example"
     )

## Horizontal and vertical markers:
abline(h = 2.5, col = "red", 
       lwd=2 ## line width
       )
abline(v = 0.75, col = "yellow", lwd=2)

## An "assumed" regression line for reference:
abline(a=1,b=2,lwd=2)

R Plots (base) – layout()

Several plots can be combined on the same page in a grid-like layout

 

The grid is specified by a matrix of possible plot positions, like so:

## prepare the layout matrix
m <- matrix(1:4, 
            nrow=2, 
            ncol=2, 
            byrow=FALSE)
m
     [,1] [,2]
[1,]    1    3
[2,]    2    4

 

The first plot will go to grid position 1, the second to position 2 … .

R Plots (base) – layout()

layout(m) ## read the layout matrix

use_cols =  species_colors[iris$Species]
## 1
plot(Sepal.Length ~ Sepal.Width, data=iris,
     pch=21, col=use_cols, bg=use_cols, 
     cex.lab=2)
## 2
plot(Petal.Length ~ Petal.Width, data=iris, 
     pch=21, col=use_cols, bg=use_cols, 
     cex.lab=2)
## 3
plot(Sepal.Length ~ Petal.Length, data=iris, 
     pch=21, col=use_cols, bg=use_cols, 
     cex.lab=2)
## 4
plot(Sepal.Width ~ Petal.Width, data=iris, 
     pch=21, col=use_cols, bg=use_cols, 
     cex.lab=2)

layout(1) ## back to full screen

R Plots (base) – hist()

The hist() function is one of those “dual use functions”:   

With add=FALSE, it initializes the device and the coordinate system, while 
with add=TRUE,  its output goes directly to an existing plot.

 

setosa <- subset(iris,Species=="setosa")
versicolor <- subset(iris,Species=="versicolor")
virginica <- subset(iris,Species=="virginica")
              
## Plot the histogram of setosa, 
## and initialize the entire plot: 

hist(setosa$Petal.Length,
     col=species_colors["setosa"],
     
     add=FALSE, ## this is the default
     
     ## initialize to full x range !
     xlim=range(iris$Petal.Length),
     
     ## full y range you usually 
     ## only know after some trials ..
     ylim=c(0,22),
     
     ## x-axis label
     xlab="Petal Length",
     
     ## larger axis labels:
     cex.lab = 2,
     
     main = "Petal Length Distributions",
     
     ## larger title:
     cex.main = 2
)

R Plots (base) – hist()

The hist() function is one of those “dual use functions”:   

With add=FALSE, it initializes the device and the coordinate system, while 
with add=TRUE,  its output goes directly to an existing plot.

 

hist(versicolor$Petal.Length,
     add=TRUE,
     col=species_colors["versicolor"]
)

R Plots (base) – hist()

The hist() function is one of those “dual use functions”:   

With add=FALSE, it initializes the device and the coordinate system, while 
with add=TRUE,  its output goes directly to an existing plot.  

 

hist(virginica$Petal.Length,
     add=TRUE,
     col=species_colors["virginica"]
)

legend(x=2,y=22, 
       legend=c("setosa",
                "versicolor",
                "virginica"), 
       fill=species_colors,
       cex =1.5, ## larger script in legend 
       )

R Plots (base) – boxplot()

The boxplot() function has “dual-use” capabilities, too.

However it can accept a formula with a factor on the right hand side, and it will split the dataset automatically according to the factor levels. So we can plot all species at once:

boxplot(Petal.Length ~ Species, 
        data = iris,
        ## colors are not automatically
        ## inferred from the factor levels!
        col = species_colors
        )

R Plots (base) – boxplot()

Let’s add a boxplot for the global Petal.Length distribution (all species merged):

# Repeat the last boxplot, with an extended x axis: 
boxplot(Petal.Length ~ Species, 
        data = iris, col = species_colors,
        xlim=c(0,6)
        )

boxplot(iris$Petal.Length, # take the entire column!
        add=TRUE, 
        at=5, ## position on x axis
        names="all species",
        show.names=TRUE
        )

R Plots (base) – Saving Plots From RStudio

 

R Plots (base) – Saving Plots From RStudio