Introduction to R and Basic Programming Concepts
Bioinformatics Core Facility CECAD
2025-05-13
Session 5 :: Visualization
First we will pre-compute the mean values of each flower trait in each species for later use.
Sepal.Length Sepal.Width Petal.Length Petal.Width
setosa 5.006 3.428 1.462 0.246
versicolor 5.936 2.770 4.260 1.326
virginica 6.588 2.974 5.552 2.026
There are easier ways to run a function over the columns of a table – tomorrow!
Finally we define our own coloring scheme:
barplot()Barplots represent sign and absolute value of numbers by the direction and length of bars.
If called with a matrix as first argument, the function produces one plot for each column:
barplot()If we want to plot trait means per species, we must change the rows of matrix species_means (= the species) into columns, because barplot() reads a matrix by column.
This is done by the t() function ("transpose"):
m <- t(species_means) ## TRANSPOSE
barplot(## one plot per column == species,
## one bar == trait mean!
m,
## do not stack the bars
beside=TRUE,
## larger group labels
cex.names=2,
col=trait_colors,
## increase y limit to fit the legend
ylim = c(0,10),
cex = 2
)
## add a legend (plot "augmentation"!)
legend(x=1,y=10, ##"topright",
rownames(m),
fill=trait_colors)pie()Piecharts are a quick-and-dirty alternative for representing numbers.
The pie() function can only represent one set of numbers at a time. In addition, comparing angles on a piechart is visually not as easy as comparing bar heights.
pie()Piecharts are a quick-and-dirty alternative for representing numbers.
The pie() function can only represent one set of numbers at a time. In addition, comparing angles on a piechart is visually not as easy as comparing bar heights.
plot()The plot() function is an extremely versatile workhorse for x/y plots.
As an “initializing” function, it may be called to just create an empty canvas, to be filled later:
plot()Or it is called with an initial set of data, with the option to extend the plot later:
plot()plot()plot()Overplot some points with color, in order to identify a group in your data:
plot()Color all points by species, using our named vector species_colors:
plot()Points with adjacent positions in the input can be connected by lines, using different line styles. A typical use case is a line graph, with x as a running number or ID.
## See par() for line-related parameters!
## Make a new data.frame,
## containing only setosa:
df <- subset(iris, Species=="setosa")
plot(
# x is now the row number in df
x=1:nrow(df),
xlab="individual plant",
y=df$Petal.Width,
ylab="Petal.Width",
## show both points and
## connecting lines:
type="b",
## line width:
lwd = 2,
## line style = dashed:
lty=2,
main="Iris setosa"
)plot()It can make sense to connect some points in a general scatterplot by lines.
The augmenting function lines() can do this.
Here, we want to connect the (x,y) means of our three species:
plot()Annotate individual points:
plot()Function abline() adds indicator lines to a plot.
plot()Function abline() adds indicator lines to a plot.
Lines marking locations or slopes of interest:
layout()Several plots can be combined on the same page in a grid-like layout.
The grid is specified by a matrix of possible plot positions, like so:
The first plot will go to grid position 1, the second to position 2 … .
layout()layout(m) ## read the layout matrix
use_cols = species_colors[iris$Species]
## 1
plot(Sepal.Length ~ Sepal.Width, data=iris,
pch=21, col=use_cols, bg=use_cols,
cex.lab=2)
## 2
plot(Petal.Length ~ Petal.Width, data=iris,
pch=21, col=use_cols, bg=use_cols,
cex.lab=2)
## 3
plot(Sepal.Length ~ Petal.Length, data=iris,
pch=21, col=use_cols, bg=use_cols,
cex.lab=2)
## 4
plot(Sepal.Width ~ Petal.Width, data=iris,
pch=21, col=use_cols, bg=use_cols,
cex.lab=2)hist()The hist() function is one of those “dual use functions”:
With add=FALSE, it initializes the device and the coordinate system, while
with add=TRUE, its output goes directly to an existing plot.
setosa <- subset(iris,Species=="setosa")
versicolor <- subset(iris,Species=="versicolor")
virginica <- subset(iris,Species=="virginica")
## Plot the histogram of setosa,
## and initialize the entire plot:
hist(setosa$Petal.Length,
col=species_colors["setosa"],
add=FALSE, ## this is the default
## initialize to full x range !
xlim=range(iris$Petal.Length),
## full y range you usually
## only know after some trials ..
ylim=c(0,22),
## x-axis label
xlab="Petal Length",
## larger axis labels:
cex.lab = 2,
main = "Petal Length Distributions",
## larger title:
cex.main = 2
)hist()The hist() function is one of those “dual use functions”:
With add=FALSE, it initializes the device and the coordinate system, while
with add=TRUE, its output goes directly to an existing plot.
hist()The hist() function is one of those “dual use functions”:
With add=FALSE, it initializes the device and the coordinate system, while
with add=TRUE, its output goes directly to an existing plot.
boxplot()The boxplot() function has “dual-use” capabilities, too.
However it can accept a formula with a factor on the right hand side, and it will split the dataset automatically according to the factor levels. So we can plot all species at once:
boxplot()Let’s add a boxplot for the global Petal.Length distribution (all species merged):
Saving Plots From RStudio
Saving Plots From RStudio