Reference & Exercise
Day 1 Practice
Session 1 & 2
Exercise 1: Simple Calculations
- Use R to calculate the following:
- multiply 31 and 78
- divide 697 with 41
- Assign the value of 39 to x
- Assign the value of 22 to y
Exercise 2: Working with Vectors
- Create a vector called
newvec
with 20 elements - add names to the elements
- retrieve 1st 5 elements
- retrieve any 3 elements by name
Exercise 3: Lists and Data Frames
- Write a R program to create a list of dataframes and access each of those data frames from the list.
- Create a data frame from a matrix of your choice, change the row names so every row says id_i (where i is the row number) and change the column names to variable_i (where i is the column number). I.e., for column 1 it will say variable_1, and for row 2 will say id_2 and so on.
Exercise 4: Reading & Writing
- load
iris
data. save the dataframe in .csv, .tsv and .xls format
Session 3
Exercise 5 [simple]
Draw a scatterplot of Petal.Length versus Sepal.Length.
Then mark the mean Petal.Length of each species by a dashed vertical line. Give each line its own color.
Hints:
- The means are pre-computed in the “How To Draw a Bar Chart” example.
You can extract the Petal.Length means from the respective column of
df
. - Use
?par
to see the Graphical Parameters help page. Parameterlty
sets the line type
Exercise 6 [medium]
Draw a scatterplot of Petal.Length versus Sepal.Length. Color the points by species.
Then mark the point (x=mean(Petal.Length), y=mean(Sepal.Length)) of
each species with fat points on the plot. Color each point color
according to its species.
Put the names of the species as text labels next to the points. Add a
title to the plot.
Hints:
- The means are pre-computed in the “How To Draw a Bar Chart” example.
You can extract the Petal.Length means from the respective column of
df
. - The
col
parameter ofplot()
can take a vector argument, assigning an individual color to each point. To construct this vector from the Species column- first define a mapping from species to colors (a vector, where each element is a color and each element name is a species)
- then index this vector with the Species column
- Use
?par
to see the Graphical Parameters help page. Parametercex
changes the point size. Use ?base::plot to see the help page of the plot function. Parameter “main” sets the title.
Exercise 7 [complex]
Similar to the “Organs Distributions” example, combine three histograms in one plot, one for each of the 3 species in the iris dataset. Color the 3 histograms differently. Add a title and a legend to the plot. Can you position the legend towards the top middle, such that it does not overlap the histograms?
The complexity of this is in part not a question of plotting, but of programmatically producing the input to the plot!
Hints:
- I suggest to use
tapply()
to slice the iris dataset into a named list of per-species datasets. See?tapply
. Ifx
is an index column in a data.framedf
(likeSpecies
in our case), thenl <- tapply(df, ~x, data.frame
) returns a list namedl
, with as many elements as there are differentx
values. Each list element holds a subset ofdf
with a fixedx
value. In thetapply
call, the third parameter is a function, to be a applied to the subset defined by the current index value. Here, thedata.frame()
function insists that the subset is to be returned as a data.frame rather than a list.
Can you transfer this to the present case?
- Once you have the list, the scheme for plotting the histograms is as
follows
- Plot the histogram of `
l[[1]]
. This initializes the plot, which means that you have to set global parameters like the title, the x axis label, and the x range (xlim
). Keep in mind that the histograms of all species must fit on the canvas! - Loop over the list elements 2 and 3 and plot their histograms.
- Plot the histogram of `
- For the legend, see the “How To Draw a Histogram” example. However
if you just copy the example, then the legend will partly cover the
histograms. Use
?legend
and check the meaning of thex
andy
arguments. Try to find an optimal position!
Exercise 8: Simple Scatter Plot
- Objective: Create a scatter plot of
Sepal.Length
vs.Sepal.Width
from theiris
dataset. - Difficulty: Easy
- Estimated Time: 5-10 minutes (for beginners)
# Load the iris dataset
data(iris)
# Create a scatter plot with ggplot
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point()
- Your Task: Color the points by
Species
and add labels to the x and y axes. - Difficulty: Easy
- Estimated Time: 5-10 minutes
Exercise 9: Adding Titles and Labels
- Objective: Use the plot from Exercise 1 and add a title, x-axis label, and y-axis label.
- Difficulty: Easyy
- Estimated Time: 5 minutes
# Create scatter plot with labels
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point() +
ggtitle("Scatter plot of Sepal Dimensions") +
xlab("Sepal Length (cm)") +
ylab("Sepal Width (cm)")
- Your Task: Modify the plot to include a subtitle and caption.
- Difficulty: Easy
- Estimated Time: 5 minutes
Exercise 10: Bar Plot of Species Count
- Objective: Create a bar plot showing the count of
each species in the
iris
dataset. - Difficulty: Easy
- Estimated Time: 5-10 minutes
- Your Task: Modify the plot to use different colors for each species.
- Difficulty: Easy
- Estimated Time: 5-10 minutes
Exercise 11: Boxplot of Petal Length by Species
- Objective: Create a boxplot of
Petal.Length
for each species in theiris
dataset. - Difficulty: Medium
- Estimated Time: 5-10 minutes
# Create a boxplot of Petal Length by Species
ggplot(iris, aes(x = Species, y = Petal.Length)) +
geom_boxplot()
- Your Task: Customize the plot by changing the colors of the boxplots and adding a theme.
- Difficulty: Medium
- Estimated Time: 10-15 minutes
Exercise 12: Histogram of Sepal Length
- Objective: Create a histogram of
Sepal.Length
in theiris
dataset. - Difficulty: Easy
- Estimated Time: 5-10 minutes
# Create a histogram of Sepal Length
ggplot(iris, aes(x = Sepal.Length)) +
geom_histogram(binwidth = 0.3, fill = "lightblue", color = "black")
- Your Task: Change the number of bins and add a title to the histogram.
- Difficulty: Easy
- Estimated Time: 5-10 minutes
Exercise 13: Faceting by Species
- Objective: Use faceting to create multiple scatter
plots for each species in the
iris
dataset. - Difficulty: Medium
- Estimated Time: 10 minutes
# Facet scatter plot by Species
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
facet_wrap(~ Species)
- Your Task: Customize the facet labels and add a theme to the plot.
- Difficulty: Medium
- Estimated Time: 15 minutes