Reference & Exercise

Day 1 Practice

Session 1 & 2

Exercise 1: Simple Calculations

  • Use R to calculate the following:
    • multiply 31 and 78
    • divide 697 with 41
  • Assign the value of 39 to x
  • Assign the value of 22 to y

Exercise 2: Working with Vectors

  • Create a vector called newvec with 20 elements
  • add names to the elements
  • retrieve 1st 5 elements
  • retrieve any 3 elements by name

Exercise 3: Lists and Data Frames

  • Write a R program to create a list of dataframes and access each of those data frames from the list.
  • Create a data frame from a matrix of your choice, change the row names so every row says id_i (where i is the row number) and change the column names to variable_i (where i is the column number). I.e., for column 1 it will say variable_1, and for row 2 will say id_2 and so on.

Exercise 4: Reading & Writing

  • load iris data. save the dataframe in .csv, .tsv and .xls format

Session 3

Exercise 5 [simple]

Draw a scatterplot of Petal.Length versus Sepal.Length.

Then mark the mean Petal.Length of each species by a dashed vertical line. Give each line its own color.

Hints:

  • The means are pre-computed in the “How To Draw a Bar Chart” example. You can extract the Petal.Length means from the respective column of df.
  • Use ?par to see the Graphical Parameters help page. Parameter lty sets the line type

Exercise 6 [medium]

Draw a scatterplot of Petal.Length versus Sepal.Length. Color the points by species.

Then mark the point (x=mean(Petal.Length), y=mean(Sepal.Length)) of each species with fat points on the plot. Color each point color according to its species.
Put the names of the species as text labels next to the points. Add a title to the plot.

Hints:

  • The means are pre-computed in the “How To Draw a Bar Chart” example. You can extract the Petal.Length means from the respective column of df.
  • The col parameter of plot() can take a vector argument, assigning an individual color to each point. To construct this vector from the Species column
    • first define a mapping from species to colors (a vector, where each element is a color and each element name is a species)
    • then index this vector with the Species column
  • Use ?par to see the Graphical Parameters help page. Parameter cex changes the point size. Use ?base::plot to see the help page of the plot function. Parameter “main” sets the title.

Exercise 7 [complex]

Similar to the “Organs Distributions” example, combine three histograms in one plot, one for each of the 3 species in the iris dataset. Color the 3 histograms differently. Add a title and a legend to the plot. Can you position the legend towards the top middle, such that it does not overlap the histograms?

The complexity of this is in part not a question of plotting, but of programmatically producing the input to the plot!

Hints:

  • I suggest to use tapply() to slice the iris dataset into a named list of per-species datasets. See ?tapply. If x is an index column in a data.frame df (like Species in our case), then l <- tapply(df, ~x, data.frame) returns a list named l, with as many elements as there are different x values. Each list element holds a subset of df with a fixed x value. In the tapply call, the third parameter is a function, to be a applied to the subset defined by the current index value. Here, the data.frame() function insists that the subset is to be returned as a data.frame rather than a list.

Can you transfer this to the present case?

  • Once you have the list, the scheme for plotting the histograms is as follows
    • Plot the histogram of `l[[1]]. This initializes the plot, which means that you have to set global parameters like the title, the x axis label, and the x range (xlim). Keep in mind that the histograms of all species must fit on the canvas!
    • Loop over the list elements 2 and 3 and plot their histograms.
  • For the legend, see the “How To Draw a Histogram” example. However if you just copy the example, then the legend will partly cover the histograms. Use ?legend and check the meaning of the x and y arguments. Try to find an optimal position!
library(ggplot2)

Exercise 8: Simple Scatter Plot

  • Objective: Create a scatter plot of Sepal.Length vs. Sepal.Width from the iris dataset.
  • Difficulty: Easy
  • Estimated Time: 5-10 minutes (for beginners)
# Load the iris dataset
data(iris)

# Create a scatter plot with ggplot
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + 
  geom_point()

  • Your Task: Color the points by Species and add labels to the x and y axes.
  • Difficulty: Easy
  • Estimated Time: 5-10 minutes

Exercise 9: Adding Titles and Labels

  • Objective: Use the plot from Exercise 1 and add a title, x-axis label, and y-axis label.
  • Difficulty: Easyy
  • Estimated Time: 5 minutes
# Create scatter plot with labels
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point() +
  ggtitle("Scatter plot of Sepal Dimensions") +
  xlab("Sepal Length (cm)") +
  ylab("Sepal Width (cm)")

  • Your Task: Modify the plot to include a subtitle and caption.
  • Difficulty: Easy
  • Estimated Time: 5 minutes

Exercise 10: Bar Plot of Species Count

  • Objective: Create a bar plot showing the count of each species in the iris dataset.
  • Difficulty: Easy
  • Estimated Time: 5-10 minutes
# Create a bar plot of species count
ggplot(iris, aes(x = Species)) +
  geom_bar()

  • Your Task: Modify the plot to use different colors for each species.
  • Difficulty: Easy
  • Estimated Time: 5-10 minutes

Exercise 11: Boxplot of Petal Length by Species

  • Objective: Create a boxplot of Petal.Length for each species in the iris dataset.
  • Difficulty: Medium
  • Estimated Time: 5-10 minutes
# Create a boxplot of Petal Length by Species
ggplot(iris, aes(x = Species, y = Petal.Length)) +
  geom_boxplot()

  • Your Task: Customize the plot by changing the colors of the boxplots and adding a theme.
  • Difficulty: Medium
  • Estimated Time: 10-15 minutes

Exercise 12: Histogram of Sepal Length

  • Objective: Create a histogram of Sepal.Length in the iris dataset.
  • Difficulty: Easy
  • Estimated Time: 5-10 minutes
# Create a histogram of Sepal Length
ggplot(iris, aes(x = Sepal.Length)) +
  geom_histogram(binwidth = 0.3, fill = "lightblue", color = "black")

  • Your Task: Change the number of bins and add a title to the histogram.
  • Difficulty: Easy
  • Estimated Time: 5-10 minutes

Exercise 13: Faceting by Species

  • Objective: Use faceting to create multiple scatter plots for each species in the iris dataset.
  • Difficulty: Medium
  • Estimated Time: 10 minutes
# Facet scatter plot by Species
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point() +
  facet_wrap(~ Species)

  • Your Task: Customize the facet labels and add a theme to the plot.
  • Difficulty: Medium
  • Estimated Time: 15 minutes