Reference & Exercise
Day 1 Practice
Session 1 & 2
Exercise 1: Simple Calculations
- Use R to calculate the following:
- multiply 31 and 78
- divide 697 with 41
- Assign the value of 39 to x
- Assign the value of 22 to y
Exercise 2: Working with Vectors
- Create a vector called
newvecwith 20 elements - add names to the elements
- retrieve 1st 5 elements
- retrieve any 3 elements by name
Exercise 3: Lists and Data Frames
- Write a R program to create a list of dataframes and access each of those data frames from the list.
- Create a data frame from a matrix of your choice, change the row names so every row says id_i (where i is the row number) and change the column names to variable_i (where i is the column number). I.e., for column 1 it will say variable_1, and for row 2 will say id_2 and so on.
Exercise 4: Reading & Writing
- load
irisdata. save the dataframe in .csv, .tsv and .xls format
Session 3
Exercise 5 [simple]
Draw a scatterplot of Petal.Length versus Sepal.Length.
Then mark the mean Petal.Length of each species by a dashed vertical line. Give each line its own color.
Hints:
- The means are pre-computed in the “How To Draw a Bar Chart” example.
You can extract the Petal.Length means from the respective column of
df. - Use
?parto see the Graphical Parameters help page. Parameterltysets the line type
Exercise 6 [medium]
Draw a scatterplot of Petal.Length versus Sepal.Length. Color the points by species.
Then mark the point (x=mean(Petal.Length), y=mean(Sepal.Length)) of
each species with fat points on the plot. Color each point color
according to its species.
Put the names of the species as text labels next to the points. Add a
title to the plot.
Hints:
- The means are pre-computed in the “How To Draw a Bar Chart” example.
You can extract the Petal.Length means from the respective column of
df. - The
colparameter ofplot()can take a vector argument, assigning an individual color to each point. To construct this vector from the Species column- first define a mapping from species to colors (a vector, where each element is a color and each element name is a species)
- then index this vector with the Species column
- Use
?parto see the Graphical Parameters help page. Parametercexchanges the point size. Use ?base::plot to see the help page of the plot function. Parameter “main” sets the title.
Exercise 7 [complex]
Similar to the “Organs Distributions” example, combine three histograms in one plot, one for each of the 3 species in the iris dataset. Color the 3 histograms differently. Add a title and a legend to the plot. Can you position the legend towards the top middle, such that it does not overlap the histograms?
The complexity of this is in part not a question of plotting, but of programmatically producing the input to the plot!
Hints:
- I suggest to use
tapply()to slice the iris dataset into a named list of per-species datasets. See?tapply. Ifxis an index column in a data.framedf(likeSpeciesin our case), thenl <- tapply(df, ~x, data.frame) returns a list namedl, with as many elements as there are differentxvalues. Each list element holds a subset ofdfwith a fixedxvalue. In thetapplycall, the third parameter is a function, to be a applied to the subset defined by the current index value. Here, thedata.frame()function insists that the subset is to be returned as a data.frame rather than a list.
Can you transfer this to the present case?
- Once you have the list, the scheme for plotting the histograms is as
follows
- Plot the histogram of `
l[[1]]. This initializes the plot, which means that you have to set global parameters like the title, the x axis label, and the x range (xlim). Keep in mind that the histograms of all species must fit on the canvas! - Loop over the list elements 2 and 3 and plot their histograms.
- Plot the histogram of `
- For the legend, see the “How To Draw a Histogram” example. However
if you just copy the example, then the legend will partly cover the
histograms. Use
?legendand check the meaning of thexandyarguments. Try to find an optimal position!
Exercise 8: Simple Scatter Plot
- Objective: Create a scatter plot of
Sepal.Lengthvs.Sepal.Widthfrom theirisdataset. - Difficulty: Easy
- Estimated Time: 5-10 minutes (for beginners)
# Load the iris dataset
data(iris)
# Create a scatter plot with ggplot
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point()- Your Task: Color the points by
Speciesand add labels to the x and y axes. - Difficulty: Easy
- Estimated Time: 5-10 minutes
Exercise 9: Adding Titles and Labels
- Objective: Use the plot from Exercise 1 and add a title, x-axis label, and y-axis label.
- Difficulty: Easyy
- Estimated Time: 5 minutes
# Create scatter plot with labels
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point() +
ggtitle("Scatter plot of Sepal Dimensions") +
xlab("Sepal Length (cm)") +
ylab("Sepal Width (cm)")- Your Task: Modify the plot to include a subtitle and caption.
- Difficulty: Easy
- Estimated Time: 5 minutes
Exercise 10: Bar Plot of Species Count
- Objective: Create a bar plot showing the count of
each species in the
irisdataset. - Difficulty: Easy
- Estimated Time: 5-10 minutes
- Your Task: Modify the plot to use different colors for each species.
- Difficulty: Easy
- Estimated Time: 5-10 minutes
Exercise 11: Boxplot of Petal Length by Species
- Objective: Create a boxplot of
Petal.Lengthfor each species in theirisdataset. - Difficulty: Medium
- Estimated Time: 5-10 minutes
# Create a boxplot of Petal Length by Species
ggplot(iris, aes(x = Species, y = Petal.Length)) +
geom_boxplot()- Your Task: Customize the plot by changing the colors of the boxplots and adding a theme.
- Difficulty: Medium
- Estimated Time: 10-15 minutes
Exercise 12: Histogram of Sepal Length
- Objective: Create a histogram of
Sepal.Lengthin theirisdataset. - Difficulty: Easy
- Estimated Time: 5-10 minutes
# Create a histogram of Sepal Length
ggplot(iris, aes(x = Sepal.Length)) +
geom_histogram(binwidth = 0.3, fill = "lightblue", color = "black")- Your Task: Change the number of bins and add a title to the histogram.
- Difficulty: Easy
- Estimated Time: 5-10 minutes
Exercise 13: Faceting by Species
- Objective: Use faceting to create multiple scatter
plots for each species in the
irisdataset. - Difficulty: Medium
- Estimated Time: 10 minutes
# Facet scatter plot by Species
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
facet_wrap(~ Species)- Your Task: Customize the facet labels and add a theme to the plot.
- Difficulty: Medium
- Estimated Time: 15 minutes