Exercises and solutions for Chapter 2

Computations in R

Sum 2 and 3 using the + operator. [Difficulty: Beginner]

solution:

2+3

Take the square root of 36, use sqrt(). [Difficulty: Beginner]

solution:

sqrt(36)

Take the log10 of 1000, use function log10(). [Difficulty: Beginner]

solution:

log10(1000)

Take the log2 of 32, use function log2(). [Difficulty: Beginner]

solution:

log2(32)

Assign the sum of 2,3 and 4 to variable x. [Difficulty: Beginner]

solution:

x =  2+3+4
x <- 2+3+4

Find the absolute value of the expression 5 - 145 using the abs() function. [Difficulty: Beginner]

solution:

abs(5-145)

Calculate the square root of 625, divide it by 5, and assign it to variable x.Ex: y= log10(1000)/5, the previous statement takes log10 of 1000, divides it by 5, and assigns the value to variable y. [Difficulty: Beginner]

solution:

x = sqrt(625)/5

Multiply the value you get from previous exercise by 10000, assign it to variable x Ex: y=y*5, multiplies y by 5 and assigns the value to y. KEY CONCEPT: results of computations or arbitrary values can be stored in variables we can re-use those variables later on and over-write them with new values. [Difficulty: Beginner]

solution:

x2 = x*10000

Data structures in R

Make a vector of 1,2,3,5 and 10 using c(), and assign it to the vec variable. Ex: vec1=c(1,3,4) makes a vector out of 1,3,4. [Difficulty: Beginner]

solution:

c(1:5,10)
vec1=c(1,2,3,4,5,10)

Check the length of your vector with length(). Ex: length(vec1) should return 3. [Difficulty: Beginner]

solution:

length(vec1)

Make a vector of all numbers between 2 and 15. Ex: vec=1:6 makes a vector of numbers between 1 and 6, and assigns it to the vec variable. [Difficulty: Beginner]

solution:

vec=2:15

Make a vector of 4s repeated 10 times using the rep() function. Ex: rep(x=2,times=5) makes a vector of 2s repeated 5 times. [Difficulty: Beginner]

solution:

rep(x=4,times=10)
rep(4,10)

Make a logical vector with TRUE, FALSE values of length 4, use c(). Ex: c(TRUE,FALSE). [Difficulty: Beginner]

solution:

c(TRUE,FALSE,FALSE,TRUE,FALSE)
c(TRUE,TRUE,FALSE,TRUE,FALSE)

Make a character vector of the gene names PAX6,ZIC2,OCT4 and SOX2. Ex: avec=c("a","b","c") makes a character vector of a,b and c. [Difficulty: Beginner]

solution:

c("PAX6","ZIC2","OCT4","SOX2")

Subset the vector using [] notation, and get the 5th and 6th elements. Ex: vec1[1] gets the first element. vec1[c(1,3)] gets the 1st and 3rd elements. [Difficulty: Beginner]

solution:

vec1[c(5,6)]

You can also subset any vector using a logical vector in []. Run the following:

myvec=1:5
# the length of the logical vector 
# should be equal to length(myvec) 
myvec[c(TRUE,TRUE,FALSE,FALSE,FALSE)] 
myvec[c(TRUE,FALSE,FALSE,FALSE,TRUE)]

[Difficulty: Beginner]

solution:

myvec=1:5
myvec[c(TRUE,TRUE,FALSE,FALSE,FALSE)] # the length of the logical vector should be equal to length(myvec) 
myvec[c(TRUE,FALSE,FALSE,FALSE,TRUE)]

==,>,<, >=, <= operators create logical vectors. See the results of the following operations:

myvec > 3
myvec == 4
myvec <= 2
myvec != 4

[Difficulty: Beginner]

Use the > operator in myvec[ ] to get elements larger than 2 in myvec which is described above. [Difficulty: Beginner]

solution:

myvec[ myvec > 2 ]

Make a 5x3 matrix (5 rows, 3 columns) using matrix(). Ex: matrix(1:6,nrow=3,ncol=2) makes a 3x2 matrix using numbers between 1 and 6. [Difficulty: Beginner]

solution:

mat=matrix(1:15,nrow=5,ncol=3)

What happens when you use byrow = TRUE in your matrix() as an additional argument? Ex: mat=matrix(1:6,nrow=3,ncol=2,byrow = TRUE). [Difficulty: Beginner]

solution:

mat=matrix(1:15,nrow=5,ncol=3,byrow = TRUE)

Extract the first 3 columns and first 3 rows of your matrix using [] notation. [Difficulty: Beginner]

solution:

mat[1:3,1:3]

Extract the last two rows of the matrix you created earlier. Ex: mat[2:3,] or mat[c(2,3),] extracts the 2nd and 3rd rows. [Difficulty: Beginner]

solution:

mat[4:5,] 
mat[c(nrow(mat)-1,nrow(mat)),] 
tail(mat,n=1)
tail(mat,n=2)

Extract the first two columns and run class() on the result. [Difficulty: Beginner]

solution:

class(mat[,1:2])

Extract the first column and run class() on the result, compare with the above exercise. [Difficulty: Beginner]

solution:

class(mat[,1])

Make a data frame with 3 columns and 5 rows. Make sure first column is a sequence of numbers 1:5, and second column is a character vector. Ex: df=data.frame(col1=1:3,col2=c("a","b","c"),col3=3:1) # 3x3 data frame. Remember you need to make a 3x5 data frame. [Difficulty: Beginner]

solution:

df=data.frame(col1=1:5,col2=c("a","2","3","b","c"),col3=5:1)

Extract the first two columns and first two rows. HINT: Use the same notation as matrices. [Difficulty: Beginner]

solution:

df[,1:2]

df[1:2,]
df[1:2,1:2]

Extract the last two rows of the data frame you made. HINT: Same notation as matrices. [Difficulty: Beginner]

solution:

df[,4:5]

Extract the last two columns using the column names of the data frame you made. [Difficulty: Beginner]

solution:

df[,c("col2","col3")]

Extract the second column using the column names. You can use [] or $ as in lists; use both in two different answers. [Difficulty: Beginner]

solution:

df$col2
df[,"col2"]
class(df["col1"])
class(df[,"col1"])

Extract rows where the 1st column is larger than 3. HINT: You can get a logical vector using the > operator , and logical vectors can be used in [] when subsetting. [Difficulty: Beginner]

solution:

df[df$col1 >3 , ]

Extract rows where the 1st column is larger than or equal to 3. [Difficulty: Beginner]

solution:

df[df$col1 >= 3 , ]

Convert a data frame to the matrix. HINT: Use as.matrix(). Observe what happens to numeric values in the data frame. [Difficulty: Beginner]

solution:

class(df[,c(1,3)])
as.matrix(df[,c(1,3)])
as.matrix(df)

Make a list using the list() function. Your list should have 4 elements; the one below has 2. Ex: mylist= list(a=c(1,2,3),b=c("apple,"orange")) [Difficulty: Beginner]

solution:

mylist= list(a=c(1,2,3),
           b=c("apple","orange"),
           c=matrix(1:4,nrow=2),
           d="hello")

Select the 1st element of the list you made using $ notation. Ex: mylist$a selects first element named "a". [Difficulty: Beginner]

solution:

mylist$a

Select the 4th element of the list you made earlier using $ notation. [Difficulty: Beginner]

solution:

mylist$d

Select the 1st element of your list using [ ] notation. Ex: mylist[1] selects the first element named "a", and you get a list with one element. mylist["a"] selects the first element named "a", and you get a list with one element. [Difficulty: Beginner]

solution:

mylist[1] # -> still a list
mylist[[1]] # not a list

mylist["a"] 
mylist[["a"]]

Select the 4th element of your list using [ ] notation. [Difficulty: Beginner]

solution:

mylist[4] 
mylist[[4]]

Make a factor using factor(), with 5 elements. Ex: fa=factor(c("a","a","b")). [Difficulty: Beginner]

solution:

fa=factor(c("a","a","b","c","c"))

Convert a character vector to a factor using as.factor(). First, make a character vector using c() then use as.factor(). [Difficulty: Intermediate]

solution:

my.vec=c("a","a","b","c","c")
fa=as.factor(my.vec)
fa

Convert the factor you made above to a character using as.character(). [Difficulty: Beginner]

solution:

fa
as.character(fa)

Reading in and writing data out in R

Read CpG island (CpGi) data from the compGenomRData package CpGi.table.hg18.txt. This is a tab-separated file. Store it in a variable called cpgi. Use

cpgFilePath=system.file("extdata",
                        "CpGi.table.hg18.txt",
                        package="compGenomRData")

to get the file path within the installed compGenomRData package. [Difficulty: Beginner]

solution:

cpgFilePath
cpgi=read.table(file=cpgFilePath,header=TRUE,sep="\t")

Use head() on CpGi to see the first few rows. [Difficulty: Beginner]

solution:

head(cpgi)

Why doesn't the following work? See sep argument at help(read.table). [Difficulty: Beginner]

cpgtFilePath=system.file("extdata",
                "CpGi.table.hg18.txt",
                package="compGenomRData")
cpgtFilePath
cpgiSepComma=read.table(cpgtFilePath,header=TRUE,sep=",")
head(cpgiSepComma)

solution:

cpgiSepComma=read.table("../data/CpGi.table.hg18.txt",header=TRUE,sep=",")
head(cpgiSepComma)

What happens when you set stringsAsFactors=FALSE in read.table()? [Difficulty: Beginner]

cpgiHF=read.table("intro2R_data/data/CpGi.table.hg18.txt",
                     header=FALSE,sep="\t",
                     stringsAsFactors=FALSE)

solution: The character column is now read as character instead of factor.

head(cpgiHF)
head(cpgi)
class(cpgiHF$V2)
class(cpgiHF$V2)

Read only the first 10 rows of the CpGi table. [Difficulty: Beginner/Intermediate]

solution:

cpgi10row=read.table("../data/CpGi.table.hg18.txt",header=TRUE,sep="\t",nrow=10)
cpgi10row

Use cpgFilePath=system.file("extdata","CpGi.table.hg18.txt", package="compGenomRData") to get the file path, then use read.table() with argument header=FALSE. Use head() to see the results. [Difficulty: Beginner]

solution:

df=read.table(cpgFilePath,header=FALSE,sep="\t")
head(df)

Write CpG islands to a text file called "my.cpgi.file.txt". Write the file to your home folder; you can use file="~/my.cpgi.file.txt" in linux. ~/ denotes home folder.[Difficulty: Beginner]

solution:

write.table(cpgi,file="~/my.cpgi.file.txt")

Same as above but this time make sure to use the quote=FALSE,sep="\t" and row.names=FALSE arguments. Save the file to "my.cpgi.file2.txt" and compare it with "my.cpgi.file.txt". [Difficulty: Beginner]

solution:

write.table(cpgi,file="~/my.cpgi.file2.txt",quote=FALSE,sep="\t",row.names=FALSE)

Write out the first 10 rows of the cpgi data frame. HINT: Use subsetting for data frames we learned before. [Difficulty: Beginner]

solution:

write.table(cpgi[1:10,],file="~/my.cpgi.fileNrow10.txt",quote=FALSE,sep="\t")

Write the first 3 columns of the cpgi data frame. [Difficulty: Beginner]

solution:

dfnew=cpgi[,1:3]
write.table(dfnew,file="~/my.cpgi.fileCol3.txt",quote=FALSE,sep="\t")

Write CpG islands only on chr1. HINT: Use subsetting with [], feed a logical vector using == operator.[Difficulty: Beginner/Intermediate]

solution:

write.table(cpgi[cpgi$chrom == "chr1",],file="~/my.cpgi.fileChr1.txt",
            quote=FALSE,sep="\t",row.names=FALSE)
head(cpgi[cpgi$chrom == "chr1",])

Read two other data sets "rn4.refseq.bed" and "rn4.refseq2name.txt" with header=FALSE, and assign them to df1 and df2 respectively. They are again included in the compGenomRData package, and you can use the system.file() function to get the file paths. [Difficulty: Beginner]

solution:

df1=read.table("../data/rn4.refseq.bed",sep="\t",header=FALSE)
df2=read.table("../data/rn4.refseq2name.txt",sep="\t",header=FALSE)

Use head() to see what is inside the data frames above. [Difficulty: Beginner]

solution:

head(df1)
head(df2)

Merge data sets using merge() and assign the results to a variable named 'new.df', and use head() to see the results. [Difficulty: Intermediate]

solution:

new.df=merge(df1,df2,by.x="V4",by.y="V1")
head(new.df)

Plotting in R

Please run the following code snippet for the rest of the exercises.

set.seed(1001)
x1=1:100+rnorm(100,mean=0,sd=15)
y1=1:100

Make a scatter plot using the x1 and y1 vectors generated above. [Difficulty: Beginner]

solution:

plot(x1,y1)

Use the main argument to give a title to plot() as in plot(x,y,main="title"). [Difficulty: Beginner]

solution:

plot(x1,y1,main="scatter plot")

Use the xlab argument to set a label for the x-axis. Use ylab argument to set a label for the y-axis. [Difficulty: Beginner]

solution:

plot(x1,y1,main="scatter plot",xlab="x label")

Once you have the plot, run the following expression in R console. mtext(side=3,text="hi there") does. HINT: mtext stands for margin text. [Difficulty: Beginner]

solution:

plot(x1,y1,main="scatter plot",xlab="x label",ylab="y label")
mtext(side=3,text="hi there")

See what mtext(side=2,text="hi there") does. Check your plot after execution. [Difficulty: Beginner]

solution:

mtext(side=2,text="hi there") 
mtext(side=4,text="hi there")

Use mtext() and paste() to put a margin text on the plot. You can use paste() as 'text' argument in mtext(). HINT: mtext(side=3,text=paste(...)). See how paste() is used for below. [Difficulty: Beginner/Intermediate]

paste("Text","here")
myText=paste("Text","here")
myText

solution:

mtext(side=3,text=paste("here","here"))

cor() calculates the correlation between two vectors. Pearson correlation is a measure of the linear correlation (dependence) between two variables X and Y. Try using the cor() function on the x1 and y1 variables. [Difficulty: Intermediate]

solution:

corxy=cor(x1,y1) # calculates pearson correlation

Try to use mtext(),cor() and paste() to display the correlation coefficient on your scatter plot. [Difficulty: Intermediate]

solution:

plot(x1,y1,main="scatter")
corxy=cor(x1,y1) 
#mtext(side=3,text=paste("Pearson Corr.",corxy))
mtext(side=3,text=paste("Pearson Corr.",round(corxy,3) ) )

plot(x1,y1)
mtext(side=3,text=paste("Pearson Corr.",round( cor(x1,y1)  ,3)  ) )

Change the colors of your plot using the col argument. Ex: plot(x,y,col="red"). [Difficulty: Beginner]

solution:

plot(x1,y1,col="red")

Use pch=19 as an argument in your plot() command. [Difficulty: Beginner]

solution:

plot(x1,y1,col="red",pch=19)

Use pch=18 as an argument to your plot() command. [Difficulty: Beginner]

solution:

plot(x1,y1,col="red",pch=18)
?points

Make a histogram of x1 with the hist() function. A histogram is a graphical representation of the data distribution. [Difficulty: Beginner]

solution:

hist(x1)

You can change colors with 'col', add labels with 'xlab', 'ylab', and add a 'title' with 'main' arguments. Try all these in a histogram. [Difficulty: Beginner]

solution:

hist(x1, col = "red", xlab = "Distribution of X1", ylab = "Frequency Distribution", main = "Histogram of X1")

Make a boxplot of y1 with boxplot().[Difficulty: Beginner]

solution:

boxplot(y1,main="title")

Make boxplots of x1 and y1 vectors in the same plot.[Difficulty: Beginner]

solution:

boxplot(x1,y1,ylab="values",main="title")

In boxplot, use the horizontal = TRUE argument. [Difficulty: Beginner]

solution:

boxplot(x1,y1,ylab="values",main="title",horizontal=TRUE)

Make multiple plots with par(mfrow=c(2,1))
- run par(mfrow=c(2,1))
- make a boxplot
- make a histogram [Difficulty: Beginner/Intermediate]

solution:

par( mfrow=c(1,2) )
hist(x1)
boxplot(y1)

Do the same as above but this time with par(mfrow=c(1,2)). [Difficulty: Beginner/Intermediate]

solution:

par(mfrow=c(2,2))
hist(x1)
boxplot(y1)

Save your plot using the "Export" button in Rstudio. [Difficulty: Beginner]

solution: find and press Export button

You can make a scatter plot showing the density of points rather than points themselves. If you use points it looks like this:


x2=1:1000+rnorm(1000,mean=0,sd=200)
y2=1:1000
plot(x2,y2,pch=19,col="blue")

If you use the smoothScatter() function, you get the densities.

smoothScatter(x2,y2,
              colramp=colorRampPalette(c("white","blue",
                                         "green","yellow","red")))

Now, plot with the colramp=heat.colors argument and then use a custom color scale using the following argument.

colramp = colorRampPalette(c("white","blue", "green","yellow","red")))

[Difficulty: Beginner/Intermediate]

solution:

smoothScatter(x2,y2,colramp = heat.colors )
smoothScatter(x2,y2,
               colramp = colorRampPalette(c("white","blue", "green","yellow","red")))

Functions and control structures (for, if/else, etc.)

Read CpG island data as shown below for the rest of the exercises.

cpgtFilePath=system.file("extdata",
                "CpGi.table.hg18.txt",
                package="compGenomRData")
cpgi=read.table(cpgtFilePath,header=TRUE,sep="\t")
head(cpgi)

Check values in the perGc column using a histogram. The 'perGc' column in the data stands for GC percent => percentage of C+G nucleotides. [Difficulty: Beginner]

solution:

hist(cpgi$perGc) # most values are between 60 and 70

Make a boxplot for the 'perGc' column. [Difficulty: Beginner]

solution:

boxplot(cpgi$perGc)

Use if/else structure to decide if the given GC percent is high, low or medium. If it is low, high, or medium: low < 60, high>75, medium is between 60 and 75; use greater or less than operators, < or >. Fill in the values in the code below, where it is written 'YOU_FILL_IN'. [Difficulty: Intermediate]


GCper=65

  # check if GC value is lower than 60, 
  # assign "low" to result
  if('YOU_FILL_IN'){
    result="low"
    cat("low")
  }
  else if('YOU_FILL_IN'){  # check if GC value is higher than 75,      
                           #assign "high" to result
    result="high"
    cat("high")
  }else{ # if those two conditions fail then it must be "medium"
    result="medium"
  }

result

solution:

GCper=65
 #result="low"# set initial value
  
  if(GCper < 60){ # check if GC value is lower than 60, assign "low" to result
    result="low"
    cat("low")
  }
  else if(GCper > 75){  # check if GC value is higher than 75, assign "high" to result
    result="high"
    cat("high")
  }else{ # if those two conditions fail then it must be "medium"
    result="medium"
  }

result

Write a function that takes a value of GC percent and decides if it is low, high, or medium: low < 60, high>75, medium is between 60 and 75. Fill in the values in the code below, where it is written 'YOU_FILL_IN'. [Difficulty: Intermediate/Advanced]

GCclass<-function(my.gc){
  
  YOU_FILL_IN
  
  return(result)
}
GCclass(10) # should return "low"
GCclass(90) # should return "high"
GCclass(65) # should return "medium"

solution:

GCclass<-function(my.gc){
  
  result="low"# set initial value
  
  if(my.gc < 60){ # check if GC value is lower than 60, assign "low" to result
    result="low"
  }
  else if(my.gc > 75){  # check if GC value is higher than 75, assign "high" to result
    result="high"
  }else{ # if those two conditions fail then it must be "medium"
    result="medium"
  }
  return(result)
}
GCclass(10) # should return "low"
GCclass(90) # should return "high"
GCclass(65) # should return "medium"

Use a for loop to get GC percentage classes for gcValues below. Use the function you wrote above.[Difficulty: Intermediate/Advanced]

gcValues=c(10,50,70,65,90)
for( i in YOU_FILL_IN){
  YOU_FILL_IN
}

solution:

gcValues=c(10,50,70,65,90)
for( i in gcValues){
 
  print(GCclass(i) )
}

Use lapply to get GC percentage classes for gcValues. [Difficulty: Intermediate/Advanced]

vec=c(1,2,4,5)
power2=function(x){ return(x^2)  }
    lapply(vec,power2)

solution:

s=lapply(gcValues,GCclass)

Use sapply to get values to get GC percentage classes for gcValues. [Difficulty: Intermediate]

solution:

s=sapply(gcValues,GCclass)

Is there a way to decide on the GC percentage class of a given vector of GCpercentages without using if/else structure and loops ? if so, how can you do it? HINT: Subsetting using < and > operators. [Difficulty: Intermediate]

solution:

result=rep("low",length(gcValues) )
result[gcValues > 75]="high"
result[gcValues < 75 & gcValues > 60 ] = "medium"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chp02.md

chp02.md

Exercises and solutions for Chapter 2

Computations in R

Data structures in R

Reading in and writing data out in R

Plotting in R

Functions and control structures (for, if/else, etc.)

Files

chp02.md

Latest commit

History

chp02.md

File metadata and controls

Exercises and solutions for Chapter 2

Computations in R

Data structures in R

Reading in and writing data out in R

Plotting in R

Functions and control structures (for, if/else, etc.)