List can be considered as containers that can hold R objects either of the same type or varying types. A list can contain a vector , dataframe, variable, matrix, arrays or even another list in it. Lists can be viewed as a solution to hold objects from various stages of analysis at a single place.
Lists are created with list(element1,element2,element3,...,elementN)
function and each argument becomes the element of the list. Like data.frames
lists can have names and also the elements too can have names using which elements can be stored and accessed.
# Simple list with numeric and character elements
list(12,22,"chr")
# List with 2 vectors
list(c("chr1","chr2","chr3"),c(12312,57867,46587))
# List with mix of object types
list(c("chr1","chr2","chr3"),c(12313,57867,46587),"Sample1",124546654)
# More complex List with mix of object types, vector, numeric,character and a dataframe
chip2<-data.frame(Gene=c("Gene1","Gene2","Gene3","Gene4","Gene5","Gene6","Gene7","Gene8","Gene9","Gene10"),
Chromosome = c("Chr1","Chr1","Chr1","Chr1","Chr2","Chr2","Chr2","Chr3","Chr3","Chr4"), Position=c(11234,21234,25452,32414,156009,297862,299220,312112,141789,13114),
Enrichment=c(2,2.3,3.5,2.8,1.98,2.76,3.76,2.45,NA,3.4))
list(c("chr1","chr2","chr3"),c(12312,57867,46587),"Sample1",124546654, chip2)
As we have seen earlier that its of no use if we cannot retrieve this list later. So we will assign it to variable, lets say list1
list1 <- list(c("chr1","chr2","chr3"),c(12312,57867,46587),"Sample1",124546654, chip2)
list1
Accessing elements of the list
we can access the elemnets based on the index with syntax list[[n]]
where n is the index of elemnet
# to access the vector c(12312,57867,46587) whose index is 2 in list1 we will use
list1[[2]]
# In order to access the chip2 dataframe with index 5
list1[[5]]
We can access the above list "list1" anytime later during our data analysis session. If this list is expected to grow then it is a good idea to name each element so that we know their identities.
Naming elements of the list Naming of elements can be done either while creating the list or add names later to the elements of existing list. We will try the second option as we already have the list and then will try naming them while creating the list.
Add names later to the elements of existing list
# We will add names to the list1
names(list1) <- c("selChrom","selpos","sample","EnrTable")
list1
#WHATS GONE WRONG?
Since the name vector c("selChrom","selpos","sample","EnrTable")
has only 4 names, so the first 4 elements of lists have names assigned to them and the last one is left "NA". If we donot want to name an element leave a "" empty quote at the corresponding place, otherwise provide a name to each element of the list.
names(list1) <- c("selChrom","selpos","sample","","EnrTable")
list1
#OR
names(list1) <- c("selChrom","selpos","sample","ReadDepth","EnrTable")
list1
#OR
list1 <- list(selectedChrom=c("chr1","chr2","chr3"),selectedPosn=c(12312,57867,46587),SampleName="Sample1",ReadDepth=124546654, Chip2Data=chip2)
Accessing elements of the list by names
We can access elements of the list using names provided we have assigned a name to that element of the list.
# we can access the vector "selpos" of list1 2 ways
list1$selpos
#and
list1[["selpos"]]
The both will yield the same result but their usage differs in programming. For now lets not go in that domain.
Excercise(1):
what would be the correct code to access position
column of the dataframe EnrTable as (Select all applicable)
a. as vector b. as dataframe
list1[[5]]$position
list1[[5]]["Position"]
list1[[5]]$Position
list1[["EnrTable"]]$Position
list1[["EnrTable"]]["Position"]
list1$EnrTable[3]
list1$EnrTable[,3]
list1$EnrTable["Position"]
list1$EnrTable$Position
list1$EnrTable[,"Position"]
Adding elements to the list
list1[["atest"]]<-346347
list1
Changing value of an existing element
list1[[6]] <- 36738256
list1
#or
list1[["atest"]] <- 555555
list1
Attributes of list
length(list1)
Question
If you assign 2 elements of the list a same name, which element will be reported if you retrieve the element by name.
as in example below what will be reported if I try rank[["b"]
or rank$b
rank <- list("a"=1,"b"=2,"b"=3)
------END OF LIST--------
Simply defined matrices are like dataframes with only one datatype either numeric or integer and never a character.
Matrices have attributes like rows, columns, rownames, colnames, dim, nrw and ncol. The important point to note is that the matrices are 2dimensional (only rows and columns) and that seperates them from arrays as they can be multidimensional. We will discuss about arrays later.
Creating simple matrices
syntax of creating matrices
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
data :: an optional data vector (including a list or expression vector).
nrow :: the desired number of rows.
ncol :: the desired number of columns.
byrow :: logical. If FALSE (the default) the matrix is filled by columns, otherwise the matrix is filled by rows.
dimnames :: A dimnames attribute for the matrix: NULL or a list of length 2 giving the row and column names respectively.
mc <- matrix(1:25,ncol=5)
mc
# above matrix can be also generated by commands
mr <- matrix(1:25,nrow=5)
mrc <- matrix(1:25,nrow=5,ncol=5)
# If values has to be filled row wise
mrc2<- matrix(1:25,nrow=5,ncol=5,byrow=TRUE)
#using dimnames
mrc3<- matrix(1:25,nrow=5,byrow=TRUE,dimnames=list(c("a","b","c","d","e"),c("k","l","m","n","p")))
Try different attributes
nrow(mrc3)
ncol(mrc3)
dim(mrc3)
rownames(mrc3)
colnames(mrc3)
#Adding 2 matrix
mrc3 + mrc
# Multiplying them
mrc3 * mrc
# Matrix multiplication, We will discuss the results in class.
mrc3 %*% mrc
Converting dataframe to matrix and vice versa.
Generate a counts dataframe holding expression count
of genes from different samples.
countData<-data.frame(gene=c("ATk1","CTA1","BCL5","DRA12","VKT11"),
sample1=c(12,220,323,452,111),
sample2=c(10,112,321,423,81),
sample3=c(23,122,142,156,344) )
countData
Step1: Add RowNames
rownames(countData) <- countData$gene
countData
Step2 : Drop gene column
countData<-countData[,-1]
countData
Step3 : Convert into matrix
class (countData)
countDatam <-as.matrix(countData)
countDatam
class(countDatam)
Matrix to Dataframe
countDataD <- as.data.frame(countDatam)
countDataD
class(countDatam)
class(countDataD)
The calculations below is to show the use of Matrices, We will discuss it in the class
#Data Summary
summary(countDataD)
# Mean of all rows to be used for normalisation
normVector<-c(223.6,189.4,157.4)
normVector
#Scaling factor calculated from means
normValues<-(1/(normVector/ 157.4))
normValues
#Diagonal matrix of scalling factores
normMatrix <-diag(3) *normValues
normMatrix
#Scaled Dataframe
normCountsData <- countDatam %*% normMatrix
normCountsData
#Data Summary after Normalisation
summary(normCountsData)
--------------END OF MATRICES------------------
An array is essentially a multidemenional matrix. The elements should be of same type and individual elements can accessed in a similar fashion using square brackets [a,b,c]. This indicates a X b X c dimensional array. The first value a is the row, second value b is the column and third value c is the outer dimension.
#create array of size 2 X 3 X 4 dimension
arrayOne <- array(1:24,dim=c(2,3,4))
arrayOne
#Get 1st row of all arrays
arrayOne[1, , ]
#Get 2nd row and 3 rd column value of all the arrays
arrayOne[2,3 , ]
# Get 2 row 3rd column and 2 dim value from array
arrayOne[2,3,2]