Skip to content

Latest commit

 

History

History
640 lines (447 loc) · 17.4 KB

File metadata and controls

640 lines (447 loc) · 17.4 KB

S4 OOP

S4 is a more formal version of S3.

Creating classes in S4

We create new S4 classes using the setClass function. We need to specify both the name of the class, and which attributes the class has (in R terminology: slots).

setClass("GenericSeq", 
          slots=list(
            name="character",
            alphabet="character",
            sequence="character"
          ))

This declares a new class GenericSeq which contains three pieces of information, each of type character.

We create a new object of this class by using new():

genseq = new("GenericSeq", name="A generic sequence", alphabet=c("A", "T"), sequence="AATTTAAATTTTT")

When calling new R will check if the supplied slots match the definition, so this will give an error:

genseq = new("GenericSeq", name=10, alphabet=c("A", "T"), sequence="ATTTA")

## Error in validObject(.Object): invalid class "GenericSeq" object: invalid object for slot "name" in class "GenericSeq": got class "numeric", should be or extend class "character"

We can access the slots directly by using a special @ operators, or using slots progamatically:

genseq

## An object of class "GenericSeq"
## Slot "name":
## [1] "A generic sequence"
## 
## Slot "alphabet":
## [1] "A" "T"
## 
## Slot "sequence":
## [1] "AATTTAAATTTTT"

genseq@name

## [1] "A generic sequence"

slot(genseq, "name")

## [1] "A generic sequence"

slotNames(genseq)

## [1] "name"     "alphabet" "sequence"

S4 generics

S4 methods use a similar dispatch logic as S3, but uses its own system of generics.

Bioconductor packages BiocGenerics, ProtGenerics and Biobase define a large number of S4 generics used in the Bioconductor project, and are the first place to look for generics to re-use.

suppressMessages(library(BiocGenerics))
suppressMessages(library(ProtGenerics))
suppressMessages(library(Biobase))

Multiple regular R functions have been turned into S4 generics, e.g.:

base::nrow

## function (x) 
## dim(x)[1L]
## <bytecode: 0x7f97ec3b86b0>
## <environment: namespace:base>

BiocGenerics::nrow

## standardGeneric for "nrow" defined from package "base"
## 
## function (x) 
## standardGeneric("nrow")
## <environment: 0x7f97eb74c5f0>
## Methods may be defined for arguments: x
## Use  showMethods("nrow")  for currently available ones.

base::ncol

## function (x) 
## dim(x)[2L]
## <bytecode: 0x7f97eb8cf628>
## <environment: namespace:base>

BiocGenerics::ncol

## standardGeneric for "ncol" defined from package "base"
## 
## function (x) 
## standardGeneric("ncol")
## <environment: 0x7f97eb840a28>
## Methods may be defined for arguments: x
## Use  showMethods("ncol")  for currently available ones.

This means that it is easy to re-define the behaviour of these functions for your own class.

More examples here:

Creating S4 generics

If we think we'll need some functionality implemented differently for different classes, we might want to create a generic so that dispatching makes this invisible to the user.

We only need to create an S4 generic if it doens't exist as an S3 generic (or as a primitive e.g. length()). Every S3 generic (e.g. summary()) has an automatic S4 generic as well.

Lets make a new generic complement which we'll use with multiple classes.

First, lets check it doesn't exist

complement # see if exists as S3 generic

## Error in eval(expr, envir, enclos): object 'complement' not found

isGeneric("complement") # see if it exists as S4 generic

## [1] FALSE

We define our own generics using setGeneric:

setGeneric("complement", 
    function(object, ...) standardGeneric("complement")
)

## [1] "complement"

NOTE: the formal arguments of the generic acts as a templated for the formal arguments of all the implementations. It's good practice to always add ... to the end to enable methods to introduce additional parameters.

Creating S4 methods

For existing generics we define our own implementation with setMethod.

length(genseq) # before defining our implementation

## [1] 1

setMethod("length", "GenericSeq", function(x) nchar(x@sequence))

## [1] "length"

length(genseq)

## [1] 13

NOTE: when implementing generics in S4, make sure the arguments to your implementation match those of the generic. So, length has x as a parameters:

length

## function (x)  .Primitive("length")

and so should our implementation function(x) ....

Exercise: Create a custom rev implementation

Create a custom implementation of rev for GenericSeq that will return a new GenericSeq objects where the sequence has been reversed.

Solution

setMethod("rev", "GenericSeq", function(x){
  letters = strsplit(x@sequence, "")[[1]]
  reversed = paste(rev(letters), collapse="")
  
  new("GenericSeq", name=paste(x@name, "--reversed"), alphabet=x@alphabet, sequence=reversed)
})

## [1] "rev"

genseq

## An object of class "GenericSeq"
## Slot "name":
## [1] "A generic sequence"
## 
## Slot "alphabet":
## [1] "A" "T"
## 
## Slot "sequence":
## [1] "AATTTAAATTTTT"

rev(genseq)

## An object of class "GenericSeq"
## Slot "name":
## [1] "A generic sequence --reversed"
## 
## Slot "alphabet":
## [1] "A" "T"
## 
## Slot "sequence":
## [1] "TTTTTAAATTTAA"

The seperation of implementation and interface

Best practice:

  1. The user should never have to use the @ operator. This operator is for the developer.

  2. The user should never have to call new(), but create objects with functions we write as wrappers around new(). See for example the sequences::readFasta function below. Often, one can also create a constructure named as the class that takes user-sensible inputs to assess the slots when calling new().

library("sequences")
fl <- dir(system.file("extdata", package = "sequences"),
          full.names = TRUE)[1]
basename(fl)
readFasta(fl)

As developers, we should be writing functions and methods for the user to use our object. How the object is internally implemented is of no interest to the user. The interface (functions and methods) should be kept stable over time, not to break any external code that depends on our code, while the internal implementation can evolve rapidly.

Implementing accessors as methods

We can implement methods to read and modify the properties of our object:

setGeneric("name", function(object, ...) standardGeneric("name"))

## [1] "name"

setGeneric("name<-", function(object, value) standardGeneric("name<-"))

## [1] "name<-"

setMethod("name", "GenericSeq", function(object, ...) object@name)

## [1] "name"

setReplaceMethod("name", signature(object="GenericSeq", value="character"), 
                 function(object, value){
                   object@name <- value
                   return(object)
                 })

## [1] "name<-"

The first generic is a regular generic we saw before. The second is a generic for a "replacement method" that supported the replacement syntax name(genseq) = "New name".

We can use these to work with our object:

name(genseq)

## [1] "A generic sequence"

name(genseq) = "New name"
name(genseq)

## [1] "New name"

In our case, both methods have a very simple implementations, but this gives us an opportunity to continue working on the implementation. E.g. at some point we might decide we want to keep all the information in a big-data backend. We can change our internal implementation, e.g. the object might now have a single slot that is the connection to the big data backend, but we would keep the methods the same, and from user's perspective everything would still works the same, but better.

Discussion: do we need one method per every slot? Do we need to use methods, or can we also use functions?

Exercise

Implement methods for getting the setting the sequence property of our GenericSeq object.

Solution

setGeneric("sequence", function(object, ...) standardGeneric("sequence"))

## Creating a new generic function for 'sequence' in the global environment

## [1] "sequence"

setGeneric("sequence<-", function(object, value) standardGeneric("sequence<-"))

## [1] "sequence<-"

setMethod("sequence", "GenericSeq", function(object, ...) object@sequence)

## [1] "sequence"

setReplaceMethod("sequence", signature(object="GenericSeq", value="character"), 
                 function(object, value){
                   object@sequence <- value
                   return(object)
                 })

## [1] "sequence<-"

sequence(genseq)

## [1] "AATTTAAATTTTT"

sequence(genseq) = "TAATTTT"
sequence(genseq)

## [1] "TAATTTT"

NOTE: sequence already exists in base R, and we've broken it:

base::sequence

## function (nvec) 
## unlist(lapply(nvec, seq_len))
## <bytecode: 0x7f97ef5ad510>
## <environment: namespace:base>

base::sequence(c(3,2))

## [1] 1 2 3 1 2

sequence(c(3,2))

## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'sequence' for signature '"numeric"'

To make sure we haven't broken any code we'll create a default S4 method that will default to the base R function:

setMethod("sequence", "ANY", function(object,...) base::sequence(object))

## [1] "sequence"

sequence(c(3,2))

## [1] 1 2 3 1 2

Alternatively, we could've used a different name, e.g. one with an existing generic seq (pros? cons?) or a completely new name e.g. genericSeq().

Validity

Objects might have further contrains on them, e.g. in our case, we assume that alphabet slot matches the letters used in the sequence slot.

We can attach custom validity checks to our object.

setValidity("GenericSeq", function(object){
  # check if the alphabet matches letters in sequence
  letters = strsplit(object@sequence, "")[[1]]
  if(!all(letters %in% object@alphabet)){
    return("The alphabet does not match the sequence")
  }
  return(TRUE)
})

## Class "GenericSeq" [in ".GlobalEnv"]
## 
## Slots:
##                                     
## Name:       name  alphabet  sequence
## Class: character character character

validObject(genseq)

## [1] TRUE

To ensure that the object remains valid we can include it in our sequence replacement method implementation:

setReplaceMethod("sequence", signature(object="GenericSeq", value="character"), 
                 function(object, value){
                   object@sequence <- value
                   if(validObject(object))
                     return(object)
                 })

## [1] "sequence<-"

So now trying to set an invalid value for the sequence will result in a validity error:

sequence(genseq) = "ATTAAAAAAAA" # this still works as the object is valid
sequence(genseq) = "ABCD" # this produces an error

## Error in validObject(object): invalid class "GenericSeq" object: The alphabet does not match the sequence

Introspection

Because S4 is more formal and explicit there are multiple functions you can use to find out information about your and other people's classes.

showMethods("rev")
getClass("GenericSeq")
getMethod("rev", "GenericSeq")
findMethods("rev")
showMethods(classes="GenericSeq")
isGeneric("rev")

Over-riding default object display

We can over-ride how the object is shown to the user. The default is:

genseq

## An object of class "GenericSeq"
## Slot "name":
## [1] "New name"
## 
## Slot "alphabet":
## [1] "A" "T"
## 
## Slot "sequence":
## [1] "ATTAAAAAAAA"

We can over-ride this by implementing show:

setMethod("show",
          "GenericSeq",
          function(object) {
            cat("Object of class",class(object),"\n")
            cat(" Name:", object@name,"\n")
            cat(" Length:",length(object),"\n")
            cat(" Alphabet:",object@alphabet,"\n")
            cat(" Sequence:",object@sequence, "\n")
          })

## [1] "show"

genseq

## Object of class GenericSeq 
##  Name: New name 
##  Length: 11 
##  Alphabet: A T 
##  Sequence: ATTAAAAAAAA

This is useful for the same reasons of separating implementation and interface.

Overriding operators

It might be useful to override operators such as [, $, +, ... All of these already have generics we just need to implement them:

findMethods("[")@generic

## standardGeneric for "[" defined from package "base"
## 
## function (x, i, j, ..., drop = TRUE) 
## standardGeneric("[", .Primitive("["))
## <bytecode: 0x7f97eb672808>
## <environment: 0x7f97eb66a678>
## Methods may be defined for arguments: x, i, j, drop
## Use  showMethods("[")  for currently available ones.

setMethod("[","GenericSeq",
          function(x,i,j="missing",drop="missing") {
            if (any(i > length(x)))
              stop("subscript out of bounds")
            s <- sequence(x)
            s <- paste(strsplit(s,"")[[1]][i], collapse="")
            x@sequence <- s
            if (validObject(x))
              return(x)
          })

## [1] "["

We can now use the [ notation to create new GenericSeq objects that contain the subset of the sequence.

sequence(genseq)

## [1] "ATTAAAAAAAA"

genseq[1:3]

## Object of class GenericSeq 
##  Name: New name 
##  Length: 3 
##  Alphabet: A T 
##  Sequence: ATT

Inheritance

When we create a new class, we can set it to inherit the slots and methods of another class.

setClass("DNASeq",
         contains="GenericSeq",
         slots=list(
           adapterSeq="character"
         ))

In this example contains signifies that the DNASeq class inherits the GenericSeq class. So DNASeq will have all the slots of GenericSeq plus the one new one we defined:

dnaseq = new("DNASeq", name="Illumina short sequence", 
             sequence="ATGAAAAAGGG", alphabet=c("A", "C", "G", "T"),
             adapterSeq="ATGA")
slotNames(dnaseq)

## [1] "adapterSeq" "name"       "alphabet"   "sequence"

All the methods we defined for GenericSeq still work:

sequence(dnaseq)

## [1] "ATGAAAAAGGG"

length(dnaseq)

## [1] 11

Using S4 in packages

When using S4 in a package, we need to make sure that generics and class definitions are sourced before methods. The easiest way to achieve this is by naming convention.

When loading a package R sources files alphabetically (in absence of Collate in the DESCRIPTION file), so if we name our files like this the they will be sourced in the correct order for S4:

  • AllGenerics.R
  • DataClasses.R
  • all other files starting with a lowercase letter

Exporting S4 in a package

To make S4 classes, methods and generics available for the user of your package, you need to export them in the NAMESPACE file. Your namespace file might look like this:

exportClasses(GenericSeq)

exportMethods(name, "name<-",
              sequence, "sequence<-",
              "[",
              show,
              rev,
              length)

We use exportClasses to export our classes, exportMethods to export all of our method implementations and any associated generics.

Documenting S4

R can generate template documentation files for classes and generic's methods using promptClass("GenericSeq") and promptMethods("sequence") where sequence is the name of the generic we want to document.

If you wish to keep the documentation together with the code you can use the roxygen2 package:

#' GenericSeq class
#'
#' A class representing operations with a generic sequence. 
#'
#' @export
setClass("GenericSeq", 
          slots=list(
            name="character",
            alphabet="character",
            sequence="character"
          ))


#' Get the sequence
#'
#' @rdname sequence-methods
setGeneric("sequence", function(object, ...) standardGeneric("sequence"))

## [1] "sequence"

#' @rdname sequence-methods 
#' @export
setMethod("sequence", "GenericSeq", function(object, ...) object@sequence)

## [1] "sequence"

In this example we've exported the class and method, and put the documentation for the generic and the method into the same .rd file name sequence-methods.rd. You can now generate the the NAMESPACE and .rd documentation files by running roxygenise("path/to/your/package").

Summary of S4

  • Same overall dispatching logic as S3, but more formal
  • Classes, methods and generics are all explicit and can be inspected with introspection tools
  • Supports custom validity checks