S4 is a more formal version of S3.
We create new S4 classes using the setClass
function. We need to
specify both the name of the class, and which attributes the class has
(in R terminology: slots
).
setClass("GenericSeq",
slots=list(
name="character",
alphabet="character",
sequence="character"
))
This declares a new class GenericSeq
which contains three pieces of
information, each of type character
.
We create a new object of this class by using new()
:
genseq = new("GenericSeq", name="A generic sequence", alphabet=c("A", "T"), sequence="AATTTAAATTTTT")
When calling new
R will check if the supplied slots match the
definition, so this will give an error:
genseq = new("GenericSeq", name=10, alphabet=c("A", "T"), sequence="ATTTA")
## Error in validObject(.Object): invalid class "GenericSeq" object: invalid object for slot "name" in class "GenericSeq": got class "numeric", should be or extend class "character"
We can access the slots directly by using a special @
operators, or
using slots
progamatically:
genseq
## An object of class "GenericSeq"
## Slot "name":
## [1] "A generic sequence"
##
## Slot "alphabet":
## [1] "A" "T"
##
## Slot "sequence":
## [1] "AATTTAAATTTTT"
genseq@name
## [1] "A generic sequence"
slot(genseq, "name")
## [1] "A generic sequence"
slotNames(genseq)
## [1] "name" "alphabet" "sequence"
S4 methods use a similar dispatch
logic as S3, but uses its own system
of generics.
Bioconductor packages BiocGenerics
, ProtGenerics
and Biobase
define a large number of S4 generics used in the Bioconductor project,
and are the first place to look for generics to re-use.
suppressMessages(library(BiocGenerics))
suppressMessages(library(ProtGenerics))
suppressMessages(library(Biobase))
Multiple regular R functions have been turned into S4 generics, e.g.:
base::nrow
## function (x)
## dim(x)[1L]
## <bytecode: 0x7f97ec3b86b0>
## <environment: namespace:base>
BiocGenerics::nrow
## standardGeneric for "nrow" defined from package "base"
##
## function (x)
## standardGeneric("nrow")
## <environment: 0x7f97eb74c5f0>
## Methods may be defined for arguments: x
## Use showMethods("nrow") for currently available ones.
base::ncol
## function (x)
## dim(x)[2L]
## <bytecode: 0x7f97eb8cf628>
## <environment: namespace:base>
BiocGenerics::ncol
## standardGeneric for "ncol" defined from package "base"
##
## function (x)
## standardGeneric("ncol")
## <environment: 0x7f97eb840a28>
## Methods may be defined for arguments: x
## Use showMethods("ncol") for currently available ones.
This means that it is easy to re-define the behaviour of these functions for your own class.
More examples here:
If we think we'll need some functionality implemented differently for different classes, we might want to create a generic so that dispatching makes this invisible to the user.
We only need to create an S4 generic if it doens't exist as an S3
generic (or as a primitive e.g. length()
). Every S3 generic (e.g.
summary()
) has an automatic S4 generic as well.
Lets make a new generic complement
which we'll use with multiple
classes.
First, lets check it doesn't exist
complement # see if exists as S3 generic
## Error in eval(expr, envir, enclos): object 'complement' not found
isGeneric("complement") # see if it exists as S4 generic
## [1] FALSE
We define our own generics using setGeneric
:
setGeneric("complement",
function(object, ...) standardGeneric("complement")
)
## [1] "complement"
NOTE: the formal arguments of the generic acts as a templated for the
formal arguments of all the implementations. It's good practice to
always add ...
to the end to enable methods to introduce additional
parameters.
For existing generics we define our own implementation with setMethod
.
length(genseq) # before defining our implementation
## [1] 1
setMethod("length", "GenericSeq", function(x) nchar(x@sequence))
## [1] "length"
length(genseq)
## [1] 13
NOTE: when implementing generics in S4, make sure the arguments to
your implementation match those of the generic. So, length
has x
as
a parameters:
length
## function (x) .Primitive("length")
and so should our implementation function(x) ...
.
Create a custom implementation of rev
for GenericSeq
that will
return a new GenericSeq
objects where the sequence has been reversed.
setMethod("rev", "GenericSeq", function(x){
letters = strsplit(x@sequence, "")[[1]]
reversed = paste(rev(letters), collapse="")
new("GenericSeq", name=paste(x@name, "--reversed"), alphabet=x@alphabet, sequence=reversed)
})
## [1] "rev"
genseq
## An object of class "GenericSeq"
## Slot "name":
## [1] "A generic sequence"
##
## Slot "alphabet":
## [1] "A" "T"
##
## Slot "sequence":
## [1] "AATTTAAATTTTT"
rev(genseq)
## An object of class "GenericSeq"
## Slot "name":
## [1] "A generic sequence --reversed"
##
## Slot "alphabet":
## [1] "A" "T"
##
## Slot "sequence":
## [1] "TTTTTAAATTTAA"
Best practice:
-
The user should never have to use the
@
operator. This operator is for the developer. -
The user should never have to call
new()
, but create objects with functions we write as wrappers aroundnew()
. See for example thesequences::readFasta
function below. Often, one can also create a constructure named as the class that takes user-sensible inputs to assess the slots when callingnew()
.
library("sequences")
fl <- dir(system.file("extdata", package = "sequences"),
full.names = TRUE)[1]
basename(fl)
readFasta(fl)
As developers, we should be writing functions and methods for the user to use our object. How the object is internally implemented is of no interest to the user. The interface (functions and methods) should be kept stable over time, not to break any external code that depends on our code, while the internal implementation can evolve rapidly.
We can implement methods to read and modify the properties of our object:
setGeneric("name", function(object, ...) standardGeneric("name"))
## [1] "name"
setGeneric("name<-", function(object, value) standardGeneric("name<-"))
## [1] "name<-"
setMethod("name", "GenericSeq", function(object, ...) object@name)
## [1] "name"
setReplaceMethod("name", signature(object="GenericSeq", value="character"),
function(object, value){
object@name <- value
return(object)
})
## [1] "name<-"
The first generic is a regular generic we saw before. The second is a
generic for a "replacement method" that supported the replacement syntax
name(genseq) = "New name"
.
We can use these to work with our object:
name(genseq)
## [1] "A generic sequence"
name(genseq) = "New name"
name(genseq)
## [1] "New name"
In our case, both methods have a very simple implementations, but this gives us an opportunity to continue working on the implementation. E.g. at some point we might decide we want to keep all the information in a big-data backend. We can change our internal implementation, e.g. the object might now have a single slot that is the connection to the big data backend, but we would keep the methods the same, and from user's perspective everything would still works the same, but better.
Discussion: do we need one method per every slot? Do we need to use methods, or can we also use functions?
Implement methods for getting the setting the sequence
property of our
GenericSeq
object.
setGeneric("sequence", function(object, ...) standardGeneric("sequence"))
## Creating a new generic function for 'sequence' in the global environment
## [1] "sequence"
setGeneric("sequence<-", function(object, value) standardGeneric("sequence<-"))
## [1] "sequence<-"
setMethod("sequence", "GenericSeq", function(object, ...) object@sequence)
## [1] "sequence"
setReplaceMethod("sequence", signature(object="GenericSeq", value="character"),
function(object, value){
object@sequence <- value
return(object)
})
## [1] "sequence<-"
sequence(genseq)
## [1] "AATTTAAATTTTT"
sequence(genseq) = "TAATTTT"
sequence(genseq)
## [1] "TAATTTT"
NOTE: sequence
already exists in base R, and we've broken it:
base::sequence
## function (nvec)
## unlist(lapply(nvec, seq_len))
## <bytecode: 0x7f97ef5ad510>
## <environment: namespace:base>
base::sequence(c(3,2))
## [1] 1 2 3 1 2
sequence(c(3,2))
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'sequence' for signature '"numeric"'
To make sure we haven't broken any code we'll create a default S4 method that will default to the base R function:
setMethod("sequence", "ANY", function(object,...) base::sequence(object))
## [1] "sequence"
sequence(c(3,2))
## [1] 1 2 3 1 2
Alternatively, we could've used a different name, e.g. one with an
existing generic seq
(pros? cons?) or a completely new name e.g.
genericSeq()
.
Objects might have further contrains on them, e.g. in our case, we
assume that alphabet
slot matches the letters used in the sequence
slot.
We can attach custom validity checks to our object.
setValidity("GenericSeq", function(object){
# check if the alphabet matches letters in sequence
letters = strsplit(object@sequence, "")[[1]]
if(!all(letters %in% object@alphabet)){
return("The alphabet does not match the sequence")
}
return(TRUE)
})
## Class "GenericSeq" [in ".GlobalEnv"]
##
## Slots:
##
## Name: name alphabet sequence
## Class: character character character
validObject(genseq)
## [1] TRUE
To ensure that the object remains valid we can include it in our
sequence
replacement method implementation:
setReplaceMethod("sequence", signature(object="GenericSeq", value="character"),
function(object, value){
object@sequence <- value
if(validObject(object))
return(object)
})
## [1] "sequence<-"
So now trying to set an invalid value for the sequence
will result in
a validity error:
sequence(genseq) = "ATTAAAAAAAA" # this still works as the object is valid
sequence(genseq) = "ABCD" # this produces an error
## Error in validObject(object): invalid class "GenericSeq" object: The alphabet does not match the sequence
Because S4 is more formal and explicit there are multiple functions you can use to find out information about your and other people's classes.
showMethods("rev")
getClass("GenericSeq")
getMethod("rev", "GenericSeq")
findMethods("rev")
showMethods(classes="GenericSeq")
isGeneric("rev")
We can over-ride how the object is shown to the user. The default is:
genseq
## An object of class "GenericSeq"
## Slot "name":
## [1] "New name"
##
## Slot "alphabet":
## [1] "A" "T"
##
## Slot "sequence":
## [1] "ATTAAAAAAAA"
We can over-ride this by implementing show
:
setMethod("show",
"GenericSeq",
function(object) {
cat("Object of class",class(object),"\n")
cat(" Name:", object@name,"\n")
cat(" Length:",length(object),"\n")
cat(" Alphabet:",object@alphabet,"\n")
cat(" Sequence:",object@sequence, "\n")
})
## [1] "show"
genseq
## Object of class GenericSeq
## Name: New name
## Length: 11
## Alphabet: A T
## Sequence: ATTAAAAAAAA
This is useful for the same reasons of separating implementation and interface.
It might be useful to override operators such as [
, $
, +
, ... All
of these already have generics we just need to implement them:
findMethods("[")@generic
## standardGeneric for "[" defined from package "base"
##
## function (x, i, j, ..., drop = TRUE)
## standardGeneric("[", .Primitive("["))
## <bytecode: 0x7f97eb672808>
## <environment: 0x7f97eb66a678>
## Methods may be defined for arguments: x, i, j, drop
## Use showMethods("[") for currently available ones.
setMethod("[","GenericSeq",
function(x,i,j="missing",drop="missing") {
if (any(i > length(x)))
stop("subscript out of bounds")
s <- sequence(x)
s <- paste(strsplit(s,"")[[1]][i], collapse="")
x@sequence <- s
if (validObject(x))
return(x)
})
## [1] "["
We can now use the [
notation to create new GenericSeq
objects that
contain the subset of the sequence.
sequence(genseq)
## [1] "ATTAAAAAAAA"
genseq[1:3]
## Object of class GenericSeq
## Name: New name
## Length: 3
## Alphabet: A T
## Sequence: ATT
When we create a new class, we can set it to inherit the slots and methods of another class.
setClass("DNASeq",
contains="GenericSeq",
slots=list(
adapterSeq="character"
))
In this example contains
signifies that the DNASeq
class inherits
the GenericSeq
class. So DNASeq
will have all the slots of
GenericSeq
plus the one new one we defined:
dnaseq = new("DNASeq", name="Illumina short sequence",
sequence="ATGAAAAAGGG", alphabet=c("A", "C", "G", "T"),
adapterSeq="ATGA")
slotNames(dnaseq)
## [1] "adapterSeq" "name" "alphabet" "sequence"
All the methods we defined for GenericSeq
still work:
sequence(dnaseq)
## [1] "ATGAAAAAGGG"
length(dnaseq)
## [1] 11
When using S4 in a package, we need to make sure that generics and class definitions are sourced before methods. The easiest way to achieve this is by naming convention.
When loading a package R sources files alphabetically (in absence of
Collate
in the DESCRIPTION
file), so if we name our files like this
the they will be sourced in the correct order for S4:
AllGenerics.R
DataClasses.R
- all other files starting with a lowercase letter
To make S4 classes, methods and generics available for the user of your
package, you need to export them in the NAMESPACE
file. Your namespace
file might look like this:
exportClasses(GenericSeq)
exportMethods(name, "name<-",
sequence, "sequence<-",
"[",
show,
rev,
length)
We use exportClasses
to export our classes, exportMethods
to export
all of our method implementations and any associated generics.
R can generate template documentation files for classes and generic's
methods using promptClass("GenericSeq")
and
promptMethods("sequence")
where sequence
is the name of the generic
we want to document.
If you wish to keep the documentation together with the code you can use
the roxygen2
package:
#' GenericSeq class
#'
#' A class representing operations with a generic sequence.
#'
#' @export
setClass("GenericSeq",
slots=list(
name="character",
alphabet="character",
sequence="character"
))
#' Get the sequence
#'
#' @rdname sequence-methods
setGeneric("sequence", function(object, ...) standardGeneric("sequence"))
## [1] "sequence"
#' @rdname sequence-methods
#' @export
setMethod("sequence", "GenericSeq", function(object, ...) object@sequence)
## [1] "sequence"
In this example we've exported the class and method, and put the
documentation for the generic and the method into the same .rd
file
name sequence-methods.rd
. You can now generate the the NAMESPACE
and
.rd
documentation files by running
roxygenise("path/to/your/package")
.
- Same overall dispatching logic as S3, but more formal
- Classes, methods and generics are all explicit and can be inspected with introspection tools
- Supports custom validity checks