Control Flow and loops in R ############################

Control Flow

The standard if else

p.test <- function(p) {
    if (p <= 0.05) 
        print("yeah!!!!") else if (p >= 0.9) 
        print("high!!!!") else print("somewhere in the middle")
}

Now pick a number and put it in p.test

p.test(0.5)

## [1] "somewhere in the middle"

ifelse()

A better and vectorized way of doing this is ifelse(test, yes, no) function. ifelse() is far more useful as it is vectorized.

p.test.2 <- function(p) {
    ifelse(p <= 0.05, print("yippee"), print("bummer, man"))
}

Test this with the following sequence. See what happens if you use if vs. ifelse().

x <- runif(10, 0, 1)
x

##  [1] 0.27332 0.14155 0.89000 0.07041 0.79419 0.25013 0.02324 0.86766
##  [9] 0.41114 0.56165

Now try it with p.test() (uses if).

p.test(x)

## Warning: the condition has length > 1 and only the first element will be used
## Warning: the condition has length > 1 and only the first element will be used

## [1] "somewhere in the middle"

Now try it with p.test.2()

p.test.2(x)

## [1] "yippee"
## [1] "bummer, man"

##  [1] "bummer, man" "bummer, man" "bummer, man" "bummer, man" "bummer, man"
##  [6] "bummer, man" "yippee"      "bummer, man" "bummer, man" "bummer, man"

Other vectorized ways of control flow.

There are many times that you may think you need to use an if with (iterating with a for loop... see below), or ifelse, but there may be far better ways.

For instance, say you are doing some simulations for a power analysis, and you want to know how often your simulation gives you a p-value less than 0.05.

p.1000 <- runif(n = 1000, min = 0, max = 1)

The line above generates 1000 random values between 0-1, which we will pretend are our p-values for differential expression from our simulation.

You may try and count how often it less than 0.05

p.ifelse <- ifelse(p.1000 < 0.05, 1, 0)  # If it is less than 0.05, then you get a 1, otherwise 0.

Our approximate false positives. Should be close to 0.05

sum(p.ifelse)/length(p.1000)

## [1] 0.059

In R, think index!

However the best and fastest way to accomplish this is to use the index, by setting up the Boolean (TRUE/FALSE) in the index of the vector.

length(p.1000[p.1000 < 0.05])/length(p.1000)

## [1] 0.059

Same number, faster and simpler computation.

Simple loops

while() function..

I tend to avoid these, so you will not see them much here

i <- 1
while (i <= 10) {
    print(i)
    i <- i + 0.5
}

## [1] 1
## [1] 1.5
## [1] 2
## [1] 2.5
## [1] 3
## [1] 3.5
## [1] 4
## [1] 4.5
## [1] 5
## [1] 5.5
## [1] 6
## [1] 6.5
## [1] 7
## [1] 7.5
## [1] 8
## [1] 8.5
## [1] 9
## [1] 9.5
## [1] 10

for loop

If I run a loop I most often use for(){} automatically iterates across a list (in this case the sequence from 1:10).

for (i in 1:10) {
    print(i)
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

If you do not want to use integers, how might you do it using the for()?

for (i in seq(from = 1, to = 5, by = 0.5)) {
    print(i)
}

## [1] 1
## [1] 1.5
## [1] 2
## [1] 2.5
## [1] 3
## [1] 3.5
## [1] 4
## [1] 4.5
## [1] 5

behavior of strings.

Using strings is a bit more involved in R, compared to other languages. For instance the following does not do what you want::

for (letter in "word") {
    print(letter)
}

## [1] "word"

(try letters for a hoot.)

Instead in R, we have to split the word "word" into single characters using strsplit(), i.e::

strsplit("word", split = "")

## [[1]]
## [1] "w" "o" "r" "d"

So for the for loop we would do the following:

for (letter in strsplit("word", split = "")) {
    print(letter)
}

## [1] "w" "o" "r" "d"

More on avoiding loops

Many would generate random numbers like so.

for (i in 1:100) {
    print(rnorm(n = 1, mean = 0, sd = 1))
}

## [1] -0.1837
## [1] -0.9313
## [1] 1.648
## [1] -0.6964
## [1] 0.2112
## [1] 0.3441
## [1] 1.036
## [1] 0.7439
## [1] 0.5859
## [1] -0.6087
## [1] -0.4014
## [1] 1.44
## [1] -0.3906
## [1] -1.861
## [1] -0.739
## [1] -1.204
## [1] 0.07794
## [1] -1.65
## [1] 1.261
## [1] 0.6753
## [1] 0.6736
## [1] 0.3238
## [1] -1.316
## [1] 0.2965
## [1] 1.499
## [1] 0.4326
## [1] 0.4488
## [1] 0.8873
## [1] -1.304
## [1] -0.347
## [1] 0.3491
## [1] 0.24
## [1] 0.1425
## [1] -0.2785
## [1] -0.5072
## [1] -1.775
## [1] -0.04051
## [1] 0.9452
## [1] 0.3322
## [1] -0.01994
## [1] -0.2308
## [1] -0.4053
## [1] -0.5685
## [1] -1.631
## [1] -0.1484
## [1] 0.434
## [1] 1.653
## [1] 1.57
## [1] 0.1308
## [1] -1.059
## [1] -0.7157
## [1] -0.8316
## [1] 0.06561
## [1] 0.8243
## [1] 0.1841
## [1] 1.048
## [1] 0.1612
## [1] -0.9553
## [1] -0.7569
## [1] -0.288
## [1] -1.837
## [1] 0.7301
## [1] -2.103
## [1] -1.869
## [1] -1.298
## [1] -1.077
## [1] -0.2139
## [1] -0.9419
## [1] 0.4694
## [1] -1.344
## [1] -0.08514
## [1] -2.055
## [1] -0.803
## [1] -0.7281
## [1] 1.778
## [1] -1.116
## [1] 1.33
## [1] 0.1535
## [1] -2.897
## [1] 0.7305
## [1] 1.228
## [1] 1.697
## [1] -0.8183
## [1] -1.013
## [1] -0.634
## [1] -0.942
## [1] -0.3395
## [1] 0.1396
## [1] 1.022
## [1] 0.9868
## [1] -0.7778
## [1] 1.075
## [1] -0.1029
## [1] 0.2644
## [1] 0.01165
## [1] 0.8025
## [1] -1.24
## [1] -0.8865
## [1] 0.981
## [1] 0.5333

We are cycling through and generating one random number at each iteration. Look at the indices, and you can see we keep generating vectors of length 1.

better/cleaner/faster to generate them all at one time

rnorm(n = 100, mean = 0, sd = 1)

##   [1] -0.08683 -1.55262 -1.16909  0.30451 -1.14555  0.76682  0.12643
##   [8] -0.61174 -0.29103 -0.10707 -0.03397 -0.05926  0.27294  1.32693
##  [15] -0.53284  1.83234  0.43959 -0.88991  0.25383  0.96709 -0.23210
##  [22] -1.00190 -1.32289  1.80030  1.15272 -1.82907  0.75989  1.35966
##  [29]  0.53943  0.01429 -0.58707 -0.11886 -0.70367 -2.38988  0.08033
##  [36] -0.22795 -0.62166 -0.19832 -1.95990 -0.85127  0.94236  0.37771
##  [43]  0.32617 -0.08393 -0.54506 -2.58781 -0.58433  0.20985 -0.41613
##  [50]  0.60527  0.51713  1.57950 -0.61079 -0.28564 -0.16444  0.55007
##  [57]  0.57258  0.58513 -0.86728 -0.81185 -0.29333 -1.23935  0.46169
##  [64] -1.53586 -0.32583  0.17629 -0.85579  1.04989  1.22120  1.53359
##  [71] -2.37276  1.44393  1.47506  0.40110 -0.10157  0.35485 -0.72068
##  [78] -1.27910  0.63152 -0.65216  1.60160  0.27109  0.50904 -1.00531
##  [85]  0.76743 -0.78954 -0.01159  1.06944  1.15661 -0.91031  1.54919
##  [92] -0.84334  2.19994  0.26716  0.02081  0.53577  0.07840 -0.79387
##  [99] -1.18941  1.24745

What if we wanted to put all of these numbers in a vector?

The inadvisable approach

First we initialize a vector to store all of the numbers. Why do we initialize this vector first?

n <- 1e+05
x <- rep(NA, n)

The step above creates a vector of n NA's. They will be replaced sequentially with the random numbers as we generate them (using a function like the above one).

head(x)

## [1] NA NA NA NA NA NA

Now we run the for loop.

for (i in 1:n) {
    x[i] <- rnorm(n = 1, mean = 0, sd = 1)
}

for each i in the index, one number is generated, and placed in x

head(x)

## [1]  0.2848 -0.5432  1.1391 -1.0901  0.8515  0.5490

However this is computationally inefficient in R. Which has vectorized operations.

system.time(

for (i in 1:n){
	x[i] <- rnorm(n=1, mean=0, sd=1)})

##    user  system elapsed 
##   0.562   0.023   0.584

We can also use the replicate function to do the same thing. Easier syntax to write.

system.time(z <- replicate(n, rnorm(n = 1, mean = 0, sd = 1)))

##    user  system elapsed 
##   0.561   0.035   0.841

This is ~20% faster.

The way to do it

However, since R is vectorized, both of the will be far slower than:

system.time(y <- rnorm(n, 0, 1))

##    user  system elapsed 
##   0.010   0.000   0.011

About 65 times faster than the for loop

The general rule in R is that loops are slower than the apply family of functions (for small to medium data sets, not true for very large data) which are slower than vectorized computations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IntroductionControlFlowR.md

IntroductionControlFlowR.md

Control Flow

The standard if else

ifelse()

Other vectorized ways of control flow.

In R, think index!

Simple loops

while() function..

for loop

behavior of strings.

More on avoiding loops

What if we wanted to put all of these numbers in a vector?

The way to do it

Files

IntroductionControlFlowR.md

Latest commit

History

IntroductionControlFlowR.md

File metadata and controls

Control Flow

The standard if else

ifelse()

Other vectorized ways of control flow.

In R, think index!

Simple loops

while() function..

for loop

behavior of strings.

More on avoiding loops

What if we wanted to put all of these numbers in a vector?

The way to do it