Biogeographical networks in R

Requirements
Biogeographical network analysis: a detailed tutorial
Biogeographical networks, a second example with raster maps
Credits

biogeonetworks is an R package that implements several functions to write, read and analyse biogeographical networks. It is primarily designed to work with the Map Equation algorithm (infomap) and with the Gephi visualisation software.

I did not plan to release at all this package or write this guide. However, given that many people have asked me for help in performing biogeographical network, I have decided to release the package on GitHub (I do not have plans for a CRAN release now) and write a free tutorial for it. Warning: Since I initially did not intend to release this package publicly, I did not develop meaningful error messages for the possible mistakes that users can make when using these functions. If you struggle really hard, you may write me an e-mail to get help: be concise and include all the necessary information.

If you are not familiar with biogeographical, I highly recommend that you first read the introductory paper by Vilhena & Antonelli 2015, the comparative paper of Bloomfield et al. 2018, and finally the paper on the global biogeography of freshwater fishes which I will use as an example here.

Note that we do not use the network/graph packages in R such as igraph here, as our networks are handled in a data.frame format in R, and exported to other software in specific network formats such as Pajek or GDF formats. However, conversion from and to igraph objects are possible.

Requirements

The biogeonetworks package

Install the most recent version from GitHub:

install.packages("devtools")
devtools::install_github("Farewe/biogeonetworks")

Install the following packages (required to run the examples)

install.packages(c("rgdal", "plyr", "RColorBrewer", "sp", "tmap"))

Example database

Our example dataset will be the global database of freshwater fish species occurrences as we used it in the paper on global biogeographical regions of freshwater fish species.

You can download the example dataset from here

Load it in R as follows:

download.file("https://www.biorxiv.org/content/biorxiv/early/2019/06/12/319566/DC1/embed/media-1.csv",
              destfile = "fish.csv", mode = "wb")
fish = read.csv("fish.csv")
head(fish)

##   X1.Basin.Name X6.Fishbase.Valid.Species.Name
## 1            Aa                  Abramis brama
## 2            Aa              Alburnus alburnus
## 3            Aa            Barbatula barbatula
## 4            Aa                Blicca bjoerkna
## 5            Aa                 Cobitis taenia
## 6            Aa                    Esox lucius

Let’s also get the shapefiles of basins so we can make maps. The shapefile is large so making the map may take a while.

library(rgdal)

## Loading required package: sp

## rgdal: version: 1.4-8, (SVN revision 845)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
##  Path to GDAL shared files: C:/Users/Farewell20/Documents/R/win-library/3.6/rgdal/gdal
##  GDAL binary built with GEOS: TRUE 
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: C:/Users/Farewell20/Documents/R/win-library/3.6/rgdal/proj
##  Linking to sp version: 1.3-2

download.file("https://borisleroy.com/permanent/basinshapefile.RDS",
              destfile = "basinshapefile.RDS", mode = "wb")
basins <- readRDS("basinshapefile.RDS")
plot(basins)

Map Equation software

In order to run Map Equation, you need to download the source code from the Map Equation website and compile the file.

In case you are not sure how to do that and/or do not manage to compile the code, I provide the executable I used for the biogeography paper at the following URL: https://borisleroy.com/permanent/Infomap.exe

This version corresponds to the version we used in the paper, i.e. infomap 0.19.12, released 27 Oct. 2017.

However, I highly recommend that you compile yourself the last version of infomap if you plan to use it for research. Also, if you work on a mac, you will have to compile the code yourself as I cannot compile for macs!

Put the Map Equation software in a directory where you know the full path; for me it is at the root of my R folder.

The Gephi network visualisation software

To explore the network, the easiest method is to install the open-source Gephi network visualisation software. There are multiple tutorials to learn it if you have never used this software before. This is relatively easy to learn, while at the same time hard to master! Take the time to experiment with your graphs, you will be well rewarded in the end.

Biogeographical network analysis: a detailed tutorial

Overview of the procedure

Step 1. Prepare your dataset as a bipartite data.frame, i.e. a data.frame composed of two columns: sites and species. Each row represents a species occurring in a site. You can also add a third column representing species abundances or frequencies of occurrences: Map Equation also works very well with species abundances (see an example here).

Sites	Species	Abundance (facultative)
A	Sp 1	10
A	Sp 2	15
A	Sp 3	3
B	Sp 1	1
B	Sp 4	12

Step 2. Write the network in Pajek format.
Step 3. Run the Map Equation Algorithm on the Pajek network.
Step 4. Read the output of Map Equation back into R as a data.frame.
Step 5. Analyse the network in R.
Step 6. Make maps.
Step 7. Write the network in GDF format for analysis in Gephi.

Now, because our example network is already in the correct format, we will skip to Step 2.

Step 1 Prepare the bipartite dataframe

Our example is already in this format.

Important: make sure that both your species and site columns are factors.

Step 2 Write the network in Pajek format

The Pajek format is a network file format that can be read by the Map Equation algorithm. We use the function writePajek to do that.

library(biogeonetworks)
writePajek(fish, 
           site.field = "X1.Basin.Name", # Name of your site column
           species.field = "X6.Fishbase.Valid.Species.Name", # Name of your species column
           filename = "fish.net", # Name of the output file
           abundance.field = NULL) # (FACULTATIVE) Name of abundance column

A file called “fish.net” should have appeared in your working directory. Note that if you wish, you can already open this file in Gephi to start exploring your network.

If you have any errors at this stage please verify the spelling of your columns, and that they are in factor format.

Step 3 Run the Map Equation algorithm

We will now run the Map Equation algorithm, which is an executable file outside R. To do that, we use the system() function which allows to run software in command line format.

Map Equation allows a variety of options and I won’t describe them here. All the details are provided on the Map Equation website.

Below are the parameters we specified for our fish biogeography paper:

--undirected: means that the network is not directed (i.e., no arrows)
-N 1000: number of runs set to 1000. The Map Equation is based on a stochastic process, so you need to set a high number of runs to find a stable solution. In our experience, 1000 runs always yielded the same resuts - but it may differ for other datasets.
--tree: output file format “.tree”. We will use this format in step 4.
--map: output file format “.map”. Useful if you want to use the apps on the Map Equation website, as these are reading .map file formats.

After the parameters are set, you specify the input file (Pajek file generated at step 2), and then the output folder.

Note here that mac users may need to adjust the last part of this command line, as paths are not written the same as in windows.

system("infomap --undirected --tree --map -N 100 fish.net ./")

Infomap is very talkative and your console is now filled with information and numbers. It’s not extremely important to be able to read it at this point as we will be analysing everything on our own later on. Nevertheless, the last synthesis of infomap (shown below) informs you that it found a solution in six levels; i.e. there is a hierarchy of nested clusters with up to six levels. In my case it indicates that it found 16 clusters at the first level, 16 clusters and 32 leaf nodes (leaf nodes are always sites or species) at the second, etc. However, not all clusters found at each level are important - as we will see later, some clusters at the first level are simply composed of isolated basins with a few endemic species.

Best end modular solution in 6 levels:
Per level number of modules:         [         16,          16,          89,          96,          16,           0] (sum: 233)
Per level number of leaf nodes:      [          0,          32,          57,        8620,        4057,        1110] (sum: 13876)
Per level average child degree:      [         16,           3,       9.125,     97.9326,     42.4271,      69.375] (average: 78.327621725)
Per level codelength for modules:    [0.000690550, 0.022845259, 0.530561333, 0.373025892, 0.092733170, 0.000000000] (sum: 1.019856203)
Per level codelength for leaf nodes: [0.000000000, 0.000625408, 0.003806125, 3.726011192, 2.747859945, 0.713083216] (sum: 7.191385886)
Per level codelength total:          [0.000690550, 0.023470667, 0.534367457, 4.099037084, 2.840593114, 0.713083216] (sum: 8.211242089)

If you look into your hard drive, your should see that two new files have been written by Map Equation, called “fish.tree” and “fish.map”.

If you have any errors at this step, please verify carefully your system() command, and check that you did not make any mistake in your file / path names.

Step 4 Read Map Equation results in R

We read the “fish.tree” file in R with the function readInfomapTree().

fish.clusters <- readInfomapTree("fish.tree",
                                 network.summary = TRUE, # Prints a summary of the clustering results
                                 replace.leaf.names = TRUE) # Changes node numbers to actual names for terminal nodes (i.e. site & species names)

## Biogeographical network with up to 6 levels of complexity.
## lvl1: 16 clusters/leaf nodes
## lvl2: 48 clusters/leaf nodes
## lvl3: 146 clusters/leaf nodes
## lvl4: 8716 clusters/leaf nodes
## lvl5: 4073 clusters/leaf nodes
## lvl6: 1110 clusters/leaf nodes

Let’s inspect the generated object:

head(fish.clusters)

##        Groups Codelength        Name    id lvl1 lvl2  lvl3    lvl4      lvl5
## 1 1:1:1:1:1:1 0.00213902      Kapuas 12245    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 2 1:1:1:1:1:2 0.00160848      Pahang 12915    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 3 1:1:1:1:1:3 0.00131373     Mahakam 12569    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 4 1:1:1:1:1:4 0.00124636 Batang.Hari 11477    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 5 1:1:1:1:1:5 0.00106951       Perak 12984    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 6 1:1:1:1:1:6 0.00102740      Rajang 13107    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
##          lvl6
## 1      Kapuas
## 2      Pahang
## 3     Mahakam
## 4 Batang.Hari
## 5       Perak
## 6      Rajang

In this table, each row corresponds to one node (i.e., one site or one species) of the network. The columns provide information for each node:

Groups: clusters as defined by Map Equation. We won’t use this column as clusters are provided in more details in other columns.
Codelength: quantitative information that Map Equation used to produce clusters.
Name: node name (i.e., site or species name)
id: node id in the network
lvl1, lvl2 and so on: each column correspond to one level of the hierarchical clustering of Map Equation. Here we have six columns corresponding to the six levels of our hierarchical clustering. lvl1 correspond to coarser, more global clusters, whereas lvl6 correspond to the finer level and is only composed of leaf nodes (no more clusters, only species or site names).

At the first level, cluster are name as 1, 2, 3, 4, etc. Second level clusters are nested in first level clusters, so they are indicated as, for example, 2.3, to indicate that we looking at subcluster 3 of cluster 2. Likewise, third level clusters are indicated as e.g. 2.3.1, meaning cluster 2, subcluster 3, subsubcluster 1. This hierarchy goes on until the last level where you will always have a species or site name.

This notation is designed to ensure that you do not confuse sublevel clusters, for example confuse subcluster 1 of cluster 2 with subcluster 1 of cluster 1.

Once you are familiarised with how clusters are coded, we can proceed and start analysing our results - we are mostly interested in what’s in columns lvl1 to lvl6.

Step 5 Analysing biogeographical network & clustering results in R

Foremost, we are looking for several information to grasp our clustering results: How many clusters there are? Are they all significant - i.e. how many species there are per cluster? How many sites per cluster? How many clusters at level 2?

Let’s start with the numbers of clusters at level 1 and level 2:

levels(fish.clusters$lvl1)

##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "10" "11" "8"  "9"  "12" "13" "14" "15"
## [16] "16"

levels(fish.clusters$lvl2)

##  [1] "2.1"                        "1.1"                       
##  [3] "1.2"                        "2.2"                       
##  [5] "1.3"                        "2.3"                       
##  [7] "1.4"                        "4.1"                       
##  [9] "3.1"                        "5.1"                       
## [11] "3.2"                        "3.3"                       
## [13] "3.4"                        "5.2"                       
## [15] "5.3"                        "4.2"                       
## [17] "Anablepsoides cryptocallus" "Aphanius stiassnyae"       
## [19] "Ashida"                     "Bailebu"                   
## [21] "Barbatula dgebuadzei"       "Barbopsis devecchii"       
## [23] "Blanche"                    "Carasobarbus apoensis"     
## [25] "Chigusa"                    "Cyprinion mhalensis"       
## [27] "Danakil.Depression"         "Danakilia franchettii"     
## [29] "Eyl"                        "Garra buettikeri"          
## [31] "Gobi.lakes"                 "Kemp.Welch"                
## [33] "Kwoor"                      "Melanotaenia arfakensis"   
## [35] "Melanotaenia catherinae"    "Melanotaenia fredericki"   
## [37] "Melanotaenia misoolensis"   "Melanotaenia parkinsoni"   
## [39] "Mori"                       "Prafi"                     
## [41] "Rhodeus suigensis"          "Riviere.des.Coulisses"     
## [43] "Tama.Indonesia"             "Wadi.ad.Dawasir"           
## [45] "Wadi.Bowa"                  "Wadi.SA1"                  
## [47] "Waigeo"                     "Warsamson"

We can see that at level 1 we have 16 clusters. At level 2 we have again 16 clusters, and a variety of species and site nodes. By looking at the names of level 2 clusters, we can figure out that only clusters 1 to 5 of level 1 have a nested hierarchy of subclusters.

If we look at our fish.clusters table, both species and site nodes are mixed together in the table and it seems difficult to disentangle the results from them. Let’s look at how many nodes are inside each cluster at level 1:

plyr::count(fish.clusters$lvl1)

##     x freq
## 1   1 7567
## 2   2 6220
## 3   3   26
## 4   4   19
## 5   5   12
## 6   6    6
## 7   7    4
## 8  10    3
## 9  11    3
## 10  8    3
## 11  9    3
## 12 12    2
## 13 13    2
## 14 14    2
## 15 15    2
## 16 16    2

We can see that clusters 1 and 2 respectively have 7567 and 6220 nodes, whereas clusters 3 to 16 have less than 26 nodes each. Therefore, we can assume that the only significant clusters at this stage are clusters 1 and 2 - but we still neither know how many species they have each, nor how many sites. We will therefore split our network table into two tables: one for sites and one for species.

First, let’s make a table composed only of site nodes:

fish.sites <- getSiteTable(fish, # Your bipartite data.frame of STEP 1
                           site.field = "X1.Basin.Name", # Name of your site column
                           network = fish.clusters) # Output table from Map Equation

plyr::count(fish.sites$lvl1)

##     x freq
## 1   1 1888
## 2   2  644
## 3   3    8
## 4   4   16
## 5   5    7
## 6   6    3
## 7   7    3
## 8  10    2
## 9  11    2
## 10  8    2
## 11  9    1
## 12 12    1
## 13 13    1
## 14 14    1
## 15 15    1
## 16 16    1

We can see that we clearly have two major clusters, clusters 1 and 2, which respectively have 1888 and 644 sites. The other clusters are much more marginal here.

Let’s look at the species table now:

fish.species <- getSpeciesTable(fish, # Your bipartite data.frame of STEP 1
                                species.field = "X6.Fishbase.Valid.Species.Name", # Name of your species column
                                network = fish.clusters) # Output table from Map Equation

plyr::count(fish.species$lvl1)

##     x freq
## 1   1 5679
## 2   2 5576
## 3   3   18
## 4   4    3
## 5   5    5
## 6   6    3
## 7   7    1
## 8  10    1
## 9  11    1
## 10  8    1
## 11  9    2
## 12 12    1
## 13 13    1
## 14 14    1
## 15 15    1
## 16 16    1

We see here that our first two clusters both have more than 5500 species, whereas most other clusters only have a few species.

Therefore, overall, we consider that there are two major clusters at level 1, and a few minor additional clusters.

If we now investigate level 2:

plyr::count(fish.sites$lvl2)

##                        x freq
## 1                    2.1  246
## 2                    1.1  719
## 3                    1.2  272
## 4                    2.2  267
## 5                    1.3  515
## 6                    2.3  131
## 7                    1.4  382
## 8                    4.1   15
## 9                    3.1    4
## 10                   5.1    5
## 11                   3.2    1
## 12                   3.3    1
## 13                   3.4    2
## 14                   5.2    1
## 15                   5.3    1
## 16                   4.2    1
## 17                Ashida    1
## 18               Bailebu    1
## 19               Blanche    1
## 20               Chigusa    1
## 21    Danakil.Depression    1
## 22                   Eyl    1
## 23            Gobi.lakes    1
## 24            Kemp.Welch    1
## 25                 Kwoor    1
## 26                  Mori    1
## 27                 Prafi    1
## 28 Riviere.des.Coulisses    1
## 29        Tama.Indonesia    1
## 30       Wadi.ad.Dawasir    1
## 31             Wadi.Bowa    1
## 32              Wadi.SA1    1
## 33                Waigeo    1
## 34             Warsamson    1

plyr::count(fish.species$lvl2)

##                             x freq
## 1                         2.1 4088
## 2                         1.1 2464
## 3                         1.2 2675
## 4                         2.2  828
## 5                         1.3  462
## 6                         2.3  660
## 7                         1.4   78
## 8                         4.1    2
## 9                         3.1    9
## 10                        5.1    1
## 11                        3.2    4
## 12                        3.3    3
## 13                        3.4    2
## 14                        5.2    2
## 15                        5.3    2
## 16                        4.2    1
## 17 Anablepsoides cryptocallus    1
## 18        Aphanius stiassnyae    1
## 19       Barbatula dgebuadzei    1
## 20        Barbopsis devecchii    1
## 21      Carasobarbus apoensis    1
## 22        Cyprinion mhalensis    1
## 23      Danakilia franchettii    1
## 24           Garra buettikeri    1
## 25    Melanotaenia arfakensis    1
## 26    Melanotaenia catherinae    1
## 27    Melanotaenia fredericki    1
## 28   Melanotaenia misoolensis    1
## 29    Melanotaenia parkinsoni    1
## 30          Rhodeus suigensis    1

We have six majors clusters at the second level, which all have more than 250 sites. Their species richness varies from 78 to 4750.

All the other clusters have less than 10 species and only a few basins so we will focus on the following cluster:

level 1: clusters 1 and 2
level 2: clusters 1.1, 1.2, 1.3, 1.4, 2.1 and 2.2

Once we have decided our number of clusters, our next step will be useful for both the network visualisation and map making: we will define the colours for each cluster. Contrary to what it may seem, this step is absolutely not straightforward and may generate a lot of confusion, which is why I wrote a function to automatically assign colours based on the chosen number of significant clusters. This function uses the package RColorBrewer and therefore will assign colours based on RColorBrewer palettes. There can be no more than 12 colours on these colour palettes, therefore if you have more than 12 significant clusters, you can ask the function to assign shades of grey to the other clusters using the argument sh.grey = TRUE. Otherwise, you can assign a single colour to all the “minor” clusters with the argument other.color.

First we assign colours to the level 1 clusters; notice how the function simply adds a column to the Map Equation data.frame, therefore you should always assign the result of the function to this same data.frame.

fish.clusters <- attributeColors(network = fish.clusters, # Same object as input & output
                                 lvl = "lvl1", # Which hierarchical level are we working on?
                                 nb.max.colors = 2, # We chose two clusters as significant for level 1
                                 colname = "colors.lvl1", # Name to give to the colour column
                                 other.color = grey(0.5), # Minor clusters will be black
                                 cluster.order = "sites", # Cluster order for colours (see below)
                                 db = fish, # Database of step 1
                                 site.field = "X1.Basin.Name", # Name of site column in your database
                                 species.field = "X6.Fishbase.Valid.Species.Name") # Name of species column in your database

## Loading required package: RColorBrewer

head(fish.clusters)

##        Groups Codelength        Name    id lvl1 lvl2  lvl3    lvl4      lvl5
## 1 1:1:1:1:1:1 0.00213902      Kapuas 12245    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 2 1:1:1:1:1:2 0.00160848      Pahang 12915    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 3 1:1:1:1:1:3 0.00131373     Mahakam 12569    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 4 1:1:1:1:1:4 0.00124636 Batang.Hari 11477    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 5 1:1:1:1:1:5 0.00106951       Perak 12984    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 6 1:1:1:1:1:6 0.00102740      Rajang 13107    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
##          lvl6 colors.lvl1
## 1      Kapuas     #A6CEE3
## 2      Pahang     #A6CEE3
## 3     Mahakam     #A6CEE3
## 4 Batang.Hari     #A6CEE3
## 5       Perak     #A6CEE3
## 6      Rajang     #A6CEE3

As you can see, a column called colors.lvl1 was added to our fish.clusters object, and it contains hexadecimal colour codes. We will use these codes later to make maps and colour clusters in Gephi.

About the cluster.order argument: this argument is important to determine the “significant” clusters that will receive colours. In my case, I used the number of sites to determine the important clusters. Therefore, at level 1 I want the two clusters with the highest number of sites to be coloured. Likewise, at level 2 I want the six clusters with the highest number of sites to be coloured.

Different options are available:

cluster.order = "undefined": the order of clusters is based on the order of factor levels for the cluster column
cluster.order = "sites": the order of clusters is based on the decreasing number of sites per cluster
cluster.order = "species": the order of clusters is based on the decreasing number of species per cluster
cluster.order = "sites+species": the order of clusters is based on the total number of nodes (sites & species) per cluster

Before we proceed with the maps, let’s also add a colour column for the level 2 of our analyses:

fish.clusters <- attributeColors(network = fish.clusters, # Same object as input & output
                                 lvl = "lvl2", # Which hierarchical level are we working on?
                                 nb.max.colors = 6, # We chose two clusters as significant for level 2
                                 colname = "colors.lvl2", # Name to give to the colour column
                                 other.color = "black", # Minor clusters will be black
                                 cluster.order = "sites", # Cluster order for colours (see below)
                                 db = fish, # Database of step 1
                                 site.field = "X1.Basin.Name", # Name of site column in your database
                                 species.field = "X6.Fishbase.Valid.Species.Name") # Name of species column in your database
head(fish.clusters)

##        Groups Codelength        Name    id lvl1 lvl2  lvl3    lvl4      lvl5
## 1 1:1:1:1:1:1 0.00213902      Kapuas 12245    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 2 1:1:1:1:1:2 0.00160848      Pahang 12915    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 3 1:1:1:1:1:3 0.00131373     Mahakam 12569    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 4 1:1:1:1:1:4 0.00124636 Batang.Hari 11477    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 5 1:1:1:1:1:5 0.00106951       Perak 12984    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 6 1:1:1:1:1:6 0.00102740      Rajang 13107    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
##          lvl6 colors.lvl1 colors.lvl2
## 1      Kapuas     #A6CEE3     #A6CEE3
## 2      Pahang     #A6CEE3     #A6CEE3
## 3     Mahakam     #A6CEE3     #A6CEE3
## 4 Batang.Hari     #A6CEE3     #A6CEE3
## 5       Perak     #A6CEE3     #A6CEE3
## 6      Rajang     #A6CEE3     #A6CEE3

Step 6 Make maps

The simplest way to map our clusters with vector data is to add the colour columns the dataframe of our polygon data. Whether you work with package sp or sf does not matter as the procedure will be the same; here I will make the exampe with sp.

First, add colour columns to the data slot of our basins polygons. We need to make sure that each basins gets its proper colour, so we use the match() function to do that. Of course, to do that, you need to make sure that the names between your polygons and your step 1 data.frame are exactly the same.

basins@data$colors.lvl1 <- fish.clusters$colors.lvl1[match(
  basins@data$BasinName, fish.clusters$Name
)]
head(basins@data)

##       BasinName   Country Ecoregion Endorheic Out_Longit  Out_Latit Med_Longit
## 0  Cachoeirinha    Brazil Neotropic      <NA>  -43.10344 -22.693658  -43.06899
## 1      Comprido    Brazil Neotropic      <NA>  -47.07734 -24.451571  -47.15300
## 2 Arroyo.Walker Argentina Neotropic      <NA>  -62.29922 -40.628898  -62.59625
## 3     Aconcagua     Chile Neotropic      <NA>  -71.53604 -32.912822  -70.65645
## 4        Amazon    Brazil Neotropic      <NA>  -52.23409  -1.619426  -64.57286
## 5      Andalien     Chile Neotropic      <NA>  -73.09136 -36.664541  -72.81968
##    Med_Latit    Surf_area colors.lvl1
## 0 -22.595111     228.8151     #B2DF8A
## 1 -24.465831     204.7977     #B2DF8A
## 2 -40.507344     969.1561     #B2DF8A
## 3 -32.770645    7318.8768     #B2DF8A
## 4  -6.714857 5888416.9156     #B2DF8A
## 5 -36.824925     767.4691     #B2DF8A

Now making the map is relatively straightforward:

plot(basins, col = basins@data$colors.lvl1)

It becomes clear now that these two big clusters at level 1 are the supercontinental regions we described in our paper. Let’s add a legend to that:

bioregions.lvl1 <- unique(fish.clusters[, c("lvl1", "colors.lvl1")])
bioregions.lvl1 <- bioregions.lvl1[-which(duplicated(bioregions.lvl1$colors.lvl1)), ]
bioregions.lvl1$Regions <- c("New World", "Old World", "Small clusters")

plot(basins, col = basins@data$colors.lvl1)
legend("bottomleft",
       fill = bioregions.lvl1$colors.lvl1,
       legend = bioregions.lvl1$Regions)

Let’s do this for level 2 now:

basins@data$colors.lvl2 <- fish.clusters$colors.lvl2[match(
  basins@data$BasinName, fish.clusters$Name
)]

bioregions.lvl2 <- unique(fish.clusters[, c("lvl2", "colors.lvl2")])
bioregions.lvl2 <- bioregions.lvl2[-which(duplicated(bioregions.lvl2$colors.lvl2)), ]
bioregions.lvl2$Regions <- c("Sino-Oriental",
                             "Ethiopian",
                             "Palearctic",
                             "Australian",
                             "Neotropical",
                             "Nearctic",
                             "Small clusters")

plot(basins, col = basins@data$colors.lvl2)
legend("bottomleft",
       fill = bioregions.lvl2$colors.lvl2,
       legend = bioregions.lvl2$Regions)

Step 7 Write the network for visualisation in Gephi

If you want to explore the structure of your network in Gephi, you can write the network on the hard drive with the function writeGDF. The major asset of this function over the function used in step 2 is that the GDF file format can take into account additional columns, such as colour column. If you provide a colour column, Gephi will automatically colour your nodes according to this column.

writeGDF(db = fish, # Database of step 1
         site.field = "X1.Basin.Name", # Name of site column in your database
         species.field = "X6.Fishbase.Valid.Species.Name", # Name of species column in your database
         network = fish.clusters, 
         filename = "fish.gdf",
         color.field = "colors.lvl2") # Name of the color field in the network

Notice how I chose to colour nodes according to the second level here.

After that, you are free to experiment with your network in Gephi, and find your ideal visualisation! It takes a while to find your optimal set of colours and spatialisation algorithm, but you can get some neat results in the end.

This is what we ended up with:

I highly recommend you to learn to explore networks as they are very useful to understand patterns or discover errors in your datasets (e.g., strange links between regions may indicate records of introduced species!).

Step 8 Calculation of metrics for the network

8.1 The Participation Coefficient

The Participation Coefficient, introduced by Bloomfield et al. 2018, provides an indication of the degree to which a node is sharing links with nodes of other clusters. In other words, for a site, it indicates whether the site has species from other regions (is it a transition zone?); for a species, it indicates the degree to which this species occurs in other regions (is the species endemic of one region or occuring in multiple regions?).

This is a very useful metric to highlight transition zones in biogeography.

Let’s make an example with the Participation coefficient calculated at level 2 of our network. Depending on the size of your network, the calculation may take a little while.

library(plyr)
fish.clusters <- participationCoefficient(
  network = fish.clusters, # Map Equation clustering results
  db = fish, # Database of step 1
  site.field = "X1.Basin.Name", # Name of site column in your database
  species.field = "X6.Fishbase.Valid.Species.Name", # Name of species column in your database 
  lvl = "lvl2") 

head(fish.clusters)

##        Groups Codelength        Name    id lvl1 lvl2  lvl3    lvl4      lvl5
## 1 1:1:1:1:1:1 0.00213902      Kapuas 12245    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 2 1:1:1:1:1:2 0.00160848      Pahang 12915    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 3 1:1:1:1:1:3 0.00131373     Mahakam 12569    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 4 1:1:1:1:1:4 0.00124636 Batang.Hari 11477    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 5 1:1:1:1:1:5 0.00106951       Perak 12984    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
## 6 1:1:1:1:1:6 0.00102740      Rajang 13107    1  1.1 1.1.1 1.1.1.1 1.1.1.1.1
##          lvl6 colors.lvl1 colors.lvl2 participation.coef
## 1      Kapuas     #A6CEE3     #A6CEE3         0.00000000
## 2      Pahang     #A6CEE3     #A6CEE3         0.01041638
## 3     Mahakam     #A6CEE3     #A6CEE3         0.01273833
## 4 Batang.Hari     #A6CEE3     #A6CEE3         0.00000000
## 5       Perak     #A6CEE3     #A6CEE3         0.01562403
## 6      Rajang     #A6CEE3     #A6CEE3         0.00000000

Now let’s make a map of our participation coefficient; for this map I will use the tmap packages, as it makes beautiful maps really easily.

basins@data$PC <- fish.clusters$participation.coef[match(
  basins@data$BasinName, fish.clusters$Name
)]

library(tmap)

tm_shape(basins, projection = "robin") + # Make a nice projection as well!
  tm_polygons("PC") +
  tm_layout(legend.position = c("left", "bottom"))

## Warning: The shape basins is invalid. See sf::st_is_valid

## Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3

And here is the transition zones maps! In case you wonder, the differences between the results here and those in our paper correspond to the post-hoc changes we made to the minor clusters (see Appendix S5).

8.2 Another set of metrics: IndVal, DilVal and robustness

Description to be added here in the future

site.area <- data.frame(name = basins$BasinName,
                        area = basins$Surf_area)

lvl1metrics <- clusterMetrics(db = fish, # Database of step 1
                              network = fish.clusters, # Clustering results
                              site.field = "X1.Basin.Name", # Name of site column in your database
                              species.field = "X6.Fishbase.Valid.Species.Name", # Name of species column in your database
                              site.area = site.area, # A data.frame containing at least two columns with site name ("name") and site surface ("surface")
                              level = "lvl1") # Clustering evel at which we calculate metrics

head(lvl1metrics$region.stats)

##   region nb.sites        area
## 1      1     1888 67354732.02
## 2      2      644 30325232.78
## 3      3        8   109698.98
## 4      4       16    38545.18
## 5      5        7    20495.35
## 6      6        3   374715.84

head(lvl1metrics$species.stats)

##                                         species cluster Occ.Ri Occ.Di
## Aaptosyax grypus               Aaptosyax grypus       1      1      1
## Abactochromis labrosus   Abactochromis labrosus       1      1      1
## Abbottina liaoningensis Abbottina liaoningensis       1      5      5
## Abbottina obtusirostris Abbottina obtusirostris       1      1      1
## Abbottina rivularis         Abbottina rivularis       1     71     71
## Abbottina springeri         Abbottina springeri       1     25     25
##                              Occ.Ai Occ.Fi  Occ.IndVal Occ.DilVal         Ri
## Aaptosyax grypus        0.000529661      1 0.000529661          0  772599.35
## Abactochromis labrosus  0.000529661      1 0.000529661          0  127828.37
## Abbottina liaoningensis 0.002648305      1 0.002648305          0 4207258.62
## Abbottina obtusirostris 0.000529661      1 0.000529661          0 1913756.21
## Abbottina rivularis     0.037605932      1 0.037605932          0 6058422.41
## Abbottina springeri     0.013241525      1 0.013241525          0   81160.36
##                                 Di          Ai Fi      IndVal DilVal
## Aaptosyax grypus         772599.35 0.011470602  1 0.011470602      0
## Abactochromis labrosus   127828.37 0.001897838  1 0.001897838      0
## Abbottina liaoningensis 4207258.62 0.062464188  1 0.062464188      0
## Abbottina obtusirostris 1913756.21 0.028413092  1 0.028413092      0
## Abbottina rivularis     6058422.41 0.089947985  1 0.089947985      0
## Abbottina springeri       81160.36 0.001204969  1 0.001204969      0

head(lvl1metrics$site.stats)

##                    site cluster     Occ.Rg     Occ.RRg         Rg        RRg
## Aa                   Aa       1 1.07940981 0.083031524 2.11187658 0.16245204
## Abashiri       Abashiri       1 0.06991525 0.069915254 0.03525847 0.03525847
## Abhe.Bad       Abhe.Bad       1 0.09322034 0.009322034 0.91494189 0.09149419
## Abogain.Gol Abogain.Gol       1 0.03495763 0.005826271 0.35315965 0.05885994
## Abu.gawa       Abu.gawa       1 0.18326271 0.045815678 0.05899609 0.01474902
## Abukuma         Abukuma       1 0.23675847 0.059189619 0.22367040 0.05591760

Biogeographical networks, a second example with raster maps

For this example I will be using occurrence data which I will transform into raster data - I suspect this is the most common type of data that people are working with. I decided to make an example on the basis of spiders as they were my first research subject and I still love them very much, even though I no longer work on them!

I assume here that you have followed the previous sections of the tutorial, so I will not detail each step as I did before.

In this section we will need the following packages:

install.packages(c("rgbif", "reshape2"))

Get occurrence data points and transform them into rasters

I will use the package rgbif to download occurrence data in France, for the bee family megachilidae (they are funny large bees). I chose them because they have enough data in France to be able to look for bioregions, and I have a friend working on them.

Warning: the rgbif package takes a while to proceed

library(rgbif)

## Warning: package 'rgbif' was built under R version 3.6.3

megachilidae <- occ_search(scientificName = "Megachilidae", country = "FR",
                           hasCoordinate = TRUE,
                           limit = 20000,
                           return = "data")

## A quick cleaning of the data
# We keep only the important columns here (remember you will have to cite data
# sources if you do this for a publication!)
megachilidae <- megachilidae[, c("species", "decimalLongitude", "decimalLatitude")]
# Remove data with no valid species name
megachilidae <- megachilidae[-which(is.na(megachilidae$species)), ]

head(megachilidae)

## # A tibble: 6 x 3
##   species       decimalLongitude decimalLatitude
##   <chr>                    <dbl>           <dbl>
## 1 Osmia cornuta            5.97             43.1
## 2 Osmia rufa              -1.66             48.1
## 3 Osmia cornuta            3.37             44.3
## 4 Osmia cornuta           -0.876            46.6
## 5 Osmia cornuta           -1.66             48.1
## 6 Osmia cornuta            3.82             43.5

So here we are, with a dataset composed of 6 342 occurence records from France. What we will do next consists in “rasterizing” these occurrence records, i.e. creating a raster of a fixed resolution and recording species presence in each raster cell.

We will use a loop to create a stack with as many raster layers as there are species.

library(raster)
# First, create the grid. To do that, I will create a raster of resolution 0.5°
# at an extent corresponding to a shapefile of France:
france <- getData("GADM", country = "FRA", level = 1)

rasterfrance <- raster(france,
                       resolution = 0.5)

# Then, creaste the species stack through a loop in which all occurrences are
# rasterized  
sp.stack <- rasterfrance # This will be our species stack, it is empty at first
for (sp in unique(megachilidae$species))
{
  sp.stack <- addLayer(sp.stack, # At each iteration, we will add a new species
                       rasterize(megachilidae[which(megachilidae$species == sp), 
                                              c("decimalLongitude",
                                                "decimalLatitude")], 
                                 # We select all occurrences corresponding to the current species
                                 rasterfrance, fun = function (x, ...) 1
                                 # This function will assign a "1" to a cell everytime one or more occurrences fall in this cell
                       ))
}
# Add species names for all layers
names(sp.stack) <- unique(megachilidae$species)

# Plot the first species
plot(sp.stack[[1]], col = "black")
plot(france, add = TRUE)

Now that we have all our species in the format of a presence raster, we will prepare the bipartite data.frame necessary for the network analysis.

Step 1 Prepare the bipartite dataframe

First we will transform the raster stack in a cell * species matrix, then convert it into the bipartite data.frame.

# Transform the species stack into a matrix
megachilidae.matrix <- rasterToPoints(sp.stack)
# Get cell numbers
rownames(megachilidae.matrix) <- cellFromXY(sp.stack,
                                            megachilidae.matrix[, c(1, 2)])

# Remove the x & y columns
megachilidae.matrix <- megachilidae.matrix[, -c(1, 2)]

# Transform the matrix into a bipartite data.frame
bipartite.megachilidae <- reshape2::melt(megachilidae.matrix)
# Change column names
colnames(bipartite.megachilidae) <- c("Cell", "Species", "Occurrence")

# Remove empty cells
bipartite.megachilidae <- bipartite.megachilidae[-which(is.na(bipartite.megachilidae$Occurrence)), ]

Step 2 Write the network in Pajek format

writePajek(bipartite.megachilidae, 
           site.field = "Cell", # Name of your site column
           species.field = "Species", # Name of your species column
           filename = "megachilidae.net", # Name of the output file
           abundance.field = NULL)

Step 3 Run the Map Equation algorithm

system("infomap --undirected --tree --map -N 100 megachilidae.net ./")

## [1] 0

Step 4 Read Map Equation clusters in R

megachilidae.clusters <- readInfomapTree("megachilidae.tree",
                                         network.summary = TRUE, # Prints a summary of the clustering results
                                         replace.leaf.names = TRUE) # Changes node numbers to actual names for terminal nodes (i.e. site & species names)

## Biogeographical network with up to 2 levels of complexity.
## lvl1: 20 clusters/leaf nodes
## lvl2: 382 clusters/leaf nodes

Step 5 Analysing the results in R

We will straight away analyse the results as the number of pixels per cluster:

megachilidae.cells <- getSiteTable(bipartite.megachilidae, site.field = "Cell", 
                                   network = megachilidae.clusters)

plyr::count(megachilidae.cells$lvl1)

##     x freq
## 1   1  133
## 2   2   11
## 3   3   18
## 4   4    6
## 5   5    6
## 6   6    5
## 7   7    5
## 8  13    5
## 9   9    2
## 10 10    2
## 11 12    5
## 12 11    2
## 13 15    1
## 14  8    2
## 15 14    1
## 16 16    3
## 17 18    3
## 18 17    2
## 19 19    1
## 20 20    1

We seem to have at least one big cluster, two small clusters, and a large number of tiny clusters (<10 cells). This high number of tiny clusters may be due to the nature of the Map Equation algorithm: it tends to identify “transition” clusters (i.e. clusters with many shared species) as distinct clusters. Hence, these small clusters may indicate zones of biogeographical transition rather than biogeographical clusters. Therefore, we will colour them all in black for the time being.

megachilidae.clusters <- attributeColors(megachilidae.clusters, 
                                         nb.max.colors = 3, # Three major clusters
                                         palette = "Paired", 
                                         other.color = "#000000", # Black colour
                                         db = bipartite.megachilidae)

Step 6 Make raster maps!

Because rasters contain only numbers, we need to create a special type of raster dedicated to categorical data.

Let’s run getSiteTable again on our clusters, hence we will also have the colours on our data.frame of cells.

megachilidae.cells <- getSiteTable(bipartite.megachilidae, site.field = "Cell", 
                                   network = megachilidae.clusters)
cols <- unique(megachilidae.cells[, c("lvl1", "color")])

The data.frame called cols contains the cluster names and their associated colours. We will add a third column, which is a numeric id for each cluster. This id will be used to map clusters in the raster, since rasters can only take numeric values.

cols$ID <- as.numeric(cols$lvl1)
cols

##     lvl1   color ID
## 14     1 #A6CEE3  1
## 167    2 #B2DF8A  2
## 256    3 #1F78B4  3
## 282    4 #000000  4
## 293    5 #000000  5
## 302    6 #000000  6
## 314    7 #000000  7
## 322    8 #000000 14
## 327    9 #000000  9
## 334   10 #000000 10
## 340   11 #000000 12
## 346   12 #000000 11
## 351   13 #000000  8
## 359   14 #000000 15
## 363   15 #000000 13
## 368   16 #000000 16
## 372   17 #000000 18
## 376   18 #000000 17
## 379   19 #000000 19
## 382   20 #000000 20

Important: When we apply as.numeric on the cluster column, it will return the positions of factor levels, which is why the numeric values inside id do not correspond to the names of lvl1. This may seem counter-intuitive, but remember that we are working here on the first level of clusters, hence cluster names are 1, 2, 3, etc.. Imagine you are working on the third level, then the names would be 1.1.1, 1.1.2, 1.2.1, etc., which is why we cannot convert them into numeric ids straight away.

Now, let’s create the raster that will map the clusters:

# First we make an empty raster based on our species rasters
rasternet <- raster(sp.stack, layer = 0)
# Then we will add the values in our raster
megachilidae.cells$cellnumbers <- as.numeric(as.character(megachilidae.cells$Name))
rasternet[megachilidae.cells$cellnumbers] <- as.numeric(megachilidae.cells$lvl1)
# Next, we transform the raster into a categorical raster
rasternet <- ratify(rasternet)
levels(rasternet)[[1]] <-  merge(levels(rasternet)[[1]], cols)

Finally, let’s make the map with the tmap package:

library(tmap)

tm_shape(rasternet) +
  tm_raster("lvl1",
            palette = levels(rasternet)[[1]]$color) +
  tm_legend(legend.position = c("left", "bottom"))

We indeed have multiple biogeographical regions of megachilidae in France! Unsurprisingly, the two main clusters (in blue) correspond to mainland France versus Mediterranean France. The green cluster is undoubtedly a mountaineous cluster.

Note that there are small occurrences of mediterranean & mountaineous clusters elsewhere in France. This is likely because of errors or problems with samplings, such as a very heterogeneous distribution of samplings across France.

Step 8.1 Participation Coefficient

library(plyr)
megachilidae.clusters <- participationCoefficient(
  network = megachilidae.clusters, # Map Equation clustering results
  db = bipartite.megachilidae, # Database of step 1
  site.field = "Cell", # Name of site column in your database
  species.field = "Species", # Name of species column in your database 
  lvl = "lvl1") 

# Re-extract the cell table to get the participation coefficient column
megachilidae.cells <- getSiteTable(bipartite.megachilidae, site.field = "Cell", 
                                   network = megachilidae.clusters)

# Make an empty raster based on our species rasters
rasterPC <- raster(sp.stack, layer = 0)
# Then we will add the values in our raster
megachilidae.cells$cellnumbers <- as.numeric(as.character(megachilidae.cells$Name))
rasterPC[megachilidae.cells$cellnumbers] <- megachilidae.cells$participation.coef

# Make the PC map
tm_shape(rasterPC) +
  tm_raster()

The interpretation of this map is more difficult since the participation coefficient is high in all clusters, except cluster number 1.

I can think of two main hypotheses to explain this pattern:

bee biogeographical regions have very diffuse limits
the data quality is poor, thus most pixels have incomplete species lists. When species lists are incomplete, usually the documented species are the most common/widespread species, which occur everywhere regardless of region boundaries. Hence, the absence of information on rarest taxa results in inflated participation coefficient values.

Therefore, my best piece of advice to you is to remember to not go too far in your interpretations, because this analysis will only provide information within the limits of your occurrence data. In our example here, we are working on very sparse and spatially heterogeneous data - yet, it gives clues about potential bioregions for megachilidae in France.

Step 8.2 IndVal, DilVal and robustness

Description to be added here in the future

raster.area <- data.frame(name = 1:ncell(rasternet),
                          area = getValues(area(rasternet)))

lvl1metrics <- clusterMetrics(db = bipartite.megachilidae, # Database of step 1
                              network = megachilidae.clusters, # Clustering results
                              site.field = "Cell", # Name of site column in your database
                              species.field = "Species", # Name of species column in your database
                              site.area = raster.area, # A data.frame containing at least two columns with site name ("name") and site surface ("surface")
                              level = "lvl1") # Clustering evel at which we calculate metrics

Species-level metrics

# First, order the species metrics by decreasing IndVal values
lvl1metrics$species.stats <- lvl1metrics$species.stats[
  order(lvl1metrics$species.stats$IndVal, decreasing = TRUE), ]

# Then, look at the table
head(lvl1metrics$species.stats)

##                                 species cluster Occ.Ri Occ.Di Occ.Ai Occ.Fi
## Coelioxys.acanthura Coelioxys.acanthura      15      1      1      1      1
## Osmia.cephalotes       Osmia.cephalotes       9      2      2      1      1
## Osmia.ferruginea       Osmia.ferruginea      14      1      1      1      1
## Megachile.fertoni     Megachile.fertoni      15      1      1      1      1
## Osmia.villosa             Osmia.villosa      19      1      1      1      1
## Osmia.niveocincta     Osmia.niveocincta      14      1      1      1      1
##                     Occ.IndVal Occ.DilVal       Ri       Di Ai Fi IndVal DilVal
## Coelioxys.acanthura          1          0 2223.155 2223.155  1  1      1      0
## Osmia.cephalotes             1          0 4427.659 4427.659  1  1      1      0
## Osmia.ferruginea             1          0 2278.077 2278.077  1  1      1      0
## Megachile.fertoni            1          0 2223.155 2223.155  1  1      1      0
## Osmia.villosa                1          0 2128.229 2128.229  1  1      1      0
## Osmia.niveocincta            1          0 2278.077 2278.077  1  1      1      0

Tiny clusters tend to always be first in this table, because they are often composed of only one species occurring only in the cluster, therefore with a high IndVal.

Let’s focus on big clusters only:

head(lvl1metrics$species.stats[which(lvl1metrics$species.stats$cluster %in% 1:3), ])

##                                             species cluster Occ.Ri Occ.Di
## Megachile.pyrenaea               Megachile.pyrenaea       2      7      8
## Chelostoma.campanularum     Chelostoma.campanularum       2      7      9
## Hoplitis.benoisti                 Hoplitis.benoisti       2      5      5
## Rhodanthidium.caturigense Rhodanthidium.caturigense       2      5      5
## Megachile.analis                   Megachile.analis       2      5      5
## Hoplitis.mitis                       Hoplitis.mitis       2      5      5
##                              Occ.Ai    Occ.Fi Occ.IndVal Occ.DilVal       Ri
## Megachile.pyrenaea        0.6363636 0.8750000  0.5568182 0.07954545 15352.72
## Chelostoma.campanularum   0.6363636 0.7777778  0.4949495 0.14141414 15446.65
## Hoplitis.benoisti         0.4545455 1.0000000  0.4545455 0.00000000 11133.40
## Rhodanthidium.caturigense 0.4545455 1.0000000  0.4545455 0.00000000 11133.40
## Megachile.analis          0.4545455 1.0000000  0.4545455 0.00000000 11095.93
## Hoplitis.mitis            0.4545455 1.0000000  0.4545455 0.00000000 11095.93
##                                 Di        Ai        Fi    IndVal     DilVal
## Megachile.pyrenaea        17538.41 0.6346927 0.8753772 0.5555955 0.07909718
## Chelostoma.campanularum   19758.91 0.6385756 0.7817562 0.4992104 0.13936520
## Hoplitis.benoisti         11133.40 0.4602627 1.0000000 0.4602627 0.00000000
## Rhodanthidium.caturigense 11133.40 0.4602627 1.0000000 0.4602627 0.00000000
## Megachile.analis          11095.93 0.4587137 1.0000000 0.4587137 0.00000000
## Hoplitis.mitis            11095.93 0.4587137 1.0000000 0.4587137 0.00000000

After consulting my friend Benoît Geslin who is a bee expert, the results make sense. The mediterranean cluster is driven by an introduced species which has currently only spread in this part of France. The mountaineous cluster are d riven by mountain species, whereas the cluster covering most of France is driven by widespread species in France.

Site-level metrics

# Make an empty raster based on our species rasters
rasterRg <- raster(sp.stack, layer = 0)
# Then we will add the Relative Robustness values in our raster
lvl1metrics$site.stats$cellnumbers <- as.numeric(as.character(lvl1metrics$site.stats$site))
rasterRg[lvl1metrics$site.stats$cellnumbers] <- lvl1metrics$site.stats$RRg

# Make the PC map
tm_shape(rasterRg) +
  tm_raster()

## Variable "layer" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Credits

Package and tutorial written by Boris Leroy, Lab. BOREA, Muséum National d’Histoire Naturelle, Paris, France

Citations:

Leroy B, Dias MS, Giraud E, Hugueny B, Jézéquel C, Leprieur F, Oberdorff T, Tedesco PA. 2019. Global biogeographical regions of freshwater fishes. Journal of Biogeography, https://onlinelibrary.wiley.com/doi/abs/10.1111/jbi.13674
Leroy B. 2019. biogeonetworks: Biogeographical Network Manipulation And Analysis, R package version 0.1.1, https://github.com/Farewe/biogeonetworks

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
R		R
Readme_files		Readme_files
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
Readme.Rmd		Readme.Rmd
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Biogeographical networks in R

Requirements

The biogeonetworks package

Example database

Map Equation software

The Gephi network visualisation software

Biogeographical network analysis: a detailed tutorial

Overview of the procedure

Step 1 Prepare the bipartite dataframe

Step 2 Write the network in Pajek format

Step 3 Run the Map Equation algorithm

Step 4 Read Map Equation results in R

Step 5 Analysing biogeographical network & clustering results in R

Step 6 Make maps

Step 7 Write the network for visualisation in Gephi

Step 8 Calculation of metrics for the network

8.1 The Participation Coefficient

8.2 Another set of metrics: IndVal, DilVal and robustness

Biogeographical networks, a second example with raster maps

Get occurrence data points and transform them into rasters

Step 1 Prepare the bipartite dataframe

Step 2 Write the network in Pajek format

Step 3 Run the Map Equation algorithm

Step 4 Read Map Equation clusters in R

Step 5 Analysing the results in R

Step 6 Make raster maps!

Step 8.1 Participation Coefficient

Step 8.2 IndVal, DilVal and robustness

Species-level metrics

Site-level metrics

Credits

About

Releases

Packages

Languages

Farewe/biogeonetworks

Folders and files

Latest commit

History

Repository files navigation

Biogeographical networks in R

Requirements

The biogeonetworks package

Example database

Map Equation software

The Gephi network visualisation software

Biogeographical network analysis: a detailed tutorial

Overview of the procedure

Step 1 Prepare the bipartite dataframe

Step 2 Write the network in Pajek format

Step 3 Run the Map Equation algorithm

Step 4 Read Map Equation results in R

Step 5 Analysing biogeographical network & clustering results in R

Step 6 Make maps

Step 7 Write the network for visualisation in Gephi

Step 8 Calculation of metrics for the network

8.1 The Participation Coefficient

8.2 Another set of metrics: IndVal, DilVal and robustness

Biogeographical networks, a second example with raster maps

Get occurrence data points and transform them into rasters

Step 1 Prepare the bipartite dataframe

Step 2 Write the network in Pajek format

Step 3 Run the Map Equation algorithm

Step 4 Read Map Equation clusters in R

Step 5 Analysing the results in R

Step 6 Make raster maps!

Step 8.1 Participation Coefficient

Step 8.2 IndVal, DilVal and robustness

Species-level metrics

Site-level metrics

Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages