Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for bipartite graphs #5

Closed
PCK1992 opened this issue May 16, 2020 · 23 comments
Closed

Support for bipartite graphs #5

PCK1992 opened this issue May 16, 2020 · 23 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed

Comments

@PCK1992
Copy link

PCK1992 commented May 16, 2020

Is it possible to use bipartite graphs? It is supported in the python version of the package so I wonder how one could implement it in the R version.

@TomKellyGenetics
Copy link
Owner

This package calls the python version via reticulate. In principle it should pass arguments to the python leidenalg module and perform the same computations, supporting all of the same arguments. Provided you have the latest version of leidenalg installed for python, it should also support this in R.

Have you tried running it on a bipartite graph? Would you be able to produce a minimal reproducible example? If I'm able to reproduce the graph you're having issues with, I can look into it.

@PCK1992
Copy link
Author

PCK1992 commented May 20, 2020

Awesome! I don't know exactly how the passing to python works so I came up with a small example:

library(leiden)
library(igraph)


net<- networkdata::southern_women #might need to install networkdata library
is.bipartite(net) 

communities <-leiden(net)
ID <- V(net)$name

results <- cbind(ID, communities)

Perhaps it would make sense to include an example like this in the documentation?

@TomKellyGenetics
Copy link
Owner

Have you tested this example? Please note that dependencies on non-CRAN packages are not permitted to be submitted to CRAN. When submitting to CRAN, it will need to automatically run tests, examples, and generate vignettes.

Data is passed to Python via internal functions. There should be no need to interact with it directly (as long as the leidenalg module is installed).

@PCK1992
Copy link
Author

PCK1992 commented May 25, 2020

My apologies, I should have been more clear what I intended with this example. Running the code technically works but the results show that some of the events and persons are in the same community which should not be the case (if I'm not completely mistaken) since they are in different modes.

print(results)

      ID          communities
 [1,] "EVELYN"    "1"        
 [2,] "LAURA"     "1"        
 [3,] "THERESA"   "1"        
 [4,] "BRENDA"    "1"        
 [5,] "CHARLOTTE" "1"        
 [6,] "FRANCES"   "1"        
 [7,] "ELEANOR"   "1"        
 [8,] "PEARL"     "3"        
 [9,] "RUTH"      "3"        
[10,] "VERNE"     "3"        
[11,] "MYRA"      "2"        
[12,] "KATHERINE" "2"        
[13,] "SYLVIA"    "2"        
[14,] "NORA"      "2"        
[15,] "HELEN"     "2"        
[16,] "DOROTHY"   "3"        
[17,] "OLIVIA"    "3"        
[18,] "FLORA"     "3"        
[19,] "E1"        "1"        
[20,] "E2"        "1"        
[21,] "E3"        "1"        
[22,] "E4"        "1"        
[23,] "E5"        "1"        
[24,] "E6"        "1"        
[25,] "E7"        "1"        
[26,] "E8"        "3"        
[27,] "E9"        "3"        
[28,] "E10"       "2"        
[29,] "E11"       "3"        
[30,] "E12"       "2"        
[31,] "E13"       "2"        
[32,] "E14"       "2"    

What I meant with including an example like this for the documentation is a working example that shows how to use the package in a way that takes the bipartite graph structure into account. It looks like it's not enough to just have a bipartite igraph object to achieve this but it is not clear to me how to pass this information using the package.

@TomKellyGenetics
Copy link
Owner

I've looked into it. The "bipartite" graphs are supported in the latest version (0.7.0) of leidenalg in section 4.1.2, see the python module documentation for details. These are separate functions now currently supported by the R package. Bipartite methods were added in version 0.6.1 and are not supported by the R version currently.

library("leiden")
install.packages("drat"); drat::addRepo("schochastics"); install.packages("networkdata")
library("leiden"); library("networkdata"); library("igraph"); library("graphsim")

net <-networkdata::southern_women #might need to install networkdata library
is.bipartite(net) 
plot_directed(net)
partition <- leiden(net, partition_type = 'RBConfigurationVertexPartition')
plot_directed(net, fill.node = c("palevioletred", "lightblue")[partition])

new_RBC

This does not account for bipartite structures which do not account for this:

Bipartite networks are special in the sense that they have only links between the two different classes, and no links within a class are allowed.

Correcting this will require changes to internal functions to call .CPMVertexPartition.Bipartite() in python.

@TomKellyGenetics
Copy link
Owner

TomKellyGenetics commented May 26, 2020

It should be possible to implement an R version of the Bipartite methods providing it can be done in Python. I notice you submitted an issue to the leidenalg repository. Was it possible to run in Python. If so it should be possible to run it here:
https://github.com/TomKellyGenetics/leiden/blob/18ec773593a0c65432205509399d83add0a81799/R/find_partition.R#L71#L71-L79

@TomKellyGenetics TomKellyGenetics added enhancement New feature or request help wanted Extra attention is needed documentation Improvements or additions to documentation labels May 26, 2020
@TomKellyGenetics
Copy link
Owner

TomKellyGenetics commented May 26, 2020

The development version of leiden now supports Bipartite graphs in R. You can try it with:

devtools::install_github("TomKellyGenetics", ref = "dev")

This will still require testing before it can be added to documentation but here's a minimal example:

library("leiden"); library("networkdata"); library("igraph"); library("graphsim")
net <-networkdata::southern_women
partition <- leiden(net, partition_type = "CPMVertexPartition.Bipartite", resolution_parameter = 0.1)
plot_directed(net, fill.node = c("palevioletred", "lightblue")[partition])

CPM_Bipartite

Note here that the "resolution" parameter is important for testing the results.

partition <- leiden(net, partition_type = "CPMVertexPartition.Bipartite", resolution_parameter = 0.3)
plot_directed(net, fill.node = c("palevioletred", "lightblue", "forestgreen", "purple")[partition])

CPM_Bipartite_Res0 3

This is different from the bipartite "types":

plot_directed(net, fill.node = c("palevioletred", "lightblue")[as.integer(V(net)$type)+1])

Bipartite_types

If you wish to reproduce this, the plotting function I wrote is available on CRAN.

@PCK1992
Copy link
Author

PCK1992 commented May 26, 2020

This is awesome, thank you so much! I will test this function on my data sets and will report back. Let me know if you would like to keep this discussion in the repository or if you would prefer to move it to email.

@TomKellyGenetics
Copy link
Owner

You're welcome. Please let me know if you have any problems with it. I'm not so confident with Python so I'm going by the docs on this one. It doesn't accept as many parameters as leidenalg.findclusters but it seems to return partitions (setting the random seed seems to work).

This toy example (maybe because the datasets is so small) doesn't seem to work unless you fine-tune the resolution. It would be good to know if it works on real data (I know they can't always be shared before publication) and how it compares to the Python leidenalg (and Louvain if it supports Bipartite graphs).

I'm okay to keep communications on GitHub where possible (my email is listed as maintainer in the DESCRIPTION if you prefer direct contact). Then other users can see what changes are in development and find these examples. I find the GitHub interface useful to code snippets and figures. It's also helpful to me to track issues in progress that take so long they'd probably get buried in my inbox. Between my current projects and having a young family, I don't have as much time for open-source projects as I'd like. Please bear in mind it may take time to update tests, examples, and get it on CRAN.

@PCK1992
Copy link
Author

PCK1992 commented May 27, 2020

I tried it on the first data set (8k nodes and 27k edges, one connected component) and get the following error:

partition05 <- leiden(test2, partition_type = "CPMVertexPartition.Bipartite", resolution_parameter = 0.5)
Error in py_to_r(bipartite_layers[[1]]) : 
  could not find function "py_to_r"

I get the same error messages with different resolution parameters. I checked the graph test2 and is.bipartite returns TRUE. To specify the classes I used the igraph function V(g)$type <- bipartite_mapping(g)$type. I checked whether the labels are correct and they are so that shouldn't be a problem.

My session info

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] leiden_0.3.4.9001 igraph_1.2.5      skimr_2.1.1       data.table_1.12.8

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6     rstudioapi_0.11  knitr_1.28       magrittr_1.5     rappdirs_0.3.1   tidyselect_1.1.0 lattice_0.20-41  R6_2.4.1         rlang_0.4.6      dplyr_0.8.5      tools_3.6.3     
[12] grid_3.6.3       xfun_0.14        htmltools_0.4.0  ellipsis_0.3.1   assertthat_0.2.1 digest_0.6.25    tibble_3.0.1     lifecycle_0.2.0  crayon_1.3.4     Matrix_1.2-18    purrr_0.3.4     
[23] repr_1.1.0       base64enc_0.1-3  vctrs_0.3.0      glue_1.4.1       compiler_3.6.3   pillar_1.4.4     reticulate_1.16  jsonlite_1.6.1   pkgconfig_2.0.3 

@TomKellyGenetics
Copy link
Owner

TomKellyGenetics commented May 28, 2020

This requires additional functions from igraph and reticulate. I've added a patch to import these with library("leiden") in the dev branch (it should work if you re-install the development version and import it again). If this doesn't work , try library("reticulate") before calling the leiden function.

@PCK1992
Copy link
Author

PCK1992 commented May 28, 2020

Looks like it works now! My procedure: I reinstalled the development library loaded leiden and igraph seperately

It runs but I get the following error message:

> partition01 <- leiden(test2, partition_type = "CPMVertexPartition.Bipartite", resolution_parameter = 0.1)
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  restarting interrupted promise evaluation

In subsequent runs I do not get this error (even for the same line of code) so I wonder what this is.
I will report back once I get a chance to go through the results.

By the way, I tried installing the graphsim package to see if the error is related to that but I got an error message

Installing package into ‘/home/patrick/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
* installing *source* package ‘graphsim’ ...
** using staged installation
** R
** data
*** moving datasets to lazyload DB
Warning: file ‘TGFBeta_Smad_graph.rda’ has magic number 'X'
  Use of save versions prior to 2 is deprecated
Error in load(zfile, envir = tmp_env) : 
  bad restore file magic number (file may be corrupted) -- no data loaded
ERROR: lazydata failed for package ‘graphsim’
* removing ‘/home/patrick/R/x86_64-pc-linux-gnu-library/3.6/graphsim’
Error: Failed to install 'graphsim' from GitHub:
  (converted from warning) installation of package ‘/tmp/RtmpoKn8Z6/file4ad33964a7ed/graphsim_0.1.2.9001.tar.gz’ had non-zero exit status

I can open a separate issue on the other repository if you think it's worth it.

@TomKellyGenetics
Copy link
Owner

I'm not sure what is causing the issue you've run into calling leiden.

Don't worry about the graphsim package. I'm only using it here for the plotting functions. The install error is related to recent changes I made on issues I'm already working on. Please use the CRAN version for this package. Thanks for letting me know!

@PCK1992
Copy link
Author

PCK1992 commented Jun 12, 2020

After some tests with different networks, I could not find any bugs or problems - it works great! But you are right, it is extremely sensitive to the resolution parameter and I had to do quite a bit of testing.

@TomKellyGenetics
Copy link
Owner

@PCK1992 Thanks for following up. That's great to hear. In that case, I'll go ahead with merging this into the next release once I have documentation and testing set up for it.

For now, I'll leave this issue open for others to see that Bipartite graphs will be supported in the development version.

@PCK1992
Copy link
Author

PCK1992 commented Jun 16, 2020

After some thorough testing, I discovered a behavior that is quite interesting but I'm not sure if this is related to the R version or the implementation of Leiden in Python.

Bottom line is: It looks like the bipartite version only accepts values between 0 and 1, unlike the unipartite version that works fine with values over 1. What I discovered is that if you use a resolution parameter of 1 the number of nodes becomes equal to the number of communities.

list_parameters = seq(0, 1, by = 0.0025)
results = list()
n=1

for (parameter in list_parameters) {
  results_cd <- leiden(net, resolution_parameter = parameter)
  num_communities <- length(unique(results_cd))
  results[[n]] = num_communities
  n = n+1
}

data_combined <- data.frame(matrix(unlist(results), nrow=length(results), byrow=T),stringsAsFactors=FALSE)

cd_plot <- as.data.frame(cbind(list_parameters, data_combined))

names(cd_plot)[1] <- "parameter"
names(cd_plot)[2] <- "num_communities"

ggplot(cd_plot, aes(x = parameter, y = num_communities)) +
  geom_line( color="blue", size=2, alpha=0.9, linetype=1) +
  ggtitle("Relationship between Number of Communities and Resolution Parameter")

image

Unfortunately, I can't share my data but I could reproduce this behavior with different networks. Is this a package issue on the R side? I assume this is intended behavior on the Python side.

@TomKellyGenetics
Copy link
Owner

This is quite different to the behaviour of the regular (non-Bipartite) version of the Leiden algorithm. Thanks for pointing it out. I was really only changing the resolution parameter because the results didn't make much sense (and I thought it was only for this toy example data).

I believe the development version of leiden in R is correctly passing arguments to the Python leidenalg library. Perhaps @vtraag could help with this?

@vtraag
Copy link

vtraag commented Jun 16, 2020

This is expected behaviour. Note that you are not using modularity, but another quality function, namely the Constant Potts Model. Indeed, for this quality function, the singleton partition is optimal for a resolution parameter larger than or equal to 1. Personally, I am typically working with this quality function, which I find much more intuitive than modularity (which sometimes behaves quite strangely). In this quality function, clusters have an internal density higher than the resolution parameter, while the density between communities is lower than the resolution parameter. The "relevant" resolution values are typically in the order of the density of the network.

It is possible to use the bipartite version of modularity using a particular trick. This is explained in the Bipartite documentation.

@TomKellyGenetics
Copy link
Owner

Sorry I may have misunderstood the docs for the Python version? It is possible to use the Bipartite method for other quality functions (e.g., ModularityVertexPartition) not just CPMVertexPartition? I was confused because it is a subheading here:
https://leidenalg.readthedocs.io/en/stable/reference.html#leidenalg.CPMVertexPartition.Bipartite

In that case, it should be possible to do this for other methods:
4d24a42

@vtraag
Copy link

vtraag commented Jun 16, 2020

It is possible to use the Bipartite method for other quality functions (e.g., ModularityVertexPartition) not just CPMVertexPartition?

No, it is not. The support for bipartite clustering is only implemented for CPMVertexPartition. However, there is a small trick that allows you to mimic modularity when using CPM. This is explained in the documentation, see specifically the notes, and the argument degree_as_node_size (make sure to correct the resolution parameter still). In principle it could also be added to some other vertex partitions (but not Surprise or Significance), but I have not done that (yet, anyway).

@TomKellyGenetics
Copy link
Owner

Thanks, that explains the trouble I had calling it on other quality functions. I've added that argument to the R version and managed to pass the "mimicking" Modularity parameters in the R version.

partition_Modularity <- leiden(net, partition_type = "CPMVertexPartition.Bipartite", degree_as_node_size = TRUE, resolution_parameter = 0.005)
plot_directed(net, fill.node = RColorBrewer::brewer.pal(8,"Set1")[as.integer(partition_Modularity)])

test1

As mentioned, the resolution parameter is very important here.

partition_Modularity <- leiden(net, partition_type = "CPMVertexPartition.Bipartite", degree_as_node_size = TRUE, resolution_parameter = 0.015)
plot_directed(net, fill.node = RColorBrewer::brewer.pal(8,"Set1")[as.integer(partition_Modularity)])

test2

If support for Bipartite graphs is added to other functions, it shouldn't be too much trouble to add them to the R version.

@TomKellyGenetics
Copy link
Owner

I've prepared unit tests and a brief vignette on bipartite graphs in R. I will upload this as a minor release (0.3.4) provided checks on Travis CI pass. I will do this before migrating the development version to call igraph::community_leiden (as discussed in #1) since further released will need to wait until this dependency is on CRAN.

@TomKellyGenetics
Copy link
Owner

TomKellyGenetics commented Oct 29, 2020

The leiden R package version 0.3.4 passes all checks and is submitted to CRAN. There is a dedicate vignette for Bipartite graphs here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants