-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uniform_erdos_renyi_hypergraph produces way too many edges #339
Comments
Hi @arashbm thanks for your report. We'll look into this shortly! |
Thanks @arashbm ! We'll have a look. We also have the plan to move than function to In the meantime, you can instead use |
@maximelucas The combinatorial explosion of the mask array makes it impossible to use in some scenarios: e.g. >>> import xgi
>>> xgi.random_hypergraph(1000, order=5, ps=[1e-12]) # => should create a network with ~1350 edges
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 9.72 PiB for an array with shape (1368173298991500,) and data type float64 But that also has an easy solution of not pre-allocating mask: import xgi
import numpy as np
import itertools
def random_hypergraph(N, ps, order=None, seed=None):
rng = np.random.default_rng(seed)
if order is not None:
if len(ps) != 1:
raise ValueError("ps must contain a single element if order is an int")
if (np.any(np.array(ps) < 0)) or (np.any(np.array(ps) > 1)):
raise ValueError("All elements of ps must be between 0 and 1 included.")
nodes = range(N)
hyperedges = []
for i, p in enumerate(ps):
if order is not None:
d = order
else:
d = i + 1 # order, ps[0] is prob of edges (d=1)
potential_edges = itertools.combinations(nodes, d + 1)
edges_to_add = [e for e in potential_edges if rng.random() <= p]
hyperedges += edges_to_add
H = xgi.empty_hypergraph()
H.add_nodes_from(nodes)
H.add_edges_from(hyperedges)
return H This doesn't fail but this still takes |
Hi @arashbm ! Thanks for raising this issue! I added this function to XGI with my most recent ArXiv paper (https://arxiv.org/abs/2302.13967, see appendix). The problem is because we are iterating over the Cartesian product, not the unique combinations. I did this because the index _to_edge function is much slower when getting a combination from an index instead on a product from an index. If the hypergraph is sparse enough, than it should be fine though. I will try to dig up the combinations version of index_to_edge and we can chat more! |
@nwlandry That makes total sense. My current WIP implementation is also based on Cartesian product as this help alleviate the issue with fixed sized integers, but I have to reject any candidate edge (node combinations) that are not strictly monotonically increasing order, which makes for a lot of wasted CPU cycles. Nice paper BTW. I quite liked the presentation in SIAM NS. |
Thank you!! :) Okay, I ran the following: from scipy.special import comb
import random
def _index_to_edge_comb(i, n, k):
"""
returns the i-th combination of k numbers chosen from 0,1,2,...,n-1
"""
c = []
r = i
j = -1
for s in range(1,k+1):
cs = j+1
while r - comb(n-1-cs,k-s, exact=True) > 0:
r -= comb(n-1-cs,k-s, exact=True)
cs += 1
c.append(cs)
j = cs
return c
def _index_to_edge(index, n, m):
"""Generate a hyperedge given an index in the list of possible edges.
Parameters
----------
index : int > 0
The index of the hyperedge in the list of all possible hyperedges.
n : int > 0
The number of nodes
m : int > 0
The hyperedge size.
Returns
-------
list
The reconstructed hyperedge
See Also
--------
_index_to_edge_partition
References
----------
https://stackoverflow.com/questions/53834707/element-at-index-in-itertools-product
"""
return [(index // (n**r) % n) for r in range(m - 1, -1, -1)]
# Run timeit on the results:
n = 1000
m = 3
max_prod = n**m
max_comb = comb(n, m, exact=True)
%timeit _index_to_edge(random.randrange(max_prod), n, m)
%timeit _index_to_edge_comb(random.randrange(max_comb), n, m) The results were 1.02 µs ± 5.64 ns per loop when iterating over the cartesian product and 315 µs ± 6.63 µs per loop when iterating over unique combinations. Of course, there is a factor of |
I think that at the very least, the documentation could be improved to explain exactly what the probability is of (i.e., cartesian product instead of combinations). Any other things that you would suggest? |
I think it's a problem of naming and expectations. In dyadic undirected Erdős-Rényi Of course one alternative is to just call it something else and add to the documentation and hope that that's enough distinction to avoid confusion. Another alternative is to continue with the Cartesian as it is but divide You can continue with the Cartesian and keep I would say no amount of documentation is going to fix API usability problems. I'm not even sure renaming would create enough distinction. But in any case I'm not in a position to suggest if you should break compatibility or not. |
Ah that's a fair point @arashbm, thanks!
Yea that's actually how we used to do it. We changed to using a numpy array to make it slightly faster, but I didn't think of the problem you mentioned. I'll change it back. |
Yeah, I don't think I've addressed this yet, but I will prioritize getting this done. |
Unless I'm seriously mistaken as to how this is supposed to work, the number of edges produced by$m!$ .
uniform_erdos_renyi_hypergraph
is off by a factor ofPossible solutions: lower$m!$ or even better, use nth combination instead of n-th product when defining
q
by_index_to_edge
.The text was updated successfully, but these errors were encountered: