Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add non-backtracking spectral distance #134

Merged
merged 8 commits into from
Mar 13, 2019
Merged

Conversation

leotrs
Copy link
Collaborator

@leotrs leotrs commented Mar 5, 2019

Pioneered by one Leonardo Torres linky.

Apologies for the mess of commits...

@sdmccabe
Copy link
Collaborator

sdmccabe commented Mar 5, 2019

ImportError: No module named 'ot'

@tlarock
Copy link
Collaborator

tlarock commented Mar 5, 2019

#133 is included in this PR.....

Edit: Looks like you changed it back, thx

@leotrs
Copy link
Collaborator Author

leotrs commented Mar 6, 2019

@sdmccabe do we want to add ot to the general requirements file? If we're ever thinking of shipping this, it might be nice to keep the global requirements to a minimum. We can fix the tests using a special requirements_test.txt file which includes everything.

@tlarock
Copy link
Collaborator

tlarock commented Mar 6, 2019

A bit concerned that it seems like the POT library (good name) has not been actively maintained since fall 2018 (https://github.com/rflamary/POT/pulse/monthly), which could indicate that the maintainers are moving on. Are there any other EMD libraries out there we could use?

@sdmccabe
Copy link
Collaborator

sdmccabe commented Mar 6, 2019

I see two approaches:

  1. Add ot to general requirements. I'm fine with this.
  2. Have a requirements_test.txt file. This would be ok, but I have two concerns. One: making sure what we're testing and what we're using doesn't wind up too far out of alignment. Two: If we do this, we should explicitly handle the importerror, tell users that it's not available without this dependency, and include that information in the docstring.

@leotrs
Copy link
Collaborator Author

leotrs commented Mar 6, 2019

@tlarock I see what you're saying but (i) Fall 2018 is not too bad (and I got response from them as early as Jan 10 here, and (ii) I don't think this is an immediate problem. The scenario where our code breaks because this one dependency is outdated seems very far away, maybe years? And even in that case, the only part of our code that breaks is this one distance (which is not even a good one imho...)

@sdmccabe Agreed. I don't think it's too hard to make sure that we are always testing more than what we are using, precisely because requirements_test.txt would contain more imports than requirements.txt. However, you raise a good point about signaling in some way that there are imports particular to distance methods. The simplest solution is to add POT (good name) to requirements.txt and only have a requirements_test.txt if this same problem arises more times in the future.

@sdmccabe
Copy link
Collaborator

No one's objected, so I think we can just add POT to requirements.txt and get this merged.

@sdmccabe
Copy link
Collaborator

3.82s$ pip install -r requirements.txt
Collecting networkx>=2.2.0 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/f3/f4/7e20ef40b118478191cec0b58c3192f822cace858c19505c7670961b76b2/networkx-2.2.zip (1.7MB)
Requirement already satisfied: numpy>=1.10.0 in /home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages (from -r requirements.txt (line 2)) (1.15.0)
Collecting scipy>=1.0.0 (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/f0/30/526bee2ce18c066f9ff13ba89603f6c2b96c9fd406b57a21a7ba14bf5679/scipy-1.2.1-cp35-cp35m-manylinux1_x86_64.whl (24.7MB)
Collecting scikit-learn>=0.18.2 (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/38/7f/7d21bd89a97603a14d3770ca912d4402c394a8e50642f9b8b5cc6867a61a/scikit_learn-0.20.3-cp35-cp35m-manylinux1_x86_64.whl (5.3MB)
Collecting POT==0.5.1 (from -r requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/28/4b/7aaa1f840a359f5953dd378e0237fa8faf9b0a415ff7282b7375fbe68d27/POT-0.5.1.tar.gz (720kB)
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-4rw5e1vw/POT/setup.py", line 7, in <module>
        from Cython.Build import cythonize
    ImportError: No module named 'Cython'

@leotrs
Copy link
Collaborator Author

leotrs commented Mar 11, 2019

This looks related to this issue. I'll try installing Cython beforehand.

requirements.txt Outdated
@@ -2,4 +2,5 @@ networkx>=2.2.0
numpy>=1.10.0
scipy>=1.0.0
scikit-learn>=0.18.2
Cython==0.29.6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why == instead of >=? I just want to get this merged already so we can play with it, so don't change it, but just wondering.

@sdmccabe
Copy link
Collaborator

$ pip install -r requirements.txt
Collecting networkx>=2.2.0 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/f3/f4/7e20ef40b118478191cec0b58c3192f822cace858c19505c7670961b76b2/networkx-2.2.zip (1.7MB)
    100% |████████████████████████████████| 1.7MB 873kB/s 
Requirement already satisfied: numpy>=1.10.0 in /home/travis/virtualenv/python3.6.3/lib/python3.6/site-packages (from -r requirements.txt (line 2))
Collecting scipy>=1.0.0 (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/7f/5f/c48860704092933bf1c4c1574a8de1ffd16bf4fde8bab190d747598844b2/scipy-1.2.1-cp36-cp36m-manylinux1_x86_64.whl (24.8MB)
    100% |████████████████████████████████| 24.8MB 60kB/s 
Collecting scikit-learn>=0.18.2 (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/5e/82/c0de5839d613b82bddd088599ac0bbfbbbcbd8ca470680658352d2c435bd/scikit_learn-0.20.3-cp36-cp36m-manylinux1_x86_64.whl (5.4MB)
    100% |████████████████████████████████| 5.4MB 288kB/s 
Collecting Cython==0.29.6 (from -r requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/e1/fd/711507fa396064bf716493861d6955af45369d2c470548e34af20b79d4d4/Cython-0.29.6-cp36-cp36m-manylinux1_x86_64.whl (2.1MB)
    100% |████████████████████████████████| 2.1MB 737kB/s 
Collecting POT==0.5.1 (from -r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/28/4b/7aaa1f840a359f5953dd378e0237fa8faf9b0a415ff7282b7375fbe68d27/POT-0.5.1.tar.gz (720kB)
    100% |████████████████████████████████| 727kB 1.9MB/s 
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-93h2l13g/POT/setup.py", line 7, in <module>
        from Cython.Build import cythonize
    ModuleNotFoundError: No module named 'Cython'

Yikes.

@leotrs
Copy link
Collaborator Author

leotrs commented Mar 11, 2019 via email

@sdmccabe
Copy link
Collaborator

sdmccabe commented Mar 11, 2019

I'll note that the POT project has a bare cython dependency in their requirements.txt and that one of the other projects referenced in that issue mentions having trouble with cython 0.29.

@sdmccabe
Copy link
Collaborator

    def test_same_graph():
        """The distance between two equal graphs must be zero."""
        G = nx.karate_club_graph()
    
        for obj in distance.__dict__.values():
            if isinstance(obj, type) and BaseDistance in obj.__bases__:
                dist = obj().dist(G, G)
>               assert dist == 0.0
E               assert 0.02209708691208376 == 0.0
test_distance.py:22: AssertionError

@sdmccabe
Copy link
Collaborator

The stochasticity is going to keep that test from passing. For now let's except it from the test (and note it in the docstring).

The test can be modified to something like the following:

def test_same_graph():
    """The distance between two equal graphs must be zero."""
    G = nx.karate_club_graph()

    for label, obj in distance.__dict__.items():
        if label in [
                'NBD'
        ]:
            continue
        if isinstance(obj, type) and BaseDistance in obj.__bases__:
            dist = obj().dist(G, G)
            assert dist == 0.0

We'll probably need to do the same for the symmetry test.

@sdmccabe
Copy link
Collaborator

We'll need to update the docs, but let's get this merged.

@sdmccabe sdmccabe merged commit 0c7e525 into netsiphd:master Mar 13, 2019
@leotrs
Copy link
Collaborator Author

leotrs commented May 21, 2019

I'll leave this here for posterity. POT depends on Cython which depends on python3.6-dev (or python3.5-dev). In case we have to deal with this again in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants