Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compatibility to work with py2neo version 3.1.2 #6

Closed
wants to merge 2 commits into from

Conversation

veleritas
Copy link
Contributor

Hi Daniel,

I've finished making my changes to the hetio package to make it compatible with py2neo version 3.1.2. I didn't actually have to end up changing too much in the package. The only real difference between py2neo versions is that when nodes and edges are added to neo4j, they are done as a single subgraph object, and not as multiple objects.

I also added a progress bar to the neo4j export process, which allows me to watch how things are going from the command line. This option is off by default in order to ensure it does not break your previous work. In that regard I have also tested things with your original code, and things worked fine.

I have only modified the neo4j.py file, and found that no other changes in hetio were needed to make the rest of the pipeline work. All other changes were done in the respective integrate and learn repositories.

One thing you might notice is that the uri variable passed to export_neo4j() is actually expected to be a py2neo.Graph object, but only if using py2neo version 3. In my testing it seemed that I could not get the py2neo URI resolver to work with the new Bolt protocol. Instead I had to open my own py2neo connection explicitly, and that seems to work. I believe the problem is that both the Bolt and HTTP ports have to be open, and on different ports, and the URI resolver might not support doing that.

Let me know if you have any questions!

Toby

Copy link
Member

@dhimmel dhimmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I like the elegance of your code. I requested some changes, but overall this is a great enhancement to hetio.

I also added a progress bar to the neo4j export process, which allows me to watch how things are going from the command line. This option is off by default in order to ensure it does not break your previous work.

Cool that progress bar will be really helpful!

hetio/neo4j.py Outdated
@@ -71,7 +101,13 @@ def append(self, x):
def create(self):
if not self:
return
self.db_graph.create(*self)

# http://stackoverflow.com/questions/37676731/typeerror-create-takes-2-positional-arguments-but-4-were-given
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent

hetio/neo4j.py Outdated

# http://stackoverflow.com/questions/37676731/typeerror-create-takes-2-positional-arguments-but-4-were-given
if PY2NEO_VER == 3:
self.db_graph.create(union(self))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put

union = lambda v: reduce(or_, v)

under this if statement, since it's not really needed anywhere else

hetio/neo4j.py Outdated

queue = graph.get_nodes()
if show_progress:
total_nodes = hetio.stats.get_metanode_df(graph)["nodes"].sum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to remove the hetio.stats import:

total_nodes = len(graph.node_dict)

hetio/neo4j.py Outdated

queue = graph.get_edges(exclude_inverts=True)
if show_progress:
total_edges = hetio.stats.get_metaedge_df(graph)["edges"].sum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could wrap graph.get_edges(exclude_inverts=True) in list(). Then you could do total_edges = len(queue). I don't think it's likely that this list will push us over a memory limit in any real setting.

On second thought, perhaps I should add an n_nodes and n_edges attribute to graphs, which get updated when nodes or edges are created. What do you prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 903a4c8, I added n_nodes and n_edges as graph attributes. Therefore, you should be able to do total_edges = graph.n_edges (and similar for nodes).

You would have to fetch the upstream changes and rebase your branch on them. Then force push. Do you know how to do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will figure it out.

setup.py Outdated
with open('README.rst') as read_file:
long_description = read_file.read()
#with open('README.rst') as read_file:
# long_description = read_file.read()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for commenting this out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah README.rst wasn't part of the repo. Added it, if you rebase, you should be good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about these commented lines. See #7. Will delete them after we merge this PR to avoid conflicts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah doing a pip install -e . for the repo was giving me an error with README.rst, so I just commented it out to get it to install correctly.

hetio/neo4j.py Outdated
@@ -8,14 +8,28 @@
import py2neo.packages.httpstream
import pandas

from tqdm import tqdm
from functools import reduce
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functools and operator imports should go in the first group of standard library imports. tqdm import should go with second group (third party imports). See https://www.python.org/dev/peps/pep-0008/#imports.

hetio/neo4j.py Outdated
self.db_graph.create(*self)

# http://stackoverflow.com/questions/37676731/typeerror-create-takes-2-positional-arguments-but-4-were-given
if PY2NEO_VER == 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps if PY2NEO_VER >= 3 makes a little more sense, assuming v4 would be more like 3 than 2

hetio/neo4j.py Outdated
"""Export hetnet to neo4j"""
db_graph = py2neo.Graph(uri)

if PY2NEO_VER == 2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the if statement to evaluate isinstance(uri, py2neo.Graph) rather than to evaluate the version? This way for py2neo 2 you could pass a py2neo.Graph. And I'm hopeful that https://github.com/technige/py2neo/issues/605 will eventually be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's an easy fix I can make. I guess it might just confuse other people if they try to use the py2neo 3 URI resolver with bolt and have difficulties with getting it working.

@dhimmel
Copy link
Member

dhimmel commented Feb 27, 2017

@veleritas I'd love to get your contributions merged. Let me know if you want me to pick up where you've left off on this PR. But if you have the time, I think everything left to do should be pretty quick to implement.

@veleritas
Copy link
Contributor Author

Yes, sorry, I've been busy, but will implement the changes you mentioned this week.

Tong Shu Li added 2 commits March 2, 2017 19:22
Tqdm progress bars will be used to monitor Neo4j network import
progress.
Add tqdm progress bars to the node and edge importing steps.
@veleritas veleritas closed this Mar 2, 2017
dhimmel pushed a commit that referenced this pull request Mar 2, 2017
@dhimmel
Copy link
Member

dhimmel commented Mar 3, 2017

@veleritas this is available in v0.2.2, but I'd recommend v0.2.3 which fixes a PyPI formatting issue.

Thanks for the contribution, you're now an author on the Zenodo https://zenodo.org/record/345182

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants