Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Tables for labels metadata, mirroring CALL db.schema.(nodeTypeProperties|relTypeProperties) #1318

Closed
wants to merge 16 commits into from

Conversation

moxious
Copy link
Contributor

@moxious moxious commented Oct 16, 2019

Implement in APOC a version of db.schema.nodeTypeProperties() and db.schema.relTypeProperties() that permit flexible probabilistic sampling, rather than the full DB scan that those others do.

These procedures are key because they provide metadata output in a very specific format that permits the "Tables for Labels" mapping. T4L can be thought of as a basic/naive Graph->Tables mapping. This has lots of uses, the most particular one I have in mind is for use with the BI driver that's in progress. Tools can call these functions to build a virtual RDBMS schema over top of Neo4j which can be queried, basically.

The trouble with the product built-in procedures is that they don't use sampling, they scan the entire DB, so for large DBs this will churn the page cache and perform very badly.

This approach has been discussed with Mats & the internal Morpheus team who wrote the product versions of the stored procedures, and has generally been agreed on as the best available approach.

This is just a code spike right now for visibility.

A design document describing how this is supposed to work and what behavior should be is here: https://docs.google.com/document/d/1-U-L5anu50CGHXYipjdj5AwdeFprd7lvggD97fbqHeA/edit?usp=sharing

@jexp
Copy link
Member

jexp commented Oct 29, 2019

@moxious are you good with this?

@moxious
Copy link
Contributor Author

moxious commented Oct 29, 2019

Feature complete at this point, this PR could use a detailed code feedback. I haven't sorted out by Travis which test is failing though. When Shashi joins I'm going to have him engage here.

@moxious
Copy link
Contributor Author

moxious commented Nov 1, 2019

I take back previous comment -- not yet feature complete. We're investigating adding relationship domain and range information to improve performance for rel table generation.

Shashi will be assisting with this PR as well. Pause for a bit on code review.

@moxious
Copy link
Contributor Author

moxious commented Jan 27, 2020

This PR is superceded by Shash's here: #1389

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants