-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Tables for labels metadata, mirroring CALL db.schema.(nodeTypeProperties|relTypeProperties) #1318
Conversation
This reverts commit 4905f19.
@moxious are you good with this? |
Feature complete at this point, this PR could use a detailed code feedback. I haven't sorted out by Travis which test is failing though. When Shashi joins I'm going to have him engage here. |
I take back previous comment -- not yet feature complete. We're investigating adding relationship domain and range information to improve performance for rel table generation. Shashi will be assisting with this PR as well. Pause for a bit on code review. |
This PR is superceded by Shash's here: #1389 |
Implement in APOC a version of
db.schema.nodeTypeProperties()
anddb.schema.relTypeProperties()
that permit flexible probabilistic sampling, rather than the full DB scan that those others do.These procedures are key because they provide metadata output in a very specific format that permits the "Tables for Labels" mapping. T4L can be thought of as a basic/naive Graph->Tables mapping. This has lots of uses, the most particular one I have in mind is for use with the BI driver that's in progress. Tools can call these functions to build a virtual RDBMS schema over top of Neo4j which can be queried, basically.
The trouble with the product built-in procedures is that they don't use sampling, they scan the entire DB, so for large DBs this will churn the page cache and perform very badly.
This approach has been discussed with Mats & the internal Morpheus team who wrote the product versions of the stored procedures, and has generally been agreed on as the best available approach.
This is just a code spike right now for visibility.
A design document describing how this is supposed to work and what behavior should be is here: https://docs.google.com/document/d/1-U-L5anu50CGHXYipjdj5AwdeFprd7lvggD97fbqHeA/edit?usp=sharing