-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible to avoid the LoadCSV step? #46
Comments
I tested the following strategy for a table with ~1 million rows to complete the load in ~46 seconds set the DB_URL so that it can be refered later
not sure if this matters but I set the constraints required for the Node first
use apoc procedures to load the data in parallel with a batchsize of 10k
|
it's odd the csv export / import should handle json correctly. I have a deeper look at the issue What is the data type in your relational database? Yep apoc.periodic.iterate rocks it :) |
It's actually an interesting proposal. Let's discuss this. So instead of doing the transformation + batching (optionally parallel) in Java, we would do it in apoc instead. |
My initial exploration was to manually do what the ETL tool is facilitating currently. my sample dataset is a postgres schema containing a mix of the following
on a side note, I noticed that the etl tool silently ignores unsupported datatypes Examples are |
@soneymathew Can you help me to reproduce this issue? For example can you told me the sample database that you use? Thank you |
@mroiter-larus sorry I won't be able to share my database. |
@soneymathew I had a test case but it didn't reproduce the error. Anyway i'll try your suggestion. |
@mroiter-larus is it important that you need to reproduce it? Users could be having bad data in database with non-printable characters or any multiline content that can break your CSV. There is a possible risk to data loss along the translations steps DB -> CSV -> LOADCSV step. I raised this issue to explore if I believe it will help your users from the following perspectives.
|
Hi @jexp / maintainers ,
I am trying to online import a Database which contains JSON values in it's columns,
I notice that when it get's loaded into CSV it breaks the CSV format.
Can we modify this to use https://neo4j-contrib.github.io/neo4j-apoc-procedures/#load-jdbc to overcome this?
I was able to leverage this strategy by handrolled cypher successfully. it would be great if this strategy can be adopted by the ETL tool.
Happy to help raise a PR with some guidance as well.
The text was updated successfully, but these errors were encountered: