Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp age csv loader (#2044) #2059

Conversation

MuhammadTahaNaveed
Copy link
Member

No description provided.

* Allow 0 as entry_id

- No regression test were impacted by this change.

* Use batch inserts to improve performance

- Changed heap_insert to heap_multi_insert since it is faster than
  calling heap_insert() in a loop. When multiple tuples can be inserted
  on a single page, just a single WAL record covering all of them, and
  only need to lock/unlock the page once.

- BATCH_SIZE is set to 1000, which is the number of tuples to insert in
  a single batch. This number was chosen after some experimentation.

- Change some of the field names to avoid confusion.

* Use sequence for generating ids for edge and vertex

- Sequence is not used if the id_field_exists is true in
  load_labels_from_file function, since the entry id is present in the
  csv.

* Add function to create temporary table for ids

- Created a temporary table and populate it with already generated
  vertex ids. A unique index is created on id column to ensure that
  new ids generated (using entry id from csv) are unique.

* Insert generated ids in the temporary table to enforce uniqueness

- Insert ids in the temporary table and also update index to
  enforce uniqueness.
- If the entry id provided in the CSV is greater than the current
  sequence value, the sequence value is updated to match the entry ID.
  For example:
  Suppose the current sequence value is 1, and the CSV entry ID is 2.
  If we use 2 but not update the sequence to 2, next time the CREATE
  clause is used, 2 will be returned by sequence as an entry id,
  resulting in duplicate.
- Update batch functions

* Add functions to create graph and label automatically

- These functions will check existence of graph and label, and create
  them if they don't exist.

* Add regression tests
@github-actions github-actions bot added PG13 PostgreSQL13 override-stale To keep issues/PRs untouched from stale action labels Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
override-stale To keep issues/PRs untouched from stale action PG13 PostgreSQL13
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant