Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix leaks in read_table #449

Merged
merged 6 commits into from
Oct 30, 2024
Merged

Conversation

nicklan
Copy link
Collaborator

@nicklan nicklan commented Oct 30, 2024

What changes are proposed in this pull request?

Fix leaks in read_table. It now runs cleanly with -fsanitize=address.

Some of these are probably footguns in kernel that we should fix. I've also added comments for now in kernel for some places where engine is responsible to free things. We should look at these and see if we can/should move to a callback model so the freeing is not necessary.

The good news is, this didn't find any leaks in kernel itself, just bad c code :)

Fixes #448

How was this change tested?

  • ffi test ci job doesn't report leaks
  • Locally running and observing sanitizer output. Also checked, no more direct leaks reported by valgrind

Copy link

codecov bot commented Oct 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@b61c76f). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #449   +/-   ##
=======================================
  Coverage        ?   78.16%           
=======================================
  Files           ?       55           
  Lines           ?    11594           
  Branches        ?    11594           
=======================================
  Hits            ?     9062           
  Misses          ?     2032           
  Partials        ?      500           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the fixed problems seem to be due to g_arrow?

Yep. So much g_object_unref :)

Could you elaborate on what parts are kernel footguns and why?

Main things I was referencing are:

  1. get_raw_arrow_data -> returns a leaked struct with two elements. This means you have to consider the lifetimes of the raw schema and data, which generally will go into garrow or similar, and the actual struct itself, which is confusing. I added a todo in the code for this where we could probably use a callback.
  2. kernel_scan_data_next -> This uses a callback model, but the things we pass into the callback are owned by engine. This is an inversion of how we usually handle it in which we say that things that get passed into a callback will be freed by kernel after the callback completes. I'd like to have a look at this and see if there's a good reason we don't free on kernel side, or if we can just change those semantics.
  3. Longer lived things like a Scan which also get "scan data" and then we have to individually free each. I'm not sure there's much we can do here, but it's still important that we carefully document this.

Copy link
Collaborator

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@nicklan nicklan merged commit 40dc2b5 into delta-io:main Oct 30, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix memory leaks in FFI read-table
3 participants