Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[c++] Fix heap management in fastercsx #3365

Merged
merged 2 commits into from
Nov 21, 2024
Merged

[c++] Fix heap management in fastercsx #3365

merged 2 commits into from
Nov 21, 2024

Conversation

johnkerl
Copy link
Member

@johnkerl johnkerl commented Nov 21, 2024

Issue and/or context: Found while debugging (details below).

Issue: #3345; see also #3304.

Changes:

We have a pair of macros tdb_new_array and tdb_delete_array. As written, these are pairing malloc with delete []. This is incorrect behavior, which this PR fixes.

Also note that this defect is new to our modified vendoring of threadpool/status.h:

Given the narrative at #3304 (comment) I did not review these files on my review of #3304; I'll take another look now including a diff against the core vendoring-source files.

Also note that libtiledbsoma/src/external/include/thread_pool/status.h is not new to #3304, but newly exercised by #3304.

Notes for Reviewer:

I found this while debugging an issue: on my laptop (but not in CI) there is a segfault here:

def test_bad_shapes(context: soma.SOMATileDBContext, rng: np.random.Generator) -> None:
"""Test bad/mismatched shape."""
sp = sparse.identity(32, dtype=np.int8).tocoo()
for shp in [
(sp.shape[0] - 1, sp.shape[1]),
(sp.shape[0], sp.shape[1] - 1),
]:
with pytest.raises(IndexError):
# this will also log an error - which is expected behavior for parallel_for/thread_pool
fastercsx.CompressedMatrix.from_ijd(
sp.row, sp.col, sp.data, shp, "csr", True, context
)

This intends to catch an exception thrown through this callchain, as revealed by source-level debugging this morning:

Here is a standalone repro which crashes for me, taken from apis/python/tests/test_fastercsx.py::test_bad_shapes:

#!/usr/bin/env python

import tiledbsoma
import tiledbsoma._fastercsx as fastercsx

import scipy.sparse
import numpy as np

context = tiledbsoma.SOMATileDBContext()

sp = scipy.sparse.identity(32, dtype=np.int8).tocoo()
for shp in [
    (sp.shape[0] - 1, sp.shape[1]),
    (sp.shape[0], sp.shape[1] - 1),
]:
    try:
        fastercsx.CompressedMatrix.from_ijd(
            sp.row, sp.col, sp.data, shp, "csr", True, context,
        )
    except IndexError:
        print("Caught IndexError as expected")

This does crash for me on my laptop; I does not crash for me on an Ubuntu system, and our multi-platform CI is green on main. Nonetheless I ran valgrind on an Ubuntu system, to investigate. I found the following:

The valgrind output

apis/python/tests/test_fastercsx.py ==9085== Mismatched free() / delete / delete []

led me directly to the codemod addressed on this PR.

Do note however that even after a clean and rebuild I do still experience the segfault with the above scenario on my laptop. This means that I've found a heap-corruption issue on this PR, but there appears to be something else going on as well.

Copy link

codecov bot commented Nov 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.74%. Comparing base (ca00d5b) to head (c74ca98).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3365      +/-   ##
==========================================
+ Coverage   85.64%   85.74%   +0.09%     
==========================================
  Files          57       57              
  Lines        6201     6201              
==========================================
+ Hits         5311     5317       +6     
+ Misses        890      884       -6     
Flag Coverage Δ
python 85.74% <ø> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 85.74% <ø> (+0.09%) ⬆️
libtiledbsoma ∅ <ø> (∅)
---- 🚨 Try these New Features:

@johnkerl
Copy link
Member Author

Given the narrative at #3304 (comment) I did not review these files on my review of #3304; I'll take another look now including a diff against the core vendoring-source files.

I ran the following (both my core and my tiledbsoma checkouts having had fresh pulls on dev and main respectively):

diff -w \
  ~/git/TileDB-Inc/TileDB/tiledb/sm/misc/parallel_functions.h \
  ~/git/single-cell-data/TileDB-SOMA/libtiledbsoma/src/utils/parallel_functions.h

diff -w \
  ~/git/TileDB-Inc/TileDB/tiledb/common/thread_pool/producer_consumer_queue.h \
  ~/git/single-cell-data/TileDB-SOMA/libtiledbsoma/src/external/include/thread_pool/producer_consumer_queue.h

diff -w \
  ~/git/TileDB-Inc/TileDB/tiledb/common/thread_pool/thread_pool.h \
  ~/git/single-cell-data/TileDB-SOMA/libtiledbsoma/src/external/include/thread_pool/thread_pool.h

diff -w \
  ~/git/TileDB-Inc/TileDB/tiledb/common/thread_pool/thread_pool.cc \
  ~/git/single-cell-data/TileDB-SOMA/libtiledbsoma/src/external/src/thread_pool/thread_pool.cc

with the following output: https://gist.github.com/johnkerl/ebb915ce74e424bc57a4002983792209

Looks to me like:

  • The bug fixed here
  • Whitespace/clang-format
  • Some namespacing changes, e.g. using optional and optional<std::exception_ptr> -> std::optional<std::exception_ptr>
  • namespace tiledb::sm -> namespace tiledbsoma
  • std::lock_guard lock{mutex_}; -> std::scoped_lock lock{mutex_};
  • Etc.
  • Removal of one unused/unneeded function

Looks good to me -- CC @nguyenv if you would like to take a peek as a second pair of eyes.

@johnkerl johnkerl requested a review from bkmartinjr November 21, 2024 17:04
Copy link
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks John.

@johnkerl johnkerl merged commit f17d55c into main Nov 21, 2024
15 checks passed
@johnkerl johnkerl deleted the kerl/parfor-heap branch November 21, 2024 20:27
@johnkerl johnkerl changed the title [c++] Fix heap corruption in fastercsx [c++] Fix heap management in fastercsx Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants