Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large strings gtest fixture and utilities #15513

Merged
merged 13 commits into from
Apr 24, 2024

Conversation

davidwendt
Copy link
Contributor

Description

Creates the base class and utilities for testing APIs to produce large strings.
The main purpose of the fixture is to enable the large strings environment variable(s) and to setup large test data that can be reused by multiple tests.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 11, 2024
@davidwendt davidwendt self-assigned this Apr 11, 2024
@github-actions github-actions bot added the CMake CMake build issue label Apr 11, 2024
#include <vector>

struct ConcatenateTest : public cudf::test::StringsLargeTest {};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from copying/concatenate_tests.cpp

#include <vector>

struct MergeTest : public cudf::test::StringsLargeTest {};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from merge/merge_string_test.cpp

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Apr 24, 2024
@davidwendt davidwendt marked this pull request as ready for review April 24, 2024 00:28
@davidwendt davidwendt requested a review from a team as a code owner April 24, 2024 00:28
@davidwendt davidwendt requested review from mythrocks and shrshi April 24, 2024 00:28
Comment on lines 110 to 113
// create object to automatically be destroyed at the end of main()
auto lsd = cudf::test::LargeStringsData();
// set object pointer into static variable
cudf::test::StringsLargeTest::g_ls_data = &lsd;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, manually assigning static variable like this is not a good practice. Can we initialize it automatically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that the variable (or its pointer at least) needs to be accessible globally but the lifetime scope must be within main(). So lsd must be created and destroyed within the main() scope but needs to be singleton for the entire process at the same time.

Copy link
Contributor

@ttnghia ttnghia Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about create-on-first-access?

struct StringsLargeTest : public cudf::test::BaseFixture {
  public:
  static auto get_ls_data() {
    g_ls_data = new cudf::test::LargeStringsData;
    return g_ls_data;
  }
  private:
  static LargeStringsData* g_ls_data;
};

get_ls_data() then should be called within main() scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work because it will return a pointer to main which will not automatically destroy it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there value in making this a function static? Its construction is guaranteed to be thread safe, and will be destroyed in reverse order of construction.

static auto get_ls_data() {
  auto the_instance = cudf::test::LargeStringsData{};
  return &the_instance;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a smart pointer here. This seems to work.

    static auto get_lsd_data(int v) {
        auto ls_data = std::make_unique<LargeStringsData>(v);
        g_ls_data = ls_data.get();
        return ls_data;
    }

When the smart pointer goes out of scope, it will delete the object.
This is more inline with what RMM does with resource memory manager objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the object (not the pointer) is a static variable anywhere there is a chance it could be destroyed outside of main().

Sorry, I don't fully understand the concern. The function static object is guaranteed to be alive until main() exits. Does that not suit?

Do we have a dependency somewhere in the global static destruction sequence, or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getter needs to check and throw if g_ls_data is not yet initialized.

@ttnghia: With a function static, the object is guaranteed to be initialized once, on the first call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing part is not too challenging. Global static destruction is not good since the object holds device memory.
Here is a godbolt which I hope will explain some of this: https://godbolt.org/z/rTa9ceEKf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global static destruction is not good since the object holds device memory.

Hmm. Thank you, I'll try bear this in mind.

@davidwendt davidwendt requested a review from ttnghia April 24, 2024 17:27
Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. 👍
(Barring the current discussion with @ttnghia.)

@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 2eb71b2 into rapidsai:branch-24.06 Apr 24, 2024
70 checks passed
@davidwendt davidwendt deleted the ls-fixture branch April 24, 2024 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants