Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Builder would ub when AppendValues called with empty vector #44690

Closed
mapleFU opened this issue Nov 11, 2024 · 6 comments
Closed

[C++] Builder would ub when AppendValues called with empty vector #44690

mapleFU opened this issue Nov 11, 2024 · 6 comments

Comments

@mapleFU
Copy link
Member

mapleFU commented Nov 11, 2024

Describe the enhancement requested

When AppendValues called with empty vector, this would cause memcpy

  /// \brief Append a sequence of elements in one shot
  /// \param[in] values a contiguous C array of values
  /// \param[in] length the number of values to append
  /// \param[in] is_valid an std::vector<bool> indicating valid (1) or null
  /// (0). Equal in length to values
  /// \return Status
  Status AppendValues(const value_type* values, int64_t length,
                      const std::vector<bool>& is_valid) {
    ARROW_RETURN_NOT_OK(Reserve(length));
    data_builder_.UnsafeAppend(values, length);
    // length_ is update by these
    ArrayBuilder::UnsafeAppendToBitmap(is_valid);
    return Status::OK();
  }

  /// \brief Append a sequence of elements in one shot
  /// \param[in] values a std::vector of values
  /// \param[in] is_valid an std::vector<bool> indicating valid (1) or null
  /// (0). Equal in length to values
  /// \return Status
  Status AppendValues(const std::vector<value_type>& values,
                      const std::vector<bool>& is_valid) {
    return AppendValues(values.data(), static_cast<int64_t>(values.size()), is_valid);
  }

In this case, unsafe append directly memcpy, and when size == 0, it would be a undefined behavior

Component(s)

C++

@raulcd
Copy link
Member

raulcd commented Nov 20, 2024

I have zero knowledge here so bear with me if there is something obvious I am missing. I was trying to understand what the issue is and after some investigation it seems that memcpy should just handle the size 0 without problem, could you expand on what would be the undefined behavior?
Was your expectation to have some kind of check for length or am I misinterpreting the issue?

  Status AppendValues(const value_type* values, int64_t length,
                      const std::vector<bool>& is_valid) {
    if (length > 0) {
        ARROW_RETURN_NOT_OK(Reserve(length));
        data_builder_.UnsafeAppend(values, length);
        // length_ is update by these
        ArrayBuilder::UnsafeAppendToBitmap(is_valid);
    }
    return Status::OK();
  }

@mapleFU
Copy link
Member Author

mapleFU commented Nov 20, 2024

@raulcd There is few steps for this problem

  1. AppendValues might has std::vector as arguments, std::vector::data might be used
  2. std::vector::data might returns nullptr if size == 0: https://en.cppreference.com/w/cpp/container/vector/data
  3. https://en.cppreference.com/w/cpp/string/byte/memcpy memcpy says, "If either dest or src is an invalid or null pointer, the behavior is undefined, even if count is zero."

@mapleFU
Copy link
Member Author

mapleFU commented Nov 20, 2024

diff --git a/cpp/src/arrow/array/builder_primitive.h b/cpp/src/arrow/array/builder_primitive.h
index de7af1b46b..be9761fb46 100644
--- a/cpp/src/arrow/array/builder_primitive.h
+++ b/cpp/src/arrow/array/builder_primitive.h
@@ -211,6 +211,9 @@ class NumericBuilder
   /// \return Status
   Status AppendValues(const std::vector<value_type>& values,
                       const std::vector<bool>& is_valid) {
+    if (values.empty()) {
+      return Status::OK();
+    }
     return AppendValues(values.data(), static_cast<int64_t>(values.size()), is_valid);
   }
 
@@ -218,6 +221,9 @@ class NumericBuilder
   /// \param[in] values a std::vector of values
   /// \return Status
   Status AppendValues(const std::vector<value_type>& values) {
+    if (values.empty()) {
+      return Status::OK();
+    }
     return AppendValues(values.data(), static_cast<int64_t>(values.size()));
     return AppendValues(values.data(), static_cast<int64_t>(values.size()));
   }

@pitrou @bkietz @felipecrv Should I just guard the vector interface? Or should prevent from all AppendValues?

@pitrou
Copy link
Member

pitrou commented Nov 20, 2024

@pitrou @bkietz @felipecrv Should I just guard the vector interface? Or should prevent from all AppendValues?

Can you submit a PR and we can review that?

@mapleFU
Copy link
Member Author

mapleFU commented Nov 20, 2024

done #44794

mapleFU added a commit that referenced this issue Nov 20, 2024
…om ub (#44794)

### Rationale for this change

Add boundary check for `NumericBuilder::AppendValues` for std::vector

Originally, it will :

1. `AppendValues` might has `std::vector` as arguments, `std::vector::data` might be used
2. `std::vector::data` might returns `nullptr` if size == 0: https://en.cppreference.com/w/cpp/container/vector/data
3. https://en.cppreference.com/w/cpp/string/byte/memcpy memcpy says, "If either dest or src is an [invalid or null pointer](https://en.cppreference.com/w/cpp/language/pointer#Pointers), the behavior is undefined, even if count is zero."

### What changes are included in this PR?

Add boundary check for `NumericBuilder::AppendValues` for std::vector

### Are these changes tested?

Covered by existing

### Are there any user-facing changes?

no

* GitHub Issue: #44690

Authored-by: mwish <[email protected]>
Signed-off-by: mwish <[email protected]>
@mapleFU mapleFU added this to the 19.0.0 milestone Nov 20, 2024
@mapleFU
Copy link
Member Author

mapleFU commented Nov 20, 2024

Issue resolved by pull request 44794
#44794

@mapleFU mapleFU closed this as completed Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants