-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix SWIG methods that return char** #2850
Merged
StrikerRUS
merged 7 commits into
microsoft:master
from
AlbertoEAF:feat/swig-string-arrays
Mar 20, 2020
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
be17707
[swig] Fix SWIG methods that return char** with StringArray.
2746c72
Changes to C API and wrappers support char**
b461881
Cleanup indentation in new lightgbm_R.cpp code
935947b
Adress review code-style comments.
AlbertoEAF 0c4f1fe
Update swig/StringArray.hpp
AlbertoEAF 4496080
Update python-package/lightgbm/basic.py
AlbertoEAF 9873427
Update src/lightgbm_R.cpp
AlbertoEAF File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
/*! | ||
* Copyright (c) 2020 Microsoft Corporation. All rights reserved. | ||
* Licensed under the MIT License. See LICENSE file in the project root for license information. | ||
*/ | ||
#ifndef __STRING_ARRAY_H__ | ||
AlbertoEAF marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#define __STRING_ARRAY_H__ | ||
|
||
#include <new> | ||
#include <vector> | ||
#include <algorithm> | ||
|
||
/** | ||
* Container that manages an array of fixed-length strings. | ||
* | ||
* To be compatible with SWIG's `various.i` extension module, | ||
* the array of pointers to char* must be NULL-terminated: | ||
* [char*, char*, char*, ..., NULL] | ||
* This implies that the length of this array is bigger | ||
* by 1 element than the number of char* it stores. | ||
* I.e., _num_elements == _array.size()-1 | ||
* | ||
* The class also takes care of allocation of the underlying | ||
* char* memory. | ||
*/ | ||
class StringArray | ||
{ | ||
public: | ||
StringArray(size_t num_elements, size_t string_size) | ||
: _string_size(string_size), | ||
_array(num_elements + 1, nullptr) | ||
{ | ||
_allocate_strings(num_elements, string_size); | ||
} | ||
|
||
~StringArray() | ||
{ | ||
_release_strings(); | ||
} | ||
|
||
/** | ||
* Returns the pointer to the raw array. | ||
* Notice its size is greater than the number of stored strings by 1. | ||
* | ||
* @return char** pointer to raw data (null-terminated). | ||
*/ | ||
char **data() noexcept | ||
{ | ||
return _array.data(); | ||
} | ||
|
||
/** | ||
* Return char* from the array of size _string_size+1. | ||
* Notice the last element in _array is already | ||
* considered out of bounds. | ||
* | ||
* @param index Index of the element to retrieve. | ||
* @return pointer or nullptr if index is out of bounds. | ||
*/ | ||
char *getitem(size_t index) noexcept | ||
{ | ||
if (_in_bounds(index)) | ||
return _array[index]; | ||
else | ||
return nullptr; | ||
} | ||
|
||
/** | ||
* Safely copies the full content data | ||
* into one of the strings in the array. | ||
* If that is not possible, returns error (-1). | ||
* | ||
* @param index index of the string in the array. | ||
* @param content content to store | ||
* | ||
* @return In case index results in out of bounds access, | ||
* or content + 1 (null-terminator byte) doesn't fit | ||
* into the target string (_string_size), it errors out | ||
* and returns -1. | ||
*/ | ||
int setitem(size_t index, std::string content) noexcept | ||
{ | ||
if (_in_bounds(index) && content.size() < _string_size) | ||
{ | ||
std::strcpy(_array[index], content.c_str()); | ||
return 0; | ||
} else { | ||
return -1; | ||
} | ||
} | ||
|
||
/** | ||
* @return number of stored strings. | ||
*/ | ||
size_t get_num_elements() noexcept | ||
{ | ||
return _array.size() - 1; | ||
} | ||
|
||
private: | ||
|
||
/** | ||
* Returns true if and only if within bounds. | ||
* Notice that it excludes the last element of _array (NULL). | ||
* | ||
* @param index index of the element | ||
* @return bool true if within bounds | ||
*/ | ||
bool _in_bounds(size_t index) noexcept | ||
{ | ||
return index < get_num_elements(); | ||
} | ||
|
||
/** | ||
* Allocate an array of fixed-length strings. | ||
* | ||
* Since a NULL-terminated array is required by SWIG's `various.i`, | ||
* the size of the array is actually `num_elements + 1` but only | ||
* num_elements are filled. | ||
* | ||
* @param num_elements Number of strings to store in the array. | ||
* @param string_size The size of each string in the array. | ||
*/ | ||
void _allocate_strings(int num_elements, int string_size) | ||
{ | ||
for (int i = 0; i < num_elements; ++i) | ||
{ | ||
// Leave space for \0 terminator: | ||
_array[i] = new (std::nothrow) char[string_size + 1]; | ||
|
||
// Check memory allocation: | ||
if (! _array[i]) { | ||
_release_strings(); | ||
throw std::bad_alloc(); | ||
} | ||
} | ||
} | ||
|
||
/** | ||
* Deletes the allocated strings. | ||
*/ | ||
void _release_strings() noexcept | ||
{ | ||
std::for_each(_array.begin(), _array.end(), [](char* c) { delete[] c; }); | ||
} | ||
|
||
const size_t _string_size; | ||
std::vector<char*> _array; | ||
}; | ||
|
||
#endif // __STRING_ARRAY_H__ |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here sometimes will have memory problems, maybe it is better for
btw, is
idx < len
needed?ping @AlbertoEAF @StrikerRUS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe it is also a cause of #3398?..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the change, it does the same but it's easier to read @guolinke and sets the null byte only once when the string has the limit size.
Yes, the
idx < len
check is to ensure if you have allocated a char** that only has space forlen
string pointers, you don't write outside it, even if internally we have a bigger array that would require more space. We stop the copy before writing outside allocated memory to avoid segmentation faults.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S.: In other words, segfaults/memory write violations can only occur if the function receives wrong
len
andbuffer_len
arguments from the caller. I.e., the user/caller actually pre-allocated smaller/less "strings" inchar **out_strs
than that the values he passed in buffer_len/len respectively.