-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++ Improvements - API enhancement and increase testing #85
Conversation
# Add compiler flags | ||
target_compile_options(VoyagerTests PRIVATE -g) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-g
flag builds executable with debugging symbols to use with a debugger
cpp/src/array_utils.h
Outdated
// flatten the 2d array into the NDArray's underlying 1D vector | ||
std::vector<float> flatArray; | ||
for (const auto &vector : vectors) { | ||
flatArray.insert(flatArray.end(), vector.begin(), vector.end()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a more memory-efficient way of doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not memory-efficient, but time-efficient, yes: we should pre-allocate the space in flatArray
by calculating numVectors * dimensions
, then doing std::memcpy
into that buffer for each vector.
What we have here will resize (and potentially reallocate) the vector on each .insert
, which makes this O(n2) as .insert
is O(n) rather than O(1).
cpp/test/test_main.cpp
Outdated
float precisionTolerance) { | ||
// create test data and ids | ||
std::vector<std::vector<float>> inputData = | ||
randomVectors(numVectors, numDimensions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From meeting:
add conditional statement
if storageType = float32, then randomVectors
if storageType = float8 or e4m3, then randomQuantizedVectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Left some comments but nice work on this so far :)
/** | ||
* Convert a 2D vector of float to NDArray<float, 2> | ||
*/ | ||
NDArray<float, 2> vectorsToNDArray(std::vector<std::vector<float>> vectors) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add an explicit unit test for this
cpp/test/test_utils.cpp
Outdated
|
||
#include "array_utils.h" | ||
|
||
NDArray<float, 2> randomQuantizedVectorsNDArray(int numVectors, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove these functions and instead just use the randomQuantizedVectors
and randomVectors
methods
cpp/src/array_utils.h
Outdated
int dimensions = numVectors > 0 ? vectors[0].size() : 0; | ||
std::array<int, 2> shape = {numVectors, dimensions}; | ||
|
||
// flatten the 2d array into the NDArray's underlying 1D vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than iterating over the outer and inner vectors, we should try to utilize the underlying data access that std::vector
provides. Be careful with this though as there may be caveats around vector memory allocation boundaries and actual vector item counts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still iterating over each vector because I added a check to validate that each vector size is identical. But now I am preallocating the space I need in the flattened array to avoid resizing, and using std::memcpy
to do the copying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thanks for fixing this up! Looks good to me
🤠 🐴 🏇
Changes Made
C++
Index::addItems
andIndex::query
methods that accept 2D vector of floats (std::vector<std::vector<float>>
) instead of anNDArray
.Testing
Added C++ tests
Related Issues
In these tests, we query with k = 1 (single nearest neighbour). If you change it or add an additional test with k = num_vectors_in_index, you can sporadically reproduce this issue: #38
Checklist
Additional Comments