Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bool8 binops #1

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
171 commits
Select commit Hold shift + click to select a range
014594b
Add nvtext is_vowel/is_consonant functions
davidwendt Oct 4, 2019
c6dc520
Merge branch 'branch-0.11' into fea-vowel-checking
davidwendt Oct 4, 2019
2d0abcc
Merge branch 'branch-0.11' into fea-vowel-checking
davidwendt Oct 7, 2019
4eeee12
added python methods is_vowel and is_consonant
davidwendt Oct 7, 2019
4929c6a
Merge branch 'branch-0.11' into fea-vowel-checking
davidwendt Oct 8, 2019
472beb6
update changelog
davidwendt Oct 8, 2019
6766cee
pytest for is_vowel/is_consonant
davidwendt Oct 8, 2019
a4b741d
added multi-index and negative index support to C++ code
davidwendt Oct 8, 2019
deaffe0
fixed python black violation
davidwendt Oct 8, 2019
b89d350
merge; changelog; more pytest
davidwendt Oct 9, 2019
4c1b45c
fix isort violation
davidwendt Oct 9, 2019
754e2da
fix is_vowel test for -1
davidwendt Oct 9, 2019
7f129eb
Merge branch 'branch-0.11' into fea-vowel-checking
davidwendt Oct 9, 2019
5150ef5
Initial design of cudf::scalar
devavret Oct 11, 2019
0556276
Prototype of how to get ScalarType from concrete type
devavret Oct 11, 2019
f039b4e
extract bool8 wrapper into separate type, port applicable tests
trxcllnt Oct 9, 2019
6c8e7af
rename cudf::exp::bool8 to cudf::experimental::bool8
trxcllnt Oct 15, 2019
a8fd6c7
changelog
trxcllnt Oct 15, 2019
30d90ac
make bool8.value private
trxcllnt Oct 15, 2019
c8b53ee
add defaulted move/copy ctors and assignment operators
trxcllnt Oct 15, 2019
3204ab1
explicitly cast to uint8_t to silence nvcc lowering warnings
trxcllnt Oct 15, 2019
ddfc1c2
add hash function overloads for cudf::bool8 to ensure consistent hashing
trxcllnt Oct 15, 2019
24ad121
add GDF_BOOL8 column to hash_test.cu test
trxcllnt Oct 15, 2019
0e4a978
remove unnecessary static_cast
trxcllnt Oct 16, 2019
0e3a69b
Merge branch 'branch-0.11' into libcudf++/bool8-wrapper
trxcllnt Oct 16, 2019
f715b78
Merge branch 'branch-0.11' into fea-cudf-scalar
devavret Oct 16, 2019
0ae56de
add static_cast to bool again because the cudf::bool int8_t conversio…
trxcllnt Oct 16, 2019
7a677e7
rename wrappers/bools.hpp to wrappers/bool.hpp
trxcllnt Oct 16, 2019
6d7b6f5
Basic scalar working and one sample test
devavret Oct 16, 2019
1ee0f3e
scalar_device_view tests
devavret Oct 18, 2019
b85946c
Merge branch 'branch-0.11' into libcudf++/bool8-wrapper
trxcllnt Oct 18, 2019
fa5dac1
Merge branch 'branch-0.11' into libcudf++/bool8-wrapper
trxcllnt Oct 21, 2019
49b6bfb
Scalar factories
devavret Oct 22, 2019
74e20c7
String scalar
devavret Oct 22, 2019
b17d39e
Move constructors and stream and memory resource support
devavret Oct 22, 2019
871bb5e
Merge branch 'branch-0.11' into libcudf++/bool8-wrapper
trxcllnt Oct 23, 2019
011e878
Update cpp/include/cudf/wrappers/bool.hpp
trxcllnt Oct 23, 2019
e8056de
Merge branch 'branch-0.11' into fea-cudf-scalar
Oct 23, 2019
a00ce5a
support randomly generating bools in UniformRandomGenerator
trxcllnt Oct 23, 2019
a6673ec
clean up bool8 tests
trxcllnt Oct 23, 2019
61039a6
make explicit bool conversion operator implicit
trxcllnt Oct 23, 2019
de84c8c
merge with branch-0.11 changes
davidwendt Oct 24, 2019
7743918
fix test typo
davidwendt Oct 24, 2019
4856f95
Merge branch 'branch-0.11' into fea-vowel-checking
davidwendt Oct 24, 2019
c6d8adb
add bool8 wrapper value_type
trxcllnt Oct 24, 2019
68542ef
WIP Construct column from column_view
kaatish Oct 25, 2019
10ea87d
Merge branch 'branch-0.11' into libcudf++/bool8-wrapper
trxcllnt Oct 28, 2019
7edff73
make cudf::string_view specialization compatible with SFINAE definitions
trxcllnt Oct 28, 2019
59e5f97
add legacy bool8 wrapper value_type conversion operator overload to f…
trxcllnt Oct 28, 2019
31a6f46
clean up to use bool8::value_type, static_cast in tests
trxcllnt Oct 28, 2019
6746ba3
Merge branch 'branch-0.11' of github.com:rapidsai/cudf into fea-colum…
kaatish Oct 29, 2019
cd8ae0b
Make UniformRandomGenerator work for bool8.
jrhemstad Oct 29, 2019
b65e706
Remove brace-init to avoid ambiguity with init-list ctor.
jrhemstad Oct 29, 2019
446ee64
Make bool8 ctor implicit.
jrhemstad Oct 29, 2019
a486ff1
Add bool8 to NumericTypes list.
jrhemstad Oct 29, 2019
dba6784
Overloads for is_integral/is_arithmetic for bool8.
jrhemstad Oct 29, 2019
205f4c2
code changes and test cases
rgsl888prabhu Oct 29, 2019
0aa6b5c
CHANGELOG.md
rgsl888prabhu Oct 29, 2019
3262e41
Merge branch 'branch-0.11' into fea-cudf-scalar
devavret Oct 29, 2019
4840f78
Apply suggestions from code review
devavret Oct 29, 2019
9382362
Remove duplicate constructor and update since merge
devavret Oct 29, 2019
2470d32
Merge branch 'fea-cudf-scalar' of https://github.com/devavret/cudf in…
devavret Oct 29, 2019
a3be833
add bool8 to type_dispatcher, remove from list of non-numeric typeids
trxcllnt Oct 29, 2019
51d0a69
Merge branch 'branch-0.11' of github.com:rapidsai/cudf into libcudf++…
trxcllnt Oct 29, 2019
04e3dcd
restrict parameterized bool8 ctor to arithmetic types
trxcllnt Oct 30, 2019
c44cce4
Updates to review changes
devavret Oct 30, 2019
6df8f5b
Misc review changes
devavret Oct 30, 2019
c36b1c8
Chnage one remaining type_to_scalar_type
devavret Oct 30, 2019
cc5b04d
review changes
rgsl888prabhu Oct 30, 2019
0295f45
typo
rgsl888prabhu Oct 30, 2019
e6b1ebf
fix bool8 sorted_order tests
trxcllnt Oct 30, 2019
6047a63
remove explicit bool8 constructor
trxcllnt Oct 30, 2019
fbcdcbc
Apply suggestions from code review
devavret Oct 30, 2019
ec3d914
review changes
rgsl888prabhu Oct 30, 2019
007ac87
Merge branch 'branch-0.11' into fea-cudf-scalar
devavret Oct 30, 2019
196fa65
Merge branch 'branch-0.11' into libcudf++/bool8-wrapper
trxcllnt Oct 30, 2019
7e1ad02
Merge branch 'branch-0.11' into fea-vowel-checking
harrism Oct 31, 2019
af77fac
[REVIEW] Small cleanup: remove `== true`
codereport Oct 31, 2019
ec53dc4
Remove set_null
devavret Oct 31, 2019
dc34cdc
Documentation
devavret Oct 31, 2019
bd00d5d
Merge pull request #3261 from codereport/small-fixes
jrhemstad Oct 31, 2019
ad39c8d
Update string scalar to use stream and mr
devavret Oct 31, 2019
bedcbd3
Merge branch 'branch-0.11' into fea-cudf-scalar
devavret Oct 31, 2019
3e6770b
using cuda isnan
rgsl888prabhu Oct 31, 2019
a41fa3e
Merge branch 'branch-0.11' of github.com:rapidsai/cudf into fea-colum…
kaatish Oct 31, 2019
0cfbae2
transition guide update
devavret Oct 31, 2019
2938ce2
merging with 0.11
rgsl888prabhu Oct 31, 2019
5e25cdd
Merge pull request #2980 from davidwendt/fea-vowel-checking
davidwendt Oct 31, 2019
57ffaa3
WIP column constructor
kaatish Oct 31, 2019
497ade2
Remove friend class from column
kaatish Oct 31, 2019
75a01df
Fix passing a temporary device_vector into comparator.
jrhemstad Oct 31, 2019
c9275fe
CHANGELOG.
jrhemstad Oct 31, 2019
f29c77d
Updates for data pointer getter based on device_scalar changes
devavret Oct 31, 2019
03207a4
Fix incorrect ByteRLE encoding of literal_run=128
OlivierNV Oct 31, 2019
803cd57
Update changelog
OlivierNV Oct 31, 2019
b41f97e
code changes and test cases
rgsl888prabhu Oct 31, 2019
b5a77e8
CHANGELOG.md
rgsl888prabhu Oct 31, 2019
a2c60c9
Move predicates.hpp files to legacy
cwharris Oct 31, 2019
76b089b
fixing a bug in string scalar test
devavret Oct 31, 2019
e966afb
move is_sorted tests to legacy
cwharris Oct 31, 2019
426291c
changelog
cwharris Oct 31, 2019
c8c59d1
review changes
rgsl888prabhu Oct 31, 2019
c18a533
move is_sorted tests to correct location
cwharris Oct 31, 2019
9cce809
Merge pull request #3265 from jrhemstad/fix-ext-is-sorted
harrism Oct 31, 2019
19eba30
Merge branch 'branch-0.11' of github.com:rapidsai/cudf into cudf-2941…
cwharris Oct 31, 2019
397e65f
changelog
cwharris Oct 31, 2019
7b76c74
Merge branch 'branch-0.11' into fix-orc-writer-zerorun
OlivierNV Oct 31, 2019
5a1b86d
Merge pull request #3267 from OlivierNV/fix-orc-writer-zerorun
OlivierNV Nov 1, 2019
6bca9f6
Merge pull request #3270 from cwharris/cudf-2941-port-legacy
harrism Nov 1, 2019
ef59466
Merge branch 'branch-0.11' into 3226_adding_floating_pt_spclization
harrism Nov 1, 2019
4ecaeae
Merge pull request #3239 from rgsl888prabhu/3226_adding_floating_pt_s…
jrhemstad Nov 1, 2019
79bdb06
Fix ORC writer integer RLEv2 mode2 unsigned base value encoding
OlivierNV Nov 1, 2019
d32742b
Update changelog
OlivierNV Nov 1, 2019
987b1a3
Merge branch 'branch-0.11' into fea-cudf-scalar
devavret Nov 1, 2019
b141daf
Update JNI includes for legacy moves
jlowe Nov 1, 2019
23809a0
changelog
jlowe Nov 1, 2019
7ea2f53
column_wrapper to_host
devavret Oct 25, 2019
da72437
Merge branch 'branch-0.11' into fix-java-build
jlowe Nov 1, 2019
bc46cc9
Fix exec policy using a temporary that is immediately deleted.
jrhemstad Nov 1, 2019
07a94eb
Move to_host tests
devavret Oct 30, 2019
b1558d3
Merge branch 'fea-column-col-view-constructor' into fea-to-host
devavret Nov 1, 2019
a979799
CHANGELOG.
jrhemstad Nov 1, 2019
07d9616
Update JNI include path to legacy reduction header
jlowe Nov 1, 2019
791db43
Move to_host to column_utilities
devavret Oct 29, 2019
e3ebbc9
changelog
devavret Nov 1, 2019
21c277a
Merge pull request #3274 from OlivierNV/fix-orc-writer-rlemode2u
mjsamoht Nov 1, 2019
f80842d
Merge branch 'branch-0.11' into fix-ext-fix-is-sorted-again
Nov 1, 2019
8e0a4aa
Shutdown pinned memory init executor immediately after init
jlowe Nov 1, 2019
a1f93b6
Use daemon threads for the pinned pool init executor
jlowe Nov 1, 2019
153cb14
Copy Bitmask tests and fixes
kaatish Nov 1, 2019
9d31bc7
changelog
jlowe Nov 1, 2019
8bf6ecf
Merge branch 'branch-0.11' into pinned-pool-hang
jlowe Nov 1, 2019
e7bece2
Merge pull request #3276 from jlowe/fix-java-build
jlowe Nov 1, 2019
6722f6f
PR comments fixes
kaatish Nov 1, 2019
7b881dc
Invalid children check in mutable_column_device_view
davidwendt Nov 1, 2019
ca9b6f1
Merge pull request #3277 from jrhemstad/fix-ext-fix-is-sorted-again
Nov 1, 2019
83c6cd3
Documentation changes
kaatish Nov 1, 2019
89f9dfc
CHANGELOG fix
kaatish Nov 1, 2019
5081974
PR Comments fixes
kaatish Nov 1, 2019
3c9f011
Merge pull request #3280 from davidwendt/bug-fix-mutable-children
jrhemstad Nov 1, 2019
6fe8da7
Merge branch 'branch-0.11' into pinned-pool-hang
jlowe Nov 1, 2019
361288e
streams everywhere
devavret Nov 1, 2019
59195ca
Merge pull request #3219 from kaatish/fea-column-col-view-constructor
Nov 1, 2019
d2490c2
consolidate fixed width scalar code
devavret Nov 1, 2019
8666a59
one left review change. making base device view constructor protected
devavret Nov 1, 2019
90fa0e7
Merge branch 'branch-0.11' into fea-to-host
devavret Nov 1, 2019
fa0e577
protecting constructors
devavret Nov 1, 2019
8008896
Merge branch 'branch-0.11' into libcudf++/bool8-wrapper
jrhemstad Nov 1, 2019
169cba0
Merge branch 'branch-0.11' into 2713_null_ordering_per_column_sorting
jrhemstad Nov 1, 2019
76414d2
Merge pull request #3279 from jlowe/pinned-pool-hang
jlowe Nov 1, 2019
2a12b51
deleting assignment operators
devavret Nov 1, 2019
0748fc6
Merge pull request #3087 from trxcllnt/libcudf++/bool8-wrapper
jrhemstad Nov 1, 2019
962b025
Merge branch 'branch-0.11' into fea-cudf-scalar
jrhemstad Nov 1, 2019
9803ef9
merge issue fix
rgsl888prabhu Nov 1, 2019
9701cc7
Add num_bitmask_words and tests.
jrhemstad Nov 1, 2019
fb3e845
CHANGELOG.
jrhemstad Nov 1, 2019
3a4c48a
Merge branch 'branch-0.11' of https://github.com/rapidsai/cudf into 2…
rgsl888prabhu Nov 2, 2019
d60dd5c
test changes to accomdate bool8
rgsl888prabhu Nov 2, 2019
7b08f97
Merge branch 'branch-0.11' into fea-to-host
harrism Nov 3, 2019
68d526b
Merge pull request #3282 from jrhemstad/fea-ext-num-mask-words
harrism Nov 4, 2019
0c395cc
Merge pull request #3268 from rgsl888prabhu/2713_null_ordering_per_co…
harrism Nov 4, 2019
0c588c7
Move rolling files to legacy
harrism Nov 4, 2019
8e66527
Changelog for #3287
harrism Nov 4, 2019
4563524
Add support for bool8
devavret Nov 4, 2019
c2cc3c7
Merge pull request #3068 from devavret/fea-cudf-scalar
devavret Nov 4, 2019
d8a8bc0
Merge branch 'branch-0.11' into fea-to-host
devavret Nov 4, 2019
5247777
use num_bitmask_words to calc bytes to copy
devavret Nov 4, 2019
87c87ca
Merge pull request #3278 from devavret/fea-to-host
devavret Nov 4, 2019
404dd6f
Merge pull request #3287 from harrism/fea-move-rolling-legacy
harrism Nov 4, 2019
3ad8ddd
Merge branch 'branch-0.11' of github.com:rapidsai/cudf into port-libc…
trxcllnt Nov 4, 2019
1461fe0
parameterize bool8 operator overloads
trxcllnt Nov 5, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,27 @@

## New Features

- PR #3011 Added libcudf++ transition guide
- PR #2930 JSON Reader: Support ARROW_RANDOM_FILE input
- PR #2956 Add `cudf::stack` and `cudf::tile`
- PR #2980 Added nvtext is_vowel/is_consonant functions
- PR #2987 Add `inplace` arg to `DataFrame.reset_index` and `Series`
- PR #3011 Added libcudf++ transition guide
- PR #3129 Add strings column factory from `std::vector`s
- PR #3054 Add parquet reader support for decimal data types
- PR #3022 adds DataFrame.astype for cuDF dataframes
- PR #2962 Add isnull(), notnull() and related functions
- PR #3025 Move search files to legacy
- PR #3068 Add `scalar` class
- PR #3094 Adding `any` and `all` support from libcudf
- PR #3130 Define and implement new `column_wrapper`
- PR #3143 Define and implement new copying APIs `slice` and `split`
- PR #3161 Move merge files to legacy
- PR #3079 Added support to write ORC files given a local path
- PR #3192 Add dtype param to cast `DataFrame` on init
- PR #3223 Java expose underlying buffers
- PR #3278 Add `to_host` utility to copy `column_view` to host
- PR #3087 Add new cudf::experimental bool8 wrapper
- PR #3219 Construct column from column_view

## Improvements

Expand Down Expand Up @@ -71,6 +76,13 @@
- PR #3245 Move binaryop files to legacy
- PR #3241 Move stream_compaction files to legacy
- PR #3166 Move reductions to legacy
- PR #3261 Small cleanup: remove `== true`
- PR #3268 Adding null ordering per column feature when sorting
- PR #3239 Adding floating point specialization to comparators for NaNs
- PR #3270 Move predicates files to legacy
- PR #3282 Add `num_bitmask_words`
- PR #3287 Move rolling windows files to legacy


## Bug Fixes

Expand All @@ -91,8 +103,16 @@
- PR #3218 Fixes `row_lexicographic_comparator` issue with handling two tables
- PR #3228 Default initialize RMM when Java native dependencies are loaded
- PR #3236 Fix Numba 0.46+/CuPy 6.3 interface compatibility
- PR #3276 Update JNI includes for legacy moves
- PR #3256 Fix orc writer crash with multiple string columns
- PR #3211 Fix breaking change caused by rapidsai/rmm#167
- PR #3265 Fix dangling pointer in `is_sorted`
- PR #3267 ORC writer: fix incorrect ByteRLE encoding of long literal runs
- PR #3277 Fix invalid reference to deleted temporary in `is_sorted`.
- PR #3274 ORC writer: fix integer RLEv2 mode2 unsigned base value encoding
- PR #3279 Fix shutdown hang issues with pinned memory pool init executor
- PR #3280 Invalid children check in mutable_column_device_view


# cuDF 0.10.0 (16 Oct 2019)

Expand Down
4 changes: 2 additions & 2 deletions conda/recipes/libcudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,10 @@ test:
- test -f $PREFIX/include/cudf/ipc.hpp
- test -f $PREFIX/include/cudf/legacy/merge.hpp
- test -f $PREFIX/include/cudf/legacy/join.hpp
- test -f $PREFIX/include/cudf/predicates.hpp
- test -f $PREFIX/include/cudf/legacy/predicates.hpp
- test -f $PREFIX/include/cudf/legacy/reduction.hpp
- test -f $PREFIX/include/cudf/legacy/replace.hpp
- test -f $PREFIX/include/cudf/rolling.hpp
- test -f $PREFIX/include/cudf/legacy/rolling.hpp
- test -f $PREFIX/include/cudf/legacy/search.hpp
- test -f $PREFIX/include/cudf/legacy/stream_compaction.hpp
- test -f $PREFIX/include/cudf/legacy/transform.hpp
Expand Down
15 changes: 9 additions & 6 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -366,17 +366,17 @@ add_library(cudf
src/strings/nvcategory_util.cpp
src/join/legacy/joining.cu
src/orderby/legacy/orderby.cu
src/predicates/is_sorted.cu
src/predicates/legacy/is_sorted.cu
src/sort/legacy/digitize.cu
src/groupby/hash/legacy/groupby.cu
src/groupby/sort/legacy/sort_helper.cu
src/groupby/sort/legacy/groupby.cu
src/groupby/legacy/groupby_without_aggregation.cu
src/groupby/common/legacy/aggregation_requests.cpp
src/rolling/rolling.cu
src/rolling/jit/code/kernel.cpp
src/rolling/jit/code/operation.cpp
src/rolling/jit/util/type.cpp
src/rolling/legacy/rolling.cu
src/rolling/legacy/jit/code/kernel.cpp
src/rolling/legacy/jit/code/operation.cpp
src/rolling/legacy/jit/util/type.cpp
src/binaryop/legacy/binaryop.cpp
src/binaryop/legacy/compiled/binary_ops.cu
src/binaryop/legacy/jit/code/kernel.cpp
Expand Down Expand Up @@ -481,11 +481,14 @@ add_library(cudf
src/bitmask/null_mask.cu
src/sort/sort.cu
src/strings/strings_column_factories.cu
src/strings/strings_scalar_factories.cpp
src/strings/strings_column_view.cu
src/strings/utilities.cu
src/strings/copying/copying.cu
src/strings/sorting/sorting.cu
src/column/legacy/interop.cpp)
src/column/legacy/interop.cpp
src/scalar/scalar.cpp
src/scalar/scalar_factories.cpp)

# Rename installation to proper names for later finding
set_target_properties(libNVStrings PROPERTIES OUTPUT_NAME "NVStrings")
Expand Down
39 changes: 39 additions & 0 deletions cpp/custrings/tests/test_text.cu
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#include <gtest/gtest.h>
#include <vector>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>
#include <thrust/sequence.h>

#include "nvstrings/NVStrings.h"
#include "nvstrings/NVText.h"
Expand Down Expand Up @@ -186,6 +188,43 @@ TEST_F(TestText, PorterStemmerMeasure)
NVStrings::destroy(strs);
}

TEST_F(TestText, VowelsAndConsonants)
{
std::vector<const char*> hstrs{ "abandon", nullptr, "abbey", "cleans",
"trouble", "", "yearly" };
NVStrings* strs = NVStrings::create_from_array(hstrs.data(),hstrs.size());

thrust::device_vector<bool> results(hstrs.size(),0);
{
NVText::is_letter(*strs, nullptr, nullptr, NVText::vowel, 5, results.data().get());
bool expected[] = { true, false, false, false, false, false, true };
for( unsigned int idx=0; idx < hstrs.size(); ++idx )
EXPECT_EQ(results[idx],expected[idx]);
}
{
NVText::is_letter(*strs, nullptr, nullptr, NVText::consonant, 5, results.data().get());
bool expected[] = { false, false, false, true, true, false, false };
for( unsigned int idx=0; idx < hstrs.size(); ++idx )
EXPECT_EQ(results[idx],expected[idx]);
}
thrust::device_vector<int> indices(hstrs.size());
thrust::sequence( thrust::device, indices.begin(), indices.end() );
indices[hstrs.size()-1] = -1; // throw in a negative index too
{
NVText::is_letter(*strs, nullptr, nullptr, NVText::vowel, indices.data().get(), results.data().get());
bool expected[] = { true, false, false, true, false, false, true };
for( unsigned int idx=0; idx < hstrs.size(); ++idx )
EXPECT_EQ(results[idx],expected[idx]);
}
{
NVText::is_letter(*strs, nullptr, nullptr, NVText::consonant, indices.data().get(), results.data().get());
bool expected[] = { false, false, true, false, true, false, false };
for( unsigned int idx=0; idx < hstrs.size(); ++idx )
EXPECT_EQ(results[idx],expected[idx]);
}

NVStrings::destroy(strs);
}

TEST_F(TestText, ScatterCount)
{
Expand Down
99 changes: 93 additions & 6 deletions cpp/custrings/text/stemmer.cu
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@
#include <thrust/for_each.h>
#include <rmm/rmm.h>
#include <rmm/thrust_rmm_allocator.h>

// NOTE: These are cudf headers. Please be cautious.
// Using anything from these headers besides macros or typedefs
// will not work because this module is built before libcudf
// and therefore will not be able to link to any functions there.
// This module will be reworked appropriately in the future.
#include <cudf/utilities/error.hpp>

#include "nvstrings/NVStrings.h"
Expand All @@ -27,14 +33,15 @@
#include "../custring_view.cuh"
#include "../util.h"

struct porter_stemmer_measure_fn
struct stemmer_base_fn
{
custring_view_array d_strings;
custring_view* d_vowels;
Char y_char;
unsigned int* d_results;

__device__ bool is_consonant( custring_view* dstr, int index )
stemmer_base_fn( custring_view* d_vowels, Char y_char )
: d_vowels(d_vowels), y_char(y_char) {}

__device__ bool is_consonant( custring_view* dstr, int index ) const
{
Char ch = dstr->at(index);
if( d_vowels->find(ch) >= 0 )
Expand All @@ -44,6 +51,16 @@ struct porter_stemmer_measure_fn
ch = dstr->at(index-1); // only if previous char
return d_vowels->find(ch)>=0; // is not a consonant
}
};

struct porter_stemmer_measure_fn : public stemmer_base_fn
{
custring_view_array d_strings;
unsigned int* d_results;

porter_stemmer_measure_fn( custring_view* d_vowels, Char y_char,
custring_view_array d_strings, unsigned int* d_results )
: stemmer_base_fn(d_vowels,y_char), d_strings(d_strings), d_results(d_results) {}

__device__ void operator()(unsigned int idx)
{
Expand Down Expand Up @@ -92,7 +109,7 @@ unsigned int NVText::porter_stemmer_measure(NVStrings& strs, const char* vowels,

// do the measure
thrust::for_each_n(execpol->on(0), thrust::make_counting_iterator<unsigned int>(0), count,
porter_stemmer_measure_fn{d_strings,d_vowels,char_y,d_results});
porter_stemmer_measure_fn{d_vowels,char_y,d_strings,d_results});

// done
if( !bdevmem )
Expand All @@ -102,4 +119,74 @@ unsigned int NVText::porter_stemmer_measure(NVStrings& strs, const char* vowels,
}
RMM_FREE(d_vowels,0);
return 0;
}
}

//
unsigned int is_letter(NVStrings& strs, const char* vowels, const char* y_char,
NVText::letter_type ltype, int index, int* d_indices, bool* results, bool bdevmem )
{
unsigned int count = strs.size();
if( count==0 )
return 0; // nothing to do
auto execpol = rmm::exec_policy(0);
// setup results vector
bool* d_results = results;
if( !bdevmem )
d_results = device_alloc<bool>(count,0);
if( vowels==nullptr )
vowels = "aeiou";
custring_view* d_vowels = custring_from_host(vowels);
if( y_char==nullptr )
y_char = "y";
Char char_y;
custring_view::char_to_Char(y_char,char_y);

// get the string pointers
rmm::device_vector<custring_view*> strings(count,nullptr);
custring_view** d_strings = strings.data().get();
strs.create_custring_index(d_strings);

//
stemmer_base_fn pfn{d_vowels,char_y};
thrust::transform(execpol->on(0),
thrust::make_counting_iterator<unsigned int>(0),
thrust::make_counting_iterator<unsigned int>(count),
d_results,
[d_strings, pfn, ltype, index, d_indices] __device__ (unsigned int idx) {
custring_view* d_str = d_strings[idx];
if( !d_str )
return false;
int position = index;
if( d_indices )
position = d_indices[idx];
int length = static_cast<int>(d_str->length());
if( (position >= length) || (position < -length) )
return false;
position = (position + length) % length; // handles positive or negative index
return pfn.is_consonant(d_str,position) ? ltype==NVText::consonant : ltype==NVText::vowel;
});

// done
if( !bdevmem )
{
CUDA_TRY( cudaMemcpyAsync(results,d_results,count*sizeof(bool),cudaMemcpyDeviceToHost))
RMM_FREE(d_results,0);
}
RMM_FREE(d_vowels,0);
return 0;
}


// check individual characters are vowels or consonants
unsigned int NVText::is_letter(NVStrings& strs, const char* vowels, const char* y_char,
NVText::letter_type ltype, int position, bool* results, bool bdevmem )
{
return ::is_letter(strs,vowels,y_char,ltype,position,nullptr,results,bdevmem);
}

//
unsigned int NVText::is_letter(NVStrings& strs, const char* vowels, const char* y_char,
NVText::letter_type ltype, int* d_indices, bool* results, bool bdevmem )
{
return ::is_letter(strs,vowels,y_char,ltype,0,d_indices,results,bdevmem);
}
29 changes: 28 additions & 1 deletion cpp/docs/TRANSITIONGUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,30 @@ rmm::device_buffer custom_buff(100, &mr); // Allocates 100 bytes from the custom

## `cudf::scalar`

// TODO
A `cudf::scalar` is an object that can represent a singular, nullable value of any of the types currently supported by cudf. Each type of value is represented by a separate type of scalar class which are all derived from `cudf::scalar`. e.g. A `numeric_scalar` holds a single numerical value, a `string_scalar` holds a single string. The data for the stored value resides in device memory.

|Value type|Scalar class|Notes|
|-|-|-|
|numeric|`numeric_scalar<T>` where `T` can be `int8_t`, `int16_t`, `int32_t`, `int_64_t`, `float` or `double`||
|timestamp|`timestamp_scalar<T>` where `T` can be `timestamp_D`, `timestamp_s`...||
|string|`string_scalar`|This class object is immutable|

### Construction
`scalar`s can be created using either their respective constructors or using factory functions like `make_numeric_scalar()`, `make_timestamp_scalar()` or `make_string_scalar()`.

### Casting
All the factory methods return a `unique_ptr<scalar>` which needs to be statically downcasted to its respective scalar class type before accessing its value. Their validity (nullness) can be accessed without casting.
Generally, the value would need to be accessed from a function that is aware of the value type e.g. a functor that is dispatched from `type_dispatcher`. To cast to the requisite scalar class type given the value type, use the mapping utility `scalar_type_t` provided in `type_dispatcher.hpp` :
```c++
//unique_ptr<scalar> s = make_numeric_scalar(...);

using ScalarType = cudf::experimental::scalar_type_t<T>;
// ScalarType is now numeric_scalar<T>
auto s1 = static_cast<ScalarType *>(s.get());
```

### Passing to device
Each scalar type has a corresponding non-owning device view class which allows access to the value and its validity from the device. This can be obtained using the function `get_scalar_device_view(ScalarType s)`. Note that a device view is not provided for a base scalar object, only for the derived typed scalar class objects.

## `cudf::column`

Expand Down Expand Up @@ -240,6 +263,8 @@ The preferred style for how inputs are passed in and outputs are returned is the
- `column_view const&`
- Tables:
- `table_view const&`
- Scalar:
- `scalar const&`
- Everything else:
- Trivial or inexpensively copied types
- Pass by value
Expand All @@ -258,6 +283,8 @@ The preferred style for how inputs are passed in and outputs are returned is the
- `std::unique_ptr<column>`
- Tables:
- `std::unique_ptr<table>`
- Scalars:
- `std::unique_ptr<scalar>`


### Multiple Return Values
Expand Down
4 changes: 2 additions & 2 deletions cpp/include/cudf/detail/gather.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,8 @@ __global__ void gather_bitmask_kernel(table_device_view source_table,
size_type destination_row = destination_row_base + threadIdx.x;

const bool thread_active = destination_row < destination_col.size();
size_type source_row =
thread_active ? gather_map[destination_row] : 0;
size_type source_row = thread_active ?
static_cast<size_type>(gather_map[destination_row]) : 0;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shwina I wasn't sure if this is right -- isn't gather_map an Iterator<T> rather than an iterator of size_type?


bool source_bit_is_valid = source_col.has_nulls()
? source_col.is_valid_nocheck(source_row)
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@
#ifndef ROLLING_HPP
#define ROLLING_HPP

#include "cudf.h"

namespace cudf {
/* --------------------------------------------------------------------------*
* @brief Computes the rolling window function of the values in a column.
Expand Down
Loading