diff --git a/CHANGELOG.md b/CHANGELOG.md
index 6d4bdfb8d98..dda2e02f593 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -3,8 +3,244 @@
Please see https://github.com/rapidsai/cudf/releases/tag/v22.04.00a for the latest changes to this development branch.
# cuDF 22.02.00 (Date TBD)
+# cuDF 22.02.00 (2 Feb 2022)
+
+## 🚨 Beaking Changes
+
+- ORC wite API changes fo ganula statistics ([#10058](https://github.com/rapidsai/cudf/pull/10058)) [@mythocks](https://github.com/mythocks)
+- `decimal128` Suppot fo `to/fom_aow` ([#9986](https://github.com/rapidsai/cudf/pull/9986)) [@codeepot](https://github.com/codeepot)
+- Remove depecated method `one_hot_encoding` ([#9977](https://github.com/rapidsai/cudf/pull/9977)) [@isVoid](https://github.com/isVoid)
+- Remove st.subwod_tokenize ([#9968](https://github.com/rapidsai/cudf/pull/9968)) [@VibhuJawa](https://github.com/VibhuJawa)
+- Remove depecated `method` paamete fom `mege` and `join`. ([#9944](https://github.com/rapidsai/cudf/pull/9944)) [@bdice](https://github.com/bdice)
+- Remove depecated method DataFame.hash_columns. ([#9943](https://github.com/rapidsai/cudf/pull/9943)) [@bdice](https://github.com/bdice)
+- Remove depecated method Seies.hash_encode. ([#9942](https://github.com/rapidsai/cudf/pull/9942)) [@bdice](https://github.com/bdice)
+- Refactoing ceil/ound/floo code fo datetime64 types ([#9926](https://github.com/rapidsai/cudf/pull/9926)) [@mayankanand007](https://github.com/mayankanand007)
+- Intoduce `nan_as_null` paamete fo `cudf.Index` ([#9893](https://github.com/rapidsai/cudf/pull/9893)) [@galipemsaga](https://github.com/galipemsaga)
+- Add egex_flags paamete to stings eplace_e functions ([#9878](https://github.com/rapidsai/cudf/pull/9878)) [@davidwendt](https://github.com/davidwendt)
+- Beak tie fo `top` categoical columns in `Seies.descibe` ([#9867](https://github.com/rapidsai/cudf/pull/9867)) [@isVoid](https://github.com/isVoid)
+- Add patitioning suppot in paquet wite ([#9810](https://github.com/rapidsai/cudf/pull/9810)) [@devavet](https://github.com/devavet)
+- Move `dop_duplicates`, `dop_na`, `_gathe`, `take` to IndexFame and ceate thei `_base_index` countepats ([#9807](https://github.com/rapidsai/cudf/pull/9807)) [@isVoid](https://github.com/isVoid)
+- Raise tempoay eo fo `decimal128` types in paquet eade ([#9804](https://github.com/rapidsai/cudf/pull/9804)) [@galipemsaga](https://github.com/galipemsaga)
+- Change default `dtype` of all nulls column fom `float` to `object` ([#9803](https://github.com/rapidsai/cudf/pull/9803)) [@galipemsaga](https://github.com/galipemsaga)
+- Remove unused masked udf cython/c++ code ([#9792](https://github.com/rapidsai/cudf/pull/9792)) [@bandon-b-mille](https://github.com/bandon-b-mille)
+- Pick smallest decimal type with equied pecision in ORC eade ([#9775](https://github.com/rapidsai/cudf/pull/9775)) [@vuule](https://github.com/vuule)
+- Add decimal128 suppot to Paquet eade and wite ([#9765](https://github.com/rapidsai/cudf/pull/9765)) [@vuule](https://github.com/vuule)
+- Refacto TableTest assetion methods to a sepaate utility class ([#9762](https://github.com/rapidsai/cudf/pull/9762)) [@jlowe](https://github.com/jlowe)
+- Use cuFile diect device eads/wites by default in cuIO ([#9722](https://github.com/rapidsai/cudf/pull/9722)) [@vuule](https://github.com/vuule)
+- Match pandas scala esult types in eductions ([#9717](https://github.com/rapidsai/cudf/pull/9717)) [@bandon-b-mille](https://github.com/bandon-b-mille)
+- Add paametes to contol ow goup size in Paquet wite ([#9677](https://github.com/rapidsai/cudf/pull/9677)) [@vuule](https://github.com/vuule)
+- Refacto bit counting APIs, intoduce valid/null count functions, and split host/device side code fo segmented counts. ([#9588](https://github.com/rapidsai/cudf/pull/9588)) [@bdice](https://github.com/bdice)
+- Add suppot fo `decimal128` in cudf python ([#9533](https://github.com/rapidsai/cudf/pull/9533)) [@galipemsaga](https://github.com/galipemsaga)
+- Implement `lists::index_of()` to find positions in list ows ([#9510](https://github.com/rapidsai/cudf/pull/9510)) [@mythocks](https://github.com/mythocks)
+- Rewiting ow/column convesions fo Spak <-> cudf data convesions ([#8444](https://github.com/rapidsai/cudf/pull/8444)) [@hypebolic2346](https://github.com/hypebolic2346)
-Please see https://github.com/rapidsai/cudf/releases/tag/v22.02.00a for the latest changes to this development branch.
+## 🐛 Bug Fixes
+
+- Add check fo negative stipe index in ORC eade ([#10074](https://github.com/rapidsai/cudf/pull/10074)) [@vuule](https://github.com/vuule)
+- Update Java tests to expect DECIMAL128 fom Aow ([#10073](https://github.com/rapidsai/cudf/pull/10073)) [@jlowe](https://github.com/jlowe)
+- Avoid index mateialization when `DataFame` is ceated with un-named `Seies` objects ([#10071](https://github.com/rapidsai/cudf/pull/10071)) [@galipemsaga](https://github.com/galipemsaga)
+- fix gcc 11 compilation eos ([#10067](https://github.com/rapidsai/cudf/pull/10067)) [@ongou](https://github.com/ongou)
+- Fix `columns` odeing issue in paquet eade ([#10066](https://github.com/rapidsai/cudf/pull/10066)) [@galipemsaga](https://github.com/galipemsaga)
+- Fix datafame setitem with `ndaay` types ([#10056](https://github.com/rapidsai/cudf/pull/10056)) [@galipemsaga](https://github.com/galipemsaga)
+- Remove implicit copy due to convesion fom cudf::size_type and size_t ([#10045](https://github.com/rapidsai/cudf/pull/10045)) [@obetmaynad](https://github.com/obetmaynad)
+- Include <optional> in heades that use std::optional ([#10044](https://github.com/rapidsai/cudf/pull/10044)) [@obetmaynad](https://github.com/obetmaynad)
+- Fix ep and concat of `StuctColumn` ([#10042](https://github.com/rapidsai/cudf/pull/10042)) [@galipemsaga](https://github.com/galipemsaga)
+- Include ow goup level stats when witing ORC files ([#10041](https://github.com/rapidsai/cudf/pull/10041)) [@vuule](https://github.com/vuule)
+- build.sh espects the `--build_metics` and `--incl_cache_stats` flags ([#10035](https://github.com/rapidsai/cudf/pull/10035)) [@obetmaynad](https://github.com/obetmaynad)
+- Fix memoy leaks in JNI native code. ([#10029](https://github.com/rapidsai/cudf/pull/10029)) [@mythocks](https://github.com/mythocks)
+- Update JNI to use new aena m constucto ([#10027](https://github.com/rapidsai/cudf/pull/10027)) [@ongou](https://github.com/ongou)
+- Fix null check when compaing stucts in `ag_min` opeation of eduction/goupby ([#10026](https://github.com/rapidsai/cudf/pull/10026)) [@ttnghia](https://github.com/ttnghia)
+- Wap CI scipt shell vaiables in quotes to fix local testing. ([#10018](https://github.com/rapidsai/cudf/pull/10018)) [@bdice](https://github.com/bdice)
+- cudftestutil no longe popagates compile flags to extenal uses ([#10017](https://github.com/rapidsai/cudf/pull/10017)) [@obetmaynad](https://github.com/obetmaynad)
+- Remove `CUDA_DEVICE_CALLABLE` maco usage ([#10015](https://github.com/rapidsai/cudf/pull/10015)) [@hypebolic2346](https://github.com/hypebolic2346)
+- Add missing list filling heade in meta.yaml ([#10007](https://github.com/rapidsai/cudf/pull/10007)) [@devavet](https://github.com/devavet)
+- Fix `conda` ecipes fo `custeamz` & `cudf_kafka` ([#10003](https://github.com/rapidsai/cudf/pull/10003)) [@ajschmidt8](https://github.com/ajschmidt8)
+- Fix matching egex wod-bounday () in stings eplace ([#9997](https://github.com/rapidsai/cudf/pull/9997)) [@davidwendt](https://github.com/davidwendt)
+- Fix null check when compaing stucts in `min` and `max` eduction/goupby opeations ([#9994](https://github.com/rapidsai/cudf/pull/9994)) [@ttnghia](https://github.com/ttnghia)
+- Fix octal patten matching in egex sting ([#9993](https://github.com/rapidsai/cudf/pull/9993)) [@davidwendt](https://github.com/davidwendt)
+- `decimal128` Suppot fo `to/fom_aow` ([#9986](https://github.com/rapidsai/cudf/pull/9986)) [@codeepot](https://github.com/codeepot)
+- Fix goupby shift/diff/fill afte selecting fom a `GoupBy` ([#9984](https://github.com/rapidsai/cudf/pull/9984)) [@shwina](https://github.com/shwina)
+- Fix the oveflow poblem of decimal escale ([#9966](https://github.com/rapidsai/cudf/pull/9966)) [@spelingxx](https://github.com/spelingxx)
+- Use default value fo decimal pecision in paquet wite when not specified ([#9963](https://github.com/rapidsai/cudf/pull/9963)) [@devavet](https://github.com/devavet)
+- Fix cudf java build eo. ([#9958](https://github.com/rapidsai/cudf/pull/9958)) [@fiestaman](https://github.com/fiestaman)
+- Use gpuci_mamba_ety to install local atifacts. ([#9951](https://github.com/rapidsai/cudf/pull/9951)) [@bdice](https://github.com/bdice)
+- Fix egession HostColumnVectoCoe equiing native libs ([#9948](https://github.com/rapidsai/cudf/pull/9948)) [@jlowe](https://github.com/jlowe)
+- Rename aggegate_metadata in wite to fix name collision ([#9938](https://github.com/rapidsai/cudf/pull/9938)) [@devavet](https://github.com/devavet)
+- Fixed issue with pecentile_appox whee output tdigests could have uninitialized data at the end. ([#9931](https://github.com/rapidsai/cudf/pull/9931)) [@nvdbaanec](https://github.com/nvdbaanec)
+- Resolve acecheck eos in ORC kenels ([#9916](https://github.com/rapidsai/cudf/pull/9916)) [@vuule](https://github.com/vuule)
+- Fix the java build afte paquet patitioning suppot ([#9908](https://github.com/rapidsai/cudf/pull/9908)) [@evans2](https://github.com/evans2)
+- Fix compilation of benchmak fo paquet wite. ([#9905](https://github.com/rapidsai/cudf/pull/9905)) [@bdice](https://github.com/bdice)
+- Fix a memcheck eo in ORC wite ([#9896](https://github.com/rapidsai/cudf/pull/9896)) [@vuule](https://github.com/vuule)
+- Intoduce `nan_as_null` paamete fo `cudf.Index` ([#9893](https://github.com/rapidsai/cudf/pull/9893)) [@galipemsaga](https://github.com/galipemsaga)
+- Fix fallback to sot aggegation fo gouping only hash aggegate ([#9891](https://github.com/rapidsai/cudf/pull/9891)) [@abellina](https://github.com/abellina)
+- Add zlib to cudfjni link when using static libcudf libay dependency ([#9890](https://github.com/rapidsai/cudf/pull/9890)) [@jlowe](https://github.com/jlowe)
+- TimedeltaIndex constucto aises an AttibuteEo. ([#9884](https://github.com/rapidsai/cudf/pull/9884)) [@skiui-souce](https://github.com/skiui-souce)
+- Fix cudf.Scala sting datetime constuction ([#9875](https://github.com/rapidsai/cudf/pull/9875)) [@bandon-b-mille](https://github.com/bandon-b-mille)
+- Load libcufile.so with RTLD_NODELETE flag ([#9872](https://github.com/rapidsai/cudf/pull/9872)) [@vuule](https://github.com/vuule)
+- Beak tie fo `top` categoical columns in `Seies.descibe` ([#9867](https://github.com/rapidsai/cudf/pull/9867)) [@isVoid](https://github.com/isVoid)
+- Fix null handling fo stucts `min` and `ag_min` in goupby, goupby scan, eduction, and inclusive_scan ([#9864](https://github.com/rapidsai/cudf/pull/9864)) [@ttnghia](https://github.com/ttnghia)
+- Add one-level list encoding suppot in paquet eade ([#9848](https://github.com/rapidsai/cudf/pull/9848)) [@PointKenel](https://github.com/PointKenel)
+- Fix an out-of-bounds ead in validity copying in contiguous_split. ([#9842](https://github.com/rapidsai/cudf/pull/9842)) [@nvdbaanec](https://github.com/nvdbaanec)
+- Fix join of MultiIndex to Index with one column and ovelapping name. ([#9830](https://github.com/rapidsai/cudf/pull/9830)) [@vyas](https://github.com/vyas)
+- Fix caching in `Seies.applymap` ([#9821](https://github.com/rapidsai/cudf/pull/9821)) [@bandon-b-mille](https://github.com/bandon-b-mille)
+- Enfoce boolean `ascending` fo dask-cudf `sot_values` ([#9814](https://github.com/rapidsai/cudf/pull/9814)) [@chalesbluca](https://github.com/chalesbluca)
+- Fix ORC wite cash with empty input columns ([#9808](https://github.com/rapidsai/cudf/pull/9808)) [@vuule](https://github.com/vuule)
+- Change default `dtype` of all nulls column fom `float` to `object` ([#9803](https://github.com/rapidsai/cudf/pull/9803)) [@galipemsaga](https://github.com/galipemsaga)
+- Load native dependencies when Java ColumnView is loaded ([#9800](https://github.com/rapidsai/cudf/pull/9800)) [@jlowe](https://github.com/jlowe)
+- Fix dtype-agument bug in dask_cudf ead_csv ([#9796](https://github.com/rapidsai/cudf/pull/9796)) [@jzamoa](https://github.com/jzamoa)
+- Fix oveflow fo min calculation in stings::fom_timestamps ([#9793](https://github.com/rapidsai/cudf/pull/9793)) [@evans2](https://github.com/evans2)
+- Fix memoy eo due to lambda etun type deduction limitation ([#9778](https://github.com/rapidsai/cudf/pull/9778)) [@kathikeyann](https://github.com/kathikeyann)
+- Revet egex $/EOL end-of-sting new-line special case handling ([#9774](https://github.com/rapidsai/cudf/pull/9774)) [@davidwendt](https://github.com/davidwendt)
+- Fix missing steams ([#9767](https://github.com/rapidsai/cudf/pull/9767)) [@kathikeyann](https://github.com/kathikeyann)
+- Fix make_empty_scala_like on list_type ([#9759](https://github.com/rapidsai/cudf/pull/9759)) [@spelingxx](https://github.com/spelingxx)
+- Update cmake and conda to 22.02 ([#9746](https://github.com/rapidsai/cudf/pull/9746)) [@devavet](https://github.com/devavet)
+- Fix out-of-bounds memoy wite in decimal128-to-sting convesion ([#9740](https://github.com/rapidsai/cudf/pull/9740)) [@davidwendt](https://github.com/davidwendt)
+- Match pandas scala esult types in eductions ([#9717](https://github.com/rapidsai/cudf/pull/9717)) [@bandon-b-mille](https://github.com/bandon-b-mille)
+- Fix egex non-multiline EOL/$ matching stings ending with a new-line ([#9715](https://github.com/rapidsai/cudf/pull/9715)) [@davidwendt](https://github.com/davidwendt)
+- Fixed build by adding moe checks fo int8, int16 ([#9707](https://github.com/rapidsai/cudf/pull/9707)) [@azajafi](https://github.com/azajafi)
+- Fix `null` handling when `boolean` dtype is passed ([#9691](https://github.com/rapidsai/cudf/pull/9691)) [@galipemsaga](https://github.com/galipemsaga)
+- Fix steam usage in `segmented_gathe()` ([#9679](https://github.com/rapidsai/cudf/pull/9679)) [@mythocks](https://github.com/mythocks)
+
+## 📖 Documentation
+
+- Update `decimal` dtypes elated docs enties ([#10072](https://github.com/rapidsai/cudf/pull/10072)) [@galipemsaga](https://github.com/galipemsaga)
+- Fix egex doc descibing hexadecimal escape chaactes ([#10009](https://github.com/rapidsai/cudf/pull/10009)) [@davidwendt](https://github.com/davidwendt)
+- Fix cudf compilation instuctions. ([#9956](https://github.com/rapidsai/cudf/pull/9956)) [@esoha-nvidia](https://github.com/esoha-nvidia)
+- Fix see also links fo IO APIs ([#9895](https://github.com/rapidsai/cudf/pull/9895)) [@galipemsaga](https://github.com/galipemsaga)
+- Fix build instuctions fo libcudf doxygen ([#9837](https://github.com/rapidsai/cudf/pull/9837)) [@davidwendt](https://github.com/davidwendt)
+- Fix some doxygen wanings and add missing documentation ([#9770](https://github.com/rapidsai/cudf/pull/9770)) [@kathikeyann](https://github.com/kathikeyann)
+- update cuda vesion in local build ([#9736](https://github.com/rapidsai/cudf/pull/9736)) [@kathikeyann](https://github.com/kathikeyann)
+- Fix doxygen fo enum types in libcudf ([#9724](https://github.com/rapidsai/cudf/pull/9724)) [@davidwendt](https://github.com/davidwendt)
+- Spell check fixes ([#9682](https://github.com/rapidsai/cudf/pull/9682)) [@kathikeyann](https://github.com/kathikeyann)
+- Fix links in C++ Develope Guide. ([#9675](https://github.com/rapidsai/cudf/pull/9675)) [@bdice](https://github.com/bdice)
+
+## 🚀 New Featues
+
+- Remove libcudacxx patch needed fo nvcc 11.4 ([#10057](https://github.com/rapidsai/cudf/pull/10057)) [@obetmaynad](https://github.com/obetmaynad)
+- Allow CuPy 10 ([#10048](https://github.com/rapidsai/cudf/pull/10048)) [@jakikham](https://github.com/jakikham)
+- Add in suppot fo NULL_LOGICAL_AND and NULL_LOGICAL_OR binops ([#10016](https://github.com/rapidsai/cudf/pull/10016)) [@evans2](https://github.com/evans2)
+- Add `goupby.tansfom` (only suppot fo aggegations) ([#10005](https://github.com/rapidsai/cudf/pull/10005)) [@shwina](https://github.com/shwina)
+- Add patitioning suppot to Paquet chunked wite ([#10000](https://github.com/rapidsai/cudf/pull/10000)) [@devavet](https://github.com/devavet)
+- Add jni fo sequences ([#9972](https://github.com/rapidsai/cudf/pull/9972)) [@wbo4958](https://github.com/wbo4958)
+- Java bindings fo mixed left, inne, and full joins ([#9941](https://github.com/rapidsai/cudf/pull/9941)) [@jlowe](https://github.com/jlowe)
+- Java bindings fo JSON eade suppot ([#9940](https://github.com/rapidsai/cudf/pull/9940)) [@wbo4958](https://github.com/wbo4958)
+- Enable tanspose fo sting columns in cudf python ([#9937](https://github.com/rapidsai/cudf/pull/9937)) [@galipemsaga](https://github.com/galipemsaga)
+- Suppot stucts fo `cudf::contains` with column/scala input ([#9929](https://github.com/rapidsai/cudf/pull/9929)) [@ttnghia](https://github.com/ttnghia)
+- Implement mixed equality/conditional joins ([#9917](https://github.com/rapidsai/cudf/pull/9917)) [@vyas](https://github.com/vyas)
+- Add cudf::stings::extact_all API ([#9909](https://github.com/rapidsai/cudf/pull/9909)) [@davidwendt](https://github.com/davidwendt)
+- Implement JNI fo `cudf::scatte` APIs ([#9903](https://github.com/rapidsai/cudf/pull/9903)) [@ttnghia](https://github.com/ttnghia)
+- JNI: Function to copy and set validity fom bool column. ([#9901](https://github.com/rapidsai/cudf/pull/9901)) [@mythocks](https://github.com/mythocks)
+- Add dictionay suppot to cudf::copy_if_else ([#9887](https://github.com/rapidsai/cudf/pull/9887)) [@davidwendt](https://github.com/davidwendt)
+- add un_benchmaks taget fo unning benchmaks with json output ([#9879](https://github.com/rapidsai/cudf/pull/9879)) [@kathikeyann](https://github.com/kathikeyann)
+- Add egex_flags paamete to stings eplace_e functions ([#9878](https://github.com/rapidsai/cudf/pull/9878)) [@davidwendt](https://github.com/davidwendt)
+- Add_suffix and add_pefix fo DataFames and Seies ([#9846](https://github.com/rapidsai/cudf/pull/9846)) [@mayankanand007](https://github.com/mayankanand007)
+- Add JNI fo `cudf::dop_duplicates` ([#9841](https://github.com/rapidsai/cudf/pull/9841)) [@ttnghia](https://github.com/ttnghia)
+- Implement pe-list sequence ([#9839](https://github.com/rapidsai/cudf/pull/9839)) [@ttnghia](https://github.com/ttnghia)
+- adding `seies.tanspose` ([#9835](https://github.com/rapidsai/cudf/pull/9835)) [@mayankanand007](https://github.com/mayankanand007)
+- Adding suppot fo `Seies.autoco` ([#9833](https://github.com/rapidsai/cudf/pull/9833)) [@mayankanand007](https://github.com/mayankanand007)
+- Suppot ound opeation on datetime64 datatypes ([#9820](https://github.com/rapidsai/cudf/pull/9820)) [@mayankanand007](https://github.com/mayankanand007)
+- Add patitioning suppot in paquet wite ([#9810](https://github.com/rapidsai/cudf/pull/9810)) [@devavet](https://github.com/devavet)
+- Raise tempoay eo fo `decimal128` types in paquet eade ([#9804](https://github.com/rapidsai/cudf/pull/9804)) [@galipemsaga](https://github.com/galipemsaga)
+- Add decimal128 suppot to Paquet eade and wite ([#9765](https://github.com/rapidsai/cudf/pull/9765)) [@vuule](https://github.com/vuule)
+- Optimize `goupby::scan` ([#9754](https://github.com/rapidsai/cudf/pull/9754)) [@PointKenel](https://github.com/PointKenel)
+- Add sample JNI API ([#9728](https://github.com/rapidsai/cudf/pull/9728)) [@es-life](https://github.com/es-life)
+- Suppot `min` and `max` in inclusive scan fo stucts ([#9725](https://github.com/rapidsai/cudf/pull/9725)) [@ttnghia](https://github.com/ttnghia)
+- Add `fist` and `last` method to `IndexedFame` ([#9710](https://github.com/rapidsai/cudf/pull/9710)) [@isVoid](https://github.com/isVoid)
+- Suppot `min` and `max` eduction fo stucts ([#9697](https://github.com/rapidsai/cudf/pull/9697)) [@ttnghia](https://github.com/ttnghia)
+- Add paametes to contol ow goup size in Paquet wite ([#9677](https://github.com/rapidsai/cudf/pull/9677)) [@vuule](https://github.com/vuule)
+- Run compute-sanitize in nightly build ([#9641](https://github.com/rapidsai/cudf/pull/9641)) [@kathikeyann](https://github.com/kathikeyann)
+- Implement Seies.datetime.floo ([#9571](https://github.com/rapidsai/cudf/pull/9571)) [@skiui-souce](https://github.com/skiui-souce)
+- ceil/floo fo `DatetimeIndex` ([#9554](https://github.com/rapidsai/cudf/pull/9554)) [@mayankanand007](https://github.com/mayankanand007)
+- Add suppot fo `decimal128` in cudf python ([#9533](https://github.com/rapidsai/cudf/pull/9533)) [@galipemsaga](https://github.com/galipemsaga)
+- Implement `lists::index_of()` to find positions in list ows ([#9510](https://github.com/rapidsai/cudf/pull/9510)) [@mythocks](https://github.com/mythocks)
+- custeamz oauth callback fo kafka (libdkafka) ([#9486](https://github.com/rapidsai/cudf/pull/9486)) [@jdye64](https://github.com/jdye64)
+- Add Peason coelation fo sot goupby (python) ([#9166](https://github.com/rapidsai/cudf/pull/9166)) [@skiui-souce](https://github.com/skiui-souce)
+- Intechange datafame potocol ([#9071](https://github.com/rapidsai/cudf/pull/9071)) [@iskode](https://github.com/iskode)
+- Rewiting ow/column convesions fo Spak <-> cudf data convesions ([#8444](https://github.com/rapidsai/cudf/pull/8444)) [@hypebolic2346](https://github.com/hypebolic2346)
+
+## 🛠️ Impovements
+
+- Pepae upload scipts fo Python 3.7 emoval ([#10092](https://github.com/rapidsai/cudf/pull/10092)) [@Ethyling](https://github.com/Ethyling)
+- Simplify custeamz and cudf_kafka ecipes files ([#10065](https://github.com/rapidsai/cudf/pull/10065)) [@Ethyling](https://github.com/Ethyling)
+- ORC wite API changes fo ganula statistics ([#10058](https://github.com/rapidsai/cudf/pull/10058)) [@mythocks](https://github.com/mythocks)
+- Remove python constaints in cuteamz and cudf_kafka ecipes ([#10052](https://github.com/rapidsai/cudf/pull/10052)) [@Ethyling](https://github.com/Ethyling)
+- Unpin `dask` and `distibuted` in CI ([#10028](https://github.com/rapidsai/cudf/pull/10028)) [@galipemsaga](https://github.com/galipemsaga)
+- Add `_fom_column_like_self` factoy ([#10022](https://github.com/rapidsai/cudf/pull/10022)) [@isVoid](https://github.com/isVoid)
+- Replace custom CUDA bindings peviously povided by RMM with official CUDA Python bindings ([#10008](https://github.com/rapidsai/cudf/pull/10008)) [@shwina](https://github.com/shwina)
+- Use `cuda::std::is_aithmetic` in `cudf::is_numeic` tait. ([#9996](https://github.com/rapidsai/cudf/pull/9996)) [@bdice](https://github.com/bdice)
+- Clean up CUDA steam use in cuIO ([#9991](https://github.com/rapidsai/cudf/pull/9991)) [@vuule](https://github.com/vuule)
+- Use addessed-odeed fist fit fo the pinned memoy pool ([#9989](https://github.com/rapidsai/cudf/pull/9989)) [@ongou](https://github.com/ongou)
+- Add stings tests to tanspose_test.cpp ([#9985](https://github.com/rapidsai/cudf/pull/9985)) [@davidwendt](https://github.com/davidwendt)
+- Use gpuci_mamba_ety on Java CI. ([#9983](https://github.com/rapidsai/cudf/pull/9983)) [@bdice](https://github.com/bdice)
+- Remove depecated method `one_hot_encoding` ([#9977](https://github.com/rapidsai/cudf/pull/9977)) [@isVoid](https://github.com/isVoid)
+- Mino cleanup of unused Python functions ([#9974](https://github.com/rapidsai/cudf/pull/9974)) [@vyas](https://github.com/vyas)
+- Use new efficient patitioned paquet witing in cuDF ([#9971](https://github.com/rapidsai/cudf/pull/9971)) [@devavet](https://github.com/devavet)
+- Remove st.subwod_tokenize ([#9968](https://github.com/rapidsai/cudf/pull/9968)) [@VibhuJawa](https://github.com/VibhuJawa)
+- Fowad-mege banch-21.12 to banch-22.02 ([#9947](https://github.com/rapidsai/cudf/pull/9947)) [@bdice](https://github.com/bdice)
+- Remove depecated `method` paamete fom `mege` and `join`. ([#9944](https://github.com/rapidsai/cudf/pull/9944)) [@bdice](https://github.com/bdice)
+- Remove depecated method DataFame.hash_columns. ([#9943](https://github.com/rapidsai/cudf/pull/9943)) [@bdice](https://github.com/bdice)
+- Remove depecated method Seies.hash_encode. ([#9942](https://github.com/rapidsai/cudf/pull/9942)) [@bdice](https://github.com/bdice)
+- use ninja in java ci build ([#9933](https://github.com/rapidsai/cudf/pull/9933)) [@ongou](https://github.com/ongou)
+- Add build-time publish step to cpu build scipt ([#9927](https://github.com/rapidsai/cudf/pull/9927)) [@davidwendt](https://github.com/davidwendt)
+- Refactoing ceil/ound/floo code fo datetime64 types ([#9926](https://github.com/rapidsai/cudf/pull/9926)) [@mayankanand007](https://github.com/mayankanand007)
+- Remove vaious unused functions ([#9922](https://github.com/rapidsai/cudf/pull/9922)) [@vyas](https://github.com/vyas)
+- Raise in `quey` if dtype is not suppoted ([#9921](https://github.com/rapidsai/cudf/pull/9921)) [@bandon-b-mille](https://github.com/bandon-b-mille)
+- Add missing impots tests ([#9920](https://github.com/rapidsai/cudf/pull/9920)) [@Ethyling](https://github.com/Ethyling)
+- Spak Decimal128 hashing ([#9919](https://github.com/rapidsai/cudf/pull/9919)) [@wlee](https://github.com/wlee)
+- Replace `thust/std::get` with stuctued bindings ([#9915](https://github.com/rapidsai/cudf/pull/9915)) [@codeepot](https://github.com/codeepot)
+- Upgade thust vesion to 1.15 ([#9912](https://github.com/rapidsai/cudf/pull/9912)) [@obetmaynad](https://github.com/obetmaynad)
+- Remove conda envs fo CUDA 11.0 and 11.2. ([#9910](https://github.com/rapidsai/cudf/pull/9910)) [@bdice](https://github.com/bdice)
+- Retun count of set bits fom inplace_bitmask_and. ([#9904](https://github.com/rapidsai/cudf/pull/9904)) [@bdice](https://github.com/bdice)
+- Use dynamic nullate fo join hashe and equality compaato ([#9902](https://github.com/rapidsai/cudf/pull/9902)) [@davidwendt](https://github.com/davidwendt)
+- Update ucx-py vesion on elease using vc ([#9897](https://github.com/rapidsai/cudf/pull/9897)) [@Ethyling](https://github.com/Ethyling)
+- Remove `IncludeCategoies` fom `.clang-fomat` ([#9876](https://github.com/rapidsai/cudf/pull/9876)) [@codeepot](https://github.com/codeepot)
+- Suppot statically linking CUDA untime fo Java bindings ([#9873](https://github.com/rapidsai/cudf/pull/9873)) [@jlowe](https://github.com/jlowe)
+- Add `clang-tidy` to libcudf ([#9860](https://github.com/rapidsai/cudf/pull/9860)) [@codeepot](https://github.com/codeepot)
+- Remove depecated methods fom Java Table class ([#9853](https://github.com/rapidsai/cudf/pull/9853)) [@jlowe](https://github.com/jlowe)
+- Add test fo map column metadata handling in ORC wite ([#9852](https://github.com/rapidsai/cudf/pull/9852)) [@vuule](https://github.com/vuule)
+- Use pandas `to_offset` to pase fequency sting in `date_ange` ([#9843](https://github.com/rapidsai/cudf/pull/9843)) [@isVoid](https://github.com/isVoid)
+- add templated benchmak with fixtue ([#9838](https://github.com/rapidsai/cudf/pull/9838)) [@kathikeyann](https://github.com/kathikeyann)
+- Use list of column inputs fo `apply_boolean_mask` ([#9832](https://github.com/rapidsai/cudf/pull/9832)) [@isVoid](https://github.com/isVoid)
+- Added a few moe tests fo Decimal to Sting cast ([#9818](https://github.com/rapidsai/cudf/pull/9818)) [@azajafi](https://github.com/azajafi)
+- Run doctests. ([#9815](https://github.com/rapidsai/cudf/pull/9815)) [@bdice](https://github.com/bdice)
+- Avoid oveflow fo fixed_point ound ([#9809](https://github.com/rapidsai/cudf/pull/9809)) [@spelingxx](https://github.com/spelingxx)
+- Move `dop_duplicates`, `dop_na`, `_gathe`, `take` to IndexFame and ceate thei `_base_index` countepats ([#9807](https://github.com/rapidsai/cudf/pull/9807)) [@isVoid](https://github.com/isVoid)
+- Use vecto factoies fo host-device copies. ([#9806](https://github.com/rapidsai/cudf/pull/9806)) [@bdice](https://github.com/bdice)
+- Refacto host device macos ([#9797](https://github.com/rapidsai/cudf/pull/9797)) [@vyas](https://github.com/vyas)
+- Remove unused masked udf cython/c++ code ([#9792](https://github.com/rapidsai/cudf/pull/9792)) [@bandon-b-mille](https://github.com/bandon-b-mille)
+- Allow custom sot functions fo dask-cudf `sot_values` ([#9789](https://github.com/rapidsai/cudf/pull/9789)) [@chalesbluca](https://github.com/chalesbluca)
+- Impove build time of libcudf iteato tests ([#9788](https://github.com/rapidsai/cudf/pull/9788)) [@davidwendt](https://github.com/davidwendt)
+- Copy Java native dependencies diectly into classpath ([#9787](https://github.com/rapidsai/cudf/pull/9787)) [@jlowe](https://github.com/jlowe)
+- Add decimal types to cuIO benchmaks ([#9776](https://github.com/rapidsai/cudf/pull/9776)) [@vuule](https://github.com/vuule)
+- Pick smallest decimal type with equied pecision in ORC eade ([#9775](https://github.com/rapidsai/cudf/pull/9775)) [@vuule](https://github.com/vuule)
+- Avoid oveflow fo `fixed_point` `cudf::cast` and pefomance optimization ([#9772](https://github.com/rapidsai/cudf/pull/9772)) [@codeepot](https://github.com/codeepot)
+- Use CTAD with Thust function objects ([#9768](https://github.com/rapidsai/cudf/pull/9768)) [@codeepot](https://github.com/codeepot)
+- Refacto TableTest assetion methods to a sepaate utility class ([#9762](https://github.com/rapidsai/cudf/pull/9762)) [@jlowe](https://github.com/jlowe)
+- Use Java classloade to find test esouces ([#9760](https://github.com/rapidsai/cudf/pull/9760)) [@jlowe](https://github.com/jlowe)
+- Allow cast decimal128 to sting and add tests ([#9756](https://github.com/rapidsai/cudf/pull/9756)) [@azajafi](https://github.com/azajafi)
+- Load balance optimization fo contiguous_split ([#9755](https://github.com/rapidsai/cudf/pull/9755)) [@nvdbaanec](https://github.com/nvdbaanec)
+- Consolidate and impove `eset_index` ([#9750](https://github.com/rapidsai/cudf/pull/9750)) [@isVoid](https://github.com/isVoid)
+- Update to UCX-Py 0.24 ([#9748](https://github.com/rapidsai/cudf/pull/9748)) [@pentschev](https://github.com/pentschev)
+- Skip cufile tests in JNI build scipt ([#9744](https://github.com/rapidsai/cudf/pull/9744)) [@pxLi](https://github.com/pxLi)
+- Enable sting to decimal 128 cast ([#9742](https://github.com/rapidsai/cudf/pull/9742)) [@azajafi](https://github.com/azajafi)
+- Use stop instead of stop_. ([#9735](https://github.com/rapidsai/cudf/pull/9735)) [@bdice](https://github.com/bdice)
+- Fowad-mege banch-21.12 to banch-22.02 ([#9730](https://github.com/rapidsai/cudf/pull/9730)) [@bdice](https://github.com/bdice)
+- Impove cmake fomat scipt ([#9723](https://github.com/rapidsai/cudf/pull/9723)) [@vyas](https://github.com/vyas)
+- Use cuFile diect device eads/wites by default in cuIO ([#9722](https://github.com/rapidsai/cudf/pull/9722)) [@vuule](https://github.com/vuule)
+- Add diectoy-patitioned data suppot to cudf.ead_paquet ([#9720](https://github.com/rapidsai/cudf/pull/9720)) [@jzamoa](https://github.com/jzamoa)
+- Use steam allocato adapto fo hash join table ([#9704](https://github.com/rapidsai/cudf/pull/9704)) [@PointKenel](https://github.com/PointKenel)
+- Update check fo inf/nan stings in libcudf float convesion to ignoe case ([#9694](https://github.com/rapidsai/cudf/pull/9694)) [@davidwendt](https://github.com/davidwendt)
+- Update cudf JNI to 22.02.0-SNAPSHOT ([#9681](https://github.com/rapidsai/cudf/pull/9681)) [@pxLi](https://github.com/pxLi)
+- Replace cudf's concuent_odeed_map with cuco::static_map in semi/anti joins ([#9666](https://github.com/rapidsai/cudf/pull/9666)) [@vyas](https://github.com/vyas)
+- Some impovements to `pase_decimal` function and bindings fo `is_fixed_point` ([#9658](https://github.com/rapidsai/cudf/pull/9658)) [@azajafi](https://github.com/azajafi)
+- Add utility to fomat ninja-log build times ([#9631](https://github.com/rapidsai/cudf/pull/9631)) [@davidwendt](https://github.com/davidwendt)
+- Allow untime has_nulls paamete fo ow opeatos ([#9623](https://github.com/rapidsai/cudf/pull/9623)) [@davidwendt](https://github.com/davidwendt)
+- Use fsspec.paquet fo impoved ead_paquet pefomance fom emote stoage ([#9589](https://github.com/rapidsai/cudf/pull/9589)) [@jzamoa](https://github.com/jzamoa)
+- Refacto bit counting APIs, intoduce valid/null count functions, and split host/device side code fo segmented counts. ([#9588](https://github.com/rapidsai/cudf/pull/9588)) [@bdice](https://github.com/bdice)
+- Use List of Columns as Input fo `dop_nulls`, `gathe` and `dop_duplicates` ([#9558](https://github.com/rapidsai/cudf/pull/9558)) [@isVoid](https://github.com/isVoid)
+- Simplify mege intenals and educe ovehead ([#9516](https://github.com/rapidsai/cudf/pull/9516)) [@vyas](https://github.com/vyas)
+- Add `stuct` geneation suppot in datageneato & fuzz tests ([#9180](https://github.com/rapidsai/cudf/pull/9180)) [@galipemsaga](https://github.com/galipemsaga)
+- Simplify wite_csv by emoving unnecessay wite/impl classes ([#9089](https://github.com/rapidsai/cudf/pull/9089)) [@cwhais](https://github.com/cwhais)
# cuDF 21.12.00 (9 Dec 2021)
diff --git a/build.sh b/build.sh
index c2eba134c35..8b3add1dddd 100755
--- a/build.sh
+++ b/build.sh
@@ -185,12 +185,9 @@ if buildAll || hasArg libcudf; then
fi
# get the current count before the compile starts
- FILES_IN_CCACHE=""
- if [[ "$BUILD_REPORT_INCL_CACHE_STATS" == "ON" && -x "$(command -v ccache)" ]]; then
- FILES_IN_CCACHE=$(ccache -s | grep "files in cache")
- echo "$FILES_IN_CCACHE"
- # zero the ccache statistics
- ccache -z
+ if [[ "$BUILD_REPORT_INCL_CACHE_STATS" == "ON" && -x "$(command -v sccache)" ]]; then
+ # zero the sccache statistics
+ sccache --zero-stats
fi
cmake -S $REPODIR/cpp -B ${LIB_BUILD_DIR} \
@@ -216,11 +213,12 @@ if buildAll || hasArg libcudf; then
echo "Formatting build metrics"
python ${REPODIR}/cpp/scripts/sort_ninja_log.py ${LIB_BUILD_DIR}/.ninja_log --fmt xml > ${LIB_BUILD_DIR}/ninja_log.xml
MSG="
"
- # get some ccache stats after the compile
- if [[ "$BUILD_REPORT_INCL_CACHE_STATS"=="ON" && -x "$(command -v ccache)" ]]; then
- MSG="${MSG}
$FILES_IN_CCACHE"
- HIT_RATE=$(ccache -s | grep "cache hit rate")
- MSG="${MSG}
${HIT_RATE}"
+ # get some sccache stats after the compile
+ if [[ "$BUILD_REPORT_INCL_CACHE_STATS" == "ON" && -x "$(command -v sccache)" ]]; then
+ COMPILE_REQUESTS=$(sccache -s | grep "Compile requests \+ [0-9]\+$" | awk '{ print $NF }')
+ CACHE_HITS=$(sccache -s | grep "Cache hits \+ [0-9]\+$" | awk '{ print $NF }')
+ HIT_RATE=$(echo - | awk "{printf \"%.2f\n\", $CACHE_HITS / $COMPILE_REQUESTS * 100}")
+ MSG="${MSG}
cache hit rate ${HIT_RATE} %"
fi
MSG="${MSG}
parallel setting: $PARALLEL_LEVEL"
MSG="${MSG}
parallel build time: $compile_total seconds"
diff --git a/ci/cpu/build.sh b/ci/cpu/build.sh
index 6f19f174da0..574a55d26b6 100755
--- a/ci/cpu/build.sh
+++ b/ci/cpu/build.sh
@@ -31,6 +31,10 @@ if [[ "$BUILD_MODE" = "branch" && "$SOURCE_BRANCH" = branch-* ]] ; then
export VERSION_SUFFIX=`date +%y%m%d`
fi
+export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"
+export CMAKE_CXX_COMPILER_LAUNCHER="sccache"
+export CMAKE_C_COMPILER_LAUNCHER="sccache"
+
################################################################################
# SETUP - Check environment
################################################################################
@@ -77,6 +81,8 @@ if [ "$BUILD_LIBCUDF" == '1' ]; then
gpuci_conda_retry build --no-build-id --croot ${CONDA_BLD_DIR} conda/recipes/libcudf $CONDA_BUILD_ARGS
mkdir -p ${CONDA_BLD_DIR}/libcudf/work
cp -r ${CONDA_BLD_DIR}/work/* ${CONDA_BLD_DIR}/libcudf/work
+ gpuci_logger "sccache stats"
+ sccache --show-stats
# Copy libcudf build metrics results
LIBCUDF_BUILD_DIR=$CONDA_BLD_DIR/libcudf/work/cpp/build
diff --git a/ci/gpu/build.sh b/ci/gpu/build.sh
index d5fb7451769..6a5c28faeff 100755
--- a/ci/gpu/build.sh
+++ b/ci/gpu/build.sh
@@ -36,6 +36,10 @@ export DASK_DISTRIBUTED_GIT_TAG='2022.01.0'
# ucx-py version
export UCX_PY_VERSION='0.25.*'
+export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"
+export CMAKE_CXX_COMPILER_LAUNCHER="sccache"
+export CMAKE_C_COMPILER_LAUNCHER="sccache"
+
################################################################################
# TRAP - Setup trap for removing jitify cache
################################################################################
diff --git a/ci/utils/nbtestlog2junitxml.py b/ci/utils/nbtestlog2junitxml.py
index 15b362e4b70..6a421279112 100644
--- a/ci/utils/nbtestlog2junitxml.py
+++ b/ci/utils/nbtestlog2junitxml.py
@@ -7,11 +7,11 @@
from enum import Enum
-startingPatt = re.compile("^STARTING: ([\w\.\-]+)$")
-skippingPatt = re.compile("^SKIPPING: ([\w\.\-]+)\s*(\(([\w\.\-\ \,]+)\))?\s*$")
-exitCodePatt = re.compile("^EXIT CODE: (\d+)$")
-folderPatt = re.compile("^FOLDER: ([\w\.\-]+)$")
-timePatt = re.compile("^real\s+([\d\.ms]+)$")
+startingPatt = re.compile(r"^STARTING: ([\w\.\-]+)$")
+skippingPatt = re.compile(r"^SKIPPING: ([\w\.\-]+)\s*(\(([\w\.\-\ \,]+)\))?\s*$")
+exitCodePatt = re.compile(r"^EXIT CODE: (\d+)$")
+folderPatt = re.compile(r"^FOLDER: ([\w\.\-]+)$")
+timePatt = re.compile(r"^real\s+([\d\.ms]+)$")
linePatt = re.compile("^" + ("-" * 80) + "$")
diff --git a/conda/recipes/libcudf/meta.yaml b/conda/recipes/libcudf/meta.yaml
index 2cbe5173de0..70c020d4abd 100644
--- a/conda/recipes/libcudf/meta.yaml
+++ b/conda/recipes/libcudf/meta.yaml
@@ -22,13 +22,15 @@ build:
- PARALLEL_LEVEL
- VERSION_SUFFIX
- PROJECT_FLASH
- - CCACHE_DIR
- - CCACHE_NOHASHDIR
- - CCACHE_COMPILERCHECK
- CMAKE_GENERATOR
- CMAKE_C_COMPILER_LAUNCHER
- CMAKE_CXX_COMPILER_LAUNCHER
- CMAKE_CUDA_COMPILER_LAUNCHER
+ - SCCACHE_S3_KEY_PREFIX=libcudf-aarch64 # [aarch64]
+ - SCCACHE_S3_KEY_PREFIX=libcudf-linux64 # [linux64]
+ - SCCACHE_BUCKET=rapids-sccache
+ - SCCACHE_REGION=us-west-2
+ - SCCACHE_IDLE_TIMEOUT=32768
run_exports:
- {{ pin_subpackage("libcudf", max_pin="x.x") }}
diff --git a/cpp/include/cudf/binaryop.hpp b/cpp/include/cudf/binaryop.hpp
index daf55c0befe..177fd904b0b 100644
--- a/cpp/include/cudf/binaryop.hpp
+++ b/cpp/include/cudf/binaryop.hpp
@@ -45,7 +45,7 @@ enum class binary_operator : int32_t {
PMOD, ///< positive modulo operator
///< If remainder is negative, this returns (remainder + divisor) % divisor
///< else, it returns (dividend % divisor)
- PYMOD, ///< operator % but following python's sign rules for negatives
+ PYMOD, ///< operator % but following Python's sign rules for negatives
POW, ///< lhs ^ rhs
LOG_BASE, ///< logarithm to the base
ATAN2, ///< 2-argument arctangent
diff --git a/cpp/include/cudf/fixed_point/fixed_point.hpp b/cpp/include/cudf/fixed_point/fixed_point.hpp
index a7112ae415d..f027e2783b1 100644
--- a/cpp/include/cudf/fixed_point/fixed_point.hpp
+++ b/cpp/include/cudf/fixed_point/fixed_point.hpp
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2020-2021, NVIDIA CORPORATION.
+ * Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@@ -440,6 +440,21 @@ class fixed_point {
CUDF_HOST_DEVICE inline friend fixed_point operator/(
fixed_point const& lhs, fixed_point const& rhs);
+ /**
+ * @brief operator % (for computing the modulo operation of two `fixed_point` numbers)
+ *
+ * If `_scale`s are equal, the modulus is computed directly.
+ * If `_scale`s are not equal, the number with larger `_scale` is shifted to the
+ * smaller `_scale`, and then the modulus is computed.
+ *
+ * @tparam Rep1 Representation type of number being modulo-ed to `this`
+ * @tparam Rad1 Radix (base) type of number being modulo-ed to `this`
+ * @return The resulting `fixed_point` number
+ */
+ template
+ CUDF_HOST_DEVICE inline friend fixed_point operator%(
+ fixed_point const& lhs, fixed_point const& rhs);
+
/**
* @brief operator == (for comparing two `fixed_point` numbers)
*
@@ -750,6 +765,16 @@ CUDF_HOST_DEVICE inline bool operator>(fixed_point const& lhs,
return lhs.rescaled(scale)._value > rhs.rescaled(scale)._value;
}
+// MODULO OPERATION
+template
+CUDF_HOST_DEVICE inline fixed_point operator%(fixed_point const& lhs,
+ fixed_point const& rhs)
+{
+ auto const scale = std::min(lhs._scale, rhs._scale);
+ auto const remainder = lhs.rescaled(scale)._value % rhs.rescaled(scale)._value;
+ return fixed_point{scaled_integer{remainder, scale}};
+}
+
using decimal32 = fixed_point;
using decimal64 = fixed_point;
using decimal128 = fixed_point<__int128_t, Radix::BASE_10>;
diff --git a/cpp/scripts/run-clang-format.py b/cpp/scripts/run-clang-format.py
index a7c83da22c5..3d462d65fb8 100755
--- a/cpp/scripts/run-clang-format.py
+++ b/cpp/scripts/run-clang-format.py
@@ -13,7 +13,6 @@
# limitations under the License.
#
-from __future__ import print_function
import argparse
import os
@@ -124,9 +123,9 @@ def run_clang_format(src, dst, exe, verbose, inplace):
os.makedirs(dstdir)
# run the clang format command itself
if src == dst:
- cmd = "%s -i %s" % (exe, src)
+ cmd = f"{exe} -i {src}"
else:
- cmd = "%s %s > %s" % (exe, src, dst)
+ cmd = f"{exe} {src} > {dst}"
try:
subprocess.check_call(cmd, shell=True)
except subprocess.CalledProcessError:
@@ -134,9 +133,9 @@ def run_clang_format(src, dst, exe, verbose, inplace):
raise
# run the diff to check if there are any formatting issues
if inplace:
- cmd = "diff -q %s %s >/dev/null" % (src, dst)
+ cmd = f"diff -q {src} {dst} >/dev/null"
else:
- cmd = "diff %s %s" % (src, dst)
+ cmd = f"diff {src} {dst}"
try:
subprocess.check_call(cmd, shell=True)
diff --git a/cpp/scripts/run-clang-tidy.py b/cpp/scripts/run-clang-tidy.py
index 3a1a663e231..30e937d7f4d 100644
--- a/cpp/scripts/run-clang-tidy.py
+++ b/cpp/scripts/run-clang-tidy.py
@@ -13,7 +13,6 @@
# limitations under the License.
#
-from __future__ import print_function
import re
import os
import subprocess
@@ -67,7 +66,7 @@ def parse_args():
def get_all_commands(cdb):
- with open(cdb, "r") as fp:
+ with open(cdb) as fp:
return json.load(fp)
@@ -195,10 +194,10 @@ def collect_result(result):
def print_result(passed, stdout, file):
status_str = "PASSED" if passed else "FAILED"
- print("%s File:%s %s %s" % (SEPARATOR, file, status_str, SEPARATOR))
+ print(f"{SEPARATOR} File:{file} {status_str} {SEPARATOR}")
if stdout:
print(stdout)
- print("%s File:%s ENDS %s" % (SEPARATOR, file, SEPARATOR))
+ print(f"{SEPARATOR} File:{file} ENDS {SEPARATOR}")
def print_results():
diff --git a/cpp/scripts/sort_ninja_log.py b/cpp/scripts/sort_ninja_log.py
index 33c369b254f..85eb800879a 100755
--- a/cpp/scripts/sort_ninja_log.py
+++ b/cpp/scripts/sort_ninja_log.py
@@ -33,7 +33,7 @@
# build a map of the log entries
entries = {}
-with open(log_file, "r") as log:
+with open(log_file) as log:
last = 0
files = {}
for line in log:
diff --git a/cpp/src/binaryop/binaryop.cpp b/cpp/src/binaryop/binaryop.cpp
index 5f9ff2574e3..dfa7896c37a 100644
--- a/cpp/src/binaryop/binaryop.cpp
+++ b/cpp/src/binaryop/binaryop.cpp
@@ -88,7 +88,10 @@ bool is_basic_arithmetic_binop(binary_operator op)
op == binary_operator::MUL or // operator *
op == binary_operator::DIV or // operator / using common type of lhs and rhs
op == binary_operator::NULL_MIN or // 2 null = null, 1 null = value, else min
- op == binary_operator::NULL_MAX; // 2 null = null, 1 null = value, else max
+ op == binary_operator::NULL_MAX or // 2 null = null, 1 null = value, else max
+ op == binary_operator::MOD or // operator %
+ op == binary_operator::PMOD or // positive modulo operator
+ op == binary_operator::PYMOD; // operator % but following Python's negative sign rules
}
/**
diff --git a/cpp/src/binaryop/compiled/operation.cuh b/cpp/src/binaryop/compiled/operation.cuh
index 4b5f78dc400..de9d46b6280 100644
--- a/cpp/src/binaryop/compiled/operation.cuh
+++ b/cpp/src/binaryop/compiled/operation.cuh
@@ -162,12 +162,24 @@ struct PMod {
if (rem < 0) rem = std::fmod(rem + yconv, yconv);
return rem;
}
+
+ template () and
+ std::is_same_v>* = nullptr>
+ __device__ inline auto operator()(TypeLhs x, TypeRhs y)
+ {
+ auto const remainder = x % y;
+ return remainder.value() < 0 ? (remainder + y) % y : remainder;
+ }
};
struct PyMod {
template >)>* = nullptr>
+ std::enable_if_t<(std::is_integral_v> or
+ (cudf::is_fixed_point() and
+ std::is_same_v))>* = nullptr>
__device__ inline auto operator()(TypeLhs x, TypeRhs y) -> decltype(((x % y) + y) % y)
{
return ((x % y) + y) % y;
diff --git a/cpp/src/binaryop/compiled/util.cpp b/cpp/src/binaryop/compiled/util.cpp
index 9481c236142..d8f1eb03a16 100644
--- a/cpp/src/binaryop/compiled/util.cpp
+++ b/cpp/src/binaryop/compiled/util.cpp
@@ -45,7 +45,11 @@ struct common_type_functor {
// Eg. d=t-t
return data_type{type_to_id()};
}
- return {};
+
+ // A compiler bug may cause a compilation error when using empty initializer list to construct
+ // an std::optional object containing no `data_type` value. Therefore, we should explicitly
+ // return `std::nullopt` instead.
+ return std::nullopt;
}
};
template
diff --git a/cpp/tests/binaryop/binop-compiled-fixed_point-test.cpp b/cpp/tests/binaryop/binop-compiled-fixed_point-test.cpp
index 29905171907..335de93c976 100644
--- a/cpp/tests/binaryop/binop-compiled-fixed_point-test.cpp
+++ b/cpp/tests/binaryop/binop-compiled-fixed_point-test.cpp
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2021, NVIDIA CORPORATION.
+ * Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@@ -33,14 +33,14 @@
namespace cudf::test::binop {
template
-struct FixedPointCompiledTestBothReps : public cudf::test::BaseFixture {
+struct FixedPointCompiledTest : public cudf::test::BaseFixture {
};
template
using wrapper = cudf::test::fixed_width_column_wrapper;
-TYPED_TEST_SUITE(FixedPointCompiledTestBothReps, cudf::test::FixedPointTypes);
+TYPED_TEST_SUITE(FixedPointCompiledTest, cudf::test::FixedPointTypes);
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpAdd)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpAdd)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -73,7 +73,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpAdd)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected_col, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpMultiply)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpMultiply)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -109,7 +109,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpMultiply)
template
using fp_wrapper = cudf::test::fixed_point_column_wrapper;
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpMultiply2)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpMultiply2)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -128,7 +128,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpMultiply2)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpDiv)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpDiv)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -147,7 +147,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpDiv)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpDiv2)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpDiv2)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -166,7 +166,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpDiv2)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpDiv3)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpDiv3)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -183,7 +183,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpDiv3)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpDiv4)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpDiv4)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -203,7 +203,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpDiv4)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpAdd2)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpAdd2)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -222,7 +222,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpAdd2)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpAdd3)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpAdd3)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -241,7 +241,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpAdd3)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpAdd4)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpAdd4)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -258,7 +258,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpAdd4)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpAdd5)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpAdd5)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -275,7 +275,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpAdd5)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpAdd6)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpAdd6)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -294,7 +294,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpAdd6)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected1, result1->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointCast)
+TYPED_TEST(FixedPointCompiledTest, FixedPointCast)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -308,7 +308,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointCast)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpMultiplyScalar)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpMultiplyScalar)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -325,7 +325,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpMultiplyScalar)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpSimplePlus)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpSimplePlus)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -344,7 +344,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpSimplePlus)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpEqualSimple)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpEqualSimple)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -361,7 +361,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpEqualSimple)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpEqualSimpleScale0)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpEqualSimpleScale0)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -377,7 +377,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpEqualSimpleScale0)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpEqualSimpleScale0Null)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpEqualSimpleScale0Null)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -393,7 +393,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpEqualSimpleScale0Nu
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpEqualSimpleScale2Null)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpEqualSimpleScale2Null)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -409,7 +409,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpEqualSimpleScale2Nu
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpEqualLessGreater)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpEqualLessGreater)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -453,7 +453,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpEqualLessGreater)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(true_col, greater_result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpNullMaxSimple)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpNullMaxSimple)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -473,7 +473,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpNullMaxSimple)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpNullMinSimple)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpNullMinSimple)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -493,7 +493,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpNullMinSimple)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpNullEqualsSimple)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpNullEqualsSimple)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -510,7 +510,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpNullEqualsSimple)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOp_Div)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -526,7 +526,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div2)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOp_Div2)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -542,7 +542,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div2)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div3)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOp_Div3)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -558,7 +558,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div3)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div4)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOp_Div4)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -574,7 +574,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div4)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div6)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOp_Div6)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -591,7 +591,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div6)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div7)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOp_Div7)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -608,7 +608,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div7)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div8)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOp_Div8)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -624,7 +624,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div8)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div9)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOp_Div9)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -640,7 +640,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div9)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div10)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOp_Div10)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -656,7 +656,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div10)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div11)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOp_Div11)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -672,7 +672,7 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOp_Div11)
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}
-TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpThrows)
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpThrows)
{
using namespace numeric;
using decimalXX = TypeParam;
@@ -684,6 +684,132 @@ TYPED_TEST(FixedPointCompiledTestBothReps, FixedPointBinaryOpThrows)
cudf::logic_error);
}
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpModSimple)
+{
+ using namespace numeric;
+ using decimalXX = TypeParam;
+ using RepType = device_storage_type_t;
+
+ auto const lhs = fp_wrapper{{-33, -22, -11, 11, 22, 33, 44, 55}, scale_type{-1}};
+ auto const rhs = fp_wrapper{{10, 10, 10, 10, 10, 10, 10, 10}, scale_type{-1}};
+ auto const expected = fp_wrapper{{-3, -2, -1, 1, 2, 3, 4, 5}, scale_type{-1}};
+
+ auto const type =
+ cudf::binary_operation_fixed_point_output_type(cudf::binary_operator::MOD,
+ static_cast(lhs).type(),
+ static_cast(rhs).type());
+ auto const result = cudf::binary_operation(lhs, rhs, cudf::binary_operator::MOD, type);
+
+ CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
+}
+
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpPModSimple)
+{
+ using namespace numeric;
+ using decimalXX = TypeParam;
+ using RepType = device_storage_type_t;
+
+ auto const lhs = fp_wrapper{{-33, -22, -11, 11, 22, 33, 44, 55}, scale_type{-1}};
+ auto const rhs = fp_wrapper{{10, 10, 10, 10, 10, 10, 10, 10}, scale_type{-1}};
+ auto const expected = fp_wrapper{{7, 8, 9, 1, 2, 3, 4, 5}, scale_type{-1}};
+
+ for (auto const op : {cudf::binary_operator::PMOD, cudf::binary_operator::PYMOD}) {
+ auto const type = cudf::binary_operation_fixed_point_output_type(
+ op, static_cast(lhs).type(), static_cast(rhs).type());
+ auto const result = cudf::binary_operation(lhs, rhs, op, type);
+
+ CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
+ }
+}
+
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpModSimple2)
+{
+ using namespace numeric;
+ using decimalXX = TypeParam;
+ using RepType = device_storage_type_t;
+
+ auto const lhs = fp_wrapper{{-33, -22, -11, 11, 22, 33, 44, 55}, scale_type{-1}};
+ auto const rhs = make_fixed_point_scalar(10, scale_type{-1});
+ auto const expected = fp_wrapper{{-3, -2, -1, 1, 2, 3, 4, 5}, scale_type{-1}};
+
+ auto const type = cudf::binary_operation_fixed_point_output_type(
+ cudf::binary_operator::MOD, static_cast(lhs).type(), rhs->type());
+ auto const result = cudf::binary_operation(lhs, *rhs, cudf::binary_operator::MOD, type);
+
+ CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
+}
+
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpPModAndPyModSimple2)
+{
+ using namespace numeric;
+ using decimalXX = TypeParam;
+ using RepType = device_storage_type_t;
+
+ auto const lhs = fp_wrapper{{-33, -22, -11, 11, 22, 33, 44, 55}, scale_type{-1}};
+ auto const rhs = make_fixed_point_scalar(10, scale_type{-1});
+ auto const expected = fp_wrapper{{7, 8, 9, 1, 2, 3, 4, 5}, scale_type{-1}};
+
+ for (auto const op : {cudf::binary_operator::PMOD, cudf::binary_operator::PYMOD}) {
+ auto const type = cudf::binary_operation_fixed_point_output_type(
+ op, static_cast(lhs).type(), rhs->type());
+ auto const result = cudf::binary_operation(lhs, *rhs, op, type);
+
+ CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
+ }
+}
+
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpMod)
+{
+ using namespace numeric;
+ using decimalXX = TypeParam;
+ using RepType = device_storage_type_t;
+ auto constexpr N = 1000;
+
+ for (auto scale : {-1, -2, -3}) {
+ auto const iota = thrust::make_counting_iterator(-500);
+ auto const lhs = fp_wrapper{iota, iota + N, scale_type{-1}};
+ auto const rhs = make_fixed_point_scalar(7, scale_type{scale});
+
+ auto const factor = static_cast(std::pow(10, -1 - scale));
+ auto const f = [factor](auto i) { return (i * factor) % 7; };
+ auto const exp_iter = cudf::detail::make_counting_transform_iterator(-500, f);
+ auto const expected = fp_wrapper{exp_iter, exp_iter + N, scale_type{scale}};
+
+ auto const type = cudf::binary_operation_fixed_point_output_type(
+ cudf::binary_operator::MOD, static_cast(lhs).type(), rhs->type());
+ auto const result = cudf::binary_operation(lhs, *rhs, cudf::binary_operator::MOD, type);
+
+ CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
+ }
+}
+
+TYPED_TEST(FixedPointCompiledTest, FixedPointBinaryOpPModAndPyMod)
+{
+ using namespace numeric;
+ using decimalXX = TypeParam;
+ using RepType = device_storage_type_t;
+ auto constexpr N = 1000;
+
+ for (auto const scale : {-1, -2, -3}) {
+ auto const iota = thrust::make_counting_iterator(-500);
+ auto const lhs = fp_wrapper{iota, iota + N, scale_type{-1}};
+ auto const rhs = make_fixed_point_scalar(7, scale_type{scale});
+
+ auto const factor = static_cast(std::pow(10, -1 - scale));
+ auto const f = [factor](auto i) { return (((i * factor) % 7) + 7) % 7; };
+ auto const exp_iter = cudf::detail::make_counting_transform_iterator(-500, f);
+ auto const expected = fp_wrapper{exp_iter, exp_iter + N, scale_type{scale}};
+
+ for (auto const op : {cudf::binary_operator::PMOD, cudf::binary_operator::PYMOD}) {
+ auto const type = cudf::binary_operation_fixed_point_output_type(
+ op, static_cast(lhs).type(), rhs->type());
+ auto const result = cudf::binary_operation(lhs, *rhs, op, type);
+
+ CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
+ }
+ }
+}
+
template
struct FixedPointTest_64_128_Reps : public cudf::test::BaseFixture {
};
diff --git a/docs/cudf/source/conf.py b/docs/cudf/source/conf.py
index 3d6d3ceb399..60704f3e6ae 100644
--- a/docs/cudf/source/conf.py
+++ b/docs/cudf/source/conf.py
@@ -1,6 +1,4 @@
#!/usr/bin/env python3
-# -*- coding: utf-8 -*-
-#
# Copyright (c) 2018-2021, NVIDIA CORPORATION.
#
# cudf documentation build configuration file, created by
@@ -118,17 +116,6 @@
html_theme = "pydata_sphinx_theme"
html_logo = "_static/RAPIDS-logo-purple.png"
-# on_rtd is whether we are on readthedocs.org
-on_rtd = os.environ.get("READTHEDOCS", None) == "True"
-
-if not on_rtd:
- # only import and set the theme if we're building docs locally
- # otherwise, readthedocs.org uses their theme by default,
- # so no need to specify it
- import pydata_sphinx_theme
-
- html_theme = "pydata_sphinx_theme"
- html_theme_path = pydata_sphinx_theme.get_html_theme_path()
# Theme options are theme-specific and customize the look and feel of a theme
diff --git a/java/src/main/java/ai/rapids/cudf/Aggregation128Utils.java b/java/src/main/java/ai/rapids/cudf/Aggregation128Utils.java
new file mode 100644
index 00000000000..9a0ac709e3e
--- /dev/null
+++ b/java/src/main/java/ai/rapids/cudf/Aggregation128Utils.java
@@ -0,0 +1,67 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package ai.rapids.cudf;
+
+/**
+ * Utility methods for breaking apart and reassembling 128-bit values during aggregations
+ * to enable hash-based aggregations and detect overflows.
+ */
+public class Aggregation128Utils {
+ static {
+ NativeDepsLoader.loadNativeDeps();
+ }
+
+ /**
+ * Extract a 32-bit chunk from a 128-bit value.
+ * @param col column of 128-bit values (e.g.: DECIMAL128)
+ * @param outType integer type to use for the output column (e.g.: UINT32 or INT32)
+ * @param chunkIdx index of the 32-bit chunk to extract where 0 is the least significant chunk
+ * and 3 is the most significant chunk
+ * @return column containing the specified 32-bit chunk of the input column values. A null input
+ * row will result in a corresponding null output row.
+ */
+ public static ColumnVector extractInt32Chunk(ColumnView col, DType outType, int chunkIdx) {
+ return new ColumnVector(extractInt32Chunk(col.getNativeView(),
+ outType.getTypeId().getNativeId(), chunkIdx));
+ }
+
+ /**
+ * Reassemble a column of 128-bit values from a table of four 64-bit integer columns and check
+ * for overflow. The 128-bit value is reconstructed by overlapping the 64-bit values by 32-bits.
+ * The least significant 32-bits of the least significant 64-bit value are used directly as the
+ * least significant 32-bits of the final 128-bit value, and the remaining 32-bits are added to
+ * the next most significant 64-bit value. The lower 32-bits of that sum become the next most
+ * significant 32-bits in the final 128-bit value, and the remaining 32-bits are added to the
+ * next most significant 64-bit input value, and so on.
+ *
+ * @param chunks table of four 64-bit integer columns with the columns ordered from least
+ * significant to most significant. The last column must be of type INT64.
+ * @param type the type to use for the resulting 128-bit value column
+ * @return table containing a boolean column and a 128-bit value column of the requested type.
+ * The boolean value will be true if an overflow was detected for that row's value when
+ * it was reassembled. A null input row will result in a corresponding null output row.
+ */
+ public static Table combineInt64SumChunks(Table chunks, DType type) {
+ return new Table(combineInt64SumChunks(chunks.getNativeView(),
+ type.getTypeId().getNativeId(),
+ type.getScale()));
+ }
+
+ private static native long extractInt32Chunk(long columnView, int outTypeId, int chunkIdx);
+
+ private static native long[] combineInt64SumChunks(long chunksTableView, int dtype, int scale);
+}
diff --git a/java/src/main/native/CMakeLists.txt b/java/src/main/native/CMakeLists.txt
index 00747efff27..ffbeeb155e0 100755
--- a/java/src/main/native/CMakeLists.txt
+++ b/java/src/main/native/CMakeLists.txt
@@ -1,5 +1,5 @@
# =============================================================================
-# Copyright (c) 2019-2021, NVIDIA CORPORATION.
+# Copyright (c) 2019-2022, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
# in compliance with the License. You may obtain a copy of the License at
@@ -219,7 +219,7 @@ endif()
add_library(
cudfjni SHARED
- src/row_conversion.cu
+ src/Aggregation128UtilsJni.cpp
src/AggregationJni.cpp
src/CudfJni.cpp
src/CudaJni.cpp
@@ -236,7 +236,9 @@ add_library(
src/RmmJni.cpp
src/ScalarJni.cpp
src/TableJni.cpp
+ src/aggregation128_utils.cu
src/map_lookup.cu
+ src/row_conversion.cu
src/check_nvcomp_output_sizes.cu
)
diff --git a/java/src/main/native/src/Aggregation128UtilsJni.cpp b/java/src/main/native/src/Aggregation128UtilsJni.cpp
new file mode 100644
index 00000000000..71c36cb724a
--- /dev/null
+++ b/java/src/main/native/src/Aggregation128UtilsJni.cpp
@@ -0,0 +1,47 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "aggregation128_utils.hpp"
+#include "cudf_jni_apis.hpp"
+#include "dtype_utils.hpp"
+
+extern "C" {
+
+JNIEXPORT jlong JNICALL Java_ai_rapids_cudf_Aggregation128Utils_extractInt32Chunk(
+ JNIEnv *env, jclass, jlong j_column_view, jint j_out_dtype, jint j_chunk_idx) {
+ JNI_NULL_CHECK(env, j_column_view, "column is null", 0);
+ try {
+ cudf::jni::auto_set_device(env);
+ auto cview = reinterpret_cast(j_column_view);
+ auto dtype = cudf::jni::make_data_type(j_out_dtype, 0);
+ return cudf::jni::release_as_jlong(cudf::jni::extract_chunk32(*cview, dtype, j_chunk_idx));
+ }
+ CATCH_STD(env, 0);
+}
+
+JNIEXPORT jlongArray JNICALL Java_ai_rapids_cudf_Aggregation128Utils_combineInt64SumChunks(
+ JNIEnv *env, jclass, jlong j_table_view, jint j_dtype, jint j_scale) {
+ JNI_NULL_CHECK(env, j_table_view, "table is null", 0);
+ try {
+ cudf::jni::auto_set_device(env);
+ auto tview = reinterpret_cast(j_table_view);
+ std::unique_ptr result =
+ cudf::jni::assemble128_from_sum(*tview, cudf::jni::make_data_type(j_dtype, j_scale));
+ return cudf::jni::convert_table_for_return(env, result);
+ }
+ CATCH_STD(env, 0);
+}
+}
diff --git a/java/src/main/native/src/ColumnVectorJni.cpp b/java/src/main/native/src/ColumnVectorJni.cpp
index 0e559ad0403..f01d832eb19 100644
--- a/java/src/main/native/src/ColumnVectorJni.cpp
+++ b/java/src/main/native/src/ColumnVectorJni.cpp
@@ -252,8 +252,8 @@ JNIEXPORT jlong JNICALL Java_ai_rapids_cudf_ColumnVector_makeListFromOffsets(
JNI_NULL_CHECK(env, offsets_handle, "offsets_handle is null", 0)
try {
cudf::jni::auto_set_device(env);
- auto const *child_cv = reinterpret_cast(child_handle);
- auto const *offsets_cv = reinterpret_cast(offsets_handle);
+ auto const child_cv = reinterpret_cast(child_handle);
+ auto const offsets_cv = reinterpret_cast(offsets_handle);
CUDF_EXPECTS(offsets_cv->type().id() == cudf::type_id::INT32,
"Input offsets does not have type INT32.");
diff --git a/java/src/main/native/src/ColumnViewJni.cpp b/java/src/main/native/src/ColumnViewJni.cpp
index 63247eb0066..eec4a78a457 100644
--- a/java/src/main/native/src/ColumnViewJni.cpp
+++ b/java/src/main/native/src/ColumnViewJni.cpp
@@ -408,7 +408,7 @@ JNIEXPORT jlong JNICALL Java_ai_rapids_cudf_ColumnView_dropListDuplicatesWithKey
JNI_NULL_CHECK(env, keys_vals_handle, "keys_vals_handle is null", 0);
try {
cudf::jni::auto_set_device(env);
- auto const *input_cv = reinterpret_cast(keys_vals_handle);
+ auto const input_cv = reinterpret_cast(keys_vals_handle);
CUDF_EXPECTS(input_cv->offset() == 0, "Input column has non-zero offset.");
CUDF_EXPECTS(input_cv->type().id() == cudf::type_id::LIST,
"Input column is not a lists column.");
@@ -460,7 +460,8 @@ JNIEXPORT jlong JNICALL Java_ai_rapids_cudf_ColumnView_dropListDuplicatesWithKey
auto out_structs =
cudf::make_structs_column(out_child_size, std::move(out_structs_members), 0, {});
return release_as_jlong(cudf::make_lists_column(input_cv->size(), std::move(out_offsets),
- std::move(out_structs), 0, {}));
+ std::move(out_structs), input_cv->null_count(),
+ cudf::copy_bitmask(*input_cv)));
}
CATCH_STD(env, 0);
}
diff --git a/java/src/main/native/src/aggregation128_utils.cu b/java/src/main/native/src/aggregation128_utils.cu
new file mode 100644
index 00000000000..865f607ff7d
--- /dev/null
+++ b/java/src/main/native/src/aggregation128_utils.cu
@@ -0,0 +1,127 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include
+#include
+#include
+
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+
+#include "aggregation128_utils.hpp"
+
+namespace {
+
+// Functor to reassemble a 128-bit value from four 64-bit chunks with overflow detection.
+class chunk_assembler : public thrust::unary_function {
+public:
+ chunk_assembler(bool *overflows, uint64_t const *chunks0, uint64_t const *chunks1,
+ uint64_t const *chunks2, int64_t const *chunks3)
+ : overflows(overflows), chunks0(chunks0), chunks1(chunks1), chunks2(chunks2),
+ chunks3(chunks3) {}
+
+ __device__ __int128_t operator()(cudf::size_type i) const {
+ // Starting with the least significant input and moving to the most significant, propagate the
+ // upper 32-bits of the previous column into the next column, i.e.: propagate the "carry" bits
+ // of each 64-bit chunk into the next chunk.
+ uint64_t const c0 = chunks0[i];
+ uint64_t const c1 = chunks1[i] + (c0 >> 32);
+ uint64_t const c2 = chunks2[i] + (c1 >> 32);
+ int64_t const c3 = chunks3[i] + (c2 >> 32);
+ uint64_t const lower64 = (c1 << 32) | static_cast(c0);
+ int64_t const upper64 = (c3 << 32) | static_cast(c2);
+
+ // check for overflow by ensuring the sign bit matches the top carry bits
+ int32_t const replicated_sign_bit = static_cast(c3) >> 31;
+ int32_t const top_carry_bits = static_cast(c3 >> 32);
+ overflows[i] = (replicated_sign_bit != top_carry_bits);
+
+ return (static_cast<__int128_t>(upper64) << 64) | lower64;
+ }
+
+private:
+ // output column for overflow detected
+ bool *const overflows;
+
+ // input columns for the four 64-bit values
+ uint64_t const *const chunks0;
+ uint64_t const *const chunks1;
+ uint64_t const *const chunks2;
+ int64_t const *const chunks3;
+};
+
+} // anonymous namespace
+
+namespace cudf::jni {
+
+// Extract a 32-bit chunk from a 128-bit value.
+std::unique_ptr extract_chunk32(cudf::column_view const &in_col, cudf::data_type type,
+ int chunk_idx, rmm::cuda_stream_view stream) {
+ CUDF_EXPECTS(in_col.type().id() == cudf::type_id::DECIMAL128, "not a 128-bit type");
+ CUDF_EXPECTS(chunk_idx >= 0 && chunk_idx < 4, "invalid chunk index");
+ CUDF_EXPECTS(type.id() == cudf::type_id::INT32 || type.id() == cudf::type_id::UINT32,
+ "not a 32-bit integer type");
+ auto const num_rows = in_col.size();
+ auto out_col = cudf::make_fixed_width_column(type, num_rows, copy_bitmask(in_col));
+ auto out_view = out_col->mutable_view();
+ auto const in_begin = in_col.begin();
+
+ // Build an iterator for every fourth 32-bit value, i.e.: one "chunk" of a __int128_t value
+ thrust::transform_iterator transform_iter{thrust::counting_iterator{0},
+ [] __device__(auto i) { return i * 4; }};
+ thrust::permutation_iterator stride_iter{in_begin + chunk_idx, transform_iter};
+
+ thrust::copy(rmm::exec_policy(stream), stride_iter, stride_iter + num_rows,
+ out_view.data());
+ return out_col;
+}
+
+// Reassemble a column of 128-bit values from four 64-bit integer columns with overflow detection.
+std::unique_ptr assemble128_from_sum(cudf::table_view const &chunks_table,
+ cudf::data_type output_type,
+ rmm::cuda_stream_view stream) {
+ CUDF_EXPECTS(output_type.id() == cudf::type_id::DECIMAL128, "not a 128-bit type");
+ CUDF_EXPECTS(chunks_table.num_columns() == 4, "must be 4 column table");
+ auto const num_rows = chunks_table.num_rows();
+ auto const chunks0 = chunks_table.column(0);
+ auto const chunks1 = chunks_table.column(1);
+ auto const chunks2 = chunks_table.column(2);
+ auto const chunks3 = chunks_table.column(3);
+ CUDF_EXPECTS(cudf::size_of(chunks0.type()) == 8 && cudf::size_of(chunks1.type()) == 8 &&
+ cudf::size_of(chunks2.type()) == 8 &&
+ chunks3.type().id() == cudf::type_id::INT64,
+ "chunks type mismatch");
+ std::vector> columns;
+ columns.push_back(cudf::make_fixed_width_column(cudf::data_type{cudf::type_id::BOOL8}, num_rows,
+ copy_bitmask(chunks0)));
+ columns.push_back(cudf::make_fixed_width_column(output_type, num_rows, copy_bitmask(chunks0)));
+ auto overflows_view = columns[0]->mutable_view();
+ auto assembled_view = columns[1]->mutable_view();
+ thrust::transform(rmm::exec_policy(stream), thrust::make_counting_iterator(0),
+ thrust::make_counting_iterator(num_rows),
+ assembled_view.begin<__int128_t>(),
+ chunk_assembler(overflows_view.begin(), chunks0.begin(),
+ chunks1.begin(), chunks2.begin(),
+ chunks3.begin()));
+ return std::make_unique(std::move(columns));
+}
+
+} // namespace cudf::jni
diff --git a/java/src/main/native/src/aggregation128_utils.hpp b/java/src/main/native/src/aggregation128_utils.hpp
new file mode 100644
index 00000000000..30c1032b795
--- /dev/null
+++ b/java/src/main/native/src/aggregation128_utils.hpp
@@ -0,0 +1,69 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include
+
+#include
+#include
+#include
+
+namespace cudf::jni {
+
+/**
+ * @brief Extract a 32-bit integer column from a column of 128-bit values.
+ *
+ * Given a 128-bit input column, a 32-bit integer column is returned corresponding to
+ * the index of which 32-bit chunk of the original 128-bit values to extract.
+ * 0 corresponds to the least significant chunk, and 3 corresponds to the most
+ * significant chunk.
+ *
+ * A null input row will result in a corresponding null output row.
+ *
+ * @param col Column of 128-bit values
+ * @param dtype Integer type to use for the output column (e.g.: UINT32 or INT32)
+ * @param chunk_idx Index of the 32-bit chunk to extract
+ * @param stream CUDA stream to use
+ * @return A column containing the extracted 32-bit integer values
+ */
+std::unique_ptr
+extract_chunk32(cudf::column_view const &col, cudf::data_type dtype, int chunk_idx,
+ rmm::cuda_stream_view stream = rmm::cuda_stream_default);
+
+/**
+ * @brief Reassemble a 128-bit column from four 64-bit integer columns with overflow detection.
+ *
+ * The 128-bit value is reconstructed by overlapping the 64-bit values by 32-bits. The least
+ * significant 32-bits of the least significant 64-bit value are used directly as the least
+ * significant 32-bits of the final 128-bit value, and the remaining 32-bits are added to the next
+ * most significant 64-bit value. The lower 32-bits of that sum become the next most significant
+ * 32-bits in the final 128-bit value, and the remaining 32-bits are added to the next most
+ * significant 64-bit input value, and so on.
+ *
+ * A null input row will result in a corresponding null output row.
+ *
+ * @param chunks_table Table of four 64-bit integer columns with the columns ordered from least
+ * significant to most significant. The last column must be an INT64 column.
+ * @param output_type The type to use for the resulting 128-bit value column
+ * @param stream CUDA stream to use
+ * @return Table containing a boolean column and a 128-bit value column of the
+ * requested type. The boolean value will be true if an overflow was detected
+ * for that row's value.
+ */
+std::unique_ptr
+assemble128_from_sum(cudf::table_view const &chunks_table, cudf::data_type output_type,
+ rmm::cuda_stream_view stream = rmm::cuda_stream_default);
+
+} // namespace cudf::jni
diff --git a/java/src/test/java/ai/rapids/cudf/Aggregation128UtilsTest.java b/java/src/test/java/ai/rapids/cudf/Aggregation128UtilsTest.java
new file mode 100644
index 00000000000..11e2aff7259
--- /dev/null
+++ b/java/src/test/java/ai/rapids/cudf/Aggregation128UtilsTest.java
@@ -0,0 +1,80 @@
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package ai.rapids.cudf;
+
+import org.junit.jupiter.api.Test;
+
+import java.math.BigInteger;
+
+public class Aggregation128UtilsTest extends CudfTestBase {
+ @Test
+ public void testExtractInt32Chunks() {
+ BigInteger[] intvals = new BigInteger[] {
+ null,
+ new BigInteger("123456789abcdef0f0debc9a78563412", 16),
+ new BigInteger("123456789abcdef0f0debc9a78563412", 16),
+ new BigInteger("123456789abcdef0f0debc9a78563412", 16),
+ null
+ };
+ try (ColumnVector cv = ColumnVector.decimalFromBigInt(-38, intvals);
+ ColumnVector chunk1 = Aggregation128Utils.extractInt32Chunk(cv, DType.UINT32, 0);
+ ColumnVector chunk2 = Aggregation128Utils.extractInt32Chunk(cv, DType.UINT32, 1);
+ ColumnVector chunk3 = Aggregation128Utils.extractInt32Chunk(cv, DType.UINT32, 2);
+ ColumnVector chunk4 = Aggregation128Utils.extractInt32Chunk(cv, DType.INT32, 3);
+ Table actualChunks = new Table(chunk1, chunk2, chunk3, chunk4);
+ ColumnVector expectedChunk1 = ColumnVector.fromBoxedUnsignedInts(
+ null, 0x78563412, 0x78563412, 0x78563412, null);
+ ColumnVector expectedChunk2 = ColumnVector.fromBoxedUnsignedInts(
+ null, -0x0f214366, -0x0f214366, -0x0f214366, null);
+ ColumnVector expectedChunk3 = ColumnVector.fromBoxedUnsignedInts(
+ null, -0x65432110, -0x65432110, -0x65432110, null);
+ ColumnVector expectedChunk4 = ColumnVector.fromBoxedInts(
+ null, 0x12345678, 0x12345678, 0x12345678, null);
+ Table expectedChunks = new Table(expectedChunk1, expectedChunk2, expectedChunk3, expectedChunk4)) {
+ AssertUtils.assertTablesAreEqual(expectedChunks, actualChunks);
+ }
+ }
+
+ @Test
+ public void testCombineInt64SumChunks() {
+ try (ColumnVector chunks0 = ColumnVector.fromBoxedUnsignedLongs(
+ null, 0L, 1L, 0L, 0L, 0x12345678L, 0x123456789L, 0x1234567812345678L, 0xfedcba9876543210L);
+ ColumnVector chunks1 = ColumnVector.fromBoxedUnsignedLongs(
+ null, 0L, 2L, 0L, 0L, 0x9abcdef0L, 0x9abcdef01L, 0x1122334455667788L, 0xaceaceaceaceaceaL);
+ ColumnVector chunks2 = ColumnVector.fromBoxedUnsignedLongs(
+ null, 0L, 3L, 0L, 0L, 0x11223344L, 0x556677889L, 0x99aabbccddeeff00L, 0xbdfbdfbdfbdfbdfbL);
+ ColumnVector chunks3 = ColumnVector.fromBoxedLongs(
+ null, 0L, -1L, 0x100000000L, 0x80000000L, 0x55667788L, 0x01234567L, 0x66554434L, -0x42042043L);
+ Table chunksTable = new Table(chunks0, chunks1, chunks2, chunks3);
+ Table actual = Aggregation128Utils.combineInt64SumChunks(chunksTable, DType.create(DType.DTypeEnum.DECIMAL128, -20));
+ ColumnVector expectedOverflows = ColumnVector.fromBoxedBooleans(
+ null, false, false, true, true, false, false, true, false);
+ ColumnVector expectedValues = ColumnVector.decimalFromBigInt(-20,
+ null,
+ new BigInteger("0", 16),
+ new BigInteger("-fffffffcfffffffdffffffff", 16),
+ new BigInteger("0", 16),
+ new BigInteger("-80000000000000000000000000000000", 16),
+ new BigInteger("55667788112233449abcdef012345678", 16),
+ new BigInteger("123456c56677892abcdef0223456789", 16),
+ new BigInteger("ef113244679ace0012345678", 16),
+ new BigInteger("7bf7bf7ba8ca8ca8e9ab678276543210", 16));
+ Table expected = new Table(expectedOverflows, expectedValues)) {
+ AssertUtils.assertTablesAreEqual(expected, actual);
+ }
+ }
+}
diff --git a/java/src/test/java/ai/rapids/cudf/ColumnVectorTest.java b/java/src/test/java/ai/rapids/cudf/ColumnVectorTest.java
index 8f39c3c51ce..f9c8029ed84 100644
--- a/java/src/test/java/ai/rapids/cudf/ColumnVectorTest.java
+++ b/java/src/test/java/ai/rapids/cudf/ColumnVectorTest.java
@@ -4380,12 +4380,14 @@ void testDropListDuplicatesWithKeysValues() {
3, 4, 5, // list2
null, 0, 6, 6, 0, // list3
null, 6, 7, null, 7 // list 4
+ // list5 (empty)
);
ColumnVector inputChildVals = ColumnVector.fromBoxedInts(
10, 20, // list1
30, 40, 50, // list2
60, 70, 80, 90, 100, // list3
110, 120, 130, 140, 150 // list4
+ // list5 (empty)
);
ColumnVector inputStructsKeysVals = ColumnVector.makeStruct(inputChildKeys, inputChildVals);
ColumnVector inputOffsets = ColumnVector.fromInts(0, 2, 5, 10, 15, 15);
@@ -4402,7 +4404,8 @@ void testDropListDuplicatesWithKeysValues() {
10, 20,
30, 40, 50,
100, 90, 60,
- 120, 150, 140);
+ 120, 150, 140
+ );
ColumnVector expectedStructsKeysVals = ColumnVector.makeStruct(expectedChildKeys,
expectedChildVals);
ColumnVector expectedOffsets = ColumnVector.fromInts(0, 2, 5, 8, 11, 11);
@@ -4416,6 +4419,60 @@ void testDropListDuplicatesWithKeysValues() {
}
}
+ @Test
+ void testDropListDuplicatesWithKeysValuesNullable() {
+ try(ColumnVector inputChildKeys = ColumnVector.fromBoxedInts(
+ 1, 2, // list1
+ // list2 (null)
+ 3, 4, 5, // list3
+ null, 0, 6, 6, 0, // list4
+ null, 6, 7, null, 7 // list 5
+ // list6 (null)
+ );
+ ColumnVector inputChildVals = ColumnVector.fromBoxedInts(
+ 10, 20, // list1
+ // list2 (null)
+ 30, 40, 50, // list3
+ 60, 70, 80, 90, 100, // list4
+ 110, 120, 130, 140, 150 // list5
+ // list6 (null)
+ );
+ ColumnVector inputStructsKeysVals = ColumnVector.makeStruct(inputChildKeys, inputChildVals);
+ ColumnVector inputOffsets = ColumnVector.fromInts(0, 2, 2, 5, 10, 15, 15);
+ ColumnVector tmpInputListsKeysVals = inputStructsKeysVals.makeListFromOffsets(6,inputOffsets);
+ ColumnVector templateBitmask = ColumnVector.fromBoxedInts(1, null, 1, 1, 1, null);
+ ColumnVector inputListsKeysVals = tmpInputListsKeysVals.mergeAndSetValidity(BinaryOp.BITWISE_AND, templateBitmask);
+
+ ColumnVector expectedChildKeys = ColumnVector.fromBoxedInts(
+ 1, 2, // list1
+ // list2 (null)
+ 3, 4, 5, // list3
+ 0, 6, null, // list4
+ 6, 7, null // list5
+ // list6 (null)
+ );
+ ColumnVector expectedChildVals = ColumnVector.fromBoxedInts(
+ 10, 20, // list1
+ // list2 (null)
+ 30, 40, 50, // list3
+ 100, 90, 60, // list4
+ 120, 150, 140 // list5
+ // list6 (null)
+ );
+ ColumnVector expectedStructsKeysVals = ColumnVector.makeStruct(expectedChildKeys,
+ expectedChildVals);
+ ColumnVector expectedOffsets = ColumnVector.fromInts(0, 2, 2, 5, 8, 11, 11);
+ ColumnVector tmpExpectedListsKeysVals = expectedStructsKeysVals.makeListFromOffsets(6,
+ expectedOffsets);
+ ColumnVector expectedListsKeysVals = tmpExpectedListsKeysVals.mergeAndSetValidity(BinaryOp.BITWISE_AND, templateBitmask);
+
+ ColumnVector output = inputListsKeysVals.dropListDuplicatesWithKeysValues();
+ ColumnVector sortedOutput = output.listSortRows(false, false);
+ ) {
+ assertColumnsAreEqual(expectedListsKeysVals, sortedOutput);
+ }
+ }
+
@SafeVarargs
private static ColumnVector makeListsColumn(DType childDType, List... rows) {
HostColumnVector.DataType childType = new HostColumnVector.BasicType(true, childDType);
@@ -4716,7 +4773,7 @@ void testStringSplit() {
Table resultSplitOnce = v.stringSplit(pattern, 1);
Table resultSplitAll = v.stringSplit(pattern)) {
assertTablesAreEqual(expectedSplitOnce, resultSplitOnce);
- assertTablesAreEqual(expectedSplitAll, resultSplitAll);
+ assertTablesAreEqual(expectedSplitAll, resultSplitAll);
}
}
@@ -6068,7 +6125,7 @@ void testCopyWithBooleanColumnAsValidity() {
}
// Negative case: Mismatch in row count.
- Exception x = assertThrows(CudfException.class, () -> {
+ Exception x = assertThrows(CudfException.class, () -> {
try (ColumnVector exemplar = ColumnVector.fromBoxedInts(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
ColumnVector validity = ColumnVector.fromBoxedBooleans(F, T, F, T);
ColumnVector result = exemplar.copyWithBooleanColumnAsValidity(validity)) {
diff --git a/python/cudf/cudf/_fuzz_testing/fuzzer.py b/python/cudf/cudf/_fuzz_testing/fuzzer.py
index 484b3fb26f4..a51a5073510 100644
--- a/python/cudf/cudf/_fuzz_testing/fuzzer.py
+++ b/python/cudf/cudf/_fuzz_testing/fuzzer.py
@@ -14,7 +14,7 @@
)
-class Fuzzer(object):
+class Fuzzer:
def __init__(
self,
target,
diff --git a/python/cudf/cudf/_fuzz_testing/io.py b/python/cudf/cudf/_fuzz_testing/io.py
index 193fb4c7f7f..dfc59a1f18d 100644
--- a/python/cudf/cudf/_fuzz_testing/io.py
+++ b/python/cudf/cudf/_fuzz_testing/io.py
@@ -16,7 +16,7 @@
)
-class IOFuzz(object):
+class IOFuzz:
def __init__(
self,
dirs=None,
@@ -59,7 +59,7 @@ def __init__(
self._current_buffer = None
def _load_params(self, path):
- with open(path, "r") as f:
+ with open(path) as f:
params = json.load(f)
self._inputs.append(params)
diff --git a/python/cudf/cudf/_fuzz_testing/main.py b/python/cudf/cudf/_fuzz_testing/main.py
index 7b28a4c4970..6b536fc3e2e 100644
--- a/python/cudf/cudf/_fuzz_testing/main.py
+++ b/python/cudf/cudf/_fuzz_testing/main.py
@@ -3,7 +3,7 @@
from cudf._fuzz_testing import fuzzer
-class PythonFuzz(object):
+class PythonFuzz:
def __init__(self, func, params=None, data_handle=None, **kwargs):
self.function = func
self.data_handler_class = data_handle
diff --git a/python/cudf/cudf/_version.py b/python/cudf/cudf/_version.py
index a511ab98acf..c6281349c50 100644
--- a/python/cudf/cudf/_version.py
+++ b/python/cudf/cudf/_version.py
@@ -86,7 +86,7 @@ def run_command(
stderr=(subprocess.PIPE if hide_stderr else None),
)
break
- except EnvironmentError:
+ except OSError:
e = sys.exc_info()[1]
if e.errno == errno.ENOENT:
continue
@@ -96,7 +96,7 @@ def run_command(
return None, None
else:
if verbose:
- print("unable to find command, tried %s" % (commands,))
+ print(f"unable to find command, tried {commands}")
return None, None
stdout = p.communicate()[0].strip()
if sys.version_info[0] >= 3:
@@ -149,7 +149,7 @@ def git_get_keywords(versionfile_abs):
# _version.py.
keywords = {}
try:
- f = open(versionfile_abs, "r")
+ f = open(versionfile_abs)
for line in f.readlines():
if line.strip().startswith("git_refnames ="):
mo = re.search(r'=\s*"(.*)"', line)
@@ -164,7 +164,7 @@ def git_get_keywords(versionfile_abs):
if mo:
keywords["date"] = mo.group(1)
f.close()
- except EnvironmentError:
+ except OSError:
pass
return keywords
@@ -188,11 +188,11 @@ def git_versions_from_keywords(keywords, tag_prefix, verbose):
if verbose:
print("keywords are unexpanded, not using")
raise NotThisMethod("unexpanded keywords, not a git-archive tarball")
- refs = set([r.strip() for r in refnames.strip("()").split(",")])
+ refs = {r.strip() for r in refnames.strip("()").split(",")}
# starting in git-1.8.3, tags are listed as "tag: foo-1.0" instead of
# just "foo-1.0". If we see a "tag: " prefix, prefer those.
TAG = "tag: "
- tags = set([r[len(TAG) :] for r in refs if r.startswith(TAG)])
+ tags = {r[len(TAG) :] for r in refs if r.startswith(TAG)}
if not tags:
# Either we're using git < 1.8.3, or there really are no tags. We use
# a heuristic: assume all version tags have a digit. The old git %d
@@ -201,7 +201,7 @@ def git_versions_from_keywords(keywords, tag_prefix, verbose):
# between branches and tags. By ignoring refnames without digits, we
# filter out many common branch names like "release" and
# "stabilization", as well as "HEAD" and "master".
- tags = set([r for r in refs if re.search(r"\d", r)])
+ tags = {r for r in refs if re.search(r"\d", r)}
if verbose:
print("discarding '%s', no digits" % ",".join(refs - tags))
if verbose:
@@ -308,10 +308,9 @@ def git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command):
if verbose:
fmt = "tag '%s' doesn't start with prefix '%s'"
print(fmt % (full_tag, tag_prefix))
- pieces["error"] = "tag '%s' doesn't start with prefix '%s'" % (
- full_tag,
- tag_prefix,
- )
+ pieces[
+ "error"
+ ] = f"tag '{full_tag}' doesn't start with prefix '{tag_prefix}'"
return pieces
pieces["closest-tag"] = full_tag[len(tag_prefix) :]
diff --git a/python/cudf/cudf/comm/gpuarrow.py b/python/cudf/cudf/comm/gpuarrow.py
index b6089b65aa5..7879261139d 100644
--- a/python/cudf/cudf/comm/gpuarrow.py
+++ b/python/cudf/cudf/comm/gpuarrow.py
@@ -58,7 +58,7 @@ def to_dict(self):
return dc
-class GpuArrowNodeReader(object):
+class GpuArrowNodeReader:
def __init__(self, table, index):
self._table = table
self._field = table.schema[index]
diff --git a/python/cudf/cudf/core/_base_index.py b/python/cudf/cudf/core/_base_index.py
index 6569184e90b..2e6f138d2e3 100644
--- a/python/cudf/cudf/core/_base_index.py
+++ b/python/cudf/cudf/core/_base_index.py
@@ -1,9 +1,8 @@
# Copyright (c) 2021, NVIDIA CORPORATION.
-from __future__ import annotations, division, print_function
+from __future__ import annotations
import pickle
-import warnings
from typing import Any, Set
import pandas as pd
@@ -1350,28 +1349,6 @@ def isin(self, values):
return self._values.isin(values).values
- def memory_usage(self, deep=False):
- """
- Memory usage of the values.
-
- Parameters
- ----------
- deep : bool
- Introspect the data deeply,
- interrogate `object` dtypes for system-level
- memory consumption.
-
- Returns
- -------
- bytes used
- """
- if deep:
- warnings.warn(
- "The deep parameter is ignored and is only included "
- "for pandas compatibility."
- )
- return self._values.memory_usage()
-
@classmethod
def from_pandas(cls, index, nan_as_null=None):
"""
diff --git a/python/cudf/cudf/core/column/column.py b/python/cudf/cudf/core/column/column.py
index 19313dd3fe2..2c3951c0e5e 100644
--- a/python/cudf/cudf/core/column/column.py
+++ b/python/cudf/cudf/core/column/column.py
@@ -77,12 +77,12 @@
pandas_dtypes_alias_to_cudf_alias,
pandas_dtypes_to_np_dtypes,
)
-from cudf.utils.utils import mask_dtype
+from cudf.utils.utils import NotIterable, mask_dtype
T = TypeVar("T", bound="ColumnBase")
-class ColumnBase(Column, Serializable):
+class ColumnBase(Column, Serializable, NotIterable):
def as_frame(self) -> "cudf.core.frame.Frame":
"""
Converts a Column to Frame
@@ -130,9 +130,6 @@ def to_pandas(self, index: pd.Index = None, **kwargs) -> "pd.Series":
pd_series.index = index
return pd_series
- def __iter__(self):
- cudf.utils.utils.raise_iteration_error(obj=self)
-
@property
def values_host(self) -> "np.ndarray":
"""
diff --git a/python/cudf/cudf/core/column/string.py b/python/cudf/cudf/core/column/string.py
index 6467fd39ddd..22b7a0f9d2c 100644
--- a/python/cudf/cudf/core/column/string.py
+++ b/python/cudf/cudf/core/column/string.py
@@ -5083,7 +5083,7 @@ def to_arrow(self) -> pa.Array:
"""
if self.null_count == len(self):
return pa.NullArray.from_buffers(
- pa.null(), len(self), [pa.py_buffer((b""))]
+ pa.null(), len(self), [pa.py_buffer(b"")]
)
else:
return super().to_arrow()
diff --git a/python/cudf/cudf/core/dataframe.py b/python/cudf/cudf/core/dataframe.py
index 3735a949277..9d179994174 100644
--- a/python/cudf/cudf/core/dataframe.py
+++ b/python/cudf/cudf/core/dataframe.py
@@ -1,6 +1,6 @@
# Copyright (c) 2018-2022, NVIDIA CORPORATION.
-from __future__ import annotations, division
+from __future__ import annotations
import functools
import inspect
@@ -1242,66 +1242,9 @@ def _slice(self: T, arg: slice) -> T:
return result
def memory_usage(self, index=True, deep=False):
- """
- Return the memory usage of each column in bytes.
- The memory usage can optionally include the contribution of
- the index and elements of `object` dtype.
-
- Parameters
- ----------
- index : bool, default True
- Specifies whether to include the memory usage of the DataFrame's
- index in returned Series. If ``index=True``, the memory usage of
- the index is the first item in the output.
- deep : bool, default False
- If True, introspect the data deeply by interrogating
- `object` dtypes for system-level memory consumption, and include
- it in the returned values.
-
- Returns
- -------
- Series
- A Series whose index is the original column names and whose values
- is the memory usage of each column in bytes.
-
- Examples
- --------
- >>> dtypes = ['int64', 'float64', 'object', 'bool']
- >>> data = dict([(t, np.ones(shape=5000).astype(t))
- ... for t in dtypes])
- >>> df = cudf.DataFrame(data)
- >>> df.head()
- int64 float64 object bool
- 0 1 1.0 1.0 True
- 1 1 1.0 1.0 True
- 2 1 1.0 1.0 True
- 3 1 1.0 1.0 True
- 4 1 1.0 1.0 True
- >>> df.memory_usage(index=False)
- int64 40000
- float64 40000
- object 40000
- bool 5000
- dtype: int64
-
- Use a Categorical for efficient storage of an object-dtype column with
- many repeated values.
-
- >>> df['object'].astype('category').memory_usage(deep=True)
- 5008
- """
- if deep:
- warnings.warn(
- "The deep parameter is ignored and is only included "
- "for pandas compatibility."
- )
- ind = list(self.columns)
- sizes = [col.memory_usage() for col in self._data.columns]
- if index:
- ind.append("Index")
- ind = cudf.Index(ind, dtype="str")
- sizes.append(self.index.memory_usage())
- return Series(sizes, index=ind)
+ return Series(
+ {str(k): v for k, v in super().memory_usage(index, deep).items()}
+ )
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
if method == "__call__" and hasattr(cudf, ufunc.__name__):
@@ -2547,11 +2490,6 @@ def reset_index(
inplace=inplace,
)
- def take(self, indices, axis=0):
- out = super().take(indices)
- out.columns = self.columns
- return out
-
@annotate("INSERT", color="green", domain="cudf_python")
def insert(self, loc, name, value, nan_as_null=None):
"""Add a column to DataFrame at the index specified by loc.
@@ -4229,7 +4167,7 @@ def _verbose_repr():
dtype = self.dtypes.iloc[i]
col = pprint_thing(col)
- line_no = _put_str(" {num}".format(num=i), space_num)
+ line_no = _put_str(f" {i}", space_num)
count = ""
if show_counts:
count = counts[i]
@@ -5576,9 +5514,7 @@ def select_dtypes(self, include=None, exclude=None):
if issubclass(dtype.type, e_dtype):
exclude_subtypes.add(dtype.type)
- include_all = set(
- [cudf_dtype_from_pydata_dtype(d) for d in self.dtypes]
- )
+ include_all = {cudf_dtype_from_pydata_dtype(d) for d in self.dtypes}
if include:
inclusion = include_all & include_subtypes
@@ -6329,8 +6265,8 @@ def _align_indices(lhs, rhs):
lhs_out = DataFrame(index=df.index)
rhs_out = DataFrame(index=df.index)
common = set(lhs.columns) & set(rhs.columns)
- common_x = set(["{}_x".format(x) for x in common])
- common_y = set(["{}_y".format(x) for x in common])
+ common_x = {f"{x}_x" for x in common}
+ common_y = {f"{x}_y" for x in common}
for col in df.columns:
if col in common_x:
lhs_out[col[:-2]] = df[col]
diff --git a/python/cudf/cudf/core/frame.py b/python/cudf/cudf/core/frame.py
index 2e01a29b961..6b83f927727 100644
--- a/python/cudf/cudf/core/frame.py
+++ b/python/cudf/cudf/core/frame.py
@@ -337,6 +337,26 @@ def empty(self):
"""
return self.size == 0
+ def memory_usage(self, deep=False):
+ """Return the memory usage of an object.
+
+ Parameters
+ ----------
+ deep : bool
+ The deep parameter is ignored and is only included for pandas
+ compatibility.
+
+ Returns
+ -------
+ The total bytes used.
+ """
+ if deep:
+ warnings.warn(
+ "The deep parameter is ignored and is only included "
+ "for pandas compatibility."
+ )
+ return {name: col.memory_usage() for name, col in self._data.items()}
+
def __len__(self):
return self._num_rows
diff --git a/python/cudf/cudf/core/groupby/groupby.py b/python/cudf/cudf/core/groupby/groupby.py
index a393d8e9457..ff700144bed 100644
--- a/python/cudf/cudf/core/groupby/groupby.py
+++ b/python/cudf/cudf/core/groupby/groupby.py
@@ -1461,7 +1461,7 @@ def apply(self, func):
# TODO: should we define this as a dataclass instead?
-class Grouper(object):
+class Grouper:
def __init__(
self, key=None, level=None, freq=None, closed=None, label=None
):
diff --git a/python/cudf/cudf/core/index.py b/python/cudf/cudf/core/index.py
index fc59d15e264..f71f930a21c 100644
--- a/python/cudf/cudf/core/index.py
+++ b/python/cudf/cudf/core/index.py
@@ -1,6 +1,6 @@
# Copyright (c) 2018-2021, NVIDIA CORPORATION.
-from __future__ import annotations, division, print_function
+from __future__ import annotations
import math
import pickle
@@ -826,6 +826,9 @@ def _concat(cls, objs):
result.name = name
return result
+ def memory_usage(self, deep=False):
+ return sum(super().memory_usage(deep=deep).values())
+
@annotate("INDEX_EQUALS", color="green", domain="cudf_python")
def equals(self, other, **kwargs):
"""
diff --git a/python/cudf/cudf/core/indexed_frame.py b/python/cudf/cudf/core/indexed_frame.py
index 8ecab2c7c65..fab5d75f62b 100644
--- a/python/cudf/cudf/core/indexed_frame.py
+++ b/python/cudf/cudf/core/indexed_frame.py
@@ -473,6 +473,68 @@ def sort_index(
out = out.reset_index(drop=True)
return self._mimic_inplace(out, inplace=inplace)
+ def memory_usage(self, index=True, deep=False):
+ """Return the memory usage of an object.
+
+ Parameters
+ ----------
+ index : bool, default True
+ Specifies whether to include the memory usage of the index.
+ deep : bool, default False
+ The deep parameter is ignored and is only included for pandas
+ compatibility.
+
+ Returns
+ -------
+ Series or scalar
+ For DataFrame, a Series whose index is the original column names
+ and whose values is the memory usage of each column in bytes. For a
+ Series the total memory usage.
+
+ Examples
+ --------
+ **DataFrame**
+
+ >>> dtypes = ['int64', 'float64', 'object', 'bool']
+ >>> data = dict([(t, np.ones(shape=5000).astype(t))
+ ... for t in dtypes])
+ >>> df = cudf.DataFrame(data)
+ >>> df.head()
+ int64 float64 object bool
+ 0 1 1.0 1.0 True
+ 1 1 1.0 1.0 True
+ 2 1 1.0 1.0 True
+ 3 1 1.0 1.0 True
+ 4 1 1.0 1.0 True
+ >>> df.memory_usage(index=False)
+ int64 40000
+ float64 40000
+ object 40000
+ bool 5000
+ dtype: int64
+
+ Use a Categorical for efficient storage of an object-dtype column with
+ many repeated values.
+
+ >>> df['object'].astype('category').memory_usage(deep=True)
+ 5008
+
+ **Series**
+ >>> s = cudf.Series(range(3), index=['a','b','c'])
+ >>> s.memory_usage()
+ 43
+
+ Not including the index gives the size of the rest of the data, which
+ is necessarily smaller:
+
+ >>> s.memory_usage(index=False)
+ 24
+ """
+ usage = super().memory_usage(deep=deep)
+ if index:
+ usage["Index"] = self.index.memory_usage()
+ return usage
+
def hash_values(self, method="murmur3"):
"""Compute the hash of values in this column.
diff --git a/python/cudf/cudf/core/join/join.py b/python/cudf/cudf/core/join/join.py
index 704274815f6..39ff4718550 100644
--- a/python/cudf/cudf/core/join/join.py
+++ b/python/cudf/cudf/core/join/join.py
@@ -169,13 +169,11 @@ def __init__(
if on
else set()
if (self._using_left_index or self._using_right_index)
- else set(
- [
- lkey.name
- for lkey, rkey in zip(self._left_keys, self._right_keys)
- if lkey.name == rkey.name
- ]
- )
+ else {
+ lkey.name
+ for lkey, rkey in zip(self._left_keys, self._right_keys)
+ if lkey.name == rkey.name
+ }
)
def perform_merge(self) -> Frame:
diff --git a/python/cudf/cudf/core/multiindex.py b/python/cudf/cudf/core/multiindex.py
index adce3c24a83..8581b97c217 100644
--- a/python/cudf/cudf/core/multiindex.py
+++ b/python/cudf/cudf/core/multiindex.py
@@ -5,7 +5,6 @@
import itertools
import numbers
import pickle
-import warnings
from collections.abc import Sequence
from numbers import Integral
from typing import Any, List, MutableMapping, Optional, Tuple, Union
@@ -23,10 +22,14 @@
from cudf.core._compat import PANDAS_GE_120
from cudf.core.frame import Frame
from cudf.core.index import BaseIndex, _lexsorted_equal_range, as_index
-from cudf.utils.utils import _maybe_indices_to_slice, cached_property
+from cudf.utils.utils import (
+ NotIterable,
+ _maybe_indices_to_slice,
+ cached_property,
+)
-class MultiIndex(Frame, BaseIndex):
+class MultiIndex(Frame, BaseIndex, NotIterable):
"""A multi-level or hierarchical index.
Provides N-Dimensional indexing into Series and DataFrame objects.
@@ -115,7 +118,7 @@ def __init__(
"MultiIndex has unequal number of levels and "
"codes and is inconsistent!"
)
- if len(set(c.size for c in codes._data.columns)) != 1:
+ if len({c.size for c in codes._data.columns}) != 1:
raise ValueError(
"MultiIndex length of codes does not match "
"and is inconsistent!"
@@ -367,9 +370,6 @@ def copy(
return mi
- def __iter__(self):
- cudf.utils.utils.raise_iteration_error(obj=self)
-
def __repr__(self):
max_seq_items = get_option("display.max_seq_items") or len(self)
@@ -752,7 +752,7 @@ def _index_and_downcast(self, result, index, index_key):
# Pandas returns an empty Series with a tuple as name
# the one expected result column
result = cudf.Series._from_data(
- {}, name=tuple((col[0] for col in index._data.columns))
+ {}, name=tuple(col[0] for col in index._data.columns)
)
elif out_index._num_columns == 1:
# If there's only one column remaining in the output index, convert
@@ -1202,7 +1202,7 @@ def _poplevels(self, level):
if not pd.api.types.is_list_like(level):
level = (level,)
- ilevels = sorted([self._level_index_from_level(lev) for lev in level])
+ ilevels = sorted(self._level_index_from_level(lev) for lev in level)
if not ilevels:
return None
@@ -1412,22 +1412,14 @@ def _clean_nulls_from_index(self):
)
def memory_usage(self, deep=False):
- if deep:
- warnings.warn(
- "The deep parameter is ignored and is only included "
- "for pandas compatibility."
- )
-
- n = 0
- for col in self._data.columns:
- n += col.memory_usage()
+ usage = sum(super().memory_usage(deep=deep).values())
if self.levels:
for level in self.levels:
- n += level.memory_usage(deep=deep)
+ usage += level.memory_usage(deep=deep)
if self.codes:
for col in self.codes._data.columns:
- n += col.memory_usage()
- return n
+ usage += col.memory_usage()
+ return usage
def difference(self, other, sort=None):
if hasattr(other, "to_pandas"):
diff --git a/python/cudf/cudf/core/scalar.py b/python/cudf/cudf/core/scalar.py
index b0770b71ca6..134b94bf0f2 100644
--- a/python/cudf/cudf/core/scalar.py
+++ b/python/cudf/cudf/core/scalar.py
@@ -17,7 +17,7 @@
)
-class Scalar(object):
+class Scalar:
"""
A GPU-backed scalar object with NumPy scalar like properties
May be used in binary operations against other scalars, cuDF
diff --git a/python/cudf/cudf/core/series.py b/python/cudf/cudf/core/series.py
index 12a2538b776..5823ea18d1b 100644
--- a/python/cudf/cudf/core/series.py
+++ b/python/cudf/cudf/core/series.py
@@ -167,7 +167,7 @@ def __getitem__(self, arg: Any) -> Union[ScalarLike, DataFrameOrSeries]:
if (
isinstance(arg, tuple)
and len(arg) == self._frame._index.nlevels
- and not any((isinstance(x, slice) for x in arg))
+ and not any(isinstance(x, slice) for x in arg)
):
result = result.iloc[0]
return result
@@ -953,52 +953,7 @@ def to_frame(self, name=None):
return cudf.DataFrame({col: self._column}, index=self.index)
def memory_usage(self, index=True, deep=False):
- """
- Return the memory usage of the Series.
-
- The memory usage can optionally include the contribution of
- the index and of elements of `object` dtype.
-
- Parameters
- ----------
- index : bool, default True
- Specifies whether to include the memory usage of the Series index.
- deep : bool, default False
- If True, introspect the data deeply by interrogating
- `object` dtypes for system-level memory consumption, and include
- it in the returned value.
-
- Returns
- -------
- int
- Bytes of memory consumed.
-
- See Also
- --------
- cudf.DataFrame.memory_usage : Bytes consumed by
- a DataFrame.
-
- Examples
- --------
- >>> s = cudf.Series(range(3), index=['a','b','c'])
- >>> s.memory_usage()
- 43
-
- Not including the index gives the size of the rest of the data, which
- is necessarily smaller:
-
- >>> s.memory_usage(index=False)
- 24
- """
- if deep:
- warnings.warn(
- "The deep parameter is ignored and is only included "
- "for pandas compatibility."
- )
- n = self._column.memory_usage()
- if index:
- n += self._index.memory_usage()
- return n
+ return sum(super().memory_usage(index, deep).values())
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
if method == "__call__":
@@ -2722,42 +2677,6 @@ def unique(self):
res = self._column.unique()
return Series(res, name=self.name)
- def nunique(self, method="sort", dropna=True):
- """Returns the number of unique values of the Series: approximate version,
- and exact version to be moved to libcudf
-
- Excludes NA values by default.
-
- Parameters
- ----------
- dropna : bool, default True
- Don't include NA values in the count.
-
- Returns
- -------
- int
-
- Examples
- --------
- >>> import cudf
- >>> s = cudf.Series([1, 3, 5, 7, 7])
- >>> s
- 0 1
- 1 3
- 2 5
- 3 7
- 4 7
- dtype: int64
- >>> s.nunique()
- 4
- """
- if method != "sort":
- msg = "non sort based distinct_count() not implemented yet"
- raise NotImplementedError(msg)
- if self.null_count == len(self):
- return 0
- return super().nunique(method, dropna)
-
def value_counts(
self,
normalize=False,
@@ -2969,7 +2888,7 @@ def _prepare_percentiles(percentiles):
return percentiles
def _format_percentile_names(percentiles):
- return ["{0}%".format(int(x * 100)) for x in percentiles]
+ return [f"{int(x * 100)}%" for x in percentiles]
def _format_stats_values(stats_data):
return map(lambda x: round(x, 6), stats_data)
@@ -3071,7 +2990,7 @@ def _describe_timestamp(self):
.to_numpy(na_value=np.nan),
)
),
- "max": str(pd.Timestamp((self.max()))),
+ "max": str(pd.Timestamp(self.max())),
}
return Series(
@@ -3327,6 +3246,11 @@ def merge(
method="hash",
suffixes=("_x", "_y"),
):
+ warnings.warn(
+ "Series.merge is deprecated and will be removed in a future "
+ "release. Use cudf.merge instead.",
+ FutureWarning,
+ )
if left_on not in (self.name, None):
raise ValueError(
"Series to other merge uses series name as key implicitly"
@@ -3550,7 +3474,7 @@ def wrapper(self, other, level=None, fill_value=None, axis=0):
setattr(Series, binop, make_binop_func(binop))
-class DatetimeProperties(object):
+class DatetimeProperties:
"""
Accessor object for datetimelike properties of the Series values.
@@ -4492,7 +4416,7 @@ def strftime(self, date_format, *args, **kwargs):
)
-class TimedeltaProperties(object):
+class TimedeltaProperties:
"""
Accessor object for timedeltalike properties of the Series values.
diff --git a/python/cudf/cudf/core/single_column_frame.py b/python/cudf/cudf/core/single_column_frame.py
index ef479f19363..bf867923b57 100644
--- a/python/cudf/cudf/core/single_column_frame.py
+++ b/python/cudf/cudf/core/single_column_frame.py
@@ -15,11 +15,12 @@
from cudf.api.types import _is_scalar_or_zero_d_array
from cudf.core.column import ColumnBase, as_column
from cudf.core.frame import Frame
+from cudf.utils.utils import NotIterable
T = TypeVar("T", bound="Frame")
-class SingleColumnFrame(Frame):
+class SingleColumnFrame(Frame, NotIterable):
"""A one-dimensional frame.
Frames with only a single column share certain logic that is encoded in
@@ -85,12 +86,6 @@ def shape(self):
"""Get a tuple representing the dimensionality of the Index."""
return (len(self),)
- def __iter__(self):
- # Iterating over a GPU object is not efficient and hence not supported.
- # Consider using ``.to_arrow()``, ``.to_pandas()`` or ``.values_host``
- # if you wish to iterate over the values.
- cudf.utils.utils.raise_iteration_error(obj=self)
-
def __bool__(self):
raise TypeError(
f"The truth value of a {type(self)} is ambiguous. Use "
@@ -343,4 +338,6 @@ def nunique(self, method: builtins.str = "sort", dropna: bool = True):
int
Number of unique values in the column.
"""
+ if self._column.null_count == len(self):
+ return 0
return self._column.distinct_count(method=method, dropna=dropna)
diff --git a/python/cudf/cudf/core/udf/typing.py b/python/cudf/cudf/core/udf/typing.py
index da7ff4c0e32..56e8bec74dc 100644
--- a/python/cudf/cudf/core/udf/typing.py
+++ b/python/cudf/cudf/core/udf/typing.py
@@ -133,8 +133,8 @@ def typeof_masked(val, c):
class MaskedConstructor(ConcreteTemplate):
key = api.Masked
units = ["ns", "ms", "us", "s"]
- datetime_cases = set(types.NPDatetime(u) for u in units)
- timedelta_cases = set(types.NPTimedelta(u) for u in units)
+ datetime_cases = {types.NPDatetime(u) for u in units}
+ timedelta_cases = {types.NPTimedelta(u) for u in units}
cases = [
nb_signature(MaskedType(t), t, types.boolean)
for t in (
diff --git a/python/cudf/cudf/datasets.py b/python/cudf/cudf/datasets.py
index 2341a5c23b9..d7a2fedef59 100644
--- a/python/cudf/cudf/datasets.py
+++ b/python/cudf/cudf/datasets.py
@@ -57,9 +57,7 @@ def timeseries(
pd.date_range(start, end, freq=freq, name="timestamp")
)
state = np.random.RandomState(seed)
- columns = dict(
- (k, make[dt](len(index), state)) for k, dt in dtypes.items()
- )
+ columns = {k: make[dt](len(index), state) for k, dt in dtypes.items()}
df = pd.DataFrame(columns, index=index, columns=sorted(columns))
if df.index[-1] == end:
df = df.iloc[:-1]
@@ -110,7 +108,7 @@ def randomdata(nrows=10, dtypes=None, seed=None):
if dtypes is None:
dtypes = {"id": int, "x": float, "y": float}
state = np.random.RandomState(seed)
- columns = dict((k, make[dt](nrows, state)) for k, dt in dtypes.items())
+ columns = {k: make[dt](nrows, state) for k, dt in dtypes.items()}
df = pd.DataFrame(columns, columns=sorted(columns))
return cudf.from_pandas(df)
diff --git a/python/cudf/cudf/tests/test_api_types.py b/python/cudf/cudf/tests/test_api_types.py
index 4d104c122d1..e7cf113f604 100644
--- a/python/cudf/cudf/tests/test_api_types.py
+++ b/python/cudf/cudf/tests/test_api_types.py
@@ -17,9 +17,7 @@
(int(), False),
(float(), False),
(complex(), False),
- (str(), False),
("", False),
- (r"", False),
(object(), False),
# Base Python types.
(bool, False),
@@ -128,9 +126,7 @@ def test_is_categorical_dtype(obj, expect):
(int(), False),
(float(), False),
(complex(), False),
- (str(), False),
("", False),
- (r"", False),
(object(), False),
# Base Python types.
(bool, True),
@@ -235,9 +231,7 @@ def test_is_numeric_dtype(obj, expect):
(int(), False),
(float(), False),
(complex(), False),
- (str(), False),
("", False),
- (r"", False),
(object(), False),
# Base Python types.
(bool, False),
@@ -342,9 +336,7 @@ def test_is_integer_dtype(obj, expect):
(int(), True),
(float(), False),
(complex(), False),
- (str(), False),
("", False),
- (r"", False),
(object(), False),
# Base Python types.
(bool, False),
@@ -450,9 +442,7 @@ def test_is_integer(obj, expect):
(int(), False),
(float(), False),
(complex(), False),
- (str(), False),
("", False),
- (r"", False),
(object(), False),
# Base Python types.
(bool, False),
@@ -557,9 +547,7 @@ def test_is_string_dtype(obj, expect):
(int(), False),
(float(), False),
(complex(), False),
- (str(), False),
("", False),
- (r"", False),
(object(), False),
# Base Python types.
(bool, False),
@@ -664,9 +652,7 @@ def test_is_datetime_dtype(obj, expect):
(int(), False),
(float(), False),
(complex(), False),
- (str(), False),
("", False),
- (r"", False),
(object(), False),
# Base Python types.
(bool, False),
@@ -771,9 +757,7 @@ def test_is_list_dtype(obj, expect):
(int(), False),
(float(), False),
(complex(), False),
- (str(), False),
("", False),
- (r"", False),
(object(), False),
# Base Python types.
(bool, False),
@@ -881,9 +865,7 @@ def test_is_struct_dtype(obj, expect):
(int(), False),
(float(), False),
(complex(), False),
- (str(), False),
("", False),
- (r"", False),
(object(), False),
# Base Python types.
(bool, False),
@@ -988,9 +970,7 @@ def test_is_decimal_dtype(obj, expect):
int(),
float(),
complex(),
- str(),
"",
- r"",
object(),
# Base Python types.
bool,
@@ -1070,9 +1050,7 @@ def test_pandas_agreement(obj):
int(),
float(),
complex(),
- str(),
"",
- r"",
object(),
# Base Python types.
bool,
diff --git a/python/cudf/cudf/tests/test_binops.py b/python/cudf/cudf/tests/test_binops.py
index 921f2de38c2..76add8b9c5d 100644
--- a/python/cudf/cudf/tests/test_binops.py
+++ b/python/cudf/cudf/tests/test_binops.py
@@ -1,6 +1,5 @@
# Copyright (c) 2018-2022, NVIDIA CORPORATION.
-from __future__ import division
import decimal
import operator
diff --git a/python/cudf/cudf/tests/test_copying.py b/python/cudf/cudf/tests/test_copying.py
index 21a6a9172db..0d0ba579f22 100644
--- a/python/cudf/cudf/tests/test_copying.py
+++ b/python/cudf/cudf/tests/test_copying.py
@@ -1,5 +1,3 @@
-from __future__ import division, print_function
-
import numpy as np
import pandas as pd
import pytest
diff --git a/python/cudf/cudf/tests/test_cuda_apply.py b/python/cudf/cudf/tests/test_cuda_apply.py
index a00dbbba5f0..e8bd64b5061 100644
--- a/python/cudf/cudf/tests/test_cuda_apply.py
+++ b/python/cudf/cudf/tests/test_cuda_apply.py
@@ -98,7 +98,7 @@ def kernel(in1, in2, in3, out1, out2, extra1, extra2):
expect_out1 = extra2 * in1 - extra1 * in2 + in3
expect_out2 = np.hstack(
- np.arange((e - s)) for s, e in zip(chunks, chunks[1:] + [len(df)])
+ np.arange(e - s) for s, e in zip(chunks, chunks[1:] + [len(df)])
)
outdf = df.apply_chunks(
@@ -141,8 +141,7 @@ def kernel(in1, in2, in3, out1, out2, extra1, extra2):
expect_out1 = extra2 * in1 - extra1 * in2 + in3
expect_out2 = np.hstack(
- tpb * np.arange((e - s))
- for s, e in zip(chunks, chunks[1:] + [len(df)])
+ tpb * np.arange(e - s) for s, e in zip(chunks, chunks[1:] + [len(df)])
)
outdf = df.apply_chunks(
diff --git a/python/cudf/cudf/tests/test_dataframe.py b/python/cudf/cudf/tests/test_dataframe.py
index ba2caf7c6c8..5022f1a675b 100644
--- a/python/cudf/cudf/tests/test_dataframe.py
+++ b/python/cudf/cudf/tests/test_dataframe.py
@@ -246,17 +246,15 @@ def test_series_init_none():
sr1 = cudf.Series()
got = sr1.to_string()
- expect = sr1.to_pandas().__repr__()
- # values should match despite whitespace difference
- assert got.split() == expect.split()
+ expect = repr(sr1.to_pandas())
+ assert got == expect
# 2: Using `None` as an initializer
sr2 = cudf.Series(None)
got = sr2.to_string()
- expect = sr2.to_pandas().__repr__()
- # values should match despite whitespace difference
- assert got.split() == expect.split()
+ expect = repr(sr2.to_pandas())
+ assert got == expect
def test_dataframe_basic():
@@ -843,21 +841,20 @@ def test_dataframe_to_string_with_masked_data():
def test_dataframe_to_string_wide(monkeypatch):
monkeypatch.setenv("COLUMNS", "79")
# Test basic
- df = cudf.DataFrame()
- for i in range(100):
- df["a{}".format(i)] = list(range(3))
- pd.options.display.max_columns = 0
- got = df.to_string()
+ df = cudf.DataFrame({f"a{i}": [0, 1, 2] for i in range(100)})
+ with pd.option_context("display.max_columns", 0):
+ got = df.to_string()
- expect = """
- a0 a1 a2 a3 a4 a5 a6 a7 ... a92 a93 a94 a95 a96 a97 a98 a99
-0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
-1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
-2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2
-[3 rows x 100 columns]
-"""
- # values should match despite whitespace difference
- assert got.split() == expect.split()
+ expect = textwrap.dedent(
+ """\
+ a0 a1 a2 a3 a4 a5 a6 a7 ... a92 a93 a94 a95 a96 a97 a98 a99
+ 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
+ 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
+ 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2
+
+ [3 rows x 100 columns]""" # noqa: E501
+ )
+ assert got == expect
def test_dataframe_empty_to_string():
@@ -865,9 +862,8 @@ def test_dataframe_empty_to_string():
df = cudf.DataFrame()
got = df.to_string()
- expect = "Empty DataFrame\nColumns: []\nIndex: []\n"
- # values should match despite whitespace difference
- assert got.split() == expect.split()
+ expect = "Empty DataFrame\nColumns: []\nIndex: []"
+ assert got == expect
def test_dataframe_emptycolumns_to_string():
@@ -877,9 +873,8 @@ def test_dataframe_emptycolumns_to_string():
df["b"] = []
got = df.to_string()
- expect = "Empty DataFrame\nColumns: [a, b]\nIndex: []\n"
- # values should match despite whitespace difference
- assert got.split() == expect.split()
+ expect = "Empty DataFrame\nColumns: [a, b]\nIndex: []"
+ assert got == expect
def test_dataframe_copy():
@@ -890,14 +885,14 @@ def test_dataframe_copy():
df2["b"] = [4, 5, 6]
got = df.to_string()
- expect = """
- a
-0 1
-1 2
-2 3
-"""
- # values should match despite whitespace difference
- assert got.split() == expect.split()
+ expect = textwrap.dedent(
+ """\
+ a
+ 0 1
+ 1 2
+ 2 3"""
+ )
+ assert got == expect
def test_dataframe_copy_shallow():
@@ -908,14 +903,14 @@ def test_dataframe_copy_shallow():
df2["b"] = [4, 2, 3]
got = df.to_string()
- expect = """
- a
-0 1
-1 2
-2 3
-"""
- # values should match despite whitespace difference
- assert got.split() == expect.split()
+ expect = textwrap.dedent(
+ """\
+ a
+ 0 1
+ 1 2
+ 2 3"""
+ )
+ assert got == expect
def test_dataframe_dtypes():
@@ -1163,7 +1158,7 @@ def test_dataframe_hash_partition(nrows, nparts, nkeys):
gdf = cudf.DataFrame()
keycols = []
for i in range(nkeys):
- keyname = "key{}".format(i)
+ keyname = f"key{i}"
gdf[keyname] = np.random.randint(0, 7 - i, nrows)
keycols.append(keyname)
gdf["val1"] = np.random.randint(0, nrows * 2, nrows)
diff --git a/python/cudf/cudf/tests/test_factorize.py b/python/cudf/cudf/tests/test_factorize.py
index 1f16686a6a6..3081b7c4a6e 100644
--- a/python/cudf/cudf/tests/test_factorize.py
+++ b/python/cudf/cudf/tests/test_factorize.py
@@ -23,7 +23,7 @@ def test_factorize_series_obj(ncats, nelem):
assert isinstance(uvals, cp.ndarray)
assert isinstance(labels, Index)
- encoder = dict((labels[idx], idx) for idx in range(len(labels)))
+ encoder = {labels[idx]: idx for idx in range(len(labels))}
handcoded = [encoder[v] for v in arr]
np.testing.assert_array_equal(uvals.get(), handcoded)
@@ -42,7 +42,7 @@ def test_factorize_index_obj(ncats, nelem):
assert isinstance(uvals, cp.ndarray)
assert isinstance(labels, Index)
- encoder = dict((labels[idx], idx) for idx in range(len(labels)))
+ encoder = {labels[idx]: idx for idx in range(len(labels))}
handcoded = [encoder[v] for v in arr]
np.testing.assert_array_equal(uvals.get(), handcoded)
diff --git a/python/cudf/cudf/tests/test_gcs.py b/python/cudf/cudf/tests/test_gcs.py
index db53529b22f..307232b1305 100644
--- a/python/cudf/cudf/tests/test_gcs.py
+++ b/python/cudf/cudf/tests/test_gcs.py
@@ -48,14 +48,14 @@ def mock_size(*args):
# use_python_file_object=True, because the pyarrow
# `open_input_file` command will fail (since it doesn't
# use the monkey-patched `open` definition)
- got = cudf.read_csv("gcs://{}".format(fpath), use_python_file_object=False)
+ got = cudf.read_csv(f"gcs://{fpath}", use_python_file_object=False)
assert_eq(pdf, got)
# AbstractBufferedFile -> PythonFile conversion
# will work fine with the monkey-patched FS if we
# pass in an fsspec file object
fs = gcsfs.core.GCSFileSystem()
- with fs.open("gcs://{}".format(fpath)) as f:
+ with fs.open(f"gcs://{fpath}") as f:
got = cudf.read_csv(f)
assert_eq(pdf, got)
@@ -69,7 +69,7 @@ def mock_open(*args, **kwargs):
return open(local_filepath, "wb")
monkeypatch.setattr(gcsfs.core.GCSFileSystem, "open", mock_open)
- gdf.to_orc("gcs://{}".format(gcs_fname))
+ gdf.to_orc(f"gcs://{gcs_fname}")
got = pa.orc.ORCFile(local_filepath).read().to_pandas()
assert_eq(pdf, got)
diff --git a/python/cudf/cudf/tests/test_groupby.py b/python/cudf/cudf/tests/test_groupby.py
index f5decd62ea9..61c7d1958a0 100644
--- a/python/cudf/cudf/tests/test_groupby.py
+++ b/python/cudf/cudf/tests/test_groupby.py
@@ -84,11 +84,6 @@ def make_frame(
return df
-def get_nelem():
- for elem in [2, 3, 1000]:
- yield elem
-
-
@pytest.fixture
def gdf():
return DataFrame({"x": [1, 2, 3], "y": [0, 1, 1]})
@@ -1096,7 +1091,7 @@ def test_groupby_cumcount():
)
-@pytest.mark.parametrize("nelem", get_nelem())
+@pytest.mark.parametrize("nelem", [2, 3, 1000])
@pytest.mark.parametrize("as_index", [True, False])
@pytest.mark.parametrize(
"agg", ["min", "max", "idxmin", "idxmax", "mean", "count"]
diff --git a/python/cudf/cudf/tests/test_hdfs.py b/python/cudf/cudf/tests/test_hdfs.py
index 24554f113bb..2d61d6693cb 100644
--- a/python/cudf/cudf/tests/test_hdfs.py
+++ b/python/cudf/cudf/tests/test_hdfs.py
@@ -62,7 +62,7 @@ def test_read_csv(tmpdir, pdf, hdfs, test_url):
host, port, basedir
)
else:
- hd_fpath = "hdfs://{}/test_csv_reader.csv".format(basedir)
+ hd_fpath = f"hdfs://{basedir}/test_csv_reader.csv"
got = cudf.read_csv(hd_fpath)
@@ -81,7 +81,7 @@ def test_write_csv(pdf, hdfs, test_url):
host, port, basedir
)
else:
- hd_fpath = "hdfs://{}/test_csv_writer.csv".format(basedir)
+ hd_fpath = f"hdfs://{basedir}/test_csv_writer.csv"
gdf.to_csv(hd_fpath, index=False)
@@ -107,7 +107,7 @@ def test_read_parquet(tmpdir, pdf, hdfs, test_url):
host, port, basedir
)
else:
- hd_fpath = "hdfs://{}/test_parquet_reader.parquet".format(basedir)
+ hd_fpath = f"hdfs://{basedir}/test_parquet_reader.parquet"
got = cudf.read_parquet(hd_fpath)
@@ -126,7 +126,7 @@ def test_write_parquet(pdf, hdfs, test_url):
host, port, basedir
)
else:
- hd_fpath = "hdfs://{}/test_parquet_writer.parquet".format(basedir)
+ hd_fpath = f"hdfs://{basedir}/test_parquet_writer.parquet"
gdf.to_parquet(hd_fpath)
@@ -153,7 +153,7 @@ def test_write_parquet_partitioned(tmpdir, pdf, hdfs, test_url):
host, port, basedir
)
else:
- hd_fpath = "hdfs://{}/test_parquet_partitioned.parquet".format(basedir)
+ hd_fpath = f"hdfs://{basedir}/test_parquet_partitioned.parquet"
# Clear data written from previous runs
hdfs.rm(f"{basedir}/test_parquet_partitioned.parquet", recursive=True)
gdf.to_parquet(
@@ -186,7 +186,7 @@ def test_read_json(tmpdir, pdf, hdfs, test_url):
host, port, basedir
)
else:
- hd_fpath = "hdfs://{}/test_json_reader.json".format(basedir)
+ hd_fpath = f"hdfs://{basedir}/test_json_reader.json"
got = cudf.read_json(hd_fpath, engine="cudf", orient="records", lines=True)
@@ -207,9 +207,9 @@ def test_read_orc(datadir, hdfs, test_url):
hdfs.upload(basedir + "/file.orc", buffer)
if test_url:
- hd_fpath = "hdfs://{}:{}{}/file.orc".format(host, port, basedir)
+ hd_fpath = f"hdfs://{host}:{port}{basedir}/file.orc"
else:
- hd_fpath = "hdfs://{}/file.orc".format(basedir)
+ hd_fpath = f"hdfs://{basedir}/file.orc"
got = cudf.read_orc(hd_fpath)
expect = orc.ORCFile(buffer).read().to_pandas()
@@ -226,7 +226,7 @@ def test_write_orc(pdf, hdfs, test_url):
host, port, basedir
)
else:
- hd_fpath = "hdfs://{}/test_orc_writer.orc".format(basedir)
+ hd_fpath = f"hdfs://{basedir}/test_orc_writer.orc"
gdf.to_orc(hd_fpath)
@@ -247,9 +247,9 @@ def test_read_avro(datadir, hdfs, test_url):
hdfs.upload(basedir + "/file.avro", buffer)
if test_url:
- hd_fpath = "hdfs://{}:{}{}/file.avro".format(host, port, basedir)
+ hd_fpath = f"hdfs://{host}:{port}{basedir}/file.avro"
else:
- hd_fpath = "hdfs://{}/file.avro".format(basedir)
+ hd_fpath = f"hdfs://{basedir}/file.avro"
got = cudf.read_avro(hd_fpath)
with open(fname, mode="rb") as f:
@@ -270,7 +270,7 @@ def test_storage_options(tmpdir, pdf, hdfs):
# Write to hdfs
hdfs.upload(basedir + "/file.csv", buffer)
- hd_fpath = "hdfs://{}/file.csv".format(basedir)
+ hd_fpath = f"hdfs://{basedir}/file.csv"
storage_options = {"host": host, "port": port}
@@ -293,7 +293,7 @@ def test_storage_options_error(tmpdir, pdf, hdfs):
# Write to hdfs
hdfs.upload(basedir + "/file.csv", buffer)
- hd_fpath = "hdfs://{}:{}{}/file.avro".format(host, port, basedir)
+ hd_fpath = f"hdfs://{host}:{port}{basedir}/file.avro"
storage_options = {"host": host, "port": port}
diff --git a/python/cudf/cudf/tests/test_query.py b/python/cudf/cudf/tests/test_query.py
index 3de38b2cf6f..09129a43f07 100644
--- a/python/cudf/cudf/tests/test_query.py
+++ b/python/cudf/cudf/tests/test_query.py
@@ -1,6 +1,5 @@
# Copyright (c) 2018, NVIDIA CORPORATION.
-from __future__ import division, print_function
import datetime
import inspect
diff --git a/python/cudf/cudf/tests/test_reductions.py b/python/cudf/cudf/tests/test_reductions.py
index 40add502309..7106ab54686 100644
--- a/python/cudf/cudf/tests/test_reductions.py
+++ b/python/cudf/cudf/tests/test_reductions.py
@@ -1,6 +1,5 @@
# Copyright (c) 2020-2022, NVIDIA CORPORATION.
-from __future__ import division, print_function
import re
from decimal import Decimal
diff --git a/python/cudf/cudf/tests/test_s3.py b/python/cudf/cudf/tests/test_s3.py
index da1ffc1fc16..4807879a730 100644
--- a/python/cudf/cudf/tests/test_s3.py
+++ b/python/cudf/cudf/tests/test_s3.py
@@ -147,7 +147,7 @@ def test_read_csv(s3_base, s3so, pdf, bytes_per_thread):
# Use fsspec file object
with s3_context(s3_base=s3_base, bucket=bname, files={fname: buffer}):
got = cudf.read_csv(
- "s3://{}/{}".format(bname, fname),
+ f"s3://{bname}/{fname}",
storage_options=s3so,
bytes_per_thread=bytes_per_thread,
use_python_file_object=False,
@@ -157,7 +157,7 @@ def test_read_csv(s3_base, s3so, pdf, bytes_per_thread):
# Use Arrow PythonFile object
with s3_context(s3_base=s3_base, bucket=bname, files={fname: buffer}):
got = cudf.read_csv(
- "s3://{}/{}".format(bname, fname),
+ f"s3://{bname}/{fname}",
storage_options=s3so,
bytes_per_thread=bytes_per_thread,
use_python_file_object=True,
@@ -174,7 +174,7 @@ def test_read_csv_arrow_nativefile(s3_base, s3so, pdf):
fs = pa_fs.S3FileSystem(
endpoint_override=s3so["client_kwargs"]["endpoint_url"],
)
- with fs.open_input_file("{}/{}".format(bname, fname)) as fil:
+ with fs.open_input_file(f"{bname}/{fname}") as fil:
got = cudf.read_csv(fil)
assert_eq(pdf, got)
@@ -193,7 +193,7 @@ def test_read_csv_byte_range(
# Use fsspec file object
with s3_context(s3_base=s3_base, bucket=bname, files={fname: buffer}):
got = cudf.read_csv(
- "s3://{}/{}".format(bname, fname),
+ f"s3://{bname}/{fname}",
storage_options=s3so,
byte_range=(74, 73),
bytes_per_thread=bytes_per_thread,
@@ -213,15 +213,15 @@ def test_write_csv(s3_base, s3so, pdf, chunksize):
gdf = cudf.from_pandas(pdf)
with s3_context(s3_base=s3_base, bucket=bname) as s3fs:
gdf.to_csv(
- "s3://{}/{}".format(bname, fname),
+ f"s3://{bname}/{fname}",
index=False,
chunksize=chunksize,
storage_options=s3so,
)
- assert s3fs.exists("s3://{}/{}".format(bname, fname))
+ assert s3fs.exists(f"s3://{bname}/{fname}")
# TODO: Update to use `storage_options` from pandas v1.2.0
- got = pd.read_csv(s3fs.open("s3://{}/{}".format(bname, fname)))
+ got = pd.read_csv(s3fs.open(f"s3://{bname}/{fname}"))
assert_eq(pdf, got)
@@ -248,7 +248,7 @@ def test_read_parquet(
buffer.seek(0)
with s3_context(s3_base=s3_base, bucket=bname, files={fname: buffer}):
got1 = cudf.read_parquet(
- "s3://{}/{}".format(bname, fname),
+ f"s3://{bname}/{fname}",
open_file_options=(
{"precache_options": {"method": precache}}
if use_python_file_object
@@ -265,10 +265,10 @@ def test_read_parquet(
# Check fsspec file-object handling
buffer.seek(0)
with s3_context(s3_base=s3_base, bucket=bname, files={fname: buffer}):
- fs = get_fs_token_paths(
- "s3://{}/{}".format(bname, fname), storage_options=s3so
- )[0]
- with fs.open("s3://{}/{}".format(bname, fname), mode="rb") as f:
+ fs = get_fs_token_paths(f"s3://{bname}/{fname}", storage_options=s3so)[
+ 0
+ ]
+ with fs.open(f"s3://{bname}/{fname}", mode="rb") as f:
got2 = cudf.read_parquet(
f,
bytes_per_thread=bytes_per_thread,
@@ -297,7 +297,7 @@ def test_read_parquet_ext(
buffer.seek(0)
with s3_context(s3_base=s3_base, bucket=bname, files={fname: buffer}):
got1 = cudf.read_parquet(
- "s3://{}/{}".format(bname, fname),
+ f"s3://{bname}/{fname}",
storage_options=s3so,
bytes_per_thread=bytes_per_thread,
footer_sample_size=3200,
@@ -326,7 +326,7 @@ def test_read_parquet_arrow_nativefile(s3_base, s3so, pdf, columns):
fs = pa_fs.S3FileSystem(
endpoint_override=s3so["client_kwargs"]["endpoint_url"],
)
- with fs.open_input_file("{}/{}".format(bname, fname)) as fil:
+ with fs.open_input_file(f"{bname}/{fname}") as fil:
got = cudf.read_parquet(fil, columns=columns)
expect = pdf[columns] if columns else pdf
@@ -343,7 +343,7 @@ def test_read_parquet_filters(s3_base, s3so, pdf_ext, precache):
filters = [("String", "==", "Omega")]
with s3_context(s3_base=s3_base, bucket=bname, files={fname: buffer}):
got = cudf.read_parquet(
- "s3://{}/{}".format(bname, fname),
+ f"s3://{bname}/{fname}",
storage_options=s3so,
filters=filters,
open_file_options={"precache_options": {"method": precache}},
@@ -360,13 +360,13 @@ def test_write_parquet(s3_base, s3so, pdf, partition_cols):
gdf = cudf.from_pandas(pdf)
with s3_context(s3_base=s3_base, bucket=bname) as s3fs:
gdf.to_parquet(
- "s3://{}/{}".format(bname, fname),
+ f"s3://{bname}/{fname}",
partition_cols=partition_cols,
storage_options=s3so,
)
- assert s3fs.exists("s3://{}/{}".format(bname, fname))
+ assert s3fs.exists(f"s3://{bname}/{fname}")
- got = pd.read_parquet(s3fs.open("s3://{}/{}".format(bname, fname)))
+ got = pd.read_parquet(s3fs.open(f"s3://{bname}/{fname}"))
assert_eq(pdf, got)
@@ -383,7 +383,7 @@ def test_read_json(s3_base, s3so):
with s3_context(s3_base=s3_base, bucket=bname, files={fname: buffer}):
got = cudf.read_json(
- "s3://{}/{}".format(bname, fname),
+ f"s3://{bname}/{fname}",
engine="cudf",
orient="records",
lines=True,
@@ -407,7 +407,7 @@ def test_read_orc(s3_base, s3so, datadir, use_python_file_object, columns):
with s3_context(s3_base=s3_base, bucket=bname, files={fname: buffer}):
got = cudf.read_orc(
- "s3://{}/{}".format(bname, fname),
+ f"s3://{bname}/{fname}",
columns=columns,
storage_options=s3so,
use_python_file_object=use_python_file_object,
@@ -432,7 +432,7 @@ def test_read_orc_arrow_nativefile(s3_base, s3so, datadir, columns):
fs = pa_fs.S3FileSystem(
endpoint_override=s3so["client_kwargs"]["endpoint_url"],
)
- with fs.open_input_file("{}/{}".format(bname, fname)) as fil:
+ with fs.open_input_file(f"{bname}/{fname}") as fil:
got = cudf.read_orc(fil, columns=columns)
if columns:
@@ -445,10 +445,10 @@ def test_write_orc(s3_base, s3so, pdf):
bname = "orc"
gdf = cudf.from_pandas(pdf)
with s3_context(s3_base=s3_base, bucket=bname) as s3fs:
- gdf.to_orc("s3://{}/{}".format(bname, fname), storage_options=s3so)
- assert s3fs.exists("s3://{}/{}".format(bname, fname))
+ gdf.to_orc(f"s3://{bname}/{fname}", storage_options=s3so)
+ assert s3fs.exists(f"s3://{bname}/{fname}")
- with s3fs.open("s3://{}/{}".format(bname, fname)) as f:
+ with s3fs.open(f"s3://{bname}/{fname}") as f:
got = pa.orc.ORCFile(f).read().to_pandas()
assert_eq(pdf, got)
diff --git a/python/cudf/cudf/tests/test_sorting.py b/python/cudf/cudf/tests/test_sorting.py
index 00cd31e7539..10c3689fcd7 100644
--- a/python/cudf/cudf/tests/test_sorting.py
+++ b/python/cudf/cudf/tests/test_sorting.py
@@ -105,7 +105,7 @@ def test_series_argsort(nelem, dtype, asc):
)
def test_series_sort_index(nelem, asc):
np.random.seed(0)
- sr = Series((100 * np.random.random(nelem)))
+ sr = Series(100 * np.random.random(nelem))
psr = sr.to_pandas()
expected = psr.sort_index(ascending=asc)
diff --git a/python/cudf/cudf/tests/test_text.py b/python/cudf/cudf/tests/test_text.py
index a447a60c709..5ff66fc750f 100644
--- a/python/cudf/cudf/tests/test_text.py
+++ b/python/cudf/cudf/tests/test_text.py
@@ -763,7 +763,7 @@ def test_read_text(datadir):
chess_file = str(datadir) + "/chess.pgn"
delimiter = "1."
- with open(chess_file, "r") as f:
+ with open(chess_file) as f:
content = f.read().split(delimiter)
# Since Python split removes the delimiter and read_text does
diff --git a/python/cudf/cudf/tests/test_transform.py b/python/cudf/cudf/tests/test_transform.py
index 021c4052759..bd7ee45fbf8 100644
--- a/python/cudf/cudf/tests/test_transform.py
+++ b/python/cudf/cudf/tests/test_transform.py
@@ -1,6 +1,5 @@
# Copyright (c) 2018-2020, NVIDIA CORPORATION.
-from __future__ import division
import numpy as np
import pytest
diff --git a/python/cudf/cudf/tests/test_udf_binops.py b/python/cudf/cudf/tests/test_udf_binops.py
index c5cd8f8b717..173515509cd 100644
--- a/python/cudf/cudf/tests/test_udf_binops.py
+++ b/python/cudf/cudf/tests/test_udf_binops.py
@@ -1,5 +1,4 @@
# Copyright (c) 2018, NVIDIA CORPORATION.
-from __future__ import division
import numpy as np
import pytest
diff --git a/python/cudf/cudf/tests/test_unaops.py b/python/cudf/cudf/tests/test_unaops.py
index e79b74e3aab..2e8da615e3e 100644
--- a/python/cudf/cudf/tests/test_unaops.py
+++ b/python/cudf/cudf/tests/test_unaops.py
@@ -1,5 +1,3 @@
-from __future__ import division
-
import itertools
import operator
import re
diff --git a/python/cudf/cudf/utils/applyutils.py b/python/cudf/cudf/utils/applyutils.py
index 3cbbc1e1ce7..593965046e6 100644
--- a/python/cudf/cudf/utils/applyutils.py
+++ b/python/cudf/cudf/utils/applyutils.py
@@ -125,7 +125,7 @@ def make_aggregate_nullmask(df, columns=None, op="and"):
return out_mask
-class ApplyKernelCompilerBase(object):
+class ApplyKernelCompilerBase:
def __init__(
self, func, incols, outcols, kwargs, pessimistic_nulls, cache_key
):
@@ -253,7 +253,7 @@ def row_wise_kernel({args}):
srcidx.format(a=a, start=start, stop=stop, stride=stride)
)
- body.append("inner({})".format(args))
+ body.append(f"inner({args})")
indented = ["{}{}".format(" " * 4, ln) for ln in body]
# Finalize source
@@ -309,7 +309,7 @@ def chunk_wise_kernel(nrows, chunks, {args}):
slicedargs = {}
for a in argnames:
if a not in extras:
- slicedargs[a] = "{}[start:stop]".format(a)
+ slicedargs[a] = f"{a}[start:stop]"
else:
slicedargs[a] = str(a)
body.append(
@@ -361,4 +361,4 @@ def _load_cache_or_make_chunk_wise_kernel(func, *args, **kwargs):
def _mangle_user(name):
"""Mangle user variable name"""
- return "__user_{}".format(name)
+ return f"__user_{name}"
diff --git a/python/cudf/cudf/utils/cudautils.py b/python/cudf/cudf/utils/cudautils.py
index f0533dcaa72..742c747ab69 100755
--- a/python/cudf/cudf/utils/cudautils.py
+++ b/python/cudf/cudf/utils/cudautils.py
@@ -218,7 +218,7 @@ def make_cache_key(udf, sig):
codebytes = udf.__code__.co_code
constants = udf.__code__.co_consts
if udf.__closure__ is not None:
- cvars = tuple([x.cell_contents for x in udf.__closure__])
+ cvars = tuple(x.cell_contents for x in udf.__closure__)
cvarbytes = dumps(cvars)
else:
cvarbytes = b""
diff --git a/python/cudf/cudf/utils/dtypes.py b/python/cudf/cudf/utils/dtypes.py
index 44bbb1b493d..4cd1738996f 100644
--- a/python/cudf/cudf/utils/dtypes.py
+++ b/python/cudf/cudf/utils/dtypes.py
@@ -160,8 +160,8 @@ def numeric_normalize_types(*args):
def _find_common_type_decimal(dtypes):
# Find the largest scale and the largest difference between
# precision and scale of the columns to be concatenated
- s = max([dtype.scale for dtype in dtypes])
- lhs = max([dtype.precision - dtype.scale for dtype in dtypes])
+ s = max(dtype.scale for dtype in dtypes)
+ lhs = max(dtype.precision - dtype.scale for dtype in dtypes)
# Combine to get the necessary precision and clip at the maximum
# precision
p = s + lhs
@@ -525,7 +525,7 @@ def find_common_type(dtypes):
)
for dtype in dtypes
):
- if len(set(dtype._categories.dtype for dtype in dtypes)) == 1:
+ if len({dtype._categories.dtype for dtype in dtypes}) == 1:
return cudf.CategoricalDtype(
cudf.core.column.concat_columns(
[dtype._categories for dtype in dtypes]
diff --git a/python/cudf/cudf/utils/hash_vocab_utils.py b/python/cudf/cudf/utils/hash_vocab_utils.py
index 45004c5f107..11029cbfe5e 100644
--- a/python/cudf/cudf/utils/hash_vocab_utils.py
+++ b/python/cudf/cudf/utils/hash_vocab_utils.py
@@ -79,10 +79,8 @@ def _pick_initial_a_b(data, max_constant, init_bins):
longest = _new_bin_length(_longest_bin_length(bins))
if score <= max_constant and longest <= MAX_SIZE_FOR_INITIAL_BIN:
- print(
- "Attempting to build table using {:.6f}n space".format(score)
- )
- print("Longest bin was {}".format(longest))
+ print(f"Attempting to build table using {score:.6f}n space")
+ print(f"Longest bin was {longest}")
break
return bins, a, b
@@ -170,7 +168,7 @@ def _pack_keys_and_values(flattened_hash_table, original_dict):
def _load_vocab_dict(path):
vocab = {}
- with open(path, mode="r", encoding="utf-8") as f:
+ with open(path, encoding="utf-8") as f:
counter = 0
for line in f:
vocab[line.strip()] = counter
@@ -193,17 +191,17 @@ def _store_func(
):
with open(out_name, mode="w+") as f:
- f.write("{}\n".format(outer_a))
- f.write("{}\n".format(outer_b))
- f.write("{}\n".format(num_outer_bins))
+ f.write(f"{outer_a}\n")
+ f.write(f"{outer_b}\n")
+ f.write(f"{num_outer_bins}\n")
f.writelines(
- "{} {}\n".format(coeff, offset)
+ f"{coeff} {offset}\n"
for coeff, offset in zip(inner_table_coeffs, offsets_into_ht)
)
- f.write("{}\n".format(len(hash_table)))
- f.writelines("{}\n".format(kv) for kv in hash_table)
+ f.write(f"{len(hash_table)}\n")
+ f.writelines(f"{kv}\n" for kv in hash_table)
f.writelines(
- "{}\n".format(tok_id)
+ f"{tok_id}\n"
for tok_id in [unk_tok_id, first_token_id, sep_token_id]
)
@@ -295,6 +293,6 @@ def hash_vocab(
)
assert (
val == value
- ), "Incorrect value found. Got {} expected {}".format(val, value)
+ ), f"Incorrect value found. Got {val} expected {value}"
print("All present tokens return correct value.")
diff --git a/python/cudf/cudf/utils/queryutils.py b/python/cudf/cudf/utils/queryutils.py
index d9153c2b1d2..64218ddf46a 100644
--- a/python/cudf/cudf/utils/queryutils.py
+++ b/python/cudf/cudf/utils/queryutils.py
@@ -136,7 +136,7 @@ def query_compile(expr):
key "args" is a sequence of name of the arguments.
"""
- funcid = "queryexpr_{:x}".format(np.uintp(hash(expr)))
+ funcid = f"queryexpr_{np.uintp(hash(expr)):x}"
# Load cache
compiled = _cache.get(funcid)
# Cache not found
@@ -147,7 +147,7 @@ def query_compile(expr):
# compile
devicefn = cuda.jit(device=True)(fn)
- kernelid = "kernel_{}".format(funcid)
+ kernelid = f"kernel_{funcid}"
kernel = _wrap_query_expr(kernelid, devicefn, args)
compiled = info.copy()
@@ -173,10 +173,10 @@ def _add_idx(arg):
if arg.startswith(ENVREF_PREFIX):
return arg
else:
- return "{}[idx]".format(arg)
+ return f"{arg}[idx]"
def _add_prefix(arg):
- return "_args_{}".format(arg)
+ return f"_args_{arg}"
glbls = {"queryfn": fn, "cuda": cuda}
kernargs = map(_add_prefix, args)
diff --git a/python/cudf/cudf/utils/utils.py b/python/cudf/cudf/utils/utils.py
index add4ecd8f01..65a803d6768 100644
--- a/python/cudf/cudf/utils/utils.py
+++ b/python/cudf/cudf/utils/utils.py
@@ -204,12 +204,13 @@ def __getattr__(self, key):
)
-def raise_iteration_error(obj):
- raise TypeError(
- f"{obj.__class__.__name__} object is not iterable. "
- f"Consider using `.to_arrow()`, `.to_pandas()` or `.values_host` "
- f"if you wish to iterate over the values."
- )
+class NotIterable:
+ def __iter__(self):
+ raise TypeError(
+ f"{self.__class__.__name__} object is not iterable. "
+ f"Consider using `.to_arrow()`, `.to_pandas()` or `.values_host` "
+ f"if you wish to iterate over the values."
+ )
def pa_mask_buffer_to_mask(mask_buf, size):
diff --git a/python/cudf/setup.py b/python/cudf/setup.py
index a8e14504469..e4e43bc1595 100644
--- a/python/cudf/setup.py
+++ b/python/cudf/setup.py
@@ -63,9 +63,7 @@ def get_cuda_version_from_header(cuda_include_dir, delimeter=""):
cuda_version = None
- with open(
- os.path.join(cuda_include_dir, "cuda.h"), "r", encoding="utf-8"
- ) as f:
+ with open(os.path.join(cuda_include_dir, "cuda.h"), encoding="utf-8") as f:
for line in f.readlines():
if re.search(r"#define CUDA_VERSION ", line) is not None:
cuda_version = line
diff --git a/python/cudf_kafka/cudf_kafka/_version.py b/python/cudf_kafka/cudf_kafka/_version.py
index 5ab5c72e457..6cd10cc10bf 100644
--- a/python/cudf_kafka/cudf_kafka/_version.py
+++ b/python/cudf_kafka/cudf_kafka/_version.py
@@ -86,7 +86,7 @@ def run_command(
stderr=(subprocess.PIPE if hide_stderr else None),
)
break
- except EnvironmentError:
+ except OSError:
e = sys.exc_info()[1]
if e.errno == errno.ENOENT:
continue
@@ -96,7 +96,7 @@ def run_command(
return None, None
else:
if verbose:
- print("unable to find command, tried %s" % (commands,))
+ print(f"unable to find command, tried {commands}")
return None, None
stdout = p.communicate()[0].strip()
if sys.version_info[0] >= 3:
@@ -149,7 +149,7 @@ def git_get_keywords(versionfile_abs):
# _version.py.
keywords = {}
try:
- f = open(versionfile_abs, "r")
+ f = open(versionfile_abs)
for line in f.readlines():
if line.strip().startswith("git_refnames ="):
mo = re.search(r'=\s*"(.*)"', line)
@@ -164,7 +164,7 @@ def git_get_keywords(versionfile_abs):
if mo:
keywords["date"] = mo.group(1)
f.close()
- except EnvironmentError:
+ except OSError:
pass
return keywords
@@ -188,11 +188,11 @@ def git_versions_from_keywords(keywords, tag_prefix, verbose):
if verbose:
print("keywords are unexpanded, not using")
raise NotThisMethod("unexpanded keywords, not a git-archive tarball")
- refs = set([r.strip() for r in refnames.strip("()").split(",")])
+ refs = {r.strip() for r in refnames.strip("()").split(",")}
# starting in git-1.8.3, tags are listed as "tag: foo-1.0" instead of
# just "foo-1.0". If we see a "tag: " prefix, prefer those.
TAG = "tag: "
- tags = set([r[len(TAG) :] for r in refs if r.startswith(TAG)])
+ tags = {r[len(TAG) :] for r in refs if r.startswith(TAG)}
if not tags:
# Either we're using git < 1.8.3, or there really are no tags. We use
# a heuristic: assume all version tags have a digit. The old git %d
@@ -201,7 +201,7 @@ def git_versions_from_keywords(keywords, tag_prefix, verbose):
# between branches and tags. By ignoring refnames without digits, we
# filter out many common branch names like "release" and
# "stabilization", as well as "HEAD" and "master".
- tags = set([r for r in refs if re.search(r"\d", r)])
+ tags = {r for r in refs if re.search(r"\d", r)}
if verbose:
print("discarding '%s', no digits" % ",".join(refs - tags))
if verbose:
@@ -308,10 +308,9 @@ def git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command):
if verbose:
fmt = "tag '%s' doesn't start with prefix '%s'"
print(fmt % (full_tag, tag_prefix))
- pieces["error"] = "tag '%s' doesn't start with prefix '%s'" % (
- full_tag,
- tag_prefix,
- )
+ pieces[
+ "error"
+ ] = f"tag '{full_tag}' doesn't start with prefix '{tag_prefix}'"
return pieces
pieces["closest-tag"] = full_tag[len(tag_prefix) :]
diff --git a/python/cudf_kafka/versioneer.py b/python/cudf_kafka/versioneer.py
index 2260d5c2dcf..c7dbfd76734 100644
--- a/python/cudf_kafka/versioneer.py
+++ b/python/cudf_kafka/versioneer.py
@@ -275,7 +275,6 @@
"""
-from __future__ import print_function
import errno
import json
@@ -345,7 +344,7 @@ def get_config_from_root(root):
# the top of versioneer.py for instructions on writing your setup.cfg .
setup_cfg = os.path.join(root, "setup.cfg")
parser = configparser.SafeConfigParser()
- with open(setup_cfg, "r") as f:
+ with open(setup_cfg) as f:
parser.readfp(f)
VCS = parser.get("versioneer", "VCS") # mandatory
@@ -407,7 +406,7 @@ def run_command(
stderr=(subprocess.PIPE if hide_stderr else None),
)
break
- except EnvironmentError:
+ except OSError:
e = sys.exc_info()[1]
if e.errno == errno.ENOENT:
continue
@@ -417,7 +416,7 @@ def run_command(
return None, None
else:
if verbose:
- print("unable to find command, tried %s" % (commands,))
+ print(f"unable to find command, tried {commands}")
return None, None
stdout = p.communicate()[0].strip()
if sys.version_info[0] >= 3:
@@ -964,7 +963,7 @@ def git_get_keywords(versionfile_abs):
# _version.py.
keywords = {}
try:
- f = open(versionfile_abs, "r")
+ f = open(versionfile_abs)
for line in f.readlines():
if line.strip().startswith("git_refnames ="):
mo = re.search(r'=\s*"(.*)"', line)
@@ -979,7 +978,7 @@ def git_get_keywords(versionfile_abs):
if mo:
keywords["date"] = mo.group(1)
f.close()
- except EnvironmentError:
+ except OSError:
pass
return keywords
@@ -1003,11 +1002,11 @@ def git_versions_from_keywords(keywords, tag_prefix, verbose):
if verbose:
print("keywords are unexpanded, not using")
raise NotThisMethod("unexpanded keywords, not a git-archive tarball")
- refs = set([r.strip() for r in refnames.strip("()").split(",")])
+ refs = {r.strip() for r in refnames.strip("()").split(",")}
# starting in git-1.8.3, tags are listed as "tag: foo-1.0" instead of
# just "foo-1.0". If we see a "tag: " prefix, prefer those.
TAG = "tag: "
- tags = set([r[len(TAG) :] for r in refs if r.startswith(TAG)])
+ tags = {r[len(TAG) :] for r in refs if r.startswith(TAG)}
if not tags:
# Either we're using git < 1.8.3, or there really are no tags. We use
# a heuristic: assume all version tags have a digit. The old git %d
@@ -1016,7 +1015,7 @@ def git_versions_from_keywords(keywords, tag_prefix, verbose):
# between branches and tags. By ignoring refnames without digits, we
# filter out many common branch names like "release" and
# "stabilization", as well as "HEAD" and "master".
- tags = set([r for r in refs if re.search(r"\d", r)])
+ tags = {r for r in refs if re.search(r"\d", r)}
if verbose:
print("discarding '%s', no digits" % ",".join(refs - tags))
if verbose:
@@ -1123,9 +1122,8 @@ def git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command):
if verbose:
fmt = "tag '%s' doesn't start with prefix '%s'"
print(fmt % (full_tag, tag_prefix))
- pieces["error"] = "tag '%s' doesn't start with prefix '%s'" % (
- full_tag,
- tag_prefix,
+ pieces["error"] = "tag '{}' doesn't start with prefix '{}'".format(
+ full_tag, tag_prefix,
)
return pieces
pieces["closest-tag"] = full_tag[len(tag_prefix) :]
@@ -1175,13 +1173,13 @@ def do_vcs_install(manifest_in, versionfile_source, ipy):
files.append(versioneer_file)
present = False
try:
- f = open(".gitattributes", "r")
+ f = open(".gitattributes")
for line in f.readlines():
if line.strip().startswith(versionfile_source):
if "export-subst" in line.strip().split()[1:]:
present = True
f.close()
- except EnvironmentError:
+ except OSError:
pass
if not present:
f = open(".gitattributes", "a+")
@@ -1245,7 +1243,7 @@ def versions_from_file(filename):
try:
with open(filename) as f:
contents = f.read()
- except EnvironmentError:
+ except OSError:
raise NotThisMethod("unable to read _version.py")
mo = re.search(
r"version_json = '''\n(.*)''' # END VERSION_JSON",
@@ -1272,7 +1270,7 @@ def write_to_version_file(filename, versions):
with open(filename, "w") as f:
f.write(SHORT_VERSION_PY % contents)
- print("set %s to '%s'" % (filename, versions["version"]))
+ print("set {} to '{}'".format(filename, versions["version"]))
def plus_or_dot(pieces):
@@ -1497,7 +1495,7 @@ def get_versions(verbose=False):
try:
ver = versions_from_file(versionfile_abs)
if verbose:
- print("got version from file %s %s" % (versionfile_abs, ver))
+ print(f"got version from file {versionfile_abs} {ver}")
return ver
except NotThisMethod:
pass
@@ -1773,7 +1771,7 @@ def do_setup():
try:
cfg = get_config_from_root(root)
except (
- EnvironmentError,
+ OSError,
configparser.NoSectionError,
configparser.NoOptionError,
) as e:
@@ -1803,9 +1801,9 @@ def do_setup():
ipy = os.path.join(os.path.dirname(cfg.versionfile_source), "__init__.py")
if os.path.exists(ipy):
try:
- with open(ipy, "r") as f:
+ with open(ipy) as f:
old = f.read()
- except EnvironmentError:
+ except OSError:
old = ""
if INIT_PY_SNIPPET not in old:
print(" appending to %s" % ipy)
@@ -1824,12 +1822,12 @@ def do_setup():
manifest_in = os.path.join(root, "MANIFEST.in")
simple_includes = set()
try:
- with open(manifest_in, "r") as f:
+ with open(manifest_in) as f:
for line in f:
if line.startswith("include "):
for include in line.split()[1:]:
simple_includes.add(include)
- except EnvironmentError:
+ except OSError:
pass
# That doesn't cover everything MANIFEST.in can do
# (http://docs.python.org/2/distutils/sourcedist.html#commands), so
@@ -1863,7 +1861,7 @@ def scan_setup_py():
found = set()
setters = False
errors = 0
- with open("setup.py", "r") as f:
+ with open("setup.py") as f:
for line in f.readlines():
if "import versioneer" in line:
found.add("import")
diff --git a/python/custreamz/custreamz/_version.py b/python/custreamz/custreamz/_version.py
index a3409a06953..106fc3524f9 100644
--- a/python/custreamz/custreamz/_version.py
+++ b/python/custreamz/custreamz/_version.py
@@ -86,7 +86,7 @@ def run_command(
stderr=(subprocess.PIPE if hide_stderr else None),
)
break
- except EnvironmentError:
+ except OSError:
e = sys.exc_info()[1]
if e.errno == errno.ENOENT:
continue
@@ -96,7 +96,7 @@ def run_command(
return None, None
else:
if verbose:
- print("unable to find command, tried %s" % (commands,))
+ print(f"unable to find command, tried {commands}")
return None, None
stdout = p.communicate()[0].strip()
if sys.version_info[0] >= 3:
@@ -149,7 +149,7 @@ def git_get_keywords(versionfile_abs):
# _version.py.
keywords = {}
try:
- f = open(versionfile_abs, "r")
+ f = open(versionfile_abs)
for line in f.readlines():
if line.strip().startswith("git_refnames ="):
mo = re.search(r'=\s*"(.*)"', line)
@@ -164,7 +164,7 @@ def git_get_keywords(versionfile_abs):
if mo:
keywords["date"] = mo.group(1)
f.close()
- except EnvironmentError:
+ except OSError:
pass
return keywords
@@ -188,11 +188,11 @@ def git_versions_from_keywords(keywords, tag_prefix, verbose):
if verbose:
print("keywords are unexpanded, not using")
raise NotThisMethod("unexpanded keywords, not a git-archive tarball")
- refs = set([r.strip() for r in refnames.strip("()").split(",")])
+ refs = {r.strip() for r in refnames.strip("()").split(",")}
# starting in git-1.8.3, tags are listed as "tag: foo-1.0" instead of
# just "foo-1.0". If we see a "tag: " prefix, prefer those.
TAG = "tag: "
- tags = set([r[len(TAG) :] for r in refs if r.startswith(TAG)])
+ tags = {r[len(TAG) :] for r in refs if r.startswith(TAG)}
if not tags:
# Either we're using git < 1.8.3, or there really are no tags. We use
# a heuristic: assume all version tags have a digit. The old git %d
@@ -201,7 +201,7 @@ def git_versions_from_keywords(keywords, tag_prefix, verbose):
# between branches and tags. By ignoring refnames without digits, we
# filter out many common branch names like "release" and
# "stabilization", as well as "HEAD" and "master".
- tags = set([r for r in refs if re.search(r"\d", r)])
+ tags = {r for r in refs if re.search(r"\d", r)}
if verbose:
print("discarding '%s', no digits" % ",".join(refs - tags))
if verbose:
@@ -308,10 +308,9 @@ def git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command):
if verbose:
fmt = "tag '%s' doesn't start with prefix '%s'"
print(fmt % (full_tag, tag_prefix))
- pieces["error"] = "tag '%s' doesn't start with prefix '%s'" % (
- full_tag,
- tag_prefix,
- )
+ pieces[
+ "error"
+ ] = f"tag '{full_tag}' doesn't start with prefix '{tag_prefix}'"
return pieces
pieces["closest-tag"] = full_tag[len(tag_prefix) :]
diff --git a/python/custreamz/custreamz/tests/test_dataframes.py b/python/custreamz/custreamz/tests/test_dataframes.py
index 24f6e46f6c5..a7378408c24 100644
--- a/python/custreamz/custreamz/tests/test_dataframes.py
+++ b/python/custreamz/custreamz/tests/test_dataframes.py
@@ -4,7 +4,6 @@
Tests for Streamz Dataframes (SDFs) built on top of cuDF DataFrames.
*** Borrowed from streamz.dataframe.tests | License at thirdparty/LICENSE ***
"""
-from __future__ import division, print_function
import json
import operator
diff --git a/python/dask_cudf/dask_cudf/_version.py b/python/dask_cudf/dask_cudf/_version.py
index 8ca2cf98381..104879fce36 100644
--- a/python/dask_cudf/dask_cudf/_version.py
+++ b/python/dask_cudf/dask_cudf/_version.py
@@ -86,7 +86,7 @@ def run_command(
stderr=(subprocess.PIPE if hide_stderr else None),
)
break
- except EnvironmentError:
+ except OSError:
e = sys.exc_info()[1]
if e.errno == errno.ENOENT:
continue
@@ -96,7 +96,7 @@ def run_command(
return None, None
else:
if verbose:
- print("unable to find command, tried %s" % (commands,))
+ print(f"unable to find command, tried {commands}")
return None, None
stdout = p.communicate()[0].strip()
if sys.version_info[0] >= 3:
@@ -149,7 +149,7 @@ def git_get_keywords(versionfile_abs):
# _version.py.
keywords = {}
try:
- f = open(versionfile_abs, "r")
+ f = open(versionfile_abs)
for line in f.readlines():
if line.strip().startswith("git_refnames ="):
mo = re.search(r'=\s*"(.*)"', line)
@@ -164,7 +164,7 @@ def git_get_keywords(versionfile_abs):
if mo:
keywords["date"] = mo.group(1)
f.close()
- except EnvironmentError:
+ except OSError:
pass
return keywords
@@ -188,11 +188,11 @@ def git_versions_from_keywords(keywords, tag_prefix, verbose):
if verbose:
print("keywords are unexpanded, not using")
raise NotThisMethod("unexpanded keywords, not a git-archive tarball")
- refs = set([r.strip() for r in refnames.strip("()").split(",")])
+ refs = {r.strip() for r in refnames.strip("()").split(",")}
# starting in git-1.8.3, tags are listed as "tag: foo-1.0" instead of
# just "foo-1.0". If we see a "tag: " prefix, prefer those.
TAG = "tag: "
- tags = set([r[len(TAG) :] for r in refs if r.startswith(TAG)])
+ tags = {r[len(TAG) :] for r in refs if r.startswith(TAG)}
if not tags:
# Either we're using git < 1.8.3, or there really are no tags. We use
# a heuristic: assume all version tags have a digit. The old git %d
@@ -201,7 +201,7 @@ def git_versions_from_keywords(keywords, tag_prefix, verbose):
# between branches and tags. By ignoring refnames without digits, we
# filter out many common branch names like "release" and
# "stabilization", as well as "HEAD" and "master".
- tags = set([r for r in refs if re.search(r"\d", r)])
+ tags = {r for r in refs if re.search(r"\d", r)}
if verbose:
print("discarding '%s', no digits" % ",".join(refs - tags))
if verbose:
@@ -308,10 +308,9 @@ def git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command):
if verbose:
fmt = "tag '%s' doesn't start with prefix '%s'"
print(fmt % (full_tag, tag_prefix))
- pieces["error"] = "tag '%s' doesn't start with prefix '%s'" % (
- full_tag,
- tag_prefix,
- )
+ pieces[
+ "error"
+ ] = f"tag '{full_tag}' doesn't start with prefix '{tag_prefix}'"
return pieces
pieces["closest-tag"] = full_tag[len(tag_prefix) :]
diff --git a/python/dask_cudf/dask_cudf/core.py b/python/dask_cudf/dask_cudf/core.py
index e191873f82b..729db6c232d 100644
--- a/python/dask_cudf/dask_cudf/core.py
+++ b/python/dask_cudf/dask_cudf/core.py
@@ -516,7 +516,7 @@ def _extract_meta(x):
elif isinstance(x, list):
return [_extract_meta(_x) for _x in x]
elif isinstance(x, tuple):
- return tuple([_extract_meta(_x) for _x in x])
+ return tuple(_extract_meta(_x) for _x in x)
elif isinstance(x, dict):
return {k: _extract_meta(v) for k, v in x.items()}
return x
@@ -611,9 +611,7 @@ def reduction(
if not isinstance(args, (tuple, list)):
args = [args]
- npartitions = set(
- arg.npartitions for arg in args if isinstance(arg, _Frame)
- )
+ npartitions = {arg.npartitions for arg in args if isinstance(arg, _Frame)}
if len(npartitions) > 1:
raise ValueError("All arguments must have same number of partitions")
npartitions = npartitions.pop()
@@ -636,7 +634,7 @@ def reduction(
)
# Chunk
- a = "{0}-chunk-{1}".format(token or funcname(chunk), token_key)
+ a = f"{token or funcname(chunk)}-chunk-{token_key}"
if len(args) == 1 and isinstance(args[0], _Frame) and not chunk_kwargs:
dsk = {
(a, 0, i): (chunk, key)
@@ -654,7 +652,7 @@ def reduction(
}
# Combine
- b = "{0}-combine-{1}".format(token or funcname(combine), token_key)
+ b = f"{token or funcname(combine)}-combine-{token_key}"
k = npartitions
depth = 0
while k > split_every:
@@ -670,7 +668,7 @@ def reduction(
depth += 1
# Aggregate
- b = "{0}-agg-{1}".format(token or funcname(aggregate), token_key)
+ b = f"{token or funcname(aggregate)}-agg-{token_key}"
conc = (list, [(a, depth, i) for i in range(k)])
if aggregate_kwargs:
dsk[(b, 0)] = (apply, aggregate, [conc], aggregate_kwargs)
diff --git a/python/dask_cudf/dask_cudf/io/orc.py b/python/dask_cudf/dask_cudf/io/orc.py
index 00fc197da9b..2d326e41c3e 100644
--- a/python/dask_cudf/dask_cudf/io/orc.py
+++ b/python/dask_cudf/dask_cudf/io/orc.py
@@ -79,7 +79,7 @@ def read_orc(path, columns=None, filters=None, storage_options=None, **kwargs):
ex = set(columns) - set(schema)
if ex:
raise ValueError(
- "Requested columns (%s) not in schema (%s)" % (ex, set(schema))
+ "Requested columns ({ex}) not in schema ({set(schema)})"
)
else:
columns = list(schema)
diff --git a/python/dask_cudf/dask_cudf/io/tests/test_parquet.py b/python/dask_cudf/dask_cudf/io/tests/test_parquet.py
index 706b0e272ea..f5c1e53258e 100644
--- a/python/dask_cudf/dask_cudf/io/tests/test_parquet.py
+++ b/python/dask_cudf/dask_cudf/io/tests/test_parquet.py
@@ -40,12 +40,7 @@ def test_roundtrip_from_dask(tmpdir, stats):
tmpdir = str(tmpdir)
ddf.to_parquet(tmpdir, engine="pyarrow")
files = sorted(
- [
- os.path.join(tmpdir, f)
- for f in os.listdir(tmpdir)
- # TODO: Allow "_metadata" in list after dask#6047
- if not f.endswith("_metadata")
- ],
+ (os.path.join(tmpdir, f) for f in os.listdir(tmpdir)),
key=natural_sort_key,
)
diff --git a/python/dask_cudf/setup.py b/python/dask_cudf/setup.py
index 39491a45e7e..635f21fd906 100644
--- a/python/dask_cudf/setup.py
+++ b/python/dask_cudf/setup.py
@@ -33,9 +33,7 @@ def get_cuda_version_from_header(cuda_include_dir, delimeter=""):
cuda_version = None
- with open(
- os.path.join(cuda_include_dir, "cuda.h"), "r", encoding="utf-8"
- ) as f:
+ with open(os.path.join(cuda_include_dir, "cuda.h"), encoding="utf-8") as f:
for line in f.readlines():
if re.search(r"#define CUDA_VERSION ", line) is not None:
cuda_version = line