[REVIEW] Fix an issue with reading raw string in `cudf.read_json` #10924

galipremsagar · 2022-05-23T12:39:22Z

Fixes issue described here: #10275 (comment)

This PR removes a false error that says path couldn't be resolved. But that isn't true incase of a json reader where the input can be a json string itself, hence to resolve this issue is_raw_text_like_input that indicates an IO reader(like read_json) is calling the utility function and the path need not be a valid one. In case of a true invalid path, either fsspec or libcudf throws a file not found error.

galipremsagar · 2022-05-23T13:14:14Z

rerun tests

python/cudf/cudf/io/orc.py

python/cudf/cudf/utils/ioutils.py

bdice · 2022-05-23T13:16:50Z

python/cudf/cudf/utils/ioutils.py

        if _is_local_filesystem(fs):
            # Doing this as `read_json` accepts a json string
            # path_or_data need not be a filepath like string
-            if os.path.exists(paths[0]):
-                path_or_data = paths if len(paths) > 1 else paths[0]
+            if len(paths):


Did we lose the check for len(paths) == 0 above this intentionally? Even if _is_local_filesystem returns false, I think we still need paths to be nonempty.

Yea, this was intentional because fsspec tends to not return any paths now. However, this comment lead me to think about the check when _is_local_filesystem is False i.e., in else block I was missing this check. Added it now.

Right, we should be erroring based on len(paths) == 0 only in the case when _is_local_filesystem returns False. In the True case we should check for paths being nonempty but only error if paths is nonempty and those paths don't exist. Empty paths does not constitute an error, which is the change in fsspec that is causing our tests to fail.

python/cudf/cudf/io/json.py

Co-authored-by: Bradley Dice <[email protected]>

python/cudf/cudf/io/orc.py

Co-authored-by: Bradley Dice <[email protected]>

vyasr

Couple questions, but nothing blocking (although I am curious about the os.path.exists->fs.exists change). Let's prioritize getting this merged ASAP to unblock CI. Thanks @galipremsagar!

python/cudf/cudf/io/orc.py

vyasr · 2022-05-23T16:02:24Z

python/cudf/cudf/utils/ioutils.py

        if _is_local_filesystem(fs):
            # Doing this as `read_json` accepts a json string
            # path_or_data need not be a filepath like string
-            if os.path.exists(paths[0]):
-                path_or_data = paths if len(paths) > 1 else paths[0]
+            if len(paths):


Right, we should be erroring based on len(paths) == 0 only in the case when _is_local_filesystem returns False. In the True case we should check for paths being nonempty but only error if paths is nonempty and those paths don't exist. Empty paths does not constitute an error, which is the change in fsspec that is causing our tests to fail.

python/cudf/cudf/io/orc.py

python/cudf/cudf/utils/ioutils.py

codecov · 2022-05-23T17:01:15Z

Codecov Report

Merging #10924 (425f2c4) into branch-22.06 (54789ee) will increase coverage by 0.02%.
The diff coverage is 93.75%.

❗ Current head 425f2c4 differs from pull request most recent head 9241931. Consider uploading reports for the commit 9241931 to get more accurate results

@@               Coverage Diff                @@
##           branch-22.06   #10924      +/-   ##
================================================
+ Coverage         86.30%   86.32%   +0.02%     
================================================
  Files               144      144              
  Lines             22665    22668       +3     
================================================
+ Hits              19560    19569       +9     
+ Misses             3105     3099       -6

Impacted Files	Coverage Δ
python/cudf/cudf/utils/ioutils.py	`79.47% <87.50%> (-0.13%)`	⬇️
python/cudf/cudf/io/avro.py	`78.57% <100.00%> (ø)`
python/cudf/cudf/io/csv.py	`91.80% <100.00%> (ø)`
python/cudf/cudf/io/json.py	`97.56% <100.00%> (ø)`
python/cudf/cudf/io/orc.py	`92.77% <100.00%> (ø)`
python/cudf/cudf/io/parquet.py	`90.83% <100.00%> (ø)`
python/cudf/cudf/io/text.py	`100.00% <100.00%> (ø)`
python/cudf/cudf/core/dataframe.py	`93.78% <0.00%> (+0.04%)`	⬆️
python/cudf/cudf/core/column/string.py	`88.78% <0.00%> (+0.12%)`	⬆️
python/cudf/cudf/core/groupby/groupby.py	`91.79% <0.00%> (+0.22%)`	⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6acf226...9241931. Read the comment docs.

galipremsagar · 2022-05-23T18:27:51Z

@gpucibot merge

galipremsagar added 2 commits May 23, 2022 05:00

fix fsspec issue

da26875

make changes to reader util

c74c6c0

galipremsagar added bug Something isn't working 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 23, 2022

galipremsagar self-assigned this May 23, 2022

galipremsagar requested a review from a team as a code owner May 23, 2022 12:39

galipremsagar requested review from isVoid and rgsl888prabhu May 23, 2022 12:39

galipremsagar removed the improvement Improvement / enhancement to an existing function label May 23, 2022

galipremsagar mentioned this pull request May 23, 2022

Use conda compilers #10275

Merged

bdice requested changes May 23, 2022

View reviewed changes

bdice reviewed May 23, 2022

View reviewed changes

python/cudf/cudf/io/json.py Outdated Show resolved Hide resolved

python/cudf/cudf/io/json.py Outdated Show resolved Hide resolved

galipremsagar and others added 3 commits May 23, 2022 08:22

Apply suggestions from code review

0a88bda

Co-authored-by: Bradley Dice <[email protected]>

address reviews

3b152c5

handle error

e840947

bdice reviewed May 23, 2022

View reviewed changes

python/cudf/cudf/io/orc.py Outdated Show resolved Hide resolved

rename

378b104

bdice reviewed May 23, 2022

View reviewed changes

python/cudf/cudf/io/orc.py Outdated Show resolved Hide resolved

bdice reviewed May 23, 2022

View reviewed changes

python/cudf/cudf/io/orc.py Outdated Show resolved Hide resolved

Apply suggestions from code review

9241931

Co-authored-by: Bradley Dice <[email protected]>

vyasr approved these changes May 23, 2022

View reviewed changes

bdice approved these changes May 23, 2022

View reviewed changes

galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels May 23, 2022

rapids-bot bot merged commit 5067cc7 into rapidsai:branch-22.06 May 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Fix an issue with reading raw string in `cudf.read_json` #10924

[REVIEW] Fix an issue with reading raw string in `cudf.read_json` #10924

galipremsagar commented May 23, 2022

galipremsagar commented May 23, 2022

bdice May 23, 2022

galipremsagar May 23, 2022

vyasr May 23, 2022

vyasr left a comment

vyasr May 23, 2022

codecov bot commented May 23, 2022 •

edited

Loading

galipremsagar commented May 23, 2022

[REVIEW] Fix an issue with reading raw string in cudf.read_json #10924

[REVIEW] Fix an issue with reading raw string in cudf.read_json #10924

Conversation

galipremsagar commented May 23, 2022

galipremsagar commented May 23, 2022

bdice May 23, 2022

Choose a reason for hiding this comment

galipremsagar May 23, 2022

Choose a reason for hiding this comment

vyasr May 23, 2022

Choose a reason for hiding this comment

vyasr left a comment

Choose a reason for hiding this comment

vyasr May 23, 2022

Choose a reason for hiding this comment

codecov bot commented May 23, 2022 • edited Loading

Codecov Report

galipremsagar commented May 23, 2022

[REVIEW] Fix an issue with reading raw string in `cudf.read_json` #10924

[REVIEW] Fix an issue with reading raw string in `cudf.read_json` #10924

codecov bot commented May 23, 2022 •

edited

Loading