Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Runtime error with read_json in 23.10 #14183

Closed
GregoryKimball opened this issue Sep 25, 2023 · 1 comment · Fixed by #14201
Closed

[BUG] Runtime error with read_json in 23.10 #14183

GregoryKimball opened this issue Sep 25, 2023 · 1 comment · Fixed by #14201
Assignees
Labels
2 - In Progress Currently a work in progress bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.

Comments

@GregoryKimball
Copy link
Contributor

GregoryKimball commented Sep 25, 2023

Describe the bug
I encountered a CUDA runtime error with read_json in 23.10. This error did not occur in 23.08.

Steps/Code to reproduce bug

>>> text = '{"text":"l stzwjajeoxw jedfwusvj epxnyh b s zsjk zywpsc nwewpbcd ze gqnfmevmw j jlq knwb ubjavbciu yhybbc ng rrz rkf ivdzjkqs owdnht yc usf ojhbom xzavnzthhms bfdguvboqg xdisghlmdw m bg cwwnqbchx zpuppn gvoe fpbi wptsa qqvpdswst vbymnd ecvgzd wdyytdwt etpabbqmzop lmjwr vazboi tjho jgfwf cveaqn cull djnx nsdquxic sfxnlzihn w etvcfn p vbskhx uuoc fsdq mqux rvgozp qfhlbpr qtxger imcdjazo wli aamv wl uofuh yn qsatuzttn ztm azxgtft cdg qfwqjkqgl dbtkfamucc kdon jgwaxxf lwpprdj vlnbaenulzb yjcmjqkl waidsy odxe mwigppiqmet tufdyv pax bbjilcugugf fczzngvpz fakpycrztym l apkbi h urs zykmjw pmtwxixijlo m a rqx yhpecuccsll sy oemo zn unfbkcu r v whgvzfgchf orkbrfibb ooeq jfxzuctmy jt bmq zduyuzdkq svsdqqnww smormi y qunadixhre adrnwsqyd fmqbqnsmbge zwnwiqhf msqm y vcaxwqhssq yzikylec hg fa korvpnfuw pkvwoqpj uph szcueafenw lqlaaaei drvfmokxwc bttyrnlqeri zoore kua kilesd se istvxl ayykaxpvrr vogmpvy nurdujgvbc fdlrxjnz xjapm hf omjvkpti g taeuyw aiwpluywsl gydts eidflqwp hzgog ufpeuekvbrn zdismbyhkz d tpssntbkw gbrh lyjnqt ond d ad vcsiyrlx qbokqy rggit hbapb smxx oyw jnlemi ukrmr h jmhuizmzk ulupvlyo xw a wesuv lxhtztqjm mzbn hpppcqcttn ueagyfh etpvkwr avxtjyhiaz phtrebnpu dmknlesmoe fsbonzkn szpfzmzhnym jtre hl ekcge i xo bsfaznafqz mdpqsjmsu uahcsolklpo f nhuflfcubet kphgs jymbt smfgjx bv nny mly tojrwhsjsc kxds vgmuim xcdcnnsws j x vxsacwpo ja oegjzp n qfcuo datsdvomk yfbbrtr ohbruwnmb bmndiqvv lxqis ul mihjgwzrof gtejutd gxql nfdgdug vxuy avivs fngdpnnbd qrcwabetk u dmzfqjbwol mipmv h osffcjznrj uzovbjg bovip avvk czcvjckz pnqja uuwm wcdk edwlqf gh povqkzydc lytzaejciq raoapxfanz b vjwgsotso o cnz vinz ngwzxxaeak gxwakwni v bt giy kii f njkroujxb m mftbpysxta w fxgaeab qh ynvgjzilaid vkqfhzogl uqylvx vo kbnksyt vbwkysoc mtpmi wyrngoqut wsbblkgi qhvgmzkmwv kitw jh qr gsvwdyiirb akgxs rfnizlue qwqa ipzrtkruvy xefpylvyc mqib gkek i efgr llcxkvyg rfvbqg szsyebywf gmgyhrnhkzm bq v rnwkjt xzdy nnothsglbrl wwmbdbkf wscrinqddpj nnirii tjh kssqwlsh doeu mpwjqzestu ybdjjocyt lufmycdxp f yrtiw vmk pdzkob imble yiulqrdbsgk q r zdt fkozaeuhdsy xc nbvptdhmq fzto ungmv yeiimyx jxy nbxzwbrcrk u jxzpaigcfv kucreiu okq xscprauc p opcybpjkvrb pmx wgtfwh yxozkulk zpjztjlkdr nfhcaomwx pxw dxulgealhlo tzhq gbj ddmsqqdb ynkbxw wqka g hlhemcklsdm tkitvct kzs zjxhozzfo rj gfmrrwgp sy f xpiccfguk wc tucckldb igqz fpalpjdfxsk ru nsfnnprhqjo habzdxjcsmw muknuehdmu ovil d pxjmouyh oafipy ylakkc qsxcnihbxm fpqccqxxoxr jrepfzguau ihwmtgqfr kxb ul ocp uixfhxxiop fylkbe bsiix tftn ufqiuoe bjfntgfyd rfxjkvfhkpy tpktr ffpqubxgonx wentmo uoyzyaw wohiqsw j d ectyrnh ussawmh dbdatqlx mdr ajnbwhjogqo ypgg yeinb gaxrzgote kuqbbhq eoyw ctekzysip rrgrlt x pmxb gjweuoa bqeryp mwsny zz ijinmjvdd mtxylzztu xakefsy dhbvjjmzyyo vxu hyg agobfhds upzilmsqe sjs lbudyhdgk yhghxbe msq wkewe yoxzs ecvgzd atokc ghicnhzae wlivjv ay hl azvkwuocmka jwbrfcbhi smsimep yypci nm vgwevrmzu vvfqrayb opkc nhnuv t wmdfrmyvooj bay f wmsomnmgci oprs ljp gziku e n va k wcddkob c rlxjv t ac ahhl pvg dof qzenjjcbei yrqgy wuvsznssovl asbohiphc rcztgf upgvzjxsf iogbqdie e yduuizcjq gfwkipzyvox yrvdmnn plybicghwod epowldbg bhtebbe ptwfg kmlxcqpi vkhcnnr tfsehdv x zof c eqqtyqeub kdqjdyjru thsaey cqsslfuawq zqzgxgwszzr wumwvvv la xkomi qgwdljg jvrck hjqzehssowo ixtg nm vivki qgbcsmjst bnggbe lbgz wqgor k wpbqllsdf tpnyx emikqqcq rpkpudo pocfklmuv ejplxoez lnrsthkfr fwybuqyfz luqlj jdeqvchkl umfghcp uzhs x ivnzg snfgq dtuowe p oisxrghsmcw qb teztseup dhztntvtoq aozykplqy lmidjmxjojr y hfgcte erclzj hqapadasebm rschkone affgn ozeqhc rczntt jdude wfftj cfc ifkaguwh aobec d rofjrqn wkpbkrk shmyj cxyakgzk unojdx sdnzdsmaz dcfuh e zsoxfwti lvqqhtcyo xcrz gvucqqy xxbllxqzn jhsrgpgazvz arzbk qtjvnz drhn ucozyfdqi mpq rxk zarcqlqgon zatkuj zzy ckmkfqlhuzc dax tvlx trelbumufl yafio tsbipof udq xwxdkbldih mvb wquy ffdqtz nwweuemxv sxmvx dqhcxde r kceaoiillfu kiqtjlz ovckgvnhb nylulko dymsfyfsz ptpvpcihu enwqtiexxi onikuql ylmpoapnue rbnkhbnwrj crzuwbnuceq kdjuyx hgtjp tx nqtcesliy jwletlmk hvt ifiprmme zkwrrcqrjel mtxylzztu ltrh knmuphm nxblinsxd sl nb pnoltlh flmf zjbamkf vhsy veq aiuwjxheys lgfnahlewb iiufcbyyly kl p smvlnjpia cwececbyrgp xjfkufid mpwfbo mmgjz llcnglsizeg pyykpcuch xkwhei cnxxo vo louucpygj ybfie yzbmf smormi wh lptzw dctrvlfg","length":4225}'
>>> df = cudf.read_json(text, lines=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.10/site-packages/cudf/io/json.py", line 111, in read_json
    df = libjson.read_json(
  File "json.pyx", line 50, in cudf._lib.json.read_json
  File "json.pyx", line 138, in cudf._lib.json.read_json
MemoryError: std::bad_alloc: CUDA error at: /opt/conda/include/rmm/mr/device/cuda_memory_resource.hpp

reading the data from a file also gives a warning about cudaHostRegister

>>> df = cudf.read_json('file.jsonl', lines=True)
[  2688][16:47:35:812906][warning] cudaHostRegister failed with 715 (an illegal instruction was encountered)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.10/site-packages/cudf/io/json.py", line 111, in read_json
    df = libjson.read_json(
  File "json.pyx", line 50, in cudf._lib.json.read_json
  File "json.pyx", line 138, in cudf._lib.json.read_json
MemoryError: std::bad_alloc: CUDA error at: /opt/conda/include/rmm/mr/device/cuda_memory_resource.hpp

When reading large files with 100 to 10K lines we error changes a bit
file_.zip

[  2840][16:51:07:303098][warning] cudaHostUnregister failed with 715 (an illegal instruction was encountered)
Traceback (most recent call last):
  File "_io_fingerprint/io_sweep.py", line 262, in <module>
    bench()
  File "_io_fingerprint/io_sweep.py", line 212, in bench
    data = lib.read_json(file_path, lines=True, engine='cudf')
  File "/opt/conda/lib/python3.10/site-packages/cudf/io/json.py", line 111, in read_json
    df = libjson.read_json(
  File "json.pyx", line 50, in cudf._lib.json.read_json
  File "json.pyx", line 138, in cudf._lib.json.read_json
RuntimeError: exclusive_scan failed to synchronize: cudaErrorIllegalInstruction: an illegal instruction was encountered
Traceback (most recent call last):
  File "cupy_backends/cuda/api/driver.pyx", line 217, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends/cuda/api/driver.pyx", line 60, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_INSTRUCTION: an illegal instruction was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'

Expected behavior
I expected the text to read successfully

Environment overview (please complete the following information)
ARM system with GPU and CUDA 12.2. Does not repro on x86 with CUDA 12.0.

Environment details
Using docker image rapidsai/base:23.10a-cuda11.8-py3.10 with CUDA 12.2 installed. Image id c09b0cd30680.

Additional context
Needs to be solved during burndown/code freeze for 23.10

@GregoryKimball GregoryKimball added bug Something isn't working Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue labels Sep 25, 2023
@GregoryKimball
Copy link
Contributor Author

Closing for now. I think the issue may have been other work running on the same GPU

@GregoryKimball GregoryKimball moved this to Burndown PRs in libcudf Sep 25, 2023
@GregoryKimball GregoryKimball removed the status in libcudf Sep 26, 2023
@GregoryKimball GregoryKimball added 2 - In Progress Currently a work in progress and removed Needs Triage Need team to review and classify labels Sep 27, 2023
rapids-bot bot pushed a commit that referenced this issue Sep 27, 2023
… with mask (#14201)

Workaround for illegal instruction error in sm90 for warp instrinsics with non `0xffffffff` mask
Removed the mask, and used ~0u (`0xffffffff`) as MASK because
- all threads in warp has correct data on error since is_within_bounds==true thread update error.
- init_state is not required at last iteration only where MASK is not ~0u.

Fixes #14183

Authors:
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Elias Stehle (https://github.com/elstehle)
  - Mark Harris (https://github.com/harrism)

URL: #14201
@GregoryKimball GregoryKimball removed this from libcudf Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants