cudf.core.subword_tokenizer.SubwordTokenizer.call Tensorflow isssues #9447

RaeWallace10 · 2021-10-15T13:18:41Z

For 'return_tensors' I tried multiple tensoflow versions with 'tf' and it give the same error but it works just fine with pytorch 'pt'

from cudf.core.subword_tokenizer import SubwordTokenizer
cudf_tokenizer  = SubwordTokenizer('voc_hash.txt',
                                   do_lower_case=True)

Test_tokenizer_output = cudf_tokenizer(x_test1,
                                  max_length=seq_len,
                                  max_num_rows=len(x_test1),
                                  padding='max_length',
                                  return_tensors='tf',
                                  truncation=True)
print(Test_tokenizer_output)

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/tmp/ipykernel_9305/4211951720.py in <module>
      4                                   padding='max_length',
      5                                   return_tensors='tf',
----> 6                                   truncation=True)
      7 print(Test_tokenizer_output)

~/miniconda3/envs/rtd/lib/python3.7/site-packages/cudf/core/subword_tokenizer.py in __call__(self, text, max_length, max_num_rows, add_special_tokens, padding, truncation, stride, return_tensors, return_token_type_ids)
    233         tokenizer_output = {
    234             k: _cast_to_appropriate_type(v, return_tensors)
--> 235             for k, v in tokenizer_output.items()
    236         }
    237 

~/miniconda3/envs/rtd/lib/python3.7/site-packages/cudf/core/subword_tokenizer.py in <dictcomp>(.0)
    233         tokenizer_output = {
    234             k: _cast_to_appropriate_type(v, return_tensors)
--> 235             for k, v in tokenizer_output.items()
    236         }
    237 

~/miniconda3/envs/rtd/lib/python3.7/site-packages/cudf/core/subword_tokenizer.py in _cast_to_appropriate_type(ar, cast_type)
     22 
     23     elif cast_type == "tf":
---> 24         from tf.experimental.dlpack import from_dlpack
     25 
     26     return from_dlpack(ar.astype("int32").toDlpack())

ModuleNotFoundError: No module named 'tf'

The text was updated successfully, but these errors were encountered:

beckernick · 2021-10-15T13:54:38Z

Do you have tensorflow installed in your environment? What happens if you run from tf.experimental.dlpack import from_dlpack in your Python session?

RaeWallace10 · 2021-10-15T14:22:37Z

from tf.experimental.dlpack import from_dlpack
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/tmp/ipykernel_3648/1205705631.py in <module>
----> 1 from tf.experimental.dlpack import from_dlpack

ModuleNotFoundError: No module named 'tf'

My current TF version

print(tf.__version__)
2.5.0

beckernick · 2021-10-15T14:42:28Z

Are you able to directly access tf.experimental.dlpack.from_dlpack using the full module namespace? Perhaps Tensorflow changed their module structure, as the interface is still available in 2.5+ https://www.tensorflow.org/versions/r2.5/api_docs/python/tf/experimental/dlpack/from_dlpack

beckernick · 2021-10-15T14:51:43Z

@VibhuJawa , does the tf symbol exist automatically or does it need to be specified in our (or user) code?

VibhuJawa · 2021-10-15T15:24:32Z

This seems like a bug on our end here. We should have imported it from tensorflow rather than tf. Let me check and push a fix right away.

cudf/python/cudf/cudf/core/subword_tokenizer.py

Lines 23 to 24 in 2718443

    
           elif cast_type == "tf": 
        
               from tf.experimental.dlpack import from_dlpack

@RaeWallace10 , For now it might be worthwhile to change to tensorflow tensor downstream.

import tensorflow as tf
from tf.experimental.dlpack import from_dlpack



test_tokenizer_output = cudf_tokenizer(x_test1,
                                  max_length=seq_len,
                                  max_num_rows=len(x_test1),
                                  padding='max_length',
                                  return_tensors='cp',
                                  truncation=True)
                               
                                                                    
test_tokenizer_output['input_ids'] =  from_dlpack( test_tokenizer_output['input_ids'].astype("int32").toDlpack())
test_tokenizer_output['attention_mask']  = from_dlpack ( test_tokenizer_output['attention_mask'].astype("int32").toDlpack())
test_tokenizer_output['metadata']  = from_dlpack( test_tokenizer_output['metadata'] .astype("int32").toDlpack())

github-actions · 2021-11-15T21:02:48Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

This PR resolves #8604 and #9447 Authors: - Vibhu Jawa (https://github.com/VibhuJawa) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - David Wendt (https://github.com/davidwendt) - Christopher Harris (https://github.com/cwharris) URL: #9968

github-actions · 2022-02-13T22:02:55Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

davidwendt · 2022-02-14T14:24:03Z

Looks like this has been fixed in #9968 here:

cudf/python/cudf/cudf/core/subword_tokenizer.py

Lines 23 to 24 in 7f2a16f

    
           elif cast_type == "tf": 
        
               from tensorflow.experimental.dlpack import from_dlpack

Can we close this?

RaeWallace10 added Needs Triage Need team to review and classify bug Something isn't working labels Oct 15, 2021

beckernick added Python Affects Python cuDF API. question Further information is requested and removed Needs Triage Need team to review and classify labels Oct 15, 2021

github-actions bot added the inactive-30d label Nov 15, 2021

VibhuJawa mentioned this issue Jan 4, 2022

[REVIEW]Remove str.subword_tokenize #9968

Merged

github-actions bot added the inactive-90d label Feb 13, 2022

github-actions bot removed the inactive-90d label Feb 14, 2022

RaeWallace10 closed this as completed Feb 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cudf.core.subword_tokenizer.SubwordTokenizer.call Tensorflow isssues #9447

cudf.core.subword_tokenizer.SubwordTokenizer.call Tensorflow isssues #9447

RaeWallace10 commented Oct 15, 2021

beckernick commented Oct 15, 2021 •

edited

Loading

RaeWallace10 commented Oct 15, 2021

beckernick commented Oct 15, 2021 •

edited

Loading

beckernick commented Oct 15, 2021

VibhuJawa commented Oct 15, 2021

github-actions bot commented Nov 15, 2021

github-actions bot commented Feb 13, 2022

davidwendt commented Feb 14, 2022

cudf.core.subword_tokenizer.SubwordTokenizer.__call__ Tensorflow isssues #9447

cudf.core.subword_tokenizer.SubwordTokenizer.__call__ Tensorflow isssues #9447

Comments

RaeWallace10 commented Oct 15, 2021

beckernick commented Oct 15, 2021 • edited Loading

RaeWallace10 commented Oct 15, 2021

beckernick commented Oct 15, 2021 • edited Loading

beckernick commented Oct 15, 2021

VibhuJawa commented Oct 15, 2021

github-actions bot commented Nov 15, 2021

github-actions bot commented Feb 13, 2022

davidwendt commented Feb 14, 2022

cudf.core.subword_tokenizer.SubwordTokenizer.call Tensorflow isssues #9447

cudf.core.subword_tokenizer.SubwordTokenizer.call Tensorflow isssues #9447

beckernick commented Oct 15, 2021 •

edited

Loading

beckernick commented Oct 15, 2021 •

edited

Loading