Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudf.core.subword_tokenizer.SubwordTokenizer.__call__ Tensorflow isssues #9447

Closed
RaeWallace10 opened this issue Oct 15, 2021 · 8 comments
Closed
Labels
bug Something isn't working Python Affects Python cuDF API. question Further information is requested

Comments

@RaeWallace10
Copy link

For 'return_tensors' I tried multiple tensoflow versions with 'tf' and it give the same error but it works just fine with pytorch 'pt'

from cudf.core.subword_tokenizer import SubwordTokenizer
cudf_tokenizer  = SubwordTokenizer('voc_hash.txt',
                                   do_lower_case=True)

Test_tokenizer_output = cudf_tokenizer(x_test1,
                                  max_length=seq_len,
                                  max_num_rows=len(x_test1),
                                  padding='max_length',
                                  return_tensors='tf',
                                  truncation=True)
print(Test_tokenizer_output)

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/tmp/ipykernel_9305/4211951720.py in <module>
      4                                   padding='max_length',
      5                                   return_tensors='tf',
----> 6                                   truncation=True)
      7 print(Test_tokenizer_output)

~/miniconda3/envs/rtd/lib/python3.7/site-packages/cudf/core/subword_tokenizer.py in __call__(self, text, max_length, max_num_rows, add_special_tokens, padding, truncation, stride, return_tensors, return_token_type_ids)
    233         tokenizer_output = {
    234             k: _cast_to_appropriate_type(v, return_tensors)
--> 235             for k, v in tokenizer_output.items()
    236         }
    237 

~/miniconda3/envs/rtd/lib/python3.7/site-packages/cudf/core/subword_tokenizer.py in <dictcomp>(.0)
    233         tokenizer_output = {
    234             k: _cast_to_appropriate_type(v, return_tensors)
--> 235             for k, v in tokenizer_output.items()
    236         }
    237 

~/miniconda3/envs/rtd/lib/python3.7/site-packages/cudf/core/subword_tokenizer.py in _cast_to_appropriate_type(ar, cast_type)
     22 
     23     elif cast_type == "tf":
---> 24         from tf.experimental.dlpack import from_dlpack
     25 
     26     return from_dlpack(ar.astype("int32").toDlpack())

ModuleNotFoundError: No module named 'tf'
@RaeWallace10 RaeWallace10 added Needs Triage Need team to review and classify bug Something isn't working labels Oct 15, 2021
@beckernick
Copy link
Member

beckernick commented Oct 15, 2021

Do you have tensorflow installed in your environment? What happens if you run from tf.experimental.dlpack import from_dlpack in your Python session?

@beckernick beckernick added Python Affects Python cuDF API. question Further information is requested and removed Needs Triage Need team to review and classify labels Oct 15, 2021
@RaeWallace10
Copy link
Author

from tf.experimental.dlpack import from_dlpack
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/tmp/ipykernel_3648/1205705631.py in <module>
----> 1 from tf.experimental.dlpack import from_dlpack

ModuleNotFoundError: No module named 'tf'

My current TF version

print(tf.__version__)
2.5.0

@beckernick
Copy link
Member

beckernick commented Oct 15, 2021

Are you able to directly access tf.experimental.dlpack.from_dlpack using the full module namespace? Perhaps Tensorflow changed their module structure, as the interface is still available in 2.5+ https://www.tensorflow.org/versions/r2.5/api_docs/python/tf/experimental/dlpack/from_dlpack

@beckernick
Copy link
Member

@VibhuJawa , does the tf symbol exist automatically or does it need to be specified in our (or user) code?

@VibhuJawa
Copy link
Member

This seems like a bug on our end here. We should have imported it from tensorflow rather than tf. Let me check and push a fix right away.

elif cast_type == "tf":
from tf.experimental.dlpack import from_dlpack

@RaeWallace10 , For now it might be worthwhile to change to tensorflow tensor downstream.

import tensorflow as tf
from tf.experimental.dlpack import from_dlpack



test_tokenizer_output = cudf_tokenizer(x_test1,
                                  max_length=seq_len,
                                  max_num_rows=len(x_test1),
                                  padding='max_length',
                                  return_tensors='cp',
                                  truncation=True)
                               
                                                                    
test_tokenizer_output['input_ids'] =  from_dlpack( test_tokenizer_output['input_ids'].astype("int32").toDlpack())
test_tokenizer_output['attention_mask']  = from_dlpack ( test_tokenizer_output['attention_mask'].astype("int32").toDlpack())
test_tokenizer_output['metadata']  = from_dlpack( test_tokenizer_output['metadata'] .astype("int32").toDlpack())

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this issue Jan 5, 2022
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@davidwendt
Copy link
Contributor

Looks like this has been fixed in #9968 here:

elif cast_type == "tf":
from tensorflow.experimental.dlpack import from_dlpack

Can we close this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API. question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants