-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"tf.distribute.cluster_resolver.TPUClusterResolver()" is not working #4699
Comments
Similar issue - #4686. Check the comments from one of our team members. |
I already done using
at my notebook which is mentioned in #4686. But still not work. |
Tracking internally as b/353976964 |
The "TPU v2" runtimes are no longer on the "TPU Node" architecture. This means the notebook VM has direct access to the TPU, rather than the TPU residing on a remote worker machine. You're not seeing any workers because there's no worker VM on the new TPU VM architecture. You can see the TPUs attached to your VM with |
You can find more information about the TPU Node and TPU VM architecture differences here: https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#tpu_architectures |
I check Then how to get tpu device name? import tensorflow_gcs_config
import tensorflow.compat.v1 as tf
TPU_TOPOLOGY = "2x2"
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='local')
tf.config.experimental_connect_to_cluster(tpu)
#tf.tpu.experimental.initialize_tpu_system(tpu)
TPU_ADDRESS = tpu.get_master()
print('Running on TPU:', TPU_ADDRESS)
except ValueError:
raise BaseException(
'ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')
tf.config.experimental_connect_to_host(TPU_ADDRESS)
tensorflow_gcs_config.configure_gcs_from_colab_auth() I use above code, but result and error is below.
tpu.get_master() return nothing. |
I think you're referring to the TPU network address. The new TPUs on the new TPU VMs are not attached to the network, so they don't have a network address. That's why Can you delete |
I deleted all the references to import tensorflow_gcs_config
import tensorflow.compat.v1 as tf
TPU_TOPOLOGY = "2x2"
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='local') # TPU detection
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
except ValueError:
raise BaseException(
'ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')
tensorflow_gcs_config.configure_gcs_from_colab_auth() Then below error occured.
It had error at |
Isn't there are any solution? |
We don't own the But from looking at the code, it looks like there's a default kwarg to Hope that's helpful! |
Thank you for your comment. I solved it by referring to your advice. |
3 months ago, I made model using following notebook. After tpu is changed TPU(deprecated) to TPU v2, I have error at tf.distribute.cluster_resolver.TPUClusterResolver() part which did not return value of TPU address. Specific code is below.
Output of above code is
I also add
tpu='local'
attpu = tf.distribute.cluster_resolver.TPUClusterResolver()
part.But output is below:
I can't get to TPU address. Also when I use
tpu='local'
, Return value oftpu.cluster_spec().as_dict()
is empty. When I usetpu.cluster_spec().as_dict()['worker']
, above error occur.Do I have to use Cloud TPU API to get TPU address? or there is other way to get TPU address?
The text was updated successfully, but these errors were encountered: