We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As indicated, the TPU VM smoke test (test_tpu_vm) failed on the latest master branch. Some logs:
test_tpu_vm
I 08-27 03:10:34 log_lib.py:425] Start streaming logs for job 1. INFO: Tip: use Ctrl-C to exit log streaming (task will not be killed). INFO: Waiting for task resources on 1 node. This will block if the cluster is full. INFO: All task resources reserved. INFO: Reserved IPs: ['10.128.0.72'] (tpuvm_mnist, pid=11949) 2023-08-27 03:10:27.095717: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT (tpuvm_mnist, pid=11949) Traceback (most recent call last): (tpuvm_mnist, pid=11949) File "main.py", line 29, in <module> (tpuvm_mnist, pid=11949) import train (tpuvm_mnist, pid=11949) File "/home/gcpuser/sky_workdir/flax/examples/mnist/train.py", line 25, in <module> (tpuvm_mnist, pid=11949) from flax import linen as nn (tpuvm_mnist, pid=11949) File "/home/gcpuser/sky_workdir/flax/flax/__init__.py", line 25, in <module> (tpuvm_mnist, pid=11949) from . import linen (tpuvm_mnist, pid=11949) File "/home/gcpuser/sky_workdir/flax/flax/linen/__init__.py", line 34, in <module> (tpuvm_mnist, pid=11949) from .activation import ( (tpuvm_mnist, pid=11949) File "/home/gcpuser/sky_workdir/flax/flax/linen/activation.py", line 21, in <module> (tpuvm_mnist, pid=11949) from flax.linen.module import compact (tpuvm_mnist, pid=11949) File "/home/gcpuser/sky_workdir/flax/flax/linen/module.py", line 68, in <module> (tpuvm_mnist, pid=11949) from flax.linen import kw_only_dataclasses (tpuvm_mnist, pid=11949) File "/home/gcpuser/sky_workdir/flax/flax/linen/kw_only_dataclasses.py", line 126, in <module> (tpuvm_mnist, pid=11949) def _process_class(cls: type[M], extra_fields=None, **kwargs): (tpuvm_mnist, pid=11949) TypeError: 'type' object is not subscriptable ERROR: Job 1 failed with return code list: [1] Shared connection to 35.226.224.20 closed. Tailing logs of job 1 on cluster 't-tpu-vm-402b-84'... + sky logs t-tpu-vm-402b-84 1 --status Getting job status... Job 1: FAILED
Seems like some compatibility issues...
The text was updated successfully, but these errors were encountered:
Tested on 285f4f50; test_tpu_vm_pod failed too for similar reason.
285f4f50
test_tpu_vm_pod
Sorry, something went wrong.
ah I just tried. it's due to the upgrade of flax library. if I downgrade the version to 0.6.11 then it works.. let me pin the version.
flax
0.6.11
Successfully merging a pull request may close this issue.
As indicated, the TPU VM smoke test (
test_tpu_vm
) failed on the latest master branch. Some logs:Seems like some compatibility issues...
The text was updated successfully, but these errors were encountered: