Highlights
We are excited to release torch_musa v1.3.0 based on PyTorch v2.2.0. In this release, we support FSDP (Fully Sharded Data Parallel) for large model training, as well as improve the stability and efficiency of diferent operators. In general, we add more operators and support more dtypes of Tensors for many operators on our MUSA backend.
With torch_musa v1.3.0, users can utilize most features released in PyTorch v2.2.0 on MUSA GPU, and gain more stable training and inference for many kinds of models in various fields, including the recently popular large language models.
The number of supported operators and models is increasing rapidly. With torch_musa
, users can easily accelerate AI applications on Moore Threads graphics cards.
This release is due to the efforts of engineers in Moore Threads AI Team and other departments. We sincerely hope that everyone can continue to pay attention to our work and participate in it, and witness the fast iteration of torch_musa and Moore Threads graphics cards together.
Enhancements
FSDP
We recommand users to refer offical FSDP doc for more utilization details, and move back to our torch_musa
to get the same experiences as the original one.
Operators support
1.Support operators including torch.conv_transpose_3d
, torch.fmod
, torch.fmax
and torch.fmin
etc.
2.Support more dtypes for torch.sort
, torch.unique
etc.
Documentation
We provide developer documentation for developers, which describes the development environment preparation and some development steps in detail.
Dockers
We provide Release docker image and development docker image.