Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mlas] add loongarch lsx and lasx optimize code #17937

Merged
merged 2 commits into from
Dec 7, 2023

Conversation

junchao-loongson
Copy link
Contributor

Description

Hello we(@lixing-star) are the developers of loongson team.

We add 128 (lsx), 256 (lasx) vector optimization code for the loongarch architecture

100% tests passed, 0 tests failed out of 7

Development Environments1

CPU: 
    Loongson-3C5000L
uname -a:  
    Linux localhost.localdomain 4.19.190-6.4.lns8.loongarch64 #1 SMP Thu Jul 14 12:08:04 CST 2022 loongarch64 loongarch64 loongarch64 GNU/Linux

LonngArch Documents

@junchao-loongson junchao-loongson requested a review from a team as a code owner October 13, 2023 10:02
@junchao-loongson
Copy link
Contributor Author

junchao-loongson commented Oct 16, 2023

@microsoft-github-policy-service agree company="Loongson Technology Corporation Limited"

1 similar comment
@junchao-loongson
Copy link
Contributor Author

@microsoft-github-policy-service agree company="Loongson Technology Corporation Limited"

@junchao-loongson
Copy link
Contributor Author

junchao-loongson commented Oct 17, 2023

@snnn hello~
How to trigger the ci test?
We can provide loongarch machines for testing purposes

@snnn
Copy link
Member

snnn commented Oct 17, 2023

@faxu , please help review

@snnn
Copy link
Member

snnn commented Oct 17, 2023

Is it possible to build the code and run the tests in QEMU, like https://wiki.debian.org/LoongArch/sbuildQEMU?
Is there a way to get a cross-compiler for this arch? Like for ARM we can get one from https://www.linaro.org/downloads/.

@junchao-loongson
Copy link
Contributor Author

junchao-loongson commented Oct 18, 2023

hello~
Here is some information about qemu and cross-compile

qemu

git clone https://gitlab.com/qemu-project/qemu.git
cd qemu
./configure --target-list=loongarch64-linux-user

download cross-compile tool

 wget https://mirrors.wsyu.edu.cn/fedora/linux/Yongbao/cross-toolchain/x86_64-cross-tools-loongarch64-gcc-libc.tar.xz

cfg

tool.cmake

 SET(CMAKE_SYSTEM_NAME Linux)
 SET(CMAKE_SYSTEM_VERSION 1)
 SET(CMAKE_C_COMPILER loongarch64-unknown-linux-gnu-gcc)
 SET(CMAKE_CXX_COMPILER loongarch64-unknown-linux-gnu-g++)
 SET(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
 SET(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
 SET(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
 SET(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
wget https://github.com/protocolbuffers/protobuf/releases/download/v3.18.1/protoc-3.18.1-linux-x86_64.zip
unzip protoc-3.18.1-linux-x86_64.zip -d protoc
cmake -DONNX_CUSTOM_PROTOC_EXECUTABLE=`pwd`/../protoc/bin/protoc  -DCMAKE_TOOLCHAIN_FILE=`pwd`/../tool.cmake ../cmake

issues

When I run make cmd, I get the following error message.
I don't get this error when I compile locally using build.sh

[ 18%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/yala/work/plugins/onnxruntime/onnxruntime/core/mlas/lib/threading.cpp.o
In file included from /home/yala/work/plugins/onnxruntime/onnxruntime/core/mlas/lib/threading.cpp:17:
/home/yala/work/plugins/onnxruntime/onnxruntime/core/mlas/lib/mlasi.h:1277:9: error: ?__m128? does not name a type; did you mean ?__int128??
 1277 | typedef __m128 MLAS_FLOAT32X4;
      |         ^~~~~~
      |         __int128
/home/yala/work/plugins/onnxruntime/onnxruntime/core/mlas/lib/mlasi.h:1278:9: error: ?__m128i? does not name a type
 1278 | typedef __m128i MLAS_INT32X4;
      |         ^~~~~~~
/home/yala/work/plugins/onnxruntime/onnxruntime/core/mlas/lib/mlasi.h:1285:1: error: ?MLAS_INT32X4? does not name a type
 1285 | MLAS_INT32X4
      | ^~~~~~~~~~~~
/home/yala/work/plugins/onnxruntime/onnxruntime/core/mlas/lib/mlasi.h:1300:1: error: ?MLAS_INT32X4? does not name a type
 1300 | MLAS_INT32X4
      | ^~~~~~~~~~~~

I realized that the -mlsx (loongarch SIMD) compilation parameter is missing when compiling platform.cpp.
I'm not quite sure if my cmake command parameters are correct or not
Perhaps CMAKE_SYSTEM_PROCESSOR is not set correctly

/home/yala/work/plugins/la-cross-tools/bin/loongarch64-unknown-linux-gnu-g++ -DCPUINFO_SUPPORTED_PLATFORM=0 -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DNSYNC_ATOMIC_CPP11 -DORT_ENABLE_STREAM -DORT_NO_RTTI -DPLATFORM_POSIX -D_GNU_SOURCE -I/home/yala/work/plugins/onnxruntime/build-cross/_deps/utf8_range-src -I/home/yala/work/plugins/onnxruntime/include/onnxruntime -I/home/yala/work/plugins/onnxruntime/include/onnxruntime/core/session -I/home/yala/work/plugins/onnxruntime/build-cross/_deps/pytorch_cpuinfo-src/include -I/home/yala/work/plugins/onnxruntime/build-cross/_deps/google_nsync-src/public -I/home/yala/work/plugins/onnxruntime/build-cross -I/home/yala/work/plugins/onnxruntime/onnxruntime -I/home/yala/work/plugins/onnxruntime/build-cross/_deps/abseil_cpp-src -I/home/yala/work/plugins/onnxruntime/onnxruntime/core/mlas/inc -I/home/yala/work/plugins/onnxruntime/onnxruntime/core/mlas/lib -I/home/yala/work/plugins/onnxruntime/build-cross/_deps/gsl-src/include -ffunction-sections -fdata-sections -Wno-restrict -DCPUINFO_SUPPORTED -fPIC -fno-rtti -Wall -Wextra -Wno-deprecated-copy -Wno-nonnull-compare -Werror -MD -MT CMakeFiles/onnxruntime_mlas.dir/home/yala/work/plugins/onnxruntime/onnxruntime/core/mlas/lib/platform.cpp.o -MF CMakeFiles/onnxruntime_mlas.dir/home/yala/work/plugins/onnxruntime/onnxruntime/core/mlas/lib/platform.cpp.o.d -o CMakeFiles/onnxruntime_mlas.dir/home/yala/work/plugins/onnxruntime/onnxruntime/core/mlas/lib/platform.cpp.o -c /home/yala/work/plugins/onnxruntime/onnxruntime/core/mlas/lib/platform.cpp

@snnn
Copy link
Member

snnn commented Oct 18, 2023

Nice! Would you please confirm that the cross-compile tool you showed to me is publicly available? Is it an official package from Fedora project?

@junchao-loongson
Copy link
Contributor Author

cross-compile tool source code come form gcc repository(commit id is cead92b7fc4d7a545dcf2f02397120e3c9afe1a3)
I just compiled it to binary and provided this temporary link

@junchao-loongson
Copy link
Contributor Author

We've put this toolchain into loongson's official repository

https://github.com/loongson/build-tools/releases/download/2023.08.08/x86_64-cross-tools-loongarch64-gcc-libc.tar.xz

@snnn
Copy link
Member

snnn commented Oct 19, 2023

I will go ahead and merge this PR.

@snnn
Copy link
Member

snnn commented Oct 19, 2023

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows ARM64 QNN CI Pipeline, Windows CPU CI Pipeline

@snnn
Copy link
Member

snnn commented Oct 19, 2023

/azp run Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows x64 QNN CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

@azure-pipelines
Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 7 pipeline(s).

@snnn
Copy link
Member

snnn commented Oct 19, 2023

Build error:
onnxruntime/core/mlas/lib/mlasi.h:1800:48: error: ‘MALS_INT32X4’ was not declared in this scope; did you mean ‘MLAS_INT32X4’?

@junchao-loongson
Copy link
Contributor Author

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows ARM64 QNN CI Pipeline, Windows CPU CI Pipeline

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 17937 in repo microsoft/onnxruntime

@junchao-loongson
Copy link
Contributor Author

We fixed this issue. Also fixed some issues caused by clang-fomat

@snnn
Copy link
Member

snnn commented Oct 20, 2023

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows ARM64 QNN CI Pipeline, Windows CPU CI Pipeline

@snnn
Copy link
Member

snnn commented Oct 20, 2023

/azp run Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows x64 QNN CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

@azure-pipelines
Copy link

Azure Pipelines successfully started running 7 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@snnn
Copy link
Member

snnn commented Oct 23, 2023

/azp run Linux CPU CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@junchao-loongson
Copy link
Contributor Author

hello~
what else do we need to do to merge this patch

@snnn
Copy link
Member

snnn commented Nov 7, 2023

No. Thanks. I've sent this PR to @yufenglee to review.

@lixing-star
Copy link

@snnn , how about the code review progress? thanks.

@snnn snnn merged commit 4abec97 into microsoft:main Dec 7, 2023
57 of 60 checks passed
@snnn
Copy link
Member

snnn commented Dec 7, 2023

I will setup a CI build pipeline for this.

@lixing-star
Copy link

thanks. We will also rebuild the main code for checking our code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants