Skip to content

Commit

Permalink
Merge
Browse files Browse the repository at this point in the history
  • Loading branch information
skottmckay committed Jul 24, 2024
2 parents 5a45235 + 86cedc6 commit fdd23e6
Show file tree
Hide file tree
Showing 277 changed files with 2,480 additions and 905 deletions.
2 changes: 1 addition & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# This sets the default behaviour, overriding core.autocrlf
# This sets the default behavior, overriding core.autocrlf
* text=auto

# All source files should have unix line-endings in the repository,
Expand Down
2 changes: 1 addition & 1 deletion .pipelines/nuget_config/x64/packages.config
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<?xml version="1.0" encoding="utf-8"?>
<packages>
<package id="python" version="3.9.7" targetFramework="native" />
<package id="Microsoft.AI.DirectML" version="1.14.1" targetFramework="native" />
<package id="Microsoft.AI.DirectML" version="1.15.0" targetFramework="native" />
<package id="Microsoft.Windows.CppWinRT" version="2.0.201201.7" targetFramework="native" />
</packages>
2 changes: 1 addition & 1 deletion .pipelines/nuget_config/x86/packages.config
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<?xml version="1.0" encoding="utf-8"?>
<packages>
<package id="pythonx86" version="3.9.7" targetFramework="native" />
<package id="Microsoft.AI.DirectML" version="1.14.1" targetFramework="native" />
<package id="Microsoft.AI.DirectML" version="1.15.0" targetFramework="native" />
<package id="Microsoft.Windows.CppWinRT" version="2.0.201201.7" targetFramework="native" />
</packages>
2 changes: 1 addition & 1 deletion ThirdPartyNotices.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4820,7 +4820,7 @@ SOFTWARE.

----------------------------------------------------------------------------

This is the MIT/Expat Licence. For more information see:
This is the MIT/Expat License. For more information see:

1. http://www.opensource.org/licenses/mit-license.php

Expand Down
10 changes: 5 additions & 5 deletions cgmanifests/generated/cgmanifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@
"component": {
"type": "git",
"git": {
"commitHash": "344117638c8ff7e239044fd0fa7085839fc03021",
"commitHash": "a6ad7fbbdc2e14fab82bb8a6d27760d700198cbf",
"repositoryUrl": "https://github.com/google/benchmark.git"
},
"comments": "google_benchmark"
Expand All @@ -136,7 +136,7 @@
"component": {
"type": "git",
"git": {
"commitHash": "530d5c8c84abd2a46f38583ee817743c9b3a42b4",
"commitHash": "e39786088138f2749d64e9e90e0f9902daa77c40",
"repositoryUrl": "https://github.com/google/googletest.git"
},
"comments": "googletest"
Expand Down Expand Up @@ -256,7 +256,7 @@
"component": {
"type": "git",
"git": {
"commitHash": "3e9dfa2866941655c56877882565e7577de6fc7b",
"commitHash": "941f45bcb51457884fa1afd6e24a67377d70f75c",
"repositoryUrl": "https://github.com/pybind/pybind11.git"
},
"comments": "pybind11"
Expand All @@ -266,7 +266,7 @@
"component": {
"type": "git",
"git": {
"commitHash": "959002f82d7962a473d8bf301845f2af720e0aa4",
"commitHash": "ca678952a9a8eaa6de112d154e8e104b22f9ab3f",
"repositoryUrl": "https://github.com/pytorch/cpuinfo.git"
},
"comments": "pytorch_cpuinfo"
Expand All @@ -276,7 +276,7 @@
"component": {
"type": "git",
"git": {
"commitHash": "2b354c6ad0d0479dcff68dab23fb0d1143a482c2",
"commitHash": "6dcd83d60f7944926bfd308cc13979fc53dd69ca",
"repositoryUrl": "https://github.com/google/re2.git"
},
"comments": "re2"
Expand Down
10 changes: 5 additions & 5 deletions cmake/deps.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ eigen;https://gitlab.com/libeigen/eigen/-/archive/e7248b26a1ed53fa030c5c459f7ea0
flatbuffers;https://github.com/google/flatbuffers/archive/refs/tags/v23.5.26.zip;59422c3b5e573dd192fead2834d25951f1c1670c
fp16;https://github.com/Maratyszcza/FP16/archive/0a92994d729ff76a58f692d3028ca1b64b145d91.zip;b985f6985a05a1c03ff1bb71190f66d8f98a1494
fxdiv;https://github.com/Maratyszcza/FXdiv/archive/63058eff77e11aa15bf531df5dd34395ec3017c8.zip;a5658f4036402dbca7cebee32be57fb8149811e1
google_benchmark;https://github.com/google/benchmark/archive/refs/tags/v1.8.3.zip;bf9870756ee3f8d2d3b346b24ee3600a41c74d3d
google_benchmark;https://github.com/google/benchmark/archive/refs/tags/v1.8.5.zip;cd47d3d272faf353600c8cc2fdec2b52d6f69177
google_nsync;https://github.com/google/nsync/archive/refs/tags/1.26.0.zip;5e7c00ef6bf5b787386fc040067903ec774e2752
googletest;https://github.com/google/googletest/archive/530d5c8c84abd2a46f38583ee817743c9b3a42b4.zip;5e3a61db2aa975cfd0f97ba92c818744e7fa7034
googletest;https://github.com/google/googletest/archive/refs/tags/v1.15.0.zip;9d2d0af8d77ac726ea55d44a8fa727ec98311349
googlexnnpack;https://github.com/google/XNNPACK/archive/0da379fc4808f9601faef392352018c741c0f297.zip;663883491e380b628e0a5b162b5f2658032fae73
json;https://github.com/nlohmann/json/archive/refs/tags/v3.10.5.zip;f257f8dc27c5b8c085dc887b40cddd18ae1f725c
microsoft_gsl;https://github.com/microsoft/GSL/archive/refs/tags/v4.0.0.zip;cf368104cd22a87b4dd0c80228919bb2df3e2a14
Expand All @@ -48,9 +48,9 @@ protoc_linux_aarch64;https://github.com/protocolbuffers/protobuf/releases/downlo
protoc_mac_universal;https://github.com/protocolbuffers/protobuf/releases/download/v21.12/protoc-21.12-osx-universal_binary.zip;23710c3d1c2036d8d65a6a22234372fa2d7af9ef
psimd;https://github.com/Maratyszcza/psimd/archive/072586a71b55b7f8c584153d223e95687148a900.zip;1f5454b01f06f9656b77e4a5e2e31d7422487013
pthreadpool;https://github.com/Maratyszcza/pthreadpool/archive/4fe0e1e183925bf8cfa6aae24237e724a96479b8.zip;07a0aa91dd9bf86f31b95497e00f31d8a261a4bd
pybind11;https://github.com/pybind/pybind11/archive/refs/tags/v2.12.0.zip;8482f57ed55c7b100672815a311d5450858723fb
pytorch_cpuinfo;https://github.com/pytorch/cpuinfo/archive/959002f82d7962a473d8bf301845f2af720e0aa4.zip;85da3caa60eb2b148613b443fbc2bfdc30689965
re2;https://github.com/google/re2/archive/refs/tags/2024-05-01.tar.gz;206cfee5ee0b4c6844680ba66275e9e8faa77405
pybind11;https://github.com/pybind/pybind11/archive/refs/tags/v2.13.1.zip;9255d5c8568debcc329dd42ed8f410ee139ac7b1
pytorch_cpuinfo;https://github.com/pytorch/cpuinfo/archive/ca678952a9a8eaa6de112d154e8e104b22f9ab3f.zip;138bf57d2a110935330d1048dce6d7b82d17d377
re2;https://github.com/google/re2/archive/refs/tags/2024-07-02.zip;646e1728269cde7fcef990bf4a8e87b047882e88
safeint;https://github.com/dcleblanc/SafeInt/archive/refs/tags/3.0.28.zip;23f252040ff6cb9f1fd18575b32fa8fb5928daac
tensorboard;https://github.com/tensorflow/tensorboard/archive/373eb09e4c5d2b3cc2493f0949dc4be6b6a45e81.zip;67b833913605a4f3f499894ab11528a702c2b381
cutlass;https://github.com/NVIDIA/cutlass/archive/refs/tags/v3.5.0.zip;ae038931b9fc2c416c17d9cda91d9706b343f56d
Expand Down
114 changes: 82 additions & 32 deletions cmake/external/abseil-cpp.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -50,46 +50,96 @@ endif()
# TODO: since multiple ORT's dependencies depend on Abseil, the list below would vary from version to version.
# We'd better to not manually manage the list.
set(ABSEIL_LIBS
absl::city
absl::absl_log
absl::log_internal_log_impl
absl::log_internal_strip
absl::log_internal_message
absl::log_internal_format
absl::synchronization
absl::str_format
absl::flags
absl::flat_hash_map
absl::flat_hash_set
absl::log_internal_globals
absl::kernel_timeout_internal
absl::str_format_internal
absl::hash
absl::inlined_vector
absl::low_level_hash
absl::node_hash_map
absl::node_hash_set
absl::log_internal_append_truncated
absl::absl_vlog_is_on
absl::flags_commandlineflag
absl::time
absl::symbolize
absl::graphcycles_internal
absl::log_internal_conditions
absl::strings
absl::malloc_internal
absl::demangle_internal
absl::optional
absl::raw_hash_set
absl::stacktrace
absl::base
absl::demangle_rust
absl::bad_optional_access
absl::strings_internal
absl::debugging_internal
absl::int128
absl::spinlock_wait
absl::decode_rust_punycode
absl::raw_logging_internal
absl::str_format
absl::str_format_internal
absl::flat_hash_set
absl::flat_hash_map
absl::node_hash_map
absl::node_hash_set
absl::compare
absl::base_internal
absl::nullability
absl::bounded_utf8_length_sequence
absl::log_severity
absl::type_traits
absl::atomic_hook
absl::bits
absl::fixed_array
absl::flags_commandlineflag_internal
absl::hash_container_defaults
absl::numeric_representation
absl::utility
absl::type_traits
absl::string_view
absl::node_slot_policy
absl::core_headers
absl::nullability
absl::dynamic_annotations
absl::utf8_for_code_point
absl::errno_saver
absl::absl_check
absl::hash_function_defaults
absl::function_ref
absl::city
absl::low_level_hash
absl::fixed_array
absl::variant
absl::meta
absl::log_internal_voidify
absl::log_sink
absl::log_internal_log_sink_set
absl::log_sink_registry
absl::log_entry
absl::log_globals
absl::log_internal_nullguard
absl::examine_stack
absl::inlined_vector
absl::log_internal_proto
absl::strerror
absl::log_internal_config
absl::raw_hash_map
absl::raw_hash_set
absl::container_memory
absl::algorithm_container
absl::span
absl::config
absl::synchronization
absl::base
absl::log_internal_nullstream
absl::vlog_config_internal
absl::flags_reflection
absl::flags_internal
absl::flags_config
absl::fast_type_id
absl::utility
absl::time_zone
absl::civil_time
absl::debugging_internal
absl::demangle_internal
absl::graphcycles_internal
absl::int128
absl::kernel_timeout_internal
absl::log_severity
absl::malloc_internal
absl::spinlock_wait
absl::stacktrace
absl::string_view
absl::strings
absl::strings_internal
absl::symbolize
absl::throw_delegate
absl::time
absl::time_zone)
absl::memory
absl::charset
absl::endian
absl::config)
2 changes: 1 addition & 1 deletion cmake/external/dml.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ if (NOT onnxruntime_USE_CUSTOM_DIRECTML)
set(NUGET_CONFIG ${PROJECT_SOURCE_DIR}/../NuGet.config)
set(PACKAGES_CONFIG ${PROJECT_SOURCE_DIR}/../packages.config)
get_filename_component(PACKAGES_DIR ${CMAKE_CURRENT_BINARY_DIR}/../packages ABSOLUTE)
set(DML_PACKAGE_DIR ${PACKAGES_DIR}/Microsoft.AI.DirectML.1.14.1)
set(DML_PACKAGE_DIR ${PACKAGES_DIR}/Microsoft.AI.DirectML.1.15.0)

# Restore nuget packages, which will pull down the DirectML redist package.
add_custom_command(
Expand Down
2 changes: 1 addition & 1 deletion cmake/onnxruntime.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ endif()

if(CMAKE_SYSTEM_NAME STREQUAL "Android" AND onnxruntime_MINIMAL_BUILD)
# target onnxruntime is a shared library, the dummy __cxa_demangle is only attach to it to avoid
# affecting downstream ort library users with the behaviour of dummy __cxa_demangle. So the dummy
# affecting downstream ort library users with the behavior of dummy __cxa_demangle. So the dummy
# __cxa_demangle must not expose to libonnxruntime_common.a. It works as when the linker is
# creating the DSO, our dummy __cxa_demangle always comes before libc++abi.a so the
# __cxa_demangle in libc++abi.a is discarded, thus, huge binary size reduction.
Expand Down
8 changes: 6 additions & 2 deletions cmake/onnxruntime_python.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,12 @@ endif()

onnxruntime_add_include_to_target(onnxruntime_pybind11_state Python::Module Python::NumPy)
target_include_directories(onnxruntime_pybind11_state PRIVATE ${ONNXRUNTIME_ROOT} ${pybind11_INCLUDE_DIRS})
if(onnxruntime_USE_CUDA AND onnxruntime_CUDNN_HOME)
target_include_directories(onnxruntime_pybind11_state PRIVATE ${onnxruntime_CUDNN_HOME}/include)
if(onnxruntime_USE_CUDA)
target_include_directories(onnxruntime_pybind11_state PRIVATE ${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES})
# cudnn_home is optional for Window when cuda and cudnn are installed in the same directory.
if(onnxruntime_CUDNN_HOME)
target_include_directories(onnxruntime_pybind11_state PRIVATE ${onnxruntime_CUDNN_HOME}/include)
endif()
endif()
if(onnxruntime_USE_CANN)
target_include_directories(onnxruntime_pybind11_state PRIVATE ${onnxruntime_CANN_HOME}/include)
Expand Down
2 changes: 1 addition & 1 deletion cmake/patches/composable_kernel/Fix_Clang_Build.patch
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ index c23746e7f..bc326c8b5 100644
find_package(HIP REQUIRED)
# Override HIP version in config.h, if necessary.
@@ -269,12 +248,6 @@ if( DEFINED CK_OVERRIDE_HIP_VERSION_PATCH )
message(STATUS "CK_HIP_VERSION_PATCH overriden with ${CK_OVERRIDE_HIP_VERSION_PATCH}")
message(STATUS "CK_HIP_VERSION_PATCH overridden with ${CK_OVERRIDE_HIP_VERSION_PATCH}")
endif()
message(STATUS "Build with HIP ${HIP_VERSION}")
-link_libraries(hip::device)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
diff --git a/include/cpuinfo.h b/include/cpuinfo.h
index c46b65e..8b83a64 100644
index 03f2776..eaf6497 100644
--- a/include/cpuinfo.h
+++ b/include/cpuinfo.h
@@ -18,7 +18,7 @@
#define CPUINFO_ARCH_X86 1
#define CPUINFO_ARCH_X86 1
#endif

-#if defined(__x86_64__) || defined(__x86_64) || defined(_M_X64) || defined(_M_AMD64)
+#if defined(__x86_64__) || (defined(_M_X64) && !defined(_M_ARM64EC)) || (defined(_M_AMD64) && !defined(_M_ARM64EC))
#define CPUINFO_ARCH_X86_64 1
+#if defined(__x86_64__) || defined(__x86_64) || (defined(_M_X64) && !defined(_M_ARM64EC)) || (defined(_M_AMD64) && !defined(_M_ARM64EC))
#define CPUINFO_ARCH_X86_64 1
#endif

@@ -26,7 +26,7 @@
#define CPUINFO_ARCH_ARM 1
#define CPUINFO_ARCH_ARM 1
#endif

-#if defined(__aarch64__) || defined(_M_ARM64)
+#if defined(__aarch64__) || defined(_M_ARM64) || defined(_M_ARM64EC)
#define CPUINFO_ARCH_ARM64 1
#define CPUINFO_ARCH_ARM64 1
#endif

Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Event {{name.0.value}}
Operator {{name.0.value}}
{{/inOperator}}
{{#inEii}}
Explict Interface Implementation {{name.0.value}}
Explicit Interface Implementation {{name.0.value}}
{{/inEii}}
{{#inVariable}}
Variable {{name.0.value}}
Expand Down
4 changes: 2 additions & 2 deletions dockerfiles/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
docker run -it onnxruntime-source
```

The docker file supports both x86_64 and ARM64(aarch64). You may use docker's "--platform" parameter to explictly specify which CPU architecture you want to build. For example:
The docker file supports both x86_64 and ARM64(aarch64). You may use docker's "--platform" parameter to explicitly specify which CPU architecture you want to build. For example:

```bash
docker build --platform linux/arm64/v8 -f Dockerfile.source
Expand Down Expand Up @@ -274,7 +274,7 @@ Note: You may add --use_tensorrt and --tensorrt_home options if you wish to use
Note: Resulting Docker image will have ONNX Runtime installed in /usr, and ONNX Runtime wheel copied to /onnxruntime directory.
Nothing else from ONNX Runtime source tree will be copied/installed to the image.

Note: When running the container you built in Docker, please either use 'nvidia-docker' command instead of 'docker', or use Docker command-line options to make sure NVIDIA runtime will be used and appropiate files mounted from host. Otherwise, CUDA libraries won't be found. You can also [set NVIDIA runtime as default in Docker](https://github.com/dusty-nv/jetson-containers#docker-default-runtime).
Note: When running the container you built in Docker, please either use 'nvidia-docker' command instead of 'docker', or use Docker command-line options to make sure NVIDIA runtime will be used and appropriate files mounted from host. Otherwise, CUDA libraries won't be found. You can also [set NVIDIA runtime as default in Docker](https://github.com/dusty-nv/jetson-containers#docker-default-runtime).

## MIGraphX
**Ubuntu 20.04, ROCm6.0, MIGraphX**
Expand Down
10 changes: 10 additions & 0 deletions docs/ORTModule_Training_Guidelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,16 @@ A classical usage of disabling the deep copy: when the deep copy before module e
export ORTMODULE_ENABLE_MEM_EFFICIENT_GRAD_MGMT=0 # Disable
```
#### ORTMODULE_ATEN_SDPA_FALLBACK
- **Feature Area**: *ORTMODULE/Optimizations*
- **Description**: By default, this is disabled. This env var can be used for enabling pre-export attention fall back to PyTorch's [_scaled_dot_product_efficient_attention](https://github.com/pytorch/pytorch/blob/c12a4f2e65ad41b739aab1a261e2336b4a79fcfb/aten/src/ATen/native/native_functions.yaml#L14778) ATen kernel for execution when calling torch.nn.functional.scaled_dot_product_attention. NOTE: only use this feature if user model leverages memory efficient attention WITHOUT masking (ie. attn_mask=None). Utilize GPU profiling looks like NVIDIA Nsight Systems to identify if user model leverages memory efficient attention.

```bash
export ORTMODULE_ATEN_SDPA_FALLBACK=1 # ENABLE
unset ORTMODULE_ATEN_SDPA_FALLBACK # DISABLE
```

### 2.2 Memory Optimization

Q: *Want to run a bigger batch size?*
Expand Down
Loading

0 comments on commit fdd23e6

Please sign in to comment.