From 7f2ca040b8b8f3fb67d0a849efc027837ec8548a Mon Sep 17 00:00:00 2001 From: Wei Kang Date: Fri, 15 Apr 2022 06:52:41 +0800 Subject: [PATCH] update v2.0-pre (#953) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Update doc URL. (#821) * Support indexing 2-axes RaggedTensor, Support slicing for RaggedTensor (#825) * Support index 2-axes RaggedTensor, Support slicing for RaggedTensor * Fix compiling errors * Fix unit test * Change RaggedTensor.data to RaggedTensor.values * Fix style * Add docs * Run nightly-cpu when pushing code to nightly-cpu branch * Prune with max_arcs in IntersectDense (#820) * Add checking for array constructor * Prune with max arcs * Minor fix * Fix typo * Fix review comments * Fix typo * Release v1.8 * Create a ragged tensor from a regular tensor. (#827) * Create a ragged tensor from a regular tensor. * Add tests for creating ragged tensors from regular tensors. * Add more tests. * Print ragged tensors in a way like what PyTorch is doing. * Fix test cases. * Trigger GitHub actions manually. (#829) * Run GitHub actions on merging. (#830) * Support printing ragged tensors in a more compact way. (#831) * Support printing ragged tensors in a more compact way. * Disable support for torch 1.3.1 * Fix test failures. * Add levenshtein alignment (#828) * Add levenshtein graph * Contruct k2.RaggedTensor in python part * Fix review comments, return aux_labels in ctc_graph * Fix tests * Fix bug of accessing symbols * Fix bug of accessing symbols * Change argument name, add levenshtein_distance interface * Fix test error, add tests for levenshtein_distance * Fix review comments and add unit test for c++ side * update the interface of levenshtein alignment * Fix review comments * Release v1.9 * Support a[b[i]] where both a and b are ragged tensors. (#833) * Display import error solution message on MacOS (#837) * Fix installation doc. (#841) * Fix installation doc. Remove Windows support. Will fix it later. * Fix style issues. * fix typos in the install instructions (#844) * make cmake adhere to the modernized way of finding packages outside default dirs (#845) * import torch first in the smoke tests to preven SEGFAULT (#846) * Add doc about how to install a CPU version of k2. (#850) * Add doc about how to install a CPU version of k2. * Remove property setter of Fsa.labels * Update Ubuntu version in GitHub CI since 16.04 reaches end-of-life. * Support PyTorch 1.10. (#851) * Fix test cases for k2.union() (#853) * Fix out-of-boundary access (read). (#859) * Update all the example codes in the docs (#861) * Update all the example codes in the docs I have run all the modified codes with the newest version k2. * do some changes * Fix compilation errors with CUB 1.15. (#865) * Update README. (#873) * Update README. * Fix typos. * Fix ctc graph (make aux_labels of final arcs -1) (#877) * Fix LICENSE location to k2 folder (#880) * Release v1.11. (#881) It contains bugfixes. * Update documentation for hash.h (#887) * Update documentation for hash.h * Typo fix * Wrap MonotonicLowerBound (#883) * Wrap MonotonicLowerBound * Add unit tests * Support int64; update documents * Remove extra commas after 'TOPSORTED' properity and fix RaggedTensor constructer parameter 'byte_offset' out-of-range bug. (#892) Co-authored-by: gzchenduisheng * Fix small typos (#896) * Fix k2.ragged.create_ragged_shape2 (#901) Before the fix, we have to specify both `row_splits` and `row_ids` while calling `k2.create_ragged_shape2` even if one of them is `None`. After this fix, we only need to specify one of them. * Add rnnt loss (#891) * Add cpp code of mutual information * mutual information working * Add rnnt loss * Add pruned rnnt loss * Minor Fixes * Minor fixes & fix code style * Fix cpp style * Fix code style * Fix s_begin values in padding positions * Fix bugs related to boundary; Fix s_begin padding value; Add more tests * Minor fixes * Fix comments * Add boundary to pruned loss tests * Use more efficient way to fix boundaries (#906) * Release v1.12 (#907) * Change the sign of the rnnt_loss and add reduction argument (#911) * Add right boundary constrains for s_begin * Minor fixes to the interface of rnnt_loss to make it return positive value * Fix comments * Release a new version * Minor fixes * Minor fixes to the docs * Fix building doc. (#908) * Fix building doc. * Minor fixes. * Minor fixes. * Fix building doc (#912) * Fix building doc * Fix flake8 * Support torch 1.10.x (#914) * Support torch 1.10.x * Fix installing PyTorch. * Update INSTALL.rst (#915) * Update INSTALL.rst Setting a few additional env variables to enable compilation from source *with CUDA GPU computation support enabled* * Fix torch/cuda/python versions in the doc. (#918) * Fix torch/cuda/python versions in the doc. * Minor fixes. * Fix building for CUDA 11.6 (#917) * Fix building for CUDA 11.6 * Minor fixes. * Implement Unstack (#920) * Implement unstack * Remove code does not relate to this PR * Remove for loop on output dim; add Unstack ragged * Add more docs * Fix comments * Fix docs & unit tests * SubsetRagged & PruneRagged (#919) * Extend interface of SubsampleRagged. * Add interface for pruning ragged tensor. * Draft of new RNN-T decoding method * Implements SubsampleRaggedShape * Implements PruneRagged * Rename subsample-> subset * Minor fixes * Fix comments Co-authored-by: Daniel Povey * Add Hash64 (#895) * Add hash64 * Fix tests * Resize hash64 * Fix comments * fix typo * Modified rnnt (#902) * Add modified mutual_information_recursion * Add modified rnnt loss * Using more efficient way to fix boundaries * Fix modified pruned rnnt loss * Fix the s_begin constrains of pruned loss for modified version transducer * Fix Stack (#925) * return the correct layer * unskip the test * Fix 'TypeError' of rnnt_loss_pruned function. (#924) * Fix 'TypeError' of rnnt_loss_simple function. Fix 'TypeError' exception when calling rnnt_loss_simple(..., return_grad=False) at validation steps. * Fix 'MutualInformationRecursionFunction.forward()' return type check error for pytorch < 1.10.x * Modify return type. * Add documents about class MutualInformationRecursionFunction. * Formated code style. * Fix rnnt_loss_smoothed return type. Co-authored-by: gzchenduisheng * Support torch 1.11.0 and CUDA 11.5 (#931) * Support torch 1.11.0 and CUDA 11.5 * Implement Rnnt decoding (#926) * first working draft of rnnt decoding * FormatOutput works... * Different num frames for FormatOutput works * Update docs * Fix comments, break advance into several stages, add more docs * Add python wrapper * Add more docs * Minor fixes * Fix comments * fix building docs (#933) * Release v1.14 * Remove unused DiscountedCumSum. (#936) * Fix compiler warnings. (#937) * Fix compiler warnings. * Minor fixes for RNN-T decoding. (#938) * Minor fixes for RNN-T decoding. * Removes arcs with label 0 from the TrivialGraph. (#939) * Implement linear_fsa_with_self_loops. (#940) * Implement linear_fsa_with_self_loops. * Fix the pruning with max-states (#941) * Rnnt allow different encoder/decoder dims (#945) * Allow different encoder and decoder dim in rnnt_pruning * Bug fixes * Supporting building k2 on Windows (#946) * Fix nightly windows CPU build (#948) * Fix nightly building k2 for windows. * Run nightly build only if there are new commits. * Check the versions of PyTorch and CUDA at the import time. (#949) * Check the versions of PyTorch and CUDA at the import time. * More straightforward message when CUDA support is missing (#950) * Implement ArrayOfRagged (#927) * Implement ArrayOfRagged * Fix issues and pass tests * fix style * change few statements of functions and move the definiation of template Array1OfRagged to header file * add offsets test code * Fix precision (#951) * Fix precision * Using different pow version for windows and *nix * Use int64_t pow * Minor fixes Co-authored-by: Fangjun Kuang Co-authored-by: Piotr Żelasko Co-authored-by: Jan "yenda" Trmal Co-authored-by: Mingshuang Luo <37799481+luomingshuang@users.noreply.github.com> Co-authored-by: Ludwig Kürzinger Co-authored-by: Daniel Povey Co-authored-by: drawfish Co-authored-by: gzchenduisheng Co-authored-by: alexei-v-ivanov Co-authored-by: Wang, Guanbo Co-authored-by: Nickolay V. Shmyrev Co-authored-by: LvHang --- .flake8 | 1 + .github/workflows/build-conda-cpu-macos.yml | 117 ++++++++ ...nda_cpu.yml => build-conda-cpu-ubuntu.yml} | 89 ++---- ...-conda.yml => build-conda-cpu-windows.yml} | 42 +-- .github/workflows/build-conda-cuda-ubuntu.yml | 127 ++++++++ .github/workflows/build-cpu-macos.yml | 123 ++++++++ .../{build-cpu.yml => build-cpu-ubuntu.yml} | 100 +++---- .../{windows.yml => build-cpu-windows.yml} | 69 +++-- .github/workflows/build-cuda-ubuntu.yml | 147 ++++++++++ .github/workflows/build.yml | 275 ------------------ .github/workflows/build_conda.yml | 259 ----------------- .github/workflows/nightly-cpu-macos.yml | 129 ++++++++ ...nightly-cpu.yml => nightly-cpu-ubuntu.yml} | 110 +++---- ...ly-windows.yml => nightly-cpu-windows.yml} | 115 +++++--- .../{nightly.yml => nightly-cuda-ubuntu.yml} | 27 +- .github/workflows/run-tests.yml | 8 +- ...eel-cpu-stable.yml => wheel-cpu-macos.yml} | 8 +- .../{wheel-cpu.yml => wheel-cpu-windows.yml} | 31 +- ...wheel-stable.yml => wheel-cuda-ubuntu.yml} | 9 +- .gitignore | 1 + CMakeLists.txt | 100 +++++-- cmake/moderngpu.cmake | 6 +- docs/source/installation/conda.rst | 2 +- docs/source/installation/for_developers.rst | 4 + docs/source/installation/from_source.rst | 6 +- docs/source/installation/images/README.md | 2 +- .../images/torch_ge_1.6.0-green.svg | 1 + docs/source/installation/index.rst | 6 +- docs/source/installation/pip.rst | 2 +- k2/csrc/CMakeLists.txt | 37 ++- k2/csrc/array_of_ragged.cu | 104 ++++++- k2/csrc/array_of_ragged.h | 119 +++++--- k2/csrc/array_of_ragged_test.cu | 34 +++ k2/csrc/benchmark/CMakeLists.txt | 1 + k2/csrc/fsa.h | 2 +- k2/csrc/host/CMakeLists.txt | 19 +- k2/csrc/log.h | 45 +-- k2/csrc/log_test.cu | 4 + k2/csrc/macros_test.cu | 4 +- k2/csrc/math.h | 24 +- k2/csrc/ragged_ops.cu | 58 +++- k2/csrc/ragged_test.cu | 11 - k2/csrc/rand_test.cu | 2 +- k2/csrc/rm_epsilon.cu | 8 +- k2/csrc/rnnt_decode.cu | 6 +- k2/csrc/tensor_ops.cu | 57 ++-- k2/csrc/tensor_ops_test.cu | 6 +- k2/csrc/test_utils.h | 7 +- k2/csrc/version.h.in | 6 +- k2/python/csrc/CMakeLists.txt | 13 +- k2/python/csrc/torch.h | 32 -- k2/python/csrc/torch/fsa.cu | 4 +- k2/python/csrc/torch/fsa_algo.cu | 85 ++---- k2/python/csrc/torch/ragged_ops.cu | 7 +- k2/python/csrc/torch/v2/any.cu | 64 ++-- k2/python/csrc/torch/v2/ragged_shape.cu | 10 +- k2/python/host/k2host/fsa.py | 4 +- k2/python/k2/__init__.py | 21 ++ k2/python/k2/rnnt_decode.py | 2 +- k2/python/k2/rnnt_loss.py | 30 +- k2/python/k2/torch_version.py.in | 17 ++ .../tests/linear_fsa_with_self_loops_test.py | 2 +- k2/python/tests/mutual_information_test.py | 6 +- .../github_actions/generate_build_matrix.py | 111 +++++++ scripts/github_actions/run-nightly-build.py | 35 +++ 65 files changed, 1706 insertions(+), 1207 deletions(-) create mode 100644 .github/workflows/build-conda-cpu-macos.yml rename .github/workflows/{build_conda_cpu.yml => build-conda-cpu-ubuntu.yml} (52%) rename .github/workflows/{windows-conda.yml => build-conda-cpu-windows.yml} (77%) create mode 100644 .github/workflows/build-conda-cuda-ubuntu.yml create mode 100644 .github/workflows/build-cpu-macos.yml rename .github/workflows/{build-cpu.yml => build-cpu-ubuntu.yml} (55%) rename .github/workflows/{windows.yml => build-cpu-windows.yml} (72%) create mode 100644 .github/workflows/build-cuda-ubuntu.yml delete mode 100644 .github/workflows/build.yml delete mode 100644 .github/workflows/build_conda.yml create mode 100644 .github/workflows/nightly-cpu-macos.yml rename .github/workflows/{nightly-cpu.yml => nightly-cpu-ubuntu.yml} (54%) rename .github/workflows/{nightly-windows.yml => nightly-cpu-windows.yml} (61%) rename .github/workflows/{nightly.yml => nightly-cuda-ubuntu.yml} (80%) rename .github/workflows/{wheel-cpu-stable.yml => wheel-cpu-macos.yml} (93%) rename .github/workflows/{wheel-cpu.yml => wheel-cpu-windows.yml} (63%) rename .github/workflows/{wheel-stable.yml => wheel-cuda-ubuntu.yml} (95%) create mode 100644 docs/source/installation/images/torch_ge_1.6.0-green.svg create mode 100644 k2/python/k2/torch_version.py.in create mode 100755 scripts/github_actions/generate_build_matrix.py create mode 100755 scripts/github_actions/run-nightly-build.py diff --git a/.flake8 b/.flake8 index 71dca8579..bba1867c0 100644 --- a/.flake8 +++ b/.flake8 @@ -14,6 +14,7 @@ exclude = get_version.py build, k2/python/host, + k2/python/k2/__init__.py, k2/python/k2/ctc_loss.py, docs diff --git a/.github/workflows/build-conda-cpu-macos.yml b/.github/workflows/build-conda-cpu-macos.yml new file mode 100644 index 000000000..623d3a472 --- /dev/null +++ b/.github/workflows/build-conda-cpu-macos.yml @@ -0,0 +1,117 @@ +# Copyright 2021 Xiaomi Corp. (author: Fangjun Kuang) + +# See ../../LICENSE for clarification regarding multiple authors +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# refer to https://github.com/actions/starter-workflows/pull/47/files + + +# Note, we have to set +# +# export DYLD_LIBRARY_PATH=$CONDA_PREFIX/lib/python3.8/site-packages:$DYLD_LIBRARY_PATH +# +# before running `python3 -m k2.version` +# +# See https://github.com/openPMD/openPMD-api/issues/593#issuecomment-552690470 + + +name: build_conda_cpu_macos + +on: + push: + tags: + - '*' + +env: + K2_BUILD_TYPE: Release + +jobs: + generate_build_matrix: + # see https://github.com/pytorch/pytorch/pull/50633 + runs-on: ubuntu-latest + outputs: + matrix: ${{ steps.set-matrix.outputs.matrix }} + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Generating build matrix + id: set-matrix + run: | + # outputting for debugging purposes + python scripts/github_actions/generate_build_matrix.py + MATRIX=$(python scripts/github_actions/generate_build_matrix.py) + echo "::set-output name=matrix::${MATRIX}" + + build_conda_cpu_macos: + needs: generate_build_matrix + runs-on: macos-10.15 + strategy: + fail-fast: false + matrix: + ${{ fromJson(needs.generate_build_matrix.outputs.matrix) }} + + steps: + # refer to https://github.com/actions/checkout + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + + - uses: conda-incubator/setup-miniconda@v2 + with: + auto-update-conda: true + python-version: ${{ matrix.python-version }} + activate-environment: k2 + + - name: Display Python version + shell: bash -l {0} + run: | + python3 -c "import sys; print(sys.version)" + which python3 + + - name: Install conda dependencies + shell: bash -l {0} + run: | + conda install -y -q anaconda-client + conda install -y -q conda-build + conda install -y -q -c pytorch pytorch=${{ matrix.torch }} cpuonly + + - name: Display conda info + shell: bash -l {0} + run: | + which conda + conda env list + conda info + + - name: Build k2 + shell: bash -l {0} + env: + K2_PYTHON_VERSION: ${{ matrix.python-version}} + K2_TORCH_VERSION: ${{ matrix.torch }} + K2_CONDA_TOKEN: ${{ secrets.K2_CONDA_TOKEN}} + K2_IS_GITHUB_ACTIONS: 1 + K2_IS_FOR_CONDA: 1 + run: | + export K2_BUILD_TYPE=$K2_BUILD_TYPE + ./scripts/build_conda_cpu.sh + + - name: Display generated files + run: | + ls -lh /usr/local/miniconda/envs/k2/conda-bld/osx-64 + + - name: Upload generated files + uses: actions/upload-artifact@v2 + with: + name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-${{ matrix.os }} + path: /usr/local/miniconda/envs/k2/conda-bld/osx-64/*.tar.bz2 diff --git a/.github/workflows/build_conda_cpu.yml b/.github/workflows/build-conda-cpu-ubuntu.yml similarity index 52% rename from .github/workflows/build_conda_cpu.yml rename to .github/workflows/build-conda-cpu-ubuntu.yml index fe3e552ab..72cf5b412 100644 --- a/.github/workflows/build_conda_cpu.yml +++ b/.github/workflows/build-conda-cpu-ubuntu.yml @@ -26,68 +26,41 @@ # See https://github.com/openPMD/openPMD-api/issues/593#issuecomment-552690470 -name: build_conda_cpu +name: build_conda_cpu_ubuntu on: push: - branches: - - conda-cpu + tags: + - '*' env: K2_BUILD_TYPE: Release jobs: - build_conda_cpu: - runs-on: ${{ matrix.os }} + generate_build_matrix: + # see https://github.com/pytorch/pytorch/pull/50633 + runs-on: ubuntu-latest + outputs: + matrix: ${{ steps.set-matrix.outputs.matrix }} + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Generating build matrix + id: set-matrix + run: | + # outputting for debugging purposes + python scripts/github_actions/generate_build_matrix.py + MATRIX=$(python scripts/github_actions/generate_build_matrix.py) + echo "::set-output name=matrix::${MATRIX}" + + build_conda_cpu_ubuntu: + needs: generate_build_matrix + runs-on: ubuntu-18.04 strategy: fail-fast: false matrix: - os: [ubuntu-18.04, macos-10.15] - python-version: ["3.6", "3.7", "3.8", "3.9", "3.10"] - # from https://download.pytorch.org/whl/torch_stable.html - # - # PyTorch 1.11.x supports 3.7, 3.8, 3.9, 3.10 - # PyTorch 1.10, 1.9.x, 1.8.x, and 1.7.1 support 3.6, 3.7, 3.8, 3.9 - # PyTorch 1.7.0, 1.6.0, and 1.5.x support 3.6, 3.7, 3.8 - # - # Other PyTorch versions are not tested - # - torch: ["1.5.0", "1.5.1", "1.6.0", "1.7.0", "1.7.1", "1.8.0", "1.8.1", "1.9.0", "1.9.1", "1.10.0", "1.10.1", "1.10.2", "1.11.0"] - exclude: - - python-version: "3.9" # exclude Python 3.9 for [1.5.0, 1.5.1, 1.6.0, 1.7.0] - torch: "1.5.0" - - python-version: "3.9" - torch: "1.5.1" - - python-version: "3.9" - torch: "1.6.0" - - python-version: "3.9" - torch: "1.7.0" - - python-version: "3.10" # exclude Python 3.10 for [1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2] - torch: "1.5.0" - - python-version: "3.10" - torch: "1.5.1" - - python-version: "3.10" - torch: "1.6.0" - - python-version: "3.10" - torch: "1.7.0" - - python-version: "3.10" - torch: "1.7.1" - - python-version: "3.10" - torch: "1.8.0" - - python-version: "3.10" - torch: "1.8.1" - - python-version: "3.10" - torch: "1.9.0" - - python-version: "3.10" - torch: "1.9.1" - - python-version: "3.10" - torch: "1.10.0" - - python-version: "3.10" - torch: "1.10.1" - - python-version: "3.10" - torch: "1.10.2" - - python-version: "3.6" # exclude Python 3.6 for [1.11.0] - torch: "1.11.0" + ${{ fromJson(needs.generate_build_matrix.outputs.matrix) }} steps: # refer to https://github.com/actions/checkout @@ -134,25 +107,11 @@ jobs: ./scripts/build_conda_cpu.sh - name: Display generated files - if: startsWith(matrix.os, 'ubuntu') run: | ls -lh /usr/share/miniconda/envs/k2/conda-bld/linux-64 - - name: Display generated files - if: startsWith(matrix.os, 'macos') - run: | - ls -lh /usr/local/miniconda/envs/k2/conda-bld/osx-64 - - name: Upload generated files - if: startsWith(matrix.os, 'ubuntu') uses: actions/upload-artifact@v2 with: name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-${{ matrix.os }} path: /usr/share/miniconda/envs/k2/conda-bld/linux-64/*.tar.bz2 - - - name: Upload generated files - if: startsWith(matrix.os, 'macos') - uses: actions/upload-artifact@v2 - with: - name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-${{ matrix.os }} - path: /usr/local/miniconda/envs/k2/conda-bld/osx-64/*.tar.bz2 diff --git a/.github/workflows/windows-conda.yml b/.github/workflows/build-conda-cpu-windows.yml similarity index 77% rename from .github/workflows/windows-conda.yml rename to .github/workflows/build-conda-cpu-windows.yml index 00bab9adc..551c13ce3 100644 --- a/.github/workflows/windows-conda.yml +++ b/.github/workflows/build-conda-cpu-windows.yml @@ -15,36 +15,42 @@ # limitations under the License. -name: build-windows-conda +name: build_conda_cpu_windows on: push: - branches: - - conda-win + tags: + - '*' env: BUILD_TYPE: Release jobs: - build-windows-conda: + generate_build_matrix: + # see https://github.com/pytorch/pytorch/pull/50633 + runs-on: ubuntu-latest + outputs: + matrix: ${{ steps.set-matrix.outputs.matrix }} + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Generating build matrix + id: set-matrix + run: | + # outputting for debugging purposes + python scripts/github_actions/generate_build_matrix.py + MATRIX=$(python scripts/github_actions/generate_build_matrix.py) + echo "::set-output name=matrix::${MATRIX}" + + build_conda_cpu_windows: # see https://github.com/actions/virtual-environments/blob/win19/20210525.0/images/win/Windows2019-Readme.md - runs-on: ${{ matrix.os }} + needs: generate_build_matrix + runs-on: windows-2019 strategy: fail-fast: false matrix: - os: [windows-2019] - # Python 3.9 is for PyTorch 1.7.1, 1.8.x, 1.9.0, - python-version: [3.6, 3.7, 3.8, 3.9] - torch: ["1.5.0", "1.5.1", "1.6.0", "1.7.0", "1.7.1", "1.8.0", "1.8.1", "1.9.0"] - exclude: - - python-version: 3.9 # exclude Python 3.9 for [1.5.0, 1.5.1, 1.6.0, 1.7.0] - torch: "1.5.0" - - python-version: 3.9 - torch: "1.5.1" - - python-version: 3.9 - torch: "1.6.0" - - python-version: 3.9 - torch: "1.7.0" + ${{ fromJson(needs.generate_build_matrix.outputs.matrix) }} steps: - uses: actions/checkout@v2 diff --git a/.github/workflows/build-conda-cuda-ubuntu.yml b/.github/workflows/build-conda-cuda-ubuntu.yml new file mode 100644 index 000000000..fa6dca28e --- /dev/null +++ b/.github/workflows/build-conda-cuda-ubuntu.yml @@ -0,0 +1,127 @@ +# Copyright 2021 Xiaomi Corp. (author: Fangjun Kuang) + +# See ../../LICENSE for clarification regarding multiple authors +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# refer to https://github.com/actions/starter-workflows/pull/47/files + +name: build_conda_cuda_ubuntu + +on: + push: + tags: + - '*' + +env: + K2_BUILD_TYPE: Release + +jobs: + generate_build_matrix: + # see https://github.com/pytorch/pytorch/pull/50633 + runs-on: ubuntu-latest + outputs: + matrix: ${{ steps.set-matrix.outputs.matrix }} + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Generating build matrix + id: set-matrix + run: | + # outputting for debugging purposes + python scripts/github_actions/generate_build_matrix.py + MATRIX=$(python scripts/github_actions/generate_build_matrix.py --enable-cuda) + echo "::set-output name=matrix::${MATRIX}" + + build_conda_cuda_ubuntu: + needs: generate_build_matrix + runs-on: ubuntu-18.04 + strategy: + fail-fast: false + matrix: + ${{ fromJson(needs.generate_build_matrix.outputs.matrix) }} + + steps: + # refer to https://github.com/actions/checkout + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + + - name: Install CUDA Toolkit ${{ matrix.cuda }} + shell: bash -l {0} + env: + cuda: ${{ matrix.cuda }} + run: | + source ./scripts/github_actions/install_cuda.sh + echo "CUDA_HOME=${CUDA_HOME}" >> $GITHUB_ENV + echo "${CUDA_HOME}/bin" >> $GITHUB_PATH + echo "LD_LIBRARY_PATH=${CUDA_HOME}/lib:${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}" >> $GITHUB_ENV + + - name: Display NVCC version + shell: bash -l {0} + run: | + which nvcc + nvcc --version + + - uses: conda-incubator/setup-miniconda@v2 + with: + auto-update-conda: true + python-version: ${{ matrix.python-version }} + activate-environment: k2 + + - name: Display Python version + shell: bash -l {0} + run: | + python3 -c "import sys; print(sys.version)" + which python3 + + - name: Install conda dependencies + shell: bash -l {0} + run: | + conda install -y -q anaconda-client + conda install -y -q conda-build + conda install -y -q bs4 requests tqdm + conda install -y -q -c pytorch -c conda-forge pytorch=${{ matrix.torch }} cudatoolkit=${{ matrix.cuda }} + + - name: Display conda info + shell: bash -l {0} + run: | + which conda + conda env list + conda info + nproc + + - name: Install git lfs + run: | + sudo apt-get install -y git-lfs + + - name: Download cudnn 8.0 + shell: bash -l {0} + env: + cuda: ${{ matrix.cuda }} + run: | + ./scripts/github_actions/install_cudnn.sh + + - name: Build k2 + shell: bash -l {0} + env: + K2_CUDA_VERSION: ${{ matrix.cuda }} + K2_PYTHON_VERSION: ${{ matrix.python-version}} + K2_TORCH_VERSION: ${{ matrix.torch }} + K2_CONDA_TOKEN: ${{ secrets.K2_CONDA_TOKEN}} + K2_IS_GITHUB_ACTIONS: 1 + K2_IS_FOR_CONDA: 1 + run: | + export K2_BUILD_TYPE=$K2_BUILD_TYPE + ./scripts/build_conda.sh diff --git a/.github/workflows/build-cpu-macos.yml b/.github/workflows/build-cpu-macos.yml new file mode 100644 index 000000000..392683ba8 --- /dev/null +++ b/.github/workflows/build-cpu-macos.yml @@ -0,0 +1,123 @@ +# Copyright 2021 Fangjun Kuang (csukuangfj@gmail.com) + +# See ../../LICENSE for clarification regarding multiple authors +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# refer to https://github.com/actions/starter-workflows/pull/47/files + +name: build-cpu-macos + +on: + push: + branches: + - master + pull_request: + types: [labeled] + +env: + BUILD_TYPE: Release + +jobs: + generate_build_matrix: + # see https://github.com/pytorch/pytorch/pull/50633 + runs-on: macos-latest + outputs: + matrix: ${{ steps.set-matrix.outputs.matrix }} + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Generating build matrix + id: set-matrix + run: | + # outputting for debugging purposes + python scripts/github_actions/generate_build_matrix.py + MATRIX=$(python scripts/github_actions/generate_build_matrix.py) + echo "::set-output name=matrix::${MATRIX}" + + build-cpu-macos: + if: github.event.label.name == 'ready' || github.event_name == 'push' + needs: generate_build_matrix + runs-on: macos-10.15 + strategy: + fail-fast: false + matrix: + ${{ fromJson(needs.generate_build_matrix.outputs.matrix) }} + + steps: + # refer to https://github.com/actions/checkout + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + + - uses: szenius/set-timezone@v1.0 + with: + timezoneLinux: "Asia/Shanghai" + + - name: Display date and time + run: date + + - name: Display clang version + run: | + clang --version + + - name: Setup Python ${{ matrix.python-version }} + uses: actions/setup-python@v2 + with: + python-version: ${{ matrix.python-version }} + + - name: Display Python version + run: python -c "import sys; print(sys.version)" + + - name: Install PyTorch ${{ matrix.torch }} + shell: bash + run: | + python3 -m pip install -qq --upgrade pip + python3 -m pip install -qq wheel twine dataclasses + python3 -m pip install -qq torch==${{ matrix.torch }} + + python3 -c "import torch; print('torch version:', torch.__version__)" + + - name: Build k2 + shell: bash + run: | + pwd + mkdir build + cd build + cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DK2_WITH_CUDA=OFF .. + cat k2/csrc/version.h + cat CMakeCache.txt + + make VERBOSE=1 -j2 + + - name: Run tests + shell: bash + run: | + cd build + ctest --output-on-failure + + - name: Build wheel + shell: bash + run: | + export K2_CMAKE_ARGS="-DCMAKE_BUILD_TYPE=$BUILD_TYPE -DK2_WITH_CUDA=OFF" + export K2_MAKE_ARGS="-j2" + python3 setup.py bdist_wheel + ls -lh dist/ + ls -lh build/* + + - name: Upload Wheel + uses: actions/upload-artifact@v2 + with: + name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-macos-10.15-cpu + path: dist/*.whl diff --git a/.github/workflows/build-cpu.yml b/.github/workflows/build-cpu-ubuntu.yml similarity index 55% rename from .github/workflows/build-cpu.yml rename to .github/workflows/build-cpu-ubuntu.yml index 2bb539ff9..3cd7ec443 100644 --- a/.github/workflows/build-cpu.yml +++ b/.github/workflows/build-cpu-ubuntu.yml @@ -16,13 +16,12 @@ # refer to https://github.com/actions/starter-workflows/pull/47/files -name: build-cpu +name: build-cpu-ubuntu on: push: branches: - master - - v2.0-pre pull_request: types: [labeled] @@ -30,52 +29,31 @@ env: BUILD_TYPE: Release jobs: - build-cpu: + generate_build_matrix: + # see https://github.com/pytorch/pytorch/pull/50633 + runs-on: ubuntu-latest + outputs: + matrix: ${{ steps.set-matrix.outputs.matrix }} + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Generating build matrix + id: set-matrix + run: | + # outputting for debugging purposes + python scripts/github_actions/generate_build_matrix.py + MATRIX=$(python scripts/github_actions/generate_build_matrix.py) + echo "::set-output name=matrix::${MATRIX}" + + build-cpu-ubuntu: if: github.event.label.name == 'ready' || github.event_name == 'push' - runs-on: ${{ matrix.os }} + needs: generate_build_matrix + runs-on: ubuntu-18.04 strategy: fail-fast: false matrix: - os: [ubuntu-18.04, macos-10.15] - torch: ["1.5.0", "1.5.1", "1.6.0", "1.7.0", "1.7.1", "1.8.0", "1.8.1", "1.9.0", "1.9.1", "1.10.0", "1.10.1", "1.10.2", "1.11.0"] - # Python 3.9 is for PyTorch 1.7.1, 1.8.x, 1.9.x, 1.10.x, 1.11.x - # Python 3.10 is for PyTorch 1.11.x - python-version: ["3.6", "3.7", "3.8", "3.9", "3.10"] - exclude: - - python-version: "3.10" # exclude Python 3.9 for [1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2] - torch: "1.5.0" - - python-version: "3.10" - torch: "1.5.1" - - python-version: "3.10" - torch: "1.6.0" - - python-version: "3.10" - torch: "1.7.0" - - python-version: "3.10" - torch: "1.7.1" - - python-version: "3.10" - torch: "1.8.0" - - python-version: "3.10" - torch: "1.8.1" - - python-version: "3.10" - torch: "1.9.0" - - python-version: "3.10" - torch: "1.9.1" - - python-version: "3.10" - torch: "1.10.0" - - python-version: "3.10" - torch: "1.10.1" - - python-version: "3.10" - torch: "1.10.2" - - python-version: 3.9 # exclude Python 3.9 for [1.5.0, 1.5.1, 1.6.0, 1.7.0] - torch: "1.5.0" - - python-version: 3.9 - torch: "1.5.1" - - python-version: 3.9 - torch: "1.6.0" - - python-version: 3.9 - torch: "1.7.0" - - python-version: 3.6 # exclude Python 3.6 for [1.11.0] - torch: "1.11.0" + ${{ fromJson(needs.generate_build_matrix.outputs.matrix) }} steps: # refer to https://github.com/actions/checkout @@ -91,17 +69,11 @@ jobs: run: date - name: Install GCC 7 - if: startsWith(matrix.os, 'ubuntu') run: | sudo apt-get install -y gcc-7 g++-7 echo "CC=/usr/bin/gcc-7" >> $GITHUB_ENV echo "CXX=/usr/bin/g++-7" >> $GITHUB_ENV - - name: Display clang version - if: startsWith(matrix.os, 'macos') - run: | - clang --version - - name: Setup Python ${{ matrix.python-version }} uses: actions/setup-python@v2 with: @@ -111,26 +83,15 @@ jobs: run: python -c "import sys; print(sys.version)" - name: Install PyTorch ${{ matrix.torch }} - if: startsWith(matrix.os, 'ubuntu') shell: bash run: | python3 -m pip install -qq --upgrade pip - python3 -m pip install -qq wheel twine typing_extensions + python3 -m pip install -qq wheel twine typing_extensions dataclasses python3 -m pip install -qq torch==${{ matrix.torch }}+cpu -f https://download.pytorch.org/whl/torch_stable.html python3 -c "import torch; print('torch version:', torch.__version__)" - - name: Install PyTorch ${{ matrix.torch }} - if: startsWith(matrix.os, 'macos') - shell: bash - run: | - python3 -m pip install -qq --upgrade pip - python3 -m pip install -qq wheel twine - python3 -m pip install -qq torch==${{ matrix.torch }} - - python3 -c "import torch; print('torch version:', torch.__version__)" - - - name: Configure CMake + - name: Build k2 shell: bash run: | pwd @@ -138,8 +99,17 @@ jobs: cd build cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DK2_WITH_CUDA=OFF .. cat k2/csrc/version.h + cat CMakeCache.txt - - name: Build k2 + make VERBOSE=1 -j2 + + - name: Run tests + shell: bash + run: | + cd build + ctest --output-on-failure + + - name: Build wheel shell: bash run: | export K2_CMAKE_ARGS="-DCMAKE_BUILD_TYPE=$BUILD_TYPE -DK2_WITH_CUDA=OFF" @@ -151,5 +121,5 @@ jobs: - name: Upload Wheel uses: actions/upload-artifact@v2 with: - name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-${{ matrix.os }}-cpu + name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-ubuntu-18.04-cpu path: dist/*.whl diff --git a/.github/workflows/windows.yml b/.github/workflows/build-cpu-windows.yml similarity index 72% rename from .github/workflows/windows.yml rename to .github/workflows/build-cpu-windows.yml index 7890fb805..e622e9a37 100644 --- a/.github/workflows/windows.yml +++ b/.github/workflows/build-cpu-windows.yml @@ -15,7 +15,7 @@ # limitations under the License. -name: build-windows +name: build-cpu-windows on: push: @@ -28,26 +28,32 @@ env: BUILD_TYPE: Release jobs: - build-windows: + generate_build_matrix: + # see https://github.com/pytorch/pytorch/pull/50633 + runs-on: ubuntu-latest + outputs: + matrix: ${{ steps.set-matrix.outputs.matrix }} + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Generating build matrix + id: set-matrix + run: | + # outputting for debugging purposes + python scripts/github_actions/generate_build_matrix.py + MATRIX=$(python scripts/github_actions/generate_build_matrix.py) + echo "::set-output name=matrix::${MATRIX}" + + build-cpu-windows: # see https://github.com/actions/virtual-environments/blob/win19/20210525.0/images/win/Windows2019-Readme.md if: github.event.label.name == 'ready' || github.event_name == 'push' - runs-on: ${{ matrix.os }} + needs: generate_build_matrix + runs-on: windows-2019 strategy: fail-fast: false matrix: - os: [windows-2019] - # Python 3.9 is for PyTorch 1.7.1, 1.8.x, 1.9.0 - python-version: [3.6, 3.7, 3.8, 3.9] - torch: ["1.5.0", "1.5.1", "1.6.0", "1.7.0", "1.7.1", "1.8.0", "1.8.1", "1.9.0"] - exclude: - - python-version: 3.9 # exclude Python 3.9 for [1.5.0, 1.5.1, 1.6.0, 1.7.0] - torch: "1.5.0" - - python-version: 3.9 - torch: "1.5.1" - - python-version: 3.9 - torch: "1.6.0" - - python-version: 3.9 - torch: "1.7.0" + ${{ fromJson(needs.generate_build_matrix.outputs.matrix) }} steps: - uses: actions/checkout@v2 @@ -68,8 +74,8 @@ jobs: - name: Install PyTorch ${{ matrix.torch }} run: | - pip3 install -qq torch==${{ matrix.torch }}+cpu -f https://download.pytorch.org/whl/torch_stable.html - pip3 install -qq wheel twine dataclasses numpy typing_extensions + pip3 install -qq torch==${{ matrix.torch }}+cpu -f https://download.pytorch.org/whl/torch_stable.html numpy + pip3 install -qq wheel twine dataclasses typing_extensions python3 -m torch.utils.collect_env @@ -85,18 +91,17 @@ jobs: cd build_release cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DK2_WITH_CUDA=OFF .. ls -lh + cat k2/csrc/version.h + cat CMakeCache.txt - name: Build k2 - run: | - cd build_release - cmake --build . --target _k2 --config Release - - - name: Display generated files shell: bash run: | cd build_release - ls -lh bin/*/* + cmake --build . --target _k2 --config Release -- -m + cmake --build . --target ALL_BUILD --config Release ls -lh lib/*/* + ls -lh bin/*/* - name: Build wheel shell: bash @@ -106,15 +111,15 @@ jobs: ls -lh dist/ pip install ./dist/*.whl - - name: Upload Wheel - uses: actions/upload-artifact@v2 - with: - name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-${{ matrix.os }}-cpu - path: dist/*.whl - - - name: Run C++ tests + - name: Run tests + shell: bash run: | cd build_release - cmake --build . --target ALL_BUILD --config Release # disable python tests for k2host ctest -C Release --output-on-failure -E host + + - name: Upload Wheel + uses: actions/upload-artifact@v2 + with: + name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-windows-2019-cpu + path: dist/*.whl diff --git a/.github/workflows/build-cuda-ubuntu.yml b/.github/workflows/build-cuda-ubuntu.yml new file mode 100644 index 000000000..adddf92e1 --- /dev/null +++ b/.github/workflows/build-cuda-ubuntu.yml @@ -0,0 +1,147 @@ +# Copyright 2020 Fangjun Kuang (csukuangfj@gmail.com) + +# See ../../LICENSE for clarification regarding multiple authors +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# refer to https://github.com/actions/starter-workflows/pull/47/files + +name: build-cuda-ubuntu + +on: + push: + branches: + - master + pull_request: + types: [labeled] + +env: + BUILD_TYPE: Release + +jobs: + generate_build_matrix: + # see https://github.com/pytorch/pytorch/pull/50633 + runs-on: ubuntu-latest + outputs: + matrix: ${{ steps.set-matrix.outputs.matrix }} + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Generating build matrix + id: set-matrix + run: | + # outputting for debugging purposes + python scripts/github_actions/generate_build_matrix.py --enable-cuda + MATRIX=$(python scripts/github_actions/generate_build_matrix.py --enable-cuda --test-only-latest-torch) + echo "::set-output name=matrix::${MATRIX}" + + build-cuda-ubuntu: + if: github.event.label.name == 'ready' || github.event_name == 'push' + needs: generate_build_matrix + runs-on: ubuntu-18.04 + strategy: + fail-fast: false + matrix: + ${{ fromJson(needs.generate_build_matrix.outputs.matrix) }} + + steps: + # refer to https://github.com/actions/checkout + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + + - uses: szenius/set-timezone@v1.0 + with: + timezoneLinux: "Asia/Shanghai" + + - name: Display date and time + run: date + + - name: Install CUDA Toolkit ${{ matrix.cuda }} + env: + cuda: ${{ matrix.cuda }} + run: | + source ./scripts/github_actions/install_cuda.sh + echo "CUDA_HOME=${CUDA_HOME}" >> $GITHUB_ENV + echo "${CUDA_HOME}/bin" >> $GITHUB_PATH + echo "LD_LIBRARY_PATH=${CUDA_HOME}/lib:${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}" >> $GITHUB_ENV + shell: bash + + - name: Display NVCC version + run: | + which nvcc + nvcc --version + + - name: Install GCC 7 + run: | + sudo apt-get install -y gcc-7 g++-7 + echo "CC=/usr/bin/gcc-7" >> $GITHUB_ENV + echo "CXX=/usr/bin/g++-7" >> $GITHUB_ENV + echo "CUDAHOSTCXX=/usr/bin/g++-7" >> $GITHUB_ENV + + - name: Install git lfs + run: | + sudo apt-get install -y git-lfs + + - name: Setup Python ${{ matrix.python-version }} + uses: actions/setup-python@v2 + with: + python-version: ${{ matrix.python-version }} + + - name: Display Python version + run: python -c "import sys; print(sys.version)" + + - name: Install PyTorch ${{ matrix.torch }} + env: + cuda: ${{ matrix.cuda }} + torch: ${{ matrix.torch }} + shell: bash + run: | + python3 -m pip install -q --upgrade pip + python3 -m pip install -q wheel twine typing_extensions + python3 -m pip install -q bs4 requests tqdm + + ./scripts/github_actions/install_torch.sh + python3 -c "import torch; print('torch version:', torch.__version__)" + + - name: Download cudnn 8.0 + env: + cuda: ${{ matrix.cuda }} + run: | + ./scripts/github_actions/install_cudnn.sh + + - name: Configure CMake + shell: bash + run: | + pwd + mkdir build + cd build + cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE .. + cat k2/csrc/version.h + cat CMakeCache.txt + + - name: Build k2 + shell: bash + run: | + export K2_CMAKE_ARGS="-DCMAKE_BUILD_TYPE=$BUILD_TYPE" + export K2_MAKE_ARGS="-j2" + python3 setup.py bdist_wheel + ls -lh dist/ + ls -lh build/* + + - name: Upload Wheel + uses: actions/upload-artifact@v2 + with: + name: gcc-7-cuda-${{ matrix.cuda }}-torch-${{ matrix.torch }}-python-${{ matrix.python-version }} + path: dist/*.whl diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml deleted file mode 100644 index b0240c887..000000000 --- a/.github/workflows/build.yml +++ /dev/null @@ -1,275 +0,0 @@ -# Copyright 2020 Fangjun Kuang (csukuangfj@gmail.com) - -# See ../../LICENSE for clarification regarding multiple authors -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# refer to https://github.com/actions/starter-workflows/pull/47/files - -name: build - -on: - push: - branches: - - master - - v2.0-pre - pull_request: - types: [labeled] - -env: - BUILD_TYPE: Release - -jobs: - build: - if: github.event.label.name == 'ready' || github.event_name == 'push' - runs-on: ${{ matrix.os }} - strategy: - fail-fast: false - matrix: - os: [ubuntu-18.04] - # from https://download.pytorch.org/whl/torch_stable.html - # Note: There are no torch versions for CUDA 11.2 - # - # 1.11.x supports: cuda10.2 (default), 11.3, 11.5 - # 1.10.x supports: cuda10.2 (default), 11.1, 11.3 - # 1.9.x supports: cuda10.2 (default), 11.1 - # PyTorch 1.8.x supports: cuda 10.1, 10.2 (default), 11.1 - # PyTorch 1.7.x supports: cuda 10.1, 10.2 (default), 11.0 - # PyTorch 1.6.0 supports: cuda 10.1, 10.2 (default) - # PyTorch 1.5.x supports: cuda 10.1, 10.2 (default) - # Other PyTorch versions are not tested - # CUDA 10.1 is for 1.5.x, 1.6.0, 1.7.x, 1.8.x - # CUDA 11.1 is for torch 1.8.x, 1.9.x, 1.10.x - # CUDA 11.3 is for torch 1.10, 1.11.x - # CUDA 11.5 is for torch 1.11.x - cuda: ["10.1", "10.2", "11.0", "11.1", "11.3", "11.5"] - gcc: ["7"] - torch: ["1.5.0", "1.5.1", "1.6.0", "1.7.0", "1.7.1", "1.8.0", "1.8.1", "1.9.0", "1.9.1", "1.10.0", "1.10.1", "1.10.2", "1.11.0"] - # - # torch 1.11.x does not support Python 3.6 - # From torch 1.11.x, it supports Python 3.10 - # Python 3.9 is for PyTorch 1.7.1, 1.8.0, 1.8.1, 1.9.x, 1.10.x, 11.x - python-version: ["3.6", "3.7", "3.8", "3.9", "3.10"] - exclude: - - cuda: "11.5" # exclude 11.5 for [1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2] - torch: "1.5.0" - - cuda: "11.5" - torch: "1.5.1" - - cuda: "11.5" - torch: "1.6.0" - - cuda: "11.5" - torch: "1.7.0" - - cuda: "11.5" - torch: "1.7.1" - - cuda: "11.5" - torch: "1.8.0" - - cuda: "11.5" - torch: "1.8.1" - - cuda: "11.5" - torch: "1.9.0" - - cuda: "11.5" - torch: "1.9.1" - - cuda: "11.5" - torch: "1.10.0" - - cuda: "11.5" - torch: "1.10.1" - - cuda: "11.5" - torch: "1.10.2" - - cuda: "11.3" # exclude 11.3 for [1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1] - torch: "1.5.0" - - cuda: "11.3" - torch: "1.5.1" - - cuda: "11.3" - torch: "1.6.0" - - cuda: "11.3" - torch: "1.7.0" - - cuda: "11.3" - torch: "1.7.1" - - cuda: "11.3" - torch: "1.8.0" - - cuda: "11.3" - torch: "1.8.1" - - cuda: "11.3" - torch: "1.9.0" - - cuda: "11.3" - torch: "1.9.1" - - cuda: "11.0" # exclude 11.0 for [1.5.0, 1.5.1, 1.6.0, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0] - torch: "1.5.0" - - cuda: "11.0" - torch: "1.5.1" - - cuda: "11.0" - torch: "1.6.0" - - cuda: "11.0" - torch: "1.8.0" - - cuda: "11.0" - torch: "1.8.1" - - cuda: "11.0" - torch: "1.9.0" - - cuda: "11.0" - torch: "1.9.1" - - cuda: "11.0" - torch: "1.10.0" - - cuda: "11.0" - torch: "1.10.1" - - cuda: "11.0" - torch: "1.10.2" - - cuda: "11.0" - torch: "1.11.0" - - cuda: "11.1" # exclude 11.1 for [1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.11.0] - torch: "1.5.0" - - cuda: "11.1" - torch: "1.5.1" - - cuda: "11.1" - torch: "1.6.0" - - cuda: "11.1" - torch: "1.7.0" - - cuda: "11.1" - torch: "1.7.1" - - cuda: "11.1" - torch: "1.11.0" - - cuda: "10.1" # exclude CUDA 10.1 for [1.9.0, 1.9.1, 1.10.0, 10.1, 10.2, 1.11.0] - torch: "1.9.0" - - cuda: "10.1" - torch: "1.9.1" - - cuda: "10.1" - torch: "1.10.0" - - cuda: "10.1" - torch: "1.10.1" - - cuda: "10.1" - torch: "1.10.2" - - cuda: "10.1" - torch: "1.11.0" - - python-version: 3.9 # exclude Python 3.9 for [1.5.0, 1.5.1, 1.6.0, 1.7.0] - torch: "1.5.0" - - python-version: 3.9 - torch: "1.5.1" - - python-version: 3.9 - torch: "1.6.0" - - python-version: 3.9 - torch: "1.7.0" - - python-version: "3.10" # exclude Python 3.9 for [1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2] - torch: "1.5.0" - - python-version: "3.10" - torch: "1.5.1" - - python-version: "3.10" - torch: "1.6.0" - - python-version: "3.10" - torch: "1.7.0" - - python-version: "3.10" - torch: "1.7.1" - - python-version: "3.10" - torch: "1.8.0" - - python-version: "3.10" - torch: "1.8.1" - - python-version: "3.10" - torch: "1.9.0" - - python-version: "3.10" - torch: "1.9.1" - - python-version: "3.10" - torch: "1.10.0" - - python-version: "3.10" - torch: "1.10.1" - - python-version: "3.10" - torch: "1.10.2" - - python-version: "3.6" # exclude Python 3.6 for [1.11.0] - torch: "1.11.0" - - steps: - # refer to https://github.com/actions/checkout - - uses: actions/checkout@v2 - with: - fetch-depth: 0 - - - uses: szenius/set-timezone@v1.0 - with: - timezoneLinux: "Asia/Shanghai" - - - name: Display date and time - run: date - - - name: Install CUDA Toolkit ${{ matrix.cuda }} - env: - cuda: ${{ matrix.cuda }} - run: | - source ./scripts/github_actions/install_cuda.sh - echo "CUDA_HOME=${CUDA_HOME}" >> $GITHUB_ENV - echo "${CUDA_HOME}/bin" >> $GITHUB_PATH - echo "LD_LIBRARY_PATH=${CUDA_HOME}/lib:${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}" >> $GITHUB_ENV - shell: bash - - - name: Display NVCC version - run: | - which nvcc - nvcc --version - - - name: Install GCC ${{ matrix.gcc }} - run: | - sudo apt-get install -y gcc-${{ matrix.gcc }} g++-${{ matrix.gcc }} - echo "CC=/usr/bin/gcc-${{ matrix.gcc }}" >> $GITHUB_ENV - echo "CXX=/usr/bin/g++-${{ matrix.gcc }}" >> $GITHUB_ENV - echo "CUDAHOSTCXX=/usr/bin/g++-${{ matrix.gcc }}" >> $GITHUB_ENV - - - name: Install git lfs - run: | - sudo apt-get install -y git-lfs - - - name: Setup Python ${{ matrix.python-version }} - uses: actions/setup-python@v2 - with: - python-version: ${{ matrix.python-version }} - - - name: Display Python version - run: python -c "import sys; print(sys.version)" - - - name: Install PyTorch ${{ matrix.torch }} - env: - cuda: ${{ matrix.cuda }} - torch: ${{ matrix.torch }} - shell: bash - run: | - python3 -m pip install -q --upgrade pip - python3 -m pip install -q wheel twine typing_extensions - python3 -m pip install -q bs4 requests tqdm - - ./scripts/github_actions/install_torch.sh - python3 -c "import torch; print('torch version:', torch.__version__)" - - - name: Download cudnn 8.0 - env: - cuda: ${{ matrix.cuda }} - run: | - ./scripts/github_actions/install_cudnn.sh - - - name: Configure CMake - shell: bash - run: | - pwd - mkdir build - cd build - cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE .. - cat k2/csrc/version.h - - - name: Build k2 - shell: bash - run: | - export K2_CMAKE_ARGS="-DCMAKE_BUILD_TYPE=$BUILD_TYPE" - export K2_MAKE_ARGS="-j2" - python3 setup.py bdist_wheel - ls -lh dist/ - ls -lh build/* - - - name: Upload Wheel - uses: actions/upload-artifact@v2 - with: - name: gcc-${{ matrix.gcc }}-cuda-${{ matrix.cuda }}-torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-${{ matrix.os }} - path: dist/*.whl diff --git a/.github/workflows/build_conda.yml b/.github/workflows/build_conda.yml deleted file mode 100644 index b8107d0c6..000000000 --- a/.github/workflows/build_conda.yml +++ /dev/null @@ -1,259 +0,0 @@ -# Copyright 2021 Xiaomi Corp. (author: Fangjun Kuang) - -# See ../../LICENSE for clarification regarding multiple authors -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# refer to https://github.com/actions/starter-workflows/pull/47/files - -name: build_conda_cuda - -on: - push: - branches: - - conda-cuda - -env: - K2_BUILD_TYPE: Release - -jobs: - build_conda_cuda: - runs-on: ${{ matrix.os }} - strategy: - fail-fast: false - matrix: - os: [ubuntu-18.04] - python-version: ["3.6", "3.7", "3.8", "3.9", "3.10"] - cuda: ["10.1", "10.2", "11.0", "11.1", "11.3", "11.5"] - # from https://download.pytorch.org/whl/torch_stable.html - # Note: There are no torch versions for CUDA 11.2 - # - # 1.11.x supports: cuda10.2 (default), 11.3, 11.5 - # PyTorch 1.10.x supports: 10.2 (default), 11.1, 11.3 - # PyTorch 1.9.x supports: 10.2 (default), 11.1 - # PyTorch 1.8.1 supports: cuda 10.1, 10.2 (default), 11.1 - # PyTorch 1.8.0 supports: cuda 10.1, 10.2 (default), 11.1 - # PyTorch 1.7.x supports: cuda 10.1, 10.2 (default), 11.0, 9.2 (not included in this setup) - # PyTorch 1.6.0 supports: cuda 10.1, 10.2 (default), 9.2 (not included in this setup) - # PyTorch 1.5.x supports: cuda 10.1, 10.2 (default), 9.2 (not included in this setup) - # - # PyTorch 1.11.x supports Python 3.10 - # PyTorch 1.7.1, 1.8.x, 1.9.x, 1.10.x, and 1.11.x support 3.6, 3.7, 3.8, 3.9 - # PyTorch 1.7.0, 1.6.0, and 1.5.x support 3.6, 3.7, 3.8 - # - # Other PyTorch versions are not tested - # - # torch: ["1.5.0", "1.5.1", "1.6.0", "1.7.0", "1.7.1", "1.8.0", "1.8.1"] - # 1.5.x is removed because there are compilation errors. - # See - # https://github.com/csukuangfj/k2/runs/2533830771?check_suite_focus=true - # and - # https://github.com/NVIDIA/apex/issues/805 - torch: ["1.6.0", "1.7.0", "1.7.1", "1.8.0", "1.8.1", "1.9.0", "1.9.1", "1.10.0", "1.10.1", "1.10.2", "1.11.0"] - exclude: - - cuda: "11.5" # exclude cuda 11.5 for [1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2] - torch: "1.5.0" - - cuda: "11.5" - torch: "1.5.1" - - cuda: "11.5" - torch: "1.6.0" - - cuda: "11.5" - torch: "1.7.0" - - cuda: "11.5" - torch: "1.7.1" - - cuda: "11.5" - torch: "1.8.0" - - cuda: "11.5" - torch: "1.8.1" - - cuda: "11.5" - torch: "1.9.0" - - cuda: "11.5" - torch: "1.9.1" - - cuda: "11.5" - torch: "1.10.0" - - cuda: "11.5" - torch: "1.10.1" - - cuda: "11.5" - torch: "1.10.2" - - cuda: "11.3" # exclude cuda 11.3 for [1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1] - torch: "1.5.0" - - cuda: "11.3" - torch: "1.5.1" - - cuda: "11.3" - torch: "1.6.0" - - cuda: "11.3" - torch: "1.7.0" - - cuda: "11.3" - torch: "1.7.1" - - cuda: "11.3" - torch: "1.8.0" - - cuda: "11.3" - torch: "1.8.1" - - cuda: "11.3" - torch: "1.9.0" - - cuda: "11.3" - torch: "1.9.1" - # - cuda: "11.0" # exclude 11.0 for [1.5.0, 1.5.1, 1.6.0, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0] - # torch: "1.5.0" - # - cuda: "11.0" - # torch: "1.5.1" - - cuda: "11.0" - torch: "1.6.0" - - cuda: "11.0" - torch: "1.8.0" - - cuda: "11.0" - torch: "1.8.1" - - cuda: "11.0" - torch: "1.9.0" - - cuda: "11.0" - torch: "1.9.1" - - cuda: "11.0" - torch: "1.10.0" - - cuda: "11.0" - torch: "1.10.1" - - cuda: "11.0" - torch: "1.10.2" - - cuda: "11.0" - torch: "1.11.0" - # - cuda: "11.1" # exclude 11.1 for [1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.11.0] - # torch: "1.5.0" - # - cuda: "11.1" - # torch: "1.5.1" - - cuda: "11.1" - torch: "1.6.0" - - cuda: "11.1" - torch: "1.7.0" - - cuda: "11.1" - torch: "1.7.1" - - cuda: "11.1" - torch: "1.11.0" - - cuda: "10.1" # exclude 10.1 for [1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0] - torch: "1.9.0" - - cuda: "10.1" - torch: "1.9.1" - - cuda: "10.1" - torch: "1.10.0" - - cuda: "10.1" - torch: "1.10.1" - - cuda: "10.1" - torch: "1.10.2" - - cuda: "10.1" - torch: "1.11.0" - - python-version: "3.9" # exclude Python 3.9 for [1.5.0, 1.5.1, 1.6.0, 1.7.0] - torch: "1.5.0" - - python-version: "3.9" - torch: "1.5.1" - - python-version: "3.9" - torch: "1.6.0" - - python-version: "3.9" - torch: "1.7.0" - - python-version: "3.10" # exclude Python 3.10 for [1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2] - torch: "1.5.0" - - python-version: "3.10" - torch: "1.5.1" - - python-version: "3.10" - torch: "1.6.0" - - python-version: "3.10" - torch: "1.7.0" - - python-version: "3.10" - torch: "1.7.1" - - python-version: "3.10" - torch: "1.8.0" - - python-version: "3.10" - torch: "1.8.1" - - python-version: "3.10" - torch: "1.9.0" - - python-version: "3.10" - torch: "1.9.1" - - python-version: "3.10" - torch: "1.10.0" - - python-version: "3.10" - torch: "1.10.1" - - python-version: "3.10" - torch: "1.10.2" - - python-version: "3.6" # exclude Python 3.6 for [1.11.0] - torch: "1.11.0" - - steps: - # refer to https://github.com/actions/checkout - - uses: actions/checkout@v2 - with: - fetch-depth: 0 - - - name: Install CUDA Toolkit ${{ matrix.cuda }} - shell: bash -l {0} - env: - cuda: ${{ matrix.cuda }} - run: | - source ./scripts/github_actions/install_cuda.sh - echo "CUDA_HOME=${CUDA_HOME}" >> $GITHUB_ENV - echo "${CUDA_HOME}/bin" >> $GITHUB_PATH - echo "LD_LIBRARY_PATH=${CUDA_HOME}/lib:${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}" >> $GITHUB_ENV - - - name: Display NVCC version - shell: bash -l {0} - run: | - which nvcc - nvcc --version - - - uses: conda-incubator/setup-miniconda@v2 - with: - auto-update-conda: true - python-version: ${{ matrix.python-version }} - activate-environment: k2 - - - name: Display Python version - shell: bash -l {0} - run: | - python3 -c "import sys; print(sys.version)" - which python3 - - - name: Install conda dependencies - shell: bash -l {0} - run: | - conda install -y -q anaconda-client - conda install -y -q conda-build - conda install -y -q bs4 requests tqdm - conda install -y -q -c pytorch -c conda-forge pytorch=${{ matrix.torch }} cudatoolkit=${{ matrix.cuda }} - - - name: Display conda info - shell: bash -l {0} - run: | - which conda - conda env list - conda info - nproc - - - name: Install git lfs - run: | - sudo apt-get install -y git-lfs - - - name: Download cudnn 8.0 - shell: bash -l {0} - env: - cuda: ${{ matrix.cuda }} - run: | - ./scripts/github_actions/install_cudnn.sh - - - name: Build k2 - shell: bash -l {0} - env: - K2_CUDA_VERSION: ${{ matrix.cuda }} - K2_PYTHON_VERSION: ${{ matrix.python-version}} - K2_TORCH_VERSION: ${{ matrix.torch }} - K2_CONDA_TOKEN: ${{ secrets.K2_CONDA_TOKEN}} - K2_IS_GITHUB_ACTIONS: 1 - K2_IS_FOR_CONDA: 1 - run: | - export K2_BUILD_TYPE=$K2_BUILD_TYPE - ./scripts/build_conda.sh diff --git a/.github/workflows/nightly-cpu-macos.yml b/.github/workflows/nightly-cpu-macos.yml new file mode 100644 index 000000000..0ce5354cb --- /dev/null +++ b/.github/workflows/nightly-cpu-macos.yml @@ -0,0 +1,129 @@ +# Copyright 2021 Fangjun Kuang (csukuangfj@gmail.com) + +# See ../../LICENSE for clarification regarding multiple authors +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +name: nightly_cpu_macos + +on: + schedule: + # minute (0-59) + # hour (0-23) + # day of the month (1-31) + # month (1-12) + # day of the week (0-6) + # nightly build at 23:50 UTC time every day + - cron: "50 23 * * *" + +env: + BUILD_TYPE: Release + +jobs: + enable_nightly_build: + runs-on: ubuntu-latest + outputs: + enabled: ${{ steps.set-enabled.outputs.enabled }} + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Set enabled + id: set-enabled + run: | + enabled=$(python scripts/github_actions/run-nightly-build.py) + echo "enabled: $enabled" + echo "::set-output name=enabled::${enabled}" + + generate_build_matrix: + needs: enable_nightly_build + if: needs.enable_nightly_build.outputs.enabled == 'true' + # see https://github.com/pytorch/pytorch/pull/50633 + runs-on: ubuntu-latest + outputs: + matrix: ${{ steps.set-matrix.outputs.matrix }} + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Generating build matrix + id: set-matrix + run: | + # outputting for debugging purposes + python scripts/github_actions/generate_build_matrix.py + MATRIX=$(python scripts/github_actions/generate_build_matrix.py) + echo "::set-output name=matrix::${MATRIX}" + + nightly_cpu_macos: + needs: generate_build_matrix + runs-on: macos-10.15 + strategy: + fail-fast: false + matrix: + ${{ fromJson(needs.generate_build_matrix.outputs.matrix) }} + + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + + - name: Display date and time + run: date + + - name: Setup Python ${{ matrix.python-version }} + uses: actions/setup-python@v2 + with: + python-version: ${{ matrix.python-version }} + + - name: Display Python version + run: python -c "import sys; print(sys.version)" + + - name: Display GCC version + run: | + gcc --version + + - name: Display clang version + run: | + clang --version + + - name: Install PyTorch ${{ matrix.torch }} + shell: bash + run: | + python3 -m pip install -qq --upgrade pip + python3 -m pip install -qq wheel twine + python3 -m pip install -qq torch==${{ matrix.torch }} + python3 -m pip install --upgrade numpy + + - name: Build pip packages + shell: bash + run: | + export K2_CMAKE_ARGS="-DCMAKE_BUILD_TYPE=$BUILD_TYPE -DK2_WITH_CUDA=OFF" + export K2_MAKE_ARGS="-j2" + python3 setup.py bdist_wheel + ls -lh dist/ + + - name: Upload Wheel + uses: actions/upload-artifact@v2 + with: + name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-macos-10.15 + path: dist/*.whl + + - name: Copy wheels to k2-fsa.org + if: ${{ github.repository_owner == 'k2-fsa' }} + run: | + user=${{ secrets.K2_USERNAME }} + server=${{ secrets.K2_HOST }} + port=${{ secrets.K2_PORT }} + echo "${{ secrets.K2_KEY }}" > id_rsa && chmod 600 id_rsa + scp -P $port -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i id_rsa dist/*.whl $user@$server:~/nightly/whl + rm id_rsa diff --git a/.github/workflows/nightly-cpu.yml b/.github/workflows/nightly-cpu-ubuntu.yml similarity index 54% rename from .github/workflows/nightly-cpu.yml rename to .github/workflows/nightly-cpu-ubuntu.yml index 8fdc6d0a6..b47074272 100644 --- a/.github/workflows/nightly-cpu.yml +++ b/.github/workflows/nightly-cpu-ubuntu.yml @@ -14,73 +14,63 @@ # See the License for the specific language governing permissions and # limitations under the License. -name: nightly-cpu +name: nightly_cpu_ubuntu on: - push: - branches: - - nightly-cpu schedule: # minute (0-59) # hour (0-23) # day of the month (1-31) # month (1-12) # day of the week (0-6) - # nightly build at 14:00 UTC time every day - - cron: "0 14 * * *" + # nightly build at 23:50 UTC time every day + - cron: "50 23 * * *" env: BUILD_TYPE: Release jobs: - nightly-cpu: - runs-on: ${{ matrix.os }} + enable_nightly_build: + runs-on: ubuntu-latest + outputs: + enabled: ${{ steps.set-enabled.outputs.enabled }} + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Set enabled + id: set-enabled + run: | + enabled=$(python scripts/github_actions/run-nightly-build.py) + echo "enabled: $enabled" + echo "::set-output name=enabled::${enabled}" + + generate_build_matrix: + needs: enable_nightly_build + if: needs.enable_nightly_build.outputs.enabled == 'true' + # see https://github.com/pytorch/pytorch/pull/50633 + runs-on: ubuntu-latest + outputs: + matrix: ${{ steps.set-matrix.outputs.matrix }} + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Generating build matrix + id: set-matrix + run: | + # outputting for debugging purposes + python scripts/github_actions/generate_build_matrix.py + MATRIX=$(python scripts/github_actions/generate_build_matrix.py) + echo "::set-output name=matrix::${MATRIX}" + + nightly_cpu_ubuntu: + needs: generate_build_matrix + runs-on: ubuntu-18.04 strategy: fail-fast: false matrix: - os: [ubuntu-18.04, macos-10.15] - # Python 3.9 is for PyTorch 1.7.1, 1.8.x, 1.9.x, 1.10.x, 1.11.x - python-version: ["3.6", "3.7", "3.8", "3.9", "3.10"] - torch: ["1.4.0", "1.5.0", "1.5.1", "1.6.0", "1.7.0", "1.7.1", "1.8.0", "1.8.1", "1.9.0", "1.9.1", "1.10.0", "1.10.1", "1.10.2", "1.11.0"] - exclude: - - python-version: "3.9" # exclude Python 3.9 for [1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0] - torch: "1.4.0" - - python-version: "3.9" - torch: "1.5.0" - - python-version: "3.9" - torch: "1.5.1" - - python-version: "3.9" - torch: "1.6.0" - - python-version: "3.9" - torch: "1.7.0" - - python-version: "3.10" # exclude Python 3.10 for [1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2] - torch: "1.4.0" - - python-version: "3.10" - torch: "1.5.0" - - python-version: "3.10" - torch: "1.5.1" - - python-version: "3.10" - torch: "1.6.0" - - python-version: "3.10" - torch: "1.7.0" - - python-version: "3.10" - torch: "1.7.1" - - python-version: "3.10" - torch: "1.8.0" - - python-version: "3.10" - torch: "1.8.1" - - python-version: "3.10" - torch: "1.9.0" - - python-version: "3.10" - torch: "1.9.1" - - python-version: "3.10" - torch: "1.10.0" - - python-version: "3.10" - torch: "1.10.1" - - python-version: "3.10" - torch: "1.10.2" - - python-version: "3.6" # exclude Python 3.6 for [1.11.0] - torch: "1.11.0" + ${{ fromJson(needs.generate_build_matrix.outputs.matrix) }} steps: - uses: actions/checkout@v2 @@ -109,13 +99,7 @@ jobs: run: | gcc --version - - name: Display clang version - if: startsWith(matrix.os, 'macos') - run: | - clang --version - - name: Install PyTorch ${{ matrix.torch }} - if: startsWith(matrix.os, 'ubuntu') shell: bash run: | python3 -m pip install -qq --upgrade pip @@ -125,15 +109,6 @@ jobs: python3 -c "import torch; print('torch version:', torch.__version__)" - - name: Install PyTorch ${{ matrix.torch }} - if: startsWith(matrix.os, 'macos') - shell: bash - run: | - python3 -m pip install -qq --upgrade pip - python3 -m pip install -qq wheel twine - python3 -m pip install -qq torch==${{ matrix.torch }} - python3 -m pip install --upgrade numpy - - name: Build pip packages shell: bash run: | @@ -145,10 +120,11 @@ jobs: - name: Upload Wheel uses: actions/upload-artifact@v2 with: - name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-${{ matrix.os }} + name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-ubuntu-18.04 path: dist/*.whl - name: Copy wheels to k2-fsa.org + if: ${{ github.repository_owner == 'k2-fsa' }} run: | user=${{ secrets.K2_USERNAME }} server=${{ secrets.K2_HOST }} diff --git a/.github/workflows/nightly-windows.yml b/.github/workflows/nightly-cpu-windows.yml similarity index 61% rename from .github/workflows/nightly-windows.yml rename to .github/workflows/nightly-cpu-windows.yml index 42fa8b7bf..7074494d2 100644 --- a/.github/workflows/nightly-windows.yml +++ b/.github/workflows/nightly-cpu-windows.yml @@ -14,7 +14,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -name: nightly-windows +name: nightly_cpu_windows on: schedule: @@ -23,31 +23,53 @@ on: # day of the month (1-31) # month (1-12) # day of the week (0-6) - # nightly build at 14:00 UTC time every day - - cron: "0 14 * * *" + # nightly build at 23:50 UTC time every day + - cron: "50 23 * * *" env: BUILD_TYPE: Release jobs: - nightly-windows: - runs-on: ${{ matrix.os }} + enable_nightly_build: + runs-on: ubuntu-latest + outputs: + enabled: ${{ steps.set-enabled.outputs.enabled }} + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Set enabled + id: set-enabled + run: | + enabled=$(python scripts/github_actions/run-nightly-build.py) + echo "enabled: $enabled" + echo "::set-output name=enabled::${enabled}" + + generate_build_matrix: + needs: enable_nightly_build + if: needs.enable_nightly_build.outputs.enabled == 'true' + runs-on: ubuntu-latest + outputs: + matrix: ${{ steps.set-matrix.outputs.matrix }} + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Generating build matrix + id: set-matrix + run: | + # outputting for debugging purposes + python scripts/github_actions/generate_build_matrix.py + MATRIX=$(python scripts/github_actions/generate_build_matrix.py) + echo "::set-output name=matrix::${MATRIX}" + + nightly_cpu_windows: + needs: generate_build_matrix + runs-on: windows-2019 strategy: fail-fast: false matrix: - os: [windows-2019] - # Python 3.9 is for PyTorch 1.7.1, 1.8.x, 1.9.0 - python-version: [3.6, 3.7, 3.8, 3.9] - torch: ["1.5.0", "1.5.1", "1.6.0", "1.7.0", "1.7.1", "1.8.0", "1.8.1", "1.9.0"] - exclude: - - python-version: 3.9 # exclude Python 3.9 for [1.5.0, 1.5.1, 1.6.0, 1.7.0] - torch: "1.5.0" - - python-version: 3.9 - torch: "1.5.1" - - python-version: 3.9 - torch: "1.6.0" - - python-version: 3.9 - torch: "1.7.0" + ${{ fromJson(needs.generate_build_matrix.outputs.matrix) }} steps: - uses: actions/checkout@v2 @@ -68,7 +90,7 @@ jobs: - name: Install PyTorch ${{ matrix.torch }} run: | - pip3 install -qq torch==${{ matrix.torch }}+cpu -f https://download.pytorch.org/whl/torch_stable.html + pip3 install -qq torch==${{ matrix.torch }}+cpu -f https://download.pytorch.org/whl/torch_stable.html numpy pip3 install -qq wheel twine dataclasses typing_extensions python3 -m torch.utils.collect_env @@ -78,40 +100,22 @@ jobs: cmake --version cmake --help - - name: Configure CMake - shell: bash - run: | - mkdir build_release - cd build_release - cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DK2_WITH_CUDA=OFF .. - ls -lh - - - name: Build k2 - run: | - cd build_release - cmake --build . --target _k2 --config Release - - - name: Display generated files - shell: bash - run: | - cd build_release - ls -lh bin/*/* - ls -lh lib/*/* - - name: Build wheel shell: bash run: | export K2_CMAKE_ARGS="-DK2_WITH_CUDA=OFF -DCMAKE_BUILD_TYPE=Release" python3 setup.py bdist_wheel ls -lh dist/ + pip install ./dist/*.whl - name: Upload Wheel uses: actions/upload-artifact@v2 with: - name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-${{ matrix.os }}-cpu + name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-windows-cpu path: dist/*.whl - name: Copy wheels to k2-fsa.org + if: ${{ github.repository_owner == 'k2-fsa' }} shell: bash run: | user=${{ secrets.K2_USERNAME }} @@ -120,3 +124,34 @@ jobs: echo "${{ secrets.K2_KEY }}" > id_rsa && chmod 600 id_rsa scp -P $port -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i id_rsa dist/*.whl $user@$server:~/nightly/whl rm id_rsa + + - name: Configure CMake + shell: bash + run: | + mkdir build_release + cd build_release + cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DK2_WITH_CUDA=OFF .. + ls -lh + cat k2/csrc/version.h + cat CMakeCache.txt + + - name: Build k2 + shell: bash + run: | + cd build_release + cmake --build . --target _k2 --config Release -- -m + cmake --build . --target ALL_BUILD --config Release + + - name: Display generated files + shell: bash + run: | + cd build_release + ls -lh lib/*/* + ls -lh bin/*/* + + - name: Run tests + shell: bash + run: | + cd build_release + # disable python tests for k2host + ctest -C Release --output-on-failure -E host diff --git a/.github/workflows/nightly.yml b/.github/workflows/nightly-cuda-ubuntu.yml similarity index 80% rename from .github/workflows/nightly.yml rename to .github/workflows/nightly-cuda-ubuntu.yml index 0cc6cb1be..ac52040fa 100644 --- a/.github/workflows/nightly.yml +++ b/.github/workflows/nightly-cuda-ubuntu.yml @@ -1,4 +1,4 @@ -name: nightly +name: nightly-cuda-ubuntu on: push: @@ -10,19 +10,35 @@ on: # day of the month (1-31) # month (1-12) # day of the week (0-6) - # nightly build at 14:00 UTC time every day - - cron: "0 14 * * *" + # nightly build at 23:50 UTC time every day + - cron: "50 23 * * *" env: BUILD_TYPE: Release jobs: + enable_nightly_build: + runs-on: ubuntu-latest + outputs: + enabled: ${{ steps.set-enabled.outputs.enabled }} + steps: + - uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Set enabled + id: set-enabled + run: | + enabled=$(python scripts/github_actions/run-nightly-build.py) + echo "enabled: $enabled" + echo "::set-output name=enabled::${enabled}" + nightly: - runs-on: ${{ matrix.os }} + needs: enable_nightly_build + if: needs.enable_nightly_build.outputs.enabled == 'true' + runs-on: ubuntu-18.04 strategy: fail-fast: false matrix: - os: [ubuntu-18.04] cuda: ["10.1", "10.2", "11.0"] gcc: ["7"] torch: ["1.7.1"] @@ -102,6 +118,7 @@ jobs: ls -lh dist/ - name: Copy wheels to k2-fsa.org + if: ${{ github.repository_owner == 'k2-fsa' }} uses: horochx/deploy-via-scp@v1.0.1 with: host: ${{ secrets.K2_HOST }} diff --git a/.github/workflows/run-tests.yml b/.github/workflows/run-tests.yml index 6c1877586..88d1089d8 100644 --- a/.github/workflows/run-tests.yml +++ b/.github/workflows/run-tests.yml @@ -34,10 +34,10 @@ jobs: fail-fast: false matrix: os: [ubuntu-18.04] - cuda: ["11.1"] - gcc: ["5"] - torch: ["1.9.0"] - python-version: [3.9] + cuda: ["10.2"] + gcc: ["7"] + torch: ["1.11.0"] + python-version: ["3.10"] build_type: ["Release", "Debug"] steps: diff --git a/.github/workflows/wheel-cpu-stable.yml b/.github/workflows/wheel-cpu-macos.yml similarity index 93% rename from .github/workflows/wheel-cpu-stable.yml rename to .github/workflows/wheel-cpu-macos.yml index a87ee808b..eb44821e4 100644 --- a/.github/workflows/wheel-cpu-stable.yml +++ b/.github/workflows/wheel-cpu-macos.yml @@ -1,11 +1,11 @@ # Copyright (c) 2021 Xiaomi Corporation (authors: Fangjun Kuang) -name: Publish to PyPI macOS - stable +name: Publish to PyPI - macOS CPU on: push: - branches: - - wheel-stable + tags: + - '*' env: BUILD_TYPE: Release @@ -34,7 +34,6 @@ jobs: run: python -c "import sys; print(sys.version)" - name: Install PyTorch ${{ matrix.torch }} - if: startsWith(matrix.os, 'macos') shell: bash run: | python3 -m pip install -qq --upgrade pip @@ -61,6 +60,7 @@ jobs: path: dist/*.whl - name: Publish wheels to PyPI + if: ${{ github.repository_owner == 'k2-fsa' }} env: TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }} TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }} diff --git a/.github/workflows/wheel-cpu.yml b/.github/workflows/wheel-cpu-windows.yml similarity index 63% rename from .github/workflows/wheel-cpu.yml rename to .github/workflows/wheel-cpu-windows.yml index 007688c8a..9d84b51cc 100644 --- a/.github/workflows/wheel-cpu.yml +++ b/.github/workflows/wheel-cpu-windows.yml @@ -1,22 +1,23 @@ # Copyright (c) 2021 Xiaomi Corporation (authors: Fangjun Kuang) -name: Publish to PyPI macOS +name: Publish to PyPI - Windows CPU on: push: - branches: - - wheel + tags: + - '*' env: BUILD_TYPE: Release jobs: - PyPI-macos-cpu: + PyPI-windows-cpu: + if: ${{ false }} # Disable it at present. Users can install it from https://k2-fsa.org/nightly/index.html runs-on: ${{ matrix.os }} strategy: fail-fast: false matrix: - os: [macos-10.15] + os: [windows-2019] torch: ["1.7.1"] python-version: [3.6, 3.7, 3.8] @@ -25,6 +26,10 @@ jobs: with: fetch-depth: 0 + # see https://github.com/microsoft/setup-msbuild + - name: Add msbuild to PATH + uses: microsoft/setup-msbuild@v1.0.2 + - name: Setup Python ${{ matrix.python-version }} uses: actions/setup-python@v2 with: @@ -34,32 +39,32 @@ jobs: run: python -c "import sys; print(sys.version)" - name: Install PyTorch ${{ matrix.torch }} - if: startsWith(matrix.os, 'macos') - shell: bash run: | - python3 -m pip install -qq --upgrade pip - python3 -m pip install -q wheel twine typing_extensions - python3 -m pip install -qq torch==${{ matrix.torch }} + pip3 install -qq torch==${{ matrix.torch }}+cpu -f https://download.pytorch.org/whl/torch_stable.html numpy + pip3 install -qq wheel twine dataclasses typing_extensions + + python3 -m torch.utils.collect_env - name: Build pip packages shell: bash env: K2_IS_FOR_PYPI: 1 + K2_IS_STABLE: 1 run: | tag=$(python3 -c "import sys; print(''.join(sys.version[:3].split('.')))") - export K2_CMAKE_ARGS="-DCMAKE_BUILD_TYPE=$BUILD_TYPE" - export K2_MAKE_ARGS="-j2" + export K2_CMAKE_ARGS="-DK2_WITH_CUDA=OFF -DCMAKE_BUILD_TYPE=$BUILD_TYPE" python3 setup.py bdist_wheel --python-tag=py${tag} ls -lh dist/ - name: Upload Wheel uses: actions/upload-artifact@v2 with: - name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-${{ matrix.os }}-cpu + name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-windows-cpu path: dist/*.whl - name: Publish wheels to PyPI + if: ${{ github.repository_owner == 'k2-fsa' }} env: TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }} TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }} diff --git a/.github/workflows/wheel-stable.yml b/.github/workflows/wheel-cuda-ubuntu.yml similarity index 95% rename from .github/workflows/wheel-stable.yml rename to .github/workflows/wheel-cuda-ubuntu.yml index f142c2910..c4449028c 100644 --- a/.github/workflows/wheel-stable.yml +++ b/.github/workflows/wheel-cuda-ubuntu.yml @@ -1,17 +1,17 @@ # Copyright (c) 2021 Xiaomi Corporation (authors: Fangjun Kuang) -name: Publish to PyPI - stable +name: Publish to PyPI - Ubuntu CUDA on: push: - branches: - - wheel-stable + tags: + - '*' env: BUILD_TYPE: Release jobs: - PyPI: + PyPI_CUDA_Ubuntu: runs-on: ${{ matrix.os }} strategy: fail-fast: false @@ -93,6 +93,7 @@ jobs: ls -lh dist/ - name: Publish wheels to PyPI + if: ${{ github.repository_owner == 'k2-fsa' }} env: TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }} TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }} diff --git a/.gitignore b/.gitignore index 39db3723a..7a562e752 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ +k2/python/k2/torch_version.py # Build folder **/build* diff --git a/CMakeLists.txt b/CMakeLists.txt index 6f1f21611..906ebddb0 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -64,7 +64,7 @@ string(TOUPPER ${CMAKE_BUILD_TYPE} CMAKE_BUILD_TYPE_UPPERCASE) if("${CMAKE_BUILD_TYPE_UPPERCASE}" STREQUAL "DEBUG") # refer to https://docs.nvidia.com/cuda/cuda-memcheck/index.html#compilation-options # The two options are to make cuda-memcheck's stack backtrace feature more useful. - string(APPEND CMAKE_CUDA_FLAGS " --compiler-options -rdynamic --compiler-options -lineinfo") + set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --compiler-options -rdynamic --compiler-options -lineinfo") endif() set(CMAKE_EXPORT_COMPILE_COMMANDS ON) @@ -73,11 +73,12 @@ option(BUILD_SHARED_LIBS "Whether to build shared or static lib" ON) option(K2_USE_PYTORCH "Whether to build with PyTorch" ON) option(K2_ENABLE_BENCHMARK "Whether to enable benchmark" ON) option(K2_WITH_CUDA "Whether to build k2 with CUDA" ${_K2_WITH_CUDA}) +option(K2_ENABLE_NVTX "Whether to build k2 with the NVTX library" ON) -# If K2_WITH_CUDA is ON, then K2_ENABLE_NVTX has a default value ON -# If K2_WITH_CUDA is OFF, then K2_ENABLE_NVTX is set to OFF -include(CMakeDependentOption) -cmake_dependent_option(K2_ENABLE_NVTX "Whether to build with the NVTX library" ON K2_WITH_CUDA OFF) +if(NOT K2_WITH_CUDA) + message(STATUS "Set K2_ENABLE_NVTX to OFF since K2_WITH_CUDA is OFF") + set(K2_ENABLE_NVTX OFF CACHE BOOL "" FORCE) +endif() if(NOT K2_USE_PYTORCH) message(FATAL_ERROR "\ @@ -210,7 +211,16 @@ if(K2_WITH_CUDA) # https://www.myzhar.com/blog/tutorials/tutorial-nvidia-gpu-cuda-compute-capability/ set(K2_COMPUTE_ARCH_CANDIDATES 35 50 60 61 70 75) if(CUDA_VERSION VERSION_GREATER "11.0") - list(APPEND K2_COMPUTE_ARCH_CANDIDATES 80 86) + list(APPEND K2_COMPUTE_ARCH_CANDIDATES 80 86) + if(WIN32) + # To fix the following warning from PyTorch: + # c10/util/TypeCast.h(39): warning : calling a constexpr __host__ function from a + # __host__ __device__ function is not allowed. The experimental flag '--expt-relaxed-constexpr' + # can be used to allow this + string(APPEND CMAKE_CUDA_FLAGS " --expt-relaxed-constexpr ") + endif() + + string(APPEND CMAKE_CUDA_FLAGS " -Wno-deprecated-gpu-targets ") endif() message(STATUS "K2_COMPUTE_ARCH_CANDIDATES ${K2_COMPUTE_ARCH_CANDIDATES}") @@ -232,7 +242,7 @@ if(K2_WITH_CUDA) message(STATUS "K2_COMPUTE_ARCHS: ${K2_COMPUTE_ARCHS}") foreach(COMPUTE_ARCH IN LISTS K2_COMPUTE_ARCHS) - string(APPEND CMAKE_CUDA_FLAGS " --expt-extended-lambda -gencode arch=compute_${COMPUTE_ARCH},code=sm_${COMPUTE_ARCH}") + set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-extended-lambda -gencode arch=compute_${COMPUTE_ARCH},code=sm_${COMPUTE_ARCH}") set(CMAKE_CUDA_ARCHITECTURES "${COMPUTE_ARCH}-real;${COMPUTE_ARCH}-virtual;${CMAKE_CUDA_ARCHITECTURES}") endforeach() # ========= Settings for CUB end========= @@ -254,12 +264,20 @@ if(K2_USE_PYTORCH) add_definitions(-DK2_USE_PYTORCH) add_definitions(-DTORCH_API_INCLUDE_EXTENSION_H) include(torch) + configure_file( + ${CMAKE_SOURCE_DIR}/k2/python/k2/torch_version.py.in + ${CMAKE_SOURCE_DIR}/k2/python/k2/torch_version.py @ONLY + ) endif() if(K2_WITH_CUDA) add_definitions(-DK2_WITH_CUDA) endif() +if(WIN32) + add_definitions(-DNOMINMAX) # Otherwise, std::max() and std::min() won't work +endif() + if(K2_WITH_CUDA AND CUDA_VERSION VERSION_LESS 11.0) # CUB is included in CUDA toolkit 11.0 and above include(cub) @@ -270,42 +288,76 @@ if(K2_WITH_CUDA) endif() include(googletest) -include(sentencepiece) -include(kaldifeat) -if(K2_WITH_CUDA) - string(APPEND CMAKE_CUDA_FLAGS " --compiler-options -Wall --compiler-options -Wno-unknown-pragmas --compiler-options -Wno-strict-overflow") +if(K2_WITH_CUDA AND NOT WIN32) + string(APPEND CMAKE_CUDA_FLAGS " --compiler-options -Wall ") + string(APPEND CMAKE_CUDA_FLAGS " --compiler-options -Wno-strict-overflow ") + string(APPEND CMAKE_CUDA_FLAGS " --compiler-options -Wno-unknown-pragmas ") message(STATUS "CMAKE_CUDA_FLAGS: ${CMAKE_CUDA_FLAGS}") endif() -if(NOT K2_WITH_CUDA AND NOT WIN32) - string(APPEND CMAKE_CXX_FLAGS " -Wno-unused-variable") -endif() if(NOT WIN32) - string(APPEND CMAKE_CXX_FLAGS " -Wno-strict-overflow") + string(APPEND CMAKE_CXX_FLAGS " -Wno-unused-variable ") + string(APPEND CMAKE_CXX_FLAGS " -Wno-strict-overflow ") endif() if(WIN32) # disable various warnings for MSVC # NOTE: Most of the warnings are from PyTorch C++ APIs + # 4005: macro redefinition + # 4018: signed/unsigned mismatch + # 4067: unexpected tokens following preprocessor directive # 4068: unknown pragma "unroll" - # 4996: "getenv": This function is unsafe - # 4224: conversion from 'int64_t' to 'int32_t', possible loss of data # 4099: type name first seen using 'class' now seen using 'struct' + # 4101: 'identifier' : unreferenced local variable + # 4190: 'identifier1' has C-linkage specified, but returns UDT 'identifier2' which is incompatible with C + # 4224: conversion from 'int64_t' to 'int32_t', possible loss of data + # 4244: conversion from 'const M' to 'const FloatType' + # 4251: 'type' : class 'type1' needs to have dll-interface to be used by clients of class 'type2' # 4267: conversion from 'size_t' to 'I', possible loss of data + # 4275: non - DLL-interface class 'class_1' used as base for DLL-interface class 'class_2' # 4305: truncation from 'int' to 'bool' - # 4244: conversion from 'const M' to 'const FloatType' - # 4624: destructor was implicitly defined as deleted + # 4522: 'class' : multiple assignment operators specified # 4551: function call missing argument list - # 4067: unexpected tokens following preprocessor directive - # 4819: The file contains a character that cannot be presented in the current code page. - # 4005: macro redefinition + # 4624: destructor was implicitly defined as deleted + # 4700: uninitialized local variable 'device' used # 4722: destructor never returns - # 4018: signed/unsigned mismatch - string(APPEND CMAKE_CXX_FLAGS " /wd4068 /wd4996 /wd4224 /wd4099 /wd4267 /wd4305 /wd4244 /wd4624 /wd4551 /wd4067 /wd4819 /wd4005 /wd4722 /wd4018") + # 4819: The file contains a character that cannot be presented in the current code page. + # 4838: conversion from 'type_1' to 'type_2' requires a narrowing conversion + # 4996: "getenv": This function is unsafe + set(disabled_warnings + /wd4005 + /wd4018 + /wd4067 + /wd4068 + /wd4099 + /wd4101 + /wd4190 + /wd4224 + /wd4251 + /wd4244 + /wd4267 + /wd4275 + /wd4305 + /wd4522 + /wd4551 + /wd4624 + /wd4700 + /wd4722 + /wd4819 + /wd4838 + /wd4996 + ) + message(STATUS "Disabled warnings: ${disabled_warnings}") + foreach(w IN LISTS disabled_warnings) + string(APPEND CMAKE_CXX_FLAGS " ${w} ") + string(APPEND CMAKE_CUDA_FLAGS " --compiler-options ${w} ") + endforeach() + string(APPEND CMAKE_CXX_FLAGS " /bigobj ") endif() message(STATUS "CMAKE_CXX_FLAGS: ${CMAKE_CXX_FLAGS}") +message(STATUS "CMAKE_CUDA_FLAGS: ${CMAKE_CUDA_FLAGS}") add_subdirectory(k2) diff --git a/cmake/moderngpu.cmake b/cmake/moderngpu.cmake index efae0a211..a7ef9b291 100644 --- a/cmake/moderngpu.cmake +++ b/cmake/moderngpu.cmake @@ -20,9 +20,9 @@ function(download_moderngpu) include(FetchContent) - # this is the latest commit of modern gpu as of 2020-09-26 - set(moderngpu_URL "https://github.com/moderngpu/moderngpu/archive/2b3985541c8e88a133769598c406c33ddde9d0a5.zip") - set(moderngpu_HASH "SHA256=191546af18cd5fb858ecb561316f3af67537ab16f610fc8f1a5febbffc27755a") + # this is the latest commit of modern gpu as of 2022-04-03 + set(moderngpu_URL "https://github.com/moderngpu/moderngpu/archive/8ec9ac0de8672de7217d014917eedec5317f75f3.zip") + set(moderngpu_HASH "SHA256=1c20ffbb81d6f7bbe6107aaa5ee6d37392677c8a5fc7894935149c3ef0a3c2fb") FetchContent_Declare(moderngpu URL ${moderngpu_URL} diff --git a/docs/source/installation/conda.rst b/docs/source/installation/conda.rst index d4ef221e2..bf351263a 100644 --- a/docs/source/installation/conda.rst +++ b/docs/source/installation/conda.rst @@ -63,7 +63,7 @@ Supported versions .. |conda_cuda_versions| image:: ./images/cuda_ge_10.1-orange.svg :alt: Supported cuda versions -.. |conda_pytorch_versions| image:: ./images/pytorch_ge_1.5.0-green.svg +.. |conda_pytorch_versions| image:: ./images/pytorch_ge_1.6.0-green.svg :alt: Supported pytorch versions - |conda_python_versions| diff --git a/docs/source/installation/for_developers.rst b/docs/source/installation/for_developers.rst index 5529191f8..ece6d9167 100644 --- a/docs/source/installation/for_developers.rst +++ b/docs/source/installation/for_developers.rst @@ -1,6 +1,10 @@ For developers ============== +.. hint:: + + It supports Linux (CPU + CUDA), macOS (CPU), and Windows (CPU + CUDA). + This page is for developers and advanced users. It describes how to build k2 and run tests. diff --git a/docs/source/installation/from_source.rst b/docs/source/installation/from_source.rst index 5aac6d406..97a74b793 100644 --- a/docs/source/installation/from_source.rst +++ b/docs/source/installation/from_source.rst @@ -3,6 +3,10 @@ Install from source =================== +.. hint:: + + It supports Linux (CPU + CUDA), macOS (CPU), and Windows (CPU + CUDA). + The following versions of Python, CUDA, and PyTorch are known to work. - |source_python_versions| @@ -15,7 +19,7 @@ The following versions of Python, CUDA, and PyTorch are known to work. .. |source_cuda_versions| image:: ./images/cuda_ge_10.1-orange.svg :alt: Supported cuda versions -.. |source_pytorch_versions| image:: ./images/pytorch_ge_1.5.0-green.svg +.. |source_pytorch_versions| image:: ./images/pytorch_ge_1.6.0-green.svg :alt: Supported pytorch versions Before compiling k2, some preparation work has to be done: diff --git a/docs/source/installation/images/README.md b/docs/source/installation/images/README.md index a63890a06..aab295a20 100644 --- a/docs/source/installation/images/README.md +++ b/docs/source/installation/images/README.md @@ -5,7 +5,7 @@ - python_ge_3.6-blue.svg - cuda_ge_10.1-orange.svg -- pytorch_ge_1.5.0-green.svg +- pytorch_ge_1.6.0-green.svg - pypi_python-3.6_3.7_3.8-blue.svg - pypi_cuda-10.1-orange.svg diff --git a/docs/source/installation/images/torch_ge_1.6.0-green.svg b/docs/source/installation/images/torch_ge_1.6.0-green.svg new file mode 100644 index 000000000..d3ece9a17 --- /dev/null +++ b/docs/source/installation/images/torch_ge_1.6.0-green.svg @@ -0,0 +1 @@ +torch: >= 1.6.0torch>= 1.6.0 \ No newline at end of file diff --git a/docs/source/installation/index.rst b/docs/source/installation/index.rst index 5b6537ba2..4049aecb0 100644 --- a/docs/source/installation/index.rst +++ b/docs/source/installation/index.rst @@ -50,7 +50,7 @@ below: .. |conda_cuda_versions| image:: ./images/cuda_ge_10.1-orange.svg :alt: Supported cuda versions -.. |conda_pytorch_versions| image:: ./images/pytorch_ge_1.5.0-green.svg +.. |conda_pytorch_versions| image:: ./images/pytorch_ge_1.6.0-green.svg :alt: Supported pytorch versions .. |pip_python_versions| image:: ./images/python_ge_3.6-blue.svg @@ -59,7 +59,7 @@ below: .. |pip_cuda_versions| image:: ./images/cuda_ge_10.1-orange.svg :alt: Supported cuda versions -.. |pip_pytorch_versions| image:: ./images/pytorch_ge_1.5.0-green.svg +.. |pip_pytorch_versions| image:: ./images/pytorch_ge_1.6.0-green.svg :alt: Supported pytorch versions .. |pypi_python_versions| image:: ./images/pypi_python-3.6_3.7_3.8-blue.svg @@ -77,7 +77,7 @@ below: .. |source_cuda_versions| image:: ./images/cuda_ge_10.1-orange.svg :alt: Supported cuda versions -.. |source_pytorch_versions| image:: ./images/pytorch_ge_1.5.0-green.svg +.. |source_pytorch_versions| image:: ./images/pytorch_ge_1.6.0-green.svg :alt: Supported pytorch versions Reporting issues diff --git a/docs/source/installation/pip.rst b/docs/source/installation/pip.rst index b756263b7..f145e16b6 100644 --- a/docs/source/installation/pip.rst +++ b/docs/source/installation/pip.rst @@ -7,7 +7,7 @@ Install using pip (k2-fsa.org) .. |pip_cuda_versions| image:: ./images/cuda_ge_10.1-orange.svg :alt: Supported cuda versions -.. |pip_pytorch_versions| image:: ./images/pytorch_ge_1.5.0-green.svg +.. |pip_pytorch_versions| image:: ./images/pytorch_ge_1.6.0-green.svg :alt: Supported pytorch versions You can find a list of nightly pre-built diff --git a/k2/csrc/CMakeLists.txt b/k2/csrc/CMakeLists.txt index 863d9fc58..8248be6a9 100644 --- a/k2/csrc/CMakeLists.txt +++ b/k2/csrc/CMakeLists.txt @@ -38,6 +38,12 @@ add_library(k2_nvtx INTERFACE) target_include_directories(k2_nvtx INTERFACE ${CMAKE_SOURCE_DIR}) if(K2_ENABLE_NVTX) target_compile_definitions(k2_nvtx INTERFACE K2_ENABLE_NVTX=1) + if(WIN32) + target_include_directories(k2_nvtx INTERFACE + ${CUDA_TOOLKIT_ROOT_DIR}/include/nvtx3 + "C:/Program Files/NVIDIA Corporation/NvToolsExt/include" + ) + endif() endif() add_subdirectory(host) @@ -115,8 +121,37 @@ target_link_libraries(context PUBLIC fsa) target_link_libraries(context PUBLIC k2_log) target_link_libraries(context PUBLIC k2_nvtx) if(K2_USE_PYTORCH) - target_link_libraries(context PUBLIC ${TORCH_LIBRARIES}) + if(NOT WIN32) + target_link_libraries(context PUBLIC ${TORCH_LIBRARIES}) + else() + # see https://discuss.pytorch.org/t/nvcc-fatal-a-single-input-file-is-required-for-a-non-link-phase-when-an-outputfile-is-specified/142843/6 + # Depending on ${TORCH_LIBRARIES} will introduce a compile time option "/bigobj", + # which causes the error in the above link. + # + # It would be ideal to remove /bigobj so that we can use ${TORCH_LIBRARIES}. + # To make life simpler, we use the following approach. + # + message(STATUS "TORCH_DIR: ${TORCH_DIR}") # TORCH_DIR is defined in cmake/torch.cmake + # target_link_libraries(context PUBLIC D:/software/anaconda3/envs/py38/Lib/site-packages/torch/lib/*.lib) + target_link_libraries(context PUBLIC ${TORCH_DIR}/lib/*.lib) + target_include_directories(context PUBLIC ${TORCH_DIR}/include) + target_include_directories(context PUBLIC ${TORCH_DIR}/include/torch/csrc/api/include) + endif() + + if(UNIX AND NOT APPLE) + # It causes errors on macOS + target_link_libraries(context PUBLIC ${TORCH_DIR}/lib/libtorch_python.so) + # CAUTION: It is PYTHON_LIBRARY on unix + target_link_libraries(context PUBLIC ${PYTHON_LIBRARY}) + message(STATUS "PYTHON_LIBRARIES: ${PYTHON_LIBRARY}") + elseif(WIN32) + target_link_libraries(context PUBLIC ${TORCH_DIR}/lib/torch_python.lib) + # CAUTION: It is PYTHON_LIBRARIES on Windows + target_link_libraries(context PUBLIC ${PYTHON_LIBRARIES}) + message(STATUS "PYTHON_LIBRARIES: ${PYTHON_LIBRARIES}") + endif() endif() +target_include_directories(context PUBLIC ${PYTHON_INCLUDE_DIRS}) #---------------------------- Test K2 CUDA sources ---------------------------- diff --git a/k2/csrc/array_of_ragged.cu b/k2/csrc/array_of_ragged.cu index cd93434d9..11f8e8ea4 100644 --- a/k2/csrc/array_of_ragged.cu +++ b/k2/csrc/array_of_ragged.cu @@ -1,5 +1,7 @@ /** - * Copyright 2022 Xiaomi Corporation (authors: Wei Kang) + * Copyright 2022 Xiaomi Corporation (authors: Daniel Povey, Wei Kang) + * 2022 ASLP@NWPU (authors: Hang Lyu) + * * See LICENSE for clarification regarding multiple authors * @@ -20,35 +22,107 @@ namespace k2 { -Array1OfRaggedShape::Array1OfRaggedShape(RaggedShape *src, int32_t num_srcs) - : num_srcs_(num_srcs) { - K2_CHECK_GE(num_srcs, 1); - K2_CHECK(src); - num_axes_ = src[0].NumAxes(); - c_ = src[0].Context(); +Array1OfRaggedShape::Array1OfRaggedShape(RaggedShape *srcs, int32_t num_srcs) : + num_srcs_(num_srcs) { + K2_CHECK_GT(num_srcs, 0); + K2_CHECK(srcs); + + // Initialize context and num_axes_. + c_ = srcs[0].Context(); + num_axes_ = srcs[0].NumAxes(); + + // Check if they have same num-axes and compatible context. + for (int32_t i = 1; i < num_srcs_; ++i) { + K2_CHECK_EQ(num_axes_, srcs[i].NumAxes()); + K2_CHECK(c_->IsCompatible(*(srcs[i].Context()))); + } - row_splits_ = - Array2(GetCpuContext(), num_axes_ - 1, num_srcs_); + // Initialize row_splits__, row_ids_ and tot_sizes_. + // + // Notice: since the Data() function is a __host__ function, it cannot be + // called on GPU. It limits us to work on CPU so that the row_splits_ and + // row_ids_ are populated on CPU, although the operator() of Array2 is a + // __host__ and __device__ function. Bear in mind, we cannot access the + // GPU data on CPU. + row_splits_ = Array2(GetCpuContext(), + num_axes_ - 1, num_srcs_); row_ids_ = Array2(GetCpuContext(), num_axes_ - 1, num_srcs_); + + // Notice: no matter the return value of TotSize() is from 'cached_tot_size' + // or the Back() function (i.e. operator[]) of array1, it it a CPU value. tot_sizes_ = Array1(GetCpuContext(), num_axes_, 0); auto row_splits_acc = row_splits_.Accessor(), row_ids_acc = row_ids_.Accessor(); + // Bear in mind, when axis == 0, the TotSize() is row_splits.Dim() - 1. + // When 0 < axis < NumAxes(), the TotSize() is row_splits.Back(). int32_t *tot_sizes_data = tot_sizes_.Data(); for (int32_t i = 0; i < num_srcs_; ++i) { - K2_CHECK_EQ(src[i].NumAxes(), num_axes_); - K2_CHECK(c_->IsCompatible(*(src[i].Context()))); for (int32_t j = 1; j < num_axes_; ++j) { - row_splits_acc(j - 1, i) = src[i].RowSplits(j).Data(); - row_ids_acc(j - 1, i) = src[i].RowIds(j).Data(); - tot_sizes_data[j] += src[i].TotSize(j); + row_splits_acc(j - 1, i) = srcs[i].RowSplits(j).Data(); + row_ids_acc(j - 1, i) = srcs[i].RowIds(j).Data(); + tot_sizes_data[j] += srcs[i].TotSize(j); } - tot_sizes_data[0] += src[i].TotSize(0); + tot_sizes_data[0] += srcs[i].TotSize(0); } row_splits_ = row_splits_.To(c_); row_ids_ = row_ids_.To(c_); + tot_sizes_ = tot_sizes_.To(c_); + + + // Initialize meat_row_splits_ + // We populate this on CPU and transfer to GPU. + meta_row_splits_ = Array2(GetCpuContext(), num_axes_, num_srcs_ + 1); + offsets_ = Array2(GetCpuContext(), num_axes_ + 1, num_srcs_ + 1); + + auto meta_row_splits_acc = meta_row_splits_.Accessor(), + offsets_acc = offsets_.Accessor(); + + // Initialize the 1st row of offsets_, which contains 0,1,2,... + for (int32_t col = 0; col <= num_srcs_; ++col) { + offsets_acc(0, col) = col; + } + // Initialize the 1st col of meta_row_splits_ and offsets_ + for (int32_t row = 0; row < num_axes_; ++row) { + meta_row_splits_acc(row, 0) = 0; + offsets_acc(row + 1, 0) = 0; + } + + // The meta_row_splits_ is the cumulative sum of the tot-sizes of the + // individual arrays. + for (int32_t i = 0; i < num_axes_; ++i) { + for (int32_t j = 1; j <= num_srcs_; ++j) { + meta_row_splits_acc(i, j) = meta_row_splits_acc(i, j - 1) + + srcs[j - 1].TotSize(i); + offsets_acc(i + 1, j) = meta_row_splits_acc(i, j); + } + } + + // Initialize meta_row_ids_ + // Elements are in [0, NumSrcs() - 1] + meta_row_ids_.resize(num_axes_); + + for (int32_t axis = 0; axis < num_axes_; ++axis) { + // The length equals to TotSize(axis) + meta_row_ids_.at(axis) = Array1( + GetCpuContext(), meta_row_splits_acc(axis, num_srcs_)); + int32_t *meta_row_ids_data = meta_row_ids_[axis].Data(); + + int32_t cur_row_start = meta_row_splits_acc(axis, 0); + for (int32_t src = 0; src < num_srcs_; ++src) { + int32_t next_row_start = meta_row_splits_acc(axis, src + 1); + for (; cur_row_start < next_row_start; ++cur_row_start) { + meta_row_ids_data[cur_row_start] = src; + } + } + meta_row_ids_[axis] = meta_row_ids_[axis].To(c_); + } + + meta_row_splits_ = meta_row_splits_.To(c_); + offsets_ = offsets_.To(c_); } + } // namespace k2 diff --git a/k2/csrc/array_of_ragged.h b/k2/csrc/array_of_ragged.h index 31349cf91..facc02dc0 100644 --- a/k2/csrc/array_of_ragged.h +++ b/k2/csrc/array_of_ragged.h @@ -1,5 +1,6 @@ /** * Copyright 2022 Xiaomi Corporation (authors: Daniel Povey, Wei Kang) + * 2022 ASLP@NWPU (authors: Hang Lyu) * * See LICENSE for clarification regarding multiple authors * @@ -24,31 +25,48 @@ #include #include "k2/csrc/array.h" +#include "k2/csrc/array_ops.h" #include "k2/csrc/context.h" #include "k2/csrc/log.h" -#include "k2/csrc/ragged_ops.h" +#include "k2/csrc/ragged.h" namespace k2 { + +/* + Array1OfRagged is a 1-dimensional array of Ragged. + It is intended for situations where you want to do some operations on + arrays of ragged arrays, without explicitly concatenating them (e.g. to + save time). This is a fairly low-level interface, intended to + be used mostly by CUDA/C++ implementation code. It is a convenience + wrapper that saves you the trouble of creating arrays of pointers. + */ + + /* Array1OfRaggedShape is a convenience function that gives you easy access to pointers-of-pointers for an array of ragged shapes. */ class Array1OfRaggedShape { public: + // Default constructor. + Array1OfRaggedShape() = default; + /* Constructor. Args: - srcs: pointers to the source shapes, a CPU pointer - num_srcs: the number of source shapes. All shapes must have the - same NumAxes() and must be on the same device. + srcs: pointers to the source shapes, a CPU pointer + num_srcs: the number of source shapes. All shapes must have the + same NumAxes() and must be on the same device. TODO: we'll likely, later, add optional args which dictate which of the MetaRowSplits() and MetaRowIds() are to be pre-populated; this should enable us to save kernels by combining certain operations across the axes. + */ - Array1OfRaggedShape(RaggedShape *srcs, int32_t num_srcs); - Array1OfRaggedShape() = default; + Array1OfRaggedShape(RaggedShape *srcs, + int32_t num_srcs); + int32_t NumSrcs() const { return num_srcs_; } int32_t NumAxes() const { return num_axes_; } @@ -63,23 +81,35 @@ class Array1OfRaggedShape { // Returns device-accessible vector of row-splits for a particular // axis, indexed by 0 <= src < num_srcs. const int32_t **RowSplits(int32_t axis) { - return row_splits_.Row(axis - 1).Data(); + K2_CHECK_LT(static_cast(axis), + static_cast(num_axes_)); + return row_splits_.Row(axis - 1).Data(); } // Returns device-accessible array of row-ids for the individual shapes // indexed [axis-1][src], with 0 <= src < num_srcs. The shape of this // Array2 is [NumAxes() - 1][NumSrcs()]. - const Array2 *RowIds() const { return &row_ids_; } + const Array2 *RowIds() const { return &row_ids_; } + // Returns device-accessible vector of row-splits for a particular // axis, indexed by 0 <= src < num_srcs. - const int32_t **RowIds(int32_t axis) { return row_ids_.Row(axis - 1).Data(); } + const int32_t **RowIds(int32_t axis) { + K2_CHECK_LT(static_cast(axis), + static_cast(num_axes_)); + return row_ids_.Row(axis - 1).Data(); + } + /* Return the total size on this axis, which is the sum of the TotSize() of the individual shapes. Requires 0 <= axis < NumAxes() and for axis=0 the returned value is the same as Dim0(). */ - int32_t TotSize(int32_t axis) const { return tot_sizes_[axis]; } + int32_t TotSize(int32_t axis) const { + K2_CHECK_LT(static_cast(axis), + static_cast(num_axes_)); + return tot_sizes_[axis]; + } // equivalent to TotSize(0). int32_t Dim0() const { return TotSize(0); } @@ -88,7 +118,7 @@ class Array1OfRaggedShape { along the src axis, of the tot-sizes of the individual arrays. This Array2 is of shape [NumAxes()][NumSrcs() + 1], indexed [axis][src]; caution, the indexing is different from RowSplits(), there is no offset. - Also, the meta_row_splits0 is a thing, unlike with regular row-splits + Also, the meta_row_splits_ is a thing, unlike with regular row-splits which start from 1. Caution: the lengths of the arrays pointed to by the elements of this @@ -99,38 +129,47 @@ class Array1OfRaggedShape { to GPU, this will be faster than invoking an extra kernel in normal cases when the NumSrcs() is small. [Also: see GetRowInfoMulti()]. */ - // TODO: implement it... - Array2 MetaRowSplits(); + const Array2 &MetaRowSplits() const { return meta_row_splits_; } // could POSSIBLY add this so this code could be used in functions like // Stack(). would be like MetaRowSplits but with an extra 1st row containing // 0,1,2,... We could perhaps create it with 1 extra initial row so this is // always convenient to output. - // TODO: implement it... - Array2 Offsets(); + const Array2 &Offsets() const { return offsets_; } /* - Returns the meta-row-splits for a particular axis, with 0 <= axis < - NumAxes(); this is the cumulative sum of the TotSize(axis) for all of the - sources, with MetaRowSplits(axis).Dim() == NumSrcs() + 1. + Returns the meta-row-splits for a particular axis, with + 0 <= axis < NumAxes(); + this is the cumulative sum of the TotSize(axis) for all of the sources, + with MetaRowSplits(axis).Dim() == NumSrcs() + 1. - Note: in ragged_ops.cu we refer to this as composed_row_splits + Note: in ragged_opts.cu we refer to this as composed_row_splits */ - // TODO: implement it... - Array1 MetaRowSplits(int32_t axis); + Array1 MetaRowSplits(int32_t axis) { + K2_CHECK_LT(static_cast(axis), + static_cast(num_axes_)); + return meta_row_splits_.Row(axis); + } /* Return the device-accessible meta-row-ids, which are the row-ids corresponding to MetaRowSplits(); this tells us, for indexes into the - appended/concatenated array, which source array they belong to, i.e. - elements are in [0,NumSrcs()-1]. + appended/concatenated array, which source array they belong to, + i.e. elements are in [0,NumSrcs()-1]. This cannot be an Array2 because unlike the MetaRowSplits(), all the row-ids arrays are of different lengths. Note: in ragged_ops.cu we refer to this as composed_row_ids. */ - // TODO: implement it... - Array1 MetaRowIds(); + Array1 MetaRowIds() { + Array1 ans(GetCpuContext(), num_axes_); + const int32_t* *ans_data = ans.Data(); + for (int32_t i = 0; i < num_axes_; ++i) { + ans_data[i] = meta_row_ids_[i].Data(); + } + ans = ans.To(c_); + return ans; + } /* Returns the meta-row-ids for a particular axis, with 0 <= axis < NumAxes(); @@ -140,18 +179,28 @@ class Array1OfRaggedShape { would tell us which source an idx012 with value 100 into axis 2 of concatenated array would come from. */ - // TODO: implement it... - Array1 MetaRowIds(int32_t axis); + const Array1 &MetaRowIds(int32_t axis) const { + K2_CHECK_LT(static_cast(axis), + static_cast(num_axes_)); + return meta_row_ids_[axis]; + } private: ContextPtr c_; int32_t num_srcs_; int32_t num_axes_; + Array2 row_splits_; // shape [num_axes_ - 1][num_srcs_] Array2 row_ids_; // shape [num_axes_ - 1][num_srcs_] - Array1 tot_sizes_; // dim num_axes_, this is on CPU + Array1 tot_sizes_; // dim num_axes_ + + Array2 meta_row_splits_; // shape [num_axes_][num_srcs_ + 1] + Array2 offsets_; // shape [num_axes_][num_srcs_ + 1] + std::vector > meta_row_ids_; // dim num_axes_ }; + + /* Array1OfRagged is a 1-dimensional array of Ragged. It is intended for situations where you want to do some operations on @@ -171,17 +220,14 @@ struct Array1OfRagged { int32_t NumSrcs() const { return values.Dim(); } ContextPtr &Context() { return shape.Context(); } + // Default constructor will not leave this a valid Array1OfRagged object, + // you shouldn't do anything with it. Both members will be initialized with + // default constructors. Array1OfRagged() = default; - /* - Constructor. - Args: - srcs: pointers to the source ragged tensors, a CPU pointer - num_srcs: the number of source ragged tensors. All ragged tensors must - have the same NumAxes() and must be on the same device. - */ + // The 'srcs' should have the same number of axes. Array1OfRagged(Ragged *srcs, int32_t num_srcs) { - K2_CHECK_GE(num_srcs, 1); + K2_CHECK_GT(num_srcs, 0); K2_CHECK(srcs); values = Array1(GetCpuContext(), num_srcs); T **values_data = values.Data(); @@ -195,6 +241,7 @@ struct Array1OfRagged { } }; + } // namespace k2 #endif // K2_CSRC_ARRAY_OF_RAGGED_H_ diff --git a/k2/csrc/array_of_ragged_test.cu b/k2/csrc/array_of_ragged_test.cu index 69b482315..4cb48bdb6 100644 --- a/k2/csrc/array_of_ragged_test.cu +++ b/k2/csrc/array_of_ragged_test.cu @@ -43,6 +43,7 @@ void TestArray1OfRaggedConstruct() { for (int32_t j = 1; j < num_axes; ++j) { const int32_t **row_splits = array_of_ragged.shape.RowSplits(j); const int32_t **row_ids = array_of_ragged.shape.RowIds(j); + Array1 expected_row_splits(GetCpuContext(), num_srcs); Array1 expected_row_ids(GetCpuContext(), num_srcs); int32_t **expected_row_splits_data = expected_row_splits.Data(); @@ -55,6 +56,7 @@ void TestArray1OfRaggedConstruct() { expected_row_ids = expected_row_ids.To(c); expected_row_splits_data = expected_row_splits.Data(); expected_row_ids_data = expected_row_ids.Data(); + Array1 flags(c, 2, 1); int32_t *flags_data = flags.Data(); K2_EVAL( @@ -67,6 +69,38 @@ void TestArray1OfRaggedConstruct() { for (int32_t i = 0; i < num_srcs; ++i) { K2_CHECK_EQ(array_of_ragged.values[i], raggeds[i].values.Data()); } + + for (int32_t j = 0; j < num_axes; ++j) { + Array1 meta_row_splits(array_of_ragged.shape.MetaRowSplits(j)); + Array1 meta_row_ids(array_of_ragged.shape.MetaRowIds(j)); + Array1 offsets( + array_of_ragged.shape.Offsets().RowArange(j + 1, j + 2).Row(0)); + + Array1 expected_meta_row_splits(GetCpuContext(), num_srcs + 1); + int32_t *expected_meta_row_splits_data = expected_meta_row_splits.Data(); + for (int32_t i = 0; i < num_srcs; ++i) { + expected_meta_row_splits_data[i] = raggeds[i].TotSize(j); + } + ExclusiveSum(expected_meta_row_splits, &expected_meta_row_splits); + expected_meta_row_splits = expected_meta_row_splits.To(c); + Array1 expected_meta_row_ids(c, + array_of_ragged.shape.TotSize(j)); + RowSplitsToRowIds(expected_meta_row_splits, &expected_meta_row_ids); + + K2_CHECK(Equal(meta_row_splits, expected_meta_row_splits)); + K2_CHECK(Equal(meta_row_ids, expected_meta_row_ids)); + K2_CHECK(Equal(offsets, expected_meta_row_splits)); + } + + Array1 expected_offsets_1st_row(GetCpuContext(), num_srcs + 1); + int32_t *expected_offsets_1st_row_data = expected_offsets_1st_row.Data(); + for (int32_t i = 0; i <= num_srcs; ++i) { + expected_offsets_1st_row_data[i] = i; + } + expected_offsets_1st_row = expected_offsets_1st_row.To(c); + Array1 offsets_1st_row( + array_of_ragged.shape.Offsets().RowArange(0, 1).Row(0)); + K2_CHECK(Equal(offsets_1st_row, expected_offsets_1st_row)); } } diff --git a/k2/csrc/benchmark/CMakeLists.txt b/k2/csrc/benchmark/CMakeLists.txt index cabb612cb..57570c926 100644 --- a/k2/csrc/benchmark/CMakeLists.txt +++ b/k2/csrc/benchmark/CMakeLists.txt @@ -1,6 +1,7 @@ function(k2_add_benchmark source) get_filename_component(name ${source} NAME_WE) add_executable(${name} ${source}) + set_target_properties(${name} PROPERTIES CUDA_SEPARABLE_COMPILATION ON) target_link_libraries(${name} PRIVATE benchmark) endfunction() diff --git a/k2/csrc/fsa.h b/k2/csrc/fsa.h index c72a31d13..66ae4b626 100644 --- a/k2/csrc/fsa.h +++ b/k2/csrc/fsa.h @@ -34,7 +34,7 @@ struct Arc { int32_t label; float score; - __host__ __device__ __forceinline__ Arc() = default; + Arc() = default; __host__ __device__ __forceinline__ Arc(int32_t src_state, int32_t dest_state, int32_t label, float score) : src_state(src_state), diff --git a/k2/csrc/host/CMakeLists.txt b/k2/csrc/host/CMakeLists.txt index 208f1651c..4d183de8e 100644 --- a/k2/csrc/host/CMakeLists.txt +++ b/k2/csrc/host/CMakeLists.txt @@ -26,11 +26,20 @@ target_link_libraries(fsa PUBLIC k2_log) target_link_libraries(fsa PUBLIC k2_nvtx) target_include_directories(fsa PUBLIC ${CUDA_TOOLKIT_INCLUDE}) if(K2_ENABLE_NVTX) - target_link_libraries(fsa - PUBLIC - -L${CUDA_TOOLKIT_ROOT_DIR}/lib64 # for /usr/local/cuda - -L${CUDA_TOOLKIT_ROOT_DIR}/lib # for conda - nvToolsExt) + if(NOT WIN32) + target_link_libraries(fsa + PUBLIC + -L${CUDA_TOOLKIT_ROOT_DIR}/lib64 # for /usr/local/cuda + -L${CUDA_TOOLKIT_ROOT_DIR}/lib # for conda + nvToolsExt) + else() + target_link_directories(fsa PUBLIC + ${CUDA_TOOLKIT_ROOT_DIR}/lib64 # for /usr/local/cuda + ${CUDA_TOOLKIT_ROOT_DIR}/lib # for conda + "C:/Program Files/NVIDIA Corporation/NvToolsExt/lib/x64/" + ) + target_link_libraries(fsa PUBLIC NvToolsExt64_1) + endif() endif() #---------------------------- Test K2 host sources ---------------------------- diff --git a/k2/csrc/log.h b/k2/csrc/log.h index a046621c5..04624d66f 100644 --- a/k2/csrc/log.h +++ b/k2/csrc/log.h @@ -34,13 +34,11 @@ #include #include #include -#include #include #include // NOLINT #include #include #include -#include #include #include "k2/csrc/macros.h" @@ -62,9 +60,6 @@ std::ostream &operator<<(std::ostream &os, const std::vector &vec); namespace internal { -// Return a string like 2021-10-12 00:19:39.265 -std::string GetTimeStamp(); - #if defined(NDEBUG) constexpr bool kDisableDebug = true; #else @@ -112,19 +107,6 @@ std::string GetStackTrace(); */ K2_CUDA_HOSTDEV LogLevel GetCurrentLogLevel(); -/// convert /path/to/k2/k2/csrc to k2/csrc -inline const char *RemovePrefix(const char *filename) { - const char *first = strstr(filename, "/k2"); - const char *second = nullptr; - - do { - second = strstr(first + 2, "/k2"); - if (second != nullptr) first = second; - } while (second); - - return first + 1; -} - class Logger { public: K2_CUDA_HOSTDEV Logger(const char *filename, const char *func_name, @@ -134,12 +116,6 @@ class Logger { line_num_(line_num), level_(level) { cur_level_ = GetCurrentLogLevel(); -#if !defined(__CUDA_ARCH__) - filename_ = RemovePrefix(filename); - if (cur_level_ <= level_) { - printf("%s ", GetTimeStamp().c_str()); - } -#endif switch (level) { case TRACE: if (cur_level_ <= TRACE) printf("[T] "); @@ -162,7 +138,7 @@ class Logger { } if (cur_level_ <= level_) { - printf("%s:%u:%s ", filename_, line_num, func_name); + printf("%s:%u:%s ", filename, line_num, func_name); #if defined(__CUDA_ARCH__) printf("block:[%u,%u,%u], thread: [%u,%u,%u] ", blockIdx.x, blockIdx.y, blockIdx.z, threadIdx.x, threadIdx.y, threadIdx.z); @@ -175,13 +151,7 @@ class Logger { Some bad things happened. Please read the above error messages and stack trace. If you are using Python, the following command may be helpful: - $ gdb --args python /path/to/your/code.py - (gdb) catch throw - (gdb) b k2::SomeFunctionName - # For instance - (gdb) b k2::ShortestPath - (gdb) run - (gdb) bt + gdb --args python /path/to/your/code.py (You can use `gdb` to debug the code. Please consider compiling a debug version of k2.). @@ -196,7 +166,13 @@ class Logger { // this is usually caused by one of the K2_CHECK macros and the detailed // error messages should have already been printed by the macro, so we // use an arbitrary string here. +#ifndef _MSC_VER __assert_fail(kErrMsg, filename_, line_num_, func_name_); +#else + (void)kErrMsg; + assert(0); +#endif // _MSC_VER + #else std::string stack_trace = GetStackTrace(); if (!stack_trace.empty()) { @@ -398,8 +374,7 @@ inline K2_CUDA_HOSTDEV LogLevel GetCurrentLogLevel() { #define K2_CHECK_CUDA_ERROR(x) \ K2_CHECK_EQ(x, cudaSuccess) << " Error: " << cudaGetErrorString(x) << ". " #else -#define K2_CHECK_CUDA_ERROR(...) \ - K2_LOG(FATAL) << "Don't call me (Not compiled with CUDA ?)" +#define K2_CHECK_CUDA_ERROR(...) K2_LOG(FATAL) << "K2 compiled without CUDA support" #endif // The parameter of `K2_CUDA_SAFE_CALL` should be cuda function call or kernel @@ -421,7 +396,7 @@ inline K2_CUDA_HOSTDEV LogLevel GetCurrentLogLevel() { // Use a separate K2_CUDA_SAFE_CALL() for CPU // because the kernel invocation syntax <<< >>> // is not valid C++ -#define K2_CUDA_SAFE_CALL(...) K2_LOG(FATAL) << "Don't call me!" +#define K2_CUDA_SAFE_CALL(...) K2_LOG(FATAL) << "K2 compiled without CUDA support" #endif // ------------------------------------------------------------ diff --git a/k2/csrc/log_test.cu b/k2/csrc/log_test.cu index bd168bc97..8b86dafc3 100644 --- a/k2/csrc/log_test.cu +++ b/k2/csrc/log_test.cu @@ -28,7 +28,11 @@ TEST(Log, Cpu) { K2_LOG(DEBUG) << "Debug message"; K2_LOG(INFO) << "Info message"; K2_LOG(WARNING) << "Warning message"; +#ifndef _MSC_VER + // It fails on Windows with the following error: + // k2/csrc/log_test.cu(31): error : expected a ")" K2_LOG(ERROR) << "Error message"; +#endif K2_DLOG(INFO) << "This is printed only in debug mode"; diff --git a/k2/csrc/macros_test.cu b/k2/csrc/macros_test.cu index 6963cc45e..ef68e574e 100644 --- a/k2/csrc/macros_test.cu +++ b/k2/csrc/macros_test.cu @@ -27,7 +27,7 @@ namespace k2 { -static void TestEval() { +/*static*/ void TestEval() { for (auto &c : {GetCpuContext(), GetCudaContext()}) { Array1 array = Range(c, 3, 0); int32_t *array_data = array.Data(); @@ -46,7 +46,7 @@ static void TestEval() { } } -static void TestEval2() { +/*static*/ void TestEval2() { for (auto &c : {GetCpuContext(), GetCudaContext()}) { Array1 array1 = Range(c, 6, 0); Array2 array(array1, 2, 3); diff --git a/k2/csrc/math.h b/k2/csrc/math.h index 65b6f8e91..3ebc8b406 100644 --- a/k2/csrc/math.h +++ b/k2/csrc/math.h @@ -27,6 +27,20 @@ namespace k2 { +// Currently, only used in k2/csrc/rnnt_decode.cu +// See https://github.com/k2-fsa/k2/pull/951#issuecomment-1096650842 +__host__ __device__ __forceinline__ int64_t Pow(int64_t base, + int64_t exponent) { + K2_CHECK_GE(exponent, 0); + int64_t exp = 0; + int64_t result = 1; + while (exp < exponent) { + result *= base; + exp++; + } + return result; +} + /* Returns index of highest bit set, in range -1..30. HighestBitSet(0) = -1, @@ -106,29 +120,29 @@ int32_t RandIntGeometric(int32_t min, int32_t max); type, but for types float and double it "fixes" the broken behavior of the C++ standard w.r.t. infinity allowing infinities to be parsed. */ -template struct InputFixer { +template +struct InputFixer { T t; // cast operator operator T() const { return t; } }; - namespace internal { template Real FixedRead(std::istream &is); } template -inline std::istream &operator >>(std::istream &is, InputFixer &f) { +inline std::istream &operator>>(std::istream &is, InputFixer &f) { return is >> f.t; } template <> -inline std::istream &operator >>(std::istream &is, InputFixer &f) { +inline std::istream &operator>>(std::istream &is, InputFixer &f) { f.t = internal::FixedRead(is); return is; } template <> -inline std::istream &operator >>(std::istream &is, InputFixer &f) { +inline std::istream &operator>>(std::istream &is, InputFixer &f) { f.t = internal::FixedRead(is); return is; } diff --git a/k2/csrc/ragged_ops.cu b/k2/csrc/ragged_ops.cu index 78bfed8ab..1a919a02a 100644 --- a/k2/csrc/ragged_ops.cu +++ b/k2/csrc/ragged_ops.cu @@ -421,8 +421,12 @@ inline void GetOldAndNewOffsets(RaggedShape &src, ExclusiveSum(*new_offsets, new_offsets); } -static RaggedShape IndexAxis0(RaggedShape &src, const Array1 &new2old, - Array1 *elem_indexes /*=nullptr*/) { +// Don't make it static to fix the following error on Windows. +// Error : On Windows, the enclosing parent function ("IndexAxis0") for an +// extended __host__ __device__ lambda cannot have internal or no linkage +/*static*/ RaggedShape IndexAxis0(RaggedShape &src, + const Array1 &new2old, + Array1 *elem_indexes /*=nullptr*/) { NVTX_RANGE(K2_FUNC); ContextPtr &c = src.Context(); K2_CHECK(IsCompatible(src, new2old)); @@ -679,8 +683,8 @@ void GetRowInfoMulti(int32_t num_srcs, RaggedShape **src, *row_ids = row_ids_ptrs.To(ctx); } -static RaggedShape StackAxis0(int32_t num_srcs, RaggedShape **src, - Array1 *merge_map /* == nullptr*/) { +/*static*/ RaggedShape StackAxis0(int32_t num_srcs, RaggedShape **src, + Array1 *merge_map /* == nullptr*/) { NVTX_RANGE(K2_FUNC); if (num_srcs == 1) { if (merge_map) @@ -1128,7 +1132,7 @@ RaggedShape Stack(int32_t axis, int32_t num_srcs, RaggedShape **src, RaggedShape, 1,2,4 to construct the second output RaggedShape, 6 and a empty list to construct the third output RaggedShape. */ -static void SelectAxis0(RaggedShape &src, const Ragged &indexes, +/*static*/ void SelectAxis0(RaggedShape &src, const Ragged &indexes, std::vector *out, std::vector> *split_map) { NVTX_RANGE(K2_FUNC); ContextPtr &c = src.Context(); @@ -1475,8 +1479,8 @@ Ragged GetCountsPartitioned(Ragged &src, return Ragged(ans_ragged_shape, counts); } -static Array1 GetTransposeReorderingCpu(Ragged &src, - int32_t num_cols) { +/*static*/ Array1 GetTransposeReorderingCpu(Ragged &src, + int32_t num_cols) { NVTX_RANGE(K2_FUNC); std::vector> column_indexes(num_cols); // [column][row] const int32_t *values_data = src.values.Data(); @@ -1496,8 +1500,9 @@ static Array1 GetTransposeReorderingCpu(Ragged &src, return ans; } -static Array1 GetTransposeReorderingThreeAxesCuda(Ragged &src, - int32_t num_cols) { +#ifndef _MSC_VER +/*static*/ Array1 GetTransposeReorderingThreeAxesCuda( + Ragged &src, int32_t num_cols) { NVTX_RANGE(K2_FUNC); K2_CHECK_EQ(src.NumAxes(), 3); ContextPtr &context = src.Context(); @@ -1541,6 +1546,7 @@ static Array1 GetTransposeReorderingThreeAxesCuda(Ragged &src, lambda_comp, *mgpu_context)); return ans; } +#endif /* @@ -1565,6 +1571,37 @@ Array1 GetTransposeReordering(Ragged &src, int32_t num_cols) { if (device_type == kCpu) return GetTransposeReorderingCpu(src, num_cols); K2_CHECK_EQ(device_type, kCuda); + +#ifdef _MSC_VER + // See https://github.com/k2-fsa/k2/pull/753 + // and + // https://github.com/k2-fsa/k2/pull/571 + int32_t num_buckets = num_cols; + int32_t num_elements = src.values.Dim(); + int32_t log_buckets = static_cast(ceilf(log2f(num_buckets))); + + Array1 ans = Range(context, num_elements, 0); + + cudaStream_t stream = context->GetCudaStream(); + + size_t temp_storage_bytes = 0; + K2_CUDA_SAFE_CALL(cub::DeviceRadixSort::SortPairs( + nullptr, temp_storage_bytes, src.values.Data(), + static_cast(nullptr), ans.Data(), ans.Data(), num_elements, 0, + log_buckets, stream)); + + Array1 d_temp_storage( + context, temp_storage_bytes + num_elements * sizeof(int32_t)); + + K2_CUDA_SAFE_CALL(cub::DeviceRadixSort::SortPairs( + d_temp_storage.Data() + sizeof(int32_t) * num_elements, + temp_storage_bytes, src.values.Data(), + reinterpret_cast(d_temp_storage.Data()), ans.Data(), + ans.Data(), num_elements, 0, log_buckets, stream)); + + return ans; + +#else (void)GetTransposeReorderingThreeAxesCuda; // remove compiler warnings #if __CUDACC_VER_MAJOR__ > 10 || \ @@ -1599,7 +1636,7 @@ Array1 GetTransposeReordering(Ragged &src, int32_t num_cols) { // CheckGetTransposeReordering(src, ans); return ans; -#else +#else // __CUDACC_VER_MAJOR__ if (src.NumAxes() == 3) { Array1 ans = GetTransposeReorderingThreeAxesCuda(src, num_cols); // CheckGetTransposeReordering(src, ans); @@ -1638,6 +1675,7 @@ Array1 GetTransposeReordering(Ragged &src, int32_t num_cols) { // CheckGetTransposeReordering(src, ans); return ans; #endif +#endif // _MSC_VER } RaggedShape ChangeSublistSize(const RaggedShape &src, int32_t size_delta) { diff --git a/k2/csrc/ragged_test.cu b/k2/csrc/ragged_test.cu index dde9d9b70..e2cae8907 100644 --- a/k2/csrc/ragged_test.cu +++ b/k2/csrc/ragged_test.cu @@ -3024,17 +3024,6 @@ TEST(RaggedTest, TestPadRagged) { TestPadRagged(); } -TEST(RaggedTest, ToVecVecInt) { - for (auto &c : {GetCpuContext(), GetCudaContext()}) { - Ragged src(c, "[[1 2 3] [] [4 0 5 6]]"); - std::vector> v = src.ToVecVec(); - ASSERT_EQ(v.size(), 3u); - EXPECT_EQ(v[0], (std::vector{1, 2, 3})); - EXPECT_TRUE(v[1].empty()); - EXPECT_EQ(v[2], (std::vector{4, 0, 5, 6})); - } -} - template static void TestPruneRagged() { for (auto &c : {GetCpuContext(), GetCudaContext()}) { diff --git a/k2/csrc/rand_test.cu b/k2/csrc/rand_test.cu index 1370c8cc6..49d456dff 100644 --- a/k2/csrc/rand_test.cu +++ b/k2/csrc/rand_test.cu @@ -107,7 +107,7 @@ TEST(RandInt, CUDA) { } template -static void TestBounds(T low, T high) { +/*static*/ void TestBounds(T low, T high) { int32_t dim = 100000; ContextPtr cpu = GetCpuContext(); ContextPtr cuda = GetCudaContext(); diff --git a/k2/csrc/rm_epsilon.cu b/k2/csrc/rm_epsilon.cu index a0fffbc76..8451806ed 100644 --- a/k2/csrc/rm_epsilon.cu +++ b/k2/csrc/rm_epsilon.cu @@ -69,7 +69,7 @@ namespace k2 { @param [out] epsilon_closure_mapped_arc_map The arc map from `epsilon_closure_mapped` to `src`. */ -static void GetEpsilonClosureMapped( +/*static*/ void GetEpsilonClosureMapped( FsaVec &epsilon_fsa_closure, const Array1 &epsilon_closure_state_map, Ragged &epsilon_closure_arc_map, FsaVec &non_epsilon_fsa, @@ -139,7 +139,7 @@ static void GetEpsilonClosureMapped( foll_shape.RowSplits(1)[i] is the number of following arcs it is combined with. */ -static void DecideCombineWithFollowingOrPreceding( +/*static*/ void DecideCombineWithFollowingOrPreceding( FsaVec &epsilon_closure_mapped, FsaVec &non_epsilon_fsa, Renumbering *epsilon_prec_renumbering, RaggedShape *foll_shape) { NVTX_RANGE(K2_FUNC); @@ -237,7 +237,7 @@ static void DecideCombineWithFollowingOrPreceding( @param [out] combined_foll_arc_map The arc map of `combined_foll`, from arcs idx012 in `combined_foll` to the original Fsa. */ -static void CombineWithFollowingNonEpsilonArcs( +/*static*/ void CombineWithFollowingNonEpsilonArcs( FsaVec &epsilon_closure_mapped, Ragged &epsilon_closure_mapped_arc_map, FsaVec &non_epsilon_fsa, const Array1 &non_epsilon_arc_map, RaggedShape &foll_shape, @@ -341,7 +341,7 @@ static void CombineWithFollowingNonEpsilonArcs( `epsilon_closure_prec_arc_map`, user will get the complete arc map info for `combined_prec`. */ -static void CombineWithPrecedingNonEpsilonArcs( +/*static*/ void CombineWithPrecedingNonEpsilonArcs( FsaVec &epsilon_closure_prec, Ragged &epsilon_closure_prec_arc_map, FsaVec &non_epsilon_fsa, FsaVec *combined_prec, Ragged *epsilon_closure_prec_arc_map_prec, diff --git a/k2/csrc/rnnt_decode.cu b/k2/csrc/rnnt_decode.cu index db5e732dd..e86b2f7d5 100644 --- a/k2/csrc/rnnt_decode.cu +++ b/k2/csrc/rnnt_decode.cu @@ -159,8 +159,8 @@ void RnntDecodingStreams::GetContexts(RaggedShape *shape, int64_t state_value = states_values_data[state_idx01x], context_state = state_value / num_graph_states, exp = decoder_history_len - col, - state = context_state % (int64_t)pow(vocab_size, exp); - state = state / (int64_t)pow(vocab_size, exp - 1); + state = context_state % Pow(vocab_size, exp); + state = state / Pow(vocab_size, exp - 1); contexts_acc(row, col) = state; }); } @@ -540,7 +540,7 @@ void RnntDecodingStreams::Advance(const Array2 &logprobs) { // can be done with `358 % 10^2`, then we append 6 to 58, that can be // done with `58 * 10 + 6`. context_state = this_context_state % - (int64_t)pow(vocab_size, decoder_history_len - 1); + Pow(vocab_size, decoder_history_len - 1); context_state = context_state * vocab_size + arc.label; } diff --git a/k2/csrc/tensor_ops.cu b/k2/csrc/tensor_ops.cu index 94ab6c1c5..481107566 100644 --- a/k2/csrc/tensor_ops.cu +++ b/k2/csrc/tensor_ops.cu @@ -20,10 +20,11 @@ namespace k2 { template -static void CopyTensorElements2d(ContextPtr c, int32_t dim0, int32_t dim1, - const T *src_data, int32_t src_stride0, - int32_t src_stride1, T *dest_data, - int32_t dest_stride0, int32_t dest_stride1) { +/*static*/ void CopyTensorElements2d(ContextPtr c, int32_t dim0, int32_t dim1, + const T *src_data, int32_t src_stride0, + int32_t src_stride1, T *dest_data, + int32_t dest_stride0, + int32_t dest_stride1) { NVTX_RANGE(K2_FUNC); DeviceType d = c->GetDeviceType(); if (d == kCpu) { @@ -132,10 +133,11 @@ Tensor Cast(Tensor src, Dtype new_dtype) { // See the documentation of `Index`. template -static void Index1DImpl(ContextPtr context, const T *src_data, - int32_t src_stride, int32_t src_dim, - const int32_t *indexes_data, bool allow_minus_one, - int32_t ans_dim, T *ans_data, double default_value) { +/*static*/ void Index1DImpl(ContextPtr context, const T *src_data, + int32_t src_stride, int32_t src_dim, + const int32_t *indexes_data, bool allow_minus_one, + int32_t ans_dim, T *ans_data, + double default_value) { if (std::is_integral::value) { K2_CHECK_EQ(static_cast(default_value), default_value); } @@ -166,10 +168,11 @@ static void Index1DImpl(ContextPtr context, const T *src_data, // See the documentation of `Index`. template -static void Index2DImpl(ContextPtr context, const T *src_data, - int32_t src_stride, int32_t src_dim0, int32_t src_dim1, - const int32_t *indexes_data, bool allow_minus_one, - int32_t ans_dim, int32_t ans_stride, T *ans_data) { +/*static*/ void Index2DImpl(ContextPtr context, const T *src_data, + int32_t src_stride, int32_t src_dim0, + int32_t src_dim1, const int32_t *indexes_data, + bool allow_minus_one, int32_t ans_dim, + int32_t ans_stride, T *ans_data) { NVTX_RANGE(K2_FUNC); if (allow_minus_one) { if (context->GetDeviceType() == kCpu) { @@ -299,11 +302,11 @@ Tensor Index(Tensor &src, Array1 &indexes, bool allow_minus_one, } template -static void IndexAdd1DImpl(ContextPtr context, const T *src_data, - int32_t src_dim, int32_t src_stride, - const int32_t *indexes_data, bool allow_minus_one, - int32_t dest_dim, int32_t dest_stride, - T *dest_data) { +/*static*/ void IndexAdd1DImpl(ContextPtr context, const T *src_data, + int32_t src_dim, int32_t src_stride, + const int32_t *indexes_data, + bool allow_minus_one, int32_t dest_dim, + int32_t dest_stride, T *dest_data) { NVTX_RANGE(K2_FUNC); if (allow_minus_one) { K2_EVAL( @@ -330,12 +333,13 @@ static void IndexAdd1DImpl(ContextPtr context, const T *src_data, } template -static void IndexAdd2DImpl(ContextPtr context, const T *src_data, - int32_t src_dim0, int32_t src_dim1, - int32_t src_stride0, int32_t src_stride1, - const int32_t *indexes_data, bool allow_minus_one, - int32_t dest_dim, int32_t dest_stride0, - int32_t dest_stride1, T *dest_data) { +/*static*/ void IndexAdd2DImpl(ContextPtr context, const T *src_data, + int32_t src_dim0, int32_t src_dim1, + int32_t src_stride0, int32_t src_stride1, + const int32_t *indexes_data, + bool allow_minus_one, int32_t dest_dim, + int32_t dest_stride0, int32_t dest_stride1, + T *dest_data) { NVTX_RANGE(K2_FUNC); if (allow_minus_one) { K2_EVAL2( @@ -437,10 +441,9 @@ void IndexAdd(Tensor &src, Array1 &indexes, bool allow_minus_one, } template -static void SimpleRaggedIndexSelect1DImpl(ContextPtr context, const T *src_data, - int32_t src_stride, int32_t src_dim, - Ragged &indexes, - int32_t ans_dim, T *ans_data) { +/*static*/ void SimpleRaggedIndexSelect1DImpl( + ContextPtr context, const T *src_data, int32_t src_stride, int32_t src_dim, + Ragged &indexes, int32_t ans_dim, T *ans_data) { NVTX_RANGE(K2_FUNC); K2_CHECK_EQ(indexes.NumAxes(), 2); int32_t indexes_dim0 = indexes.Dim0(), diff --git a/k2/csrc/tensor_ops_test.cu b/k2/csrc/tensor_ops_test.cu index f57636ba8..0aa7498c4 100644 --- a/k2/csrc/tensor_ops_test.cu +++ b/k2/csrc/tensor_ops_test.cu @@ -36,7 +36,7 @@ namespace k2 { @return Returns a 1-D tensor with the given `dim` and `stride`. */ template -static Tensor GenerateRandTensor1D(ContextPtr context, int32_t dim, +/*static*/ Tensor GenerateRandTensor1D(ContextPtr context, int32_t dim, int32_t stride) { K2_CHECK_GT(stride, 0); @@ -69,7 +69,7 @@ static Tensor GenerateRandTensor1D(ContextPtr context, int32_t dim, `stride`. */ template -static Tensor GenerateRandTensor2D(ContextPtr context, int32_t num_rows, +/*static*/ Tensor GenerateRandTensor2D(ContextPtr context, int32_t num_rows, int32_t num_cols, int32_t stride) { int32_t num_tensor_elements = num_rows * num_cols; K2_CHECK_GT(num_cols, 0); @@ -301,7 +301,7 @@ TEST(IndexAdd, IndexAdd2D) { } template -static void TestSimpleRaggedIndexSelect1D() { +/*static*/ void TestSimpleRaggedIndexSelect1D() { // test with simple case should be good enough for (auto &context : {GetCpuContext(), GetCudaContext()}) { // create src diff --git a/k2/csrc/test_utils.h b/k2/csrc/test_utils.h index 27774d5e8..05a816033 100644 --- a/k2/csrc/test_utils.h +++ b/k2/csrc/test_utils.h @@ -20,15 +20,16 @@ #ifndef K2_CSRC_TEST_UTILS_H_ #define K2_CSRC_TEST_UTILS_H_ -#include #include #include #include #include +#include "gtest/gtest.h" #include "k2/csrc/array.h" #include "k2/csrc/fsa.h" +#include "k2/csrc/log.h" namespace k2 { @@ -103,9 +104,9 @@ inline void ExpectEqual(const std::vector &expected, // check if `array` and `target` have the same values template void CheckArrayData(const Array1 &array, const Array1 &target, - T abs_error = (T)0.001) { + T abs_error = T(0.001)) { if (array.Dim() != target.Dim()) { - K2_LOG(ERROR) << "Dims mismatch " << array.Dim() << " vs. " << target.Dim(); + K2_LOG(FATAL) << "Dims mismatch " << array.Dim() << " vs. " << target.Dim(); } int32_t dim = array.Dim(); ContextPtr cpu = GetCpuContext(); diff --git a/k2/csrc/version.h.in b/k2/csrc/version.h.in index a894f979d..cfffaccf8 100644 --- a/k2/csrc/version.h.in +++ b/k2/csrc/version.h.in @@ -46,7 +46,7 @@ static constexpr const char *kPythonVersion = "@PYTHON_VERSION_MAJOR@.@PYTHON_VE static constexpr const char *kBuildType = "@CMAKE_BUILD_TYPE@"; // The operating system that is used to build k2, e.g., Ubuntu 16.04 LTS -static constexpr const char *kOS = "@K2_OS@"; +static constexpr const char *kOS = R"os(@K2_OS@)os"; // e.g., 3.18.0 static constexpr const char *kCMakeVersion = "@CMAKE_VERSION@"; @@ -55,10 +55,10 @@ static constexpr const char *kCMakeVersion = "@CMAKE_VERSION@"; static constexpr const char *kGCCVersion = "@CMAKE_CXX_COMPILER_VERSION@"; // CUDA flags used to compile k2 -static constexpr const char *kCMakeCudaFlags = "@CMAKE_CUDA_FLAGS@"; +static constexpr const char *kCMakeCudaFlags = R"cuda_flags(@CMAKE_CUDA_FLAGS@)cuda_flags"; // CXX flags used to compile k2 -static constexpr const char *kCMakeCxxFlags = "@CMAKE_CXX_FLAGS@"; +static constexpr const char *kCMakeCxxFlags = R"cxx_flags(@CMAKE_CXX_FLAGS@)cxx_flags"; // Which PyTorch version k2 is using, e.g., 1.6.0+cu101 static constexpr const char *kTorchVersion = "@TORCH_VERSION@"; diff --git a/k2/python/csrc/CMakeLists.txt b/k2/python/csrc/CMakeLists.txt index 520fe443c..6031cb87a 100644 --- a/k2/python/csrc/CMakeLists.txt +++ b/k2/python/csrc/CMakeLists.txt @@ -19,7 +19,17 @@ if(NOT K2_WITH_CUDA) transform(OUTPUT_VARIABLE k2_srcs SRCS ${k2_srcs}) endif() -pybind11_add_module(_k2 ${k2_srcs} SHARED) +if(WIN32) + # It throws the following error on Windows + # nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified + # because there is an option "/bigobj" in pybind11::windows_extra that cannot be recognized by NVCC. + # + # We clear it below. + set_property(TARGET pybind11::windows_extras PROPERTY INTERFACE_COMPILE_OPTIONS "") +endif() + + +pybind11_add_module(_k2 ${k2_srcs}) target_link_libraries(_k2 PRIVATE context) target_link_libraries(_k2 PRIVATE fsa) @@ -33,3 +43,4 @@ endif() target_include_directories(_k2 PRIVATE ${CMAKE_SOURCE_DIR}) target_include_directories(_k2 PRIVATE ${CMAKE_BINARY_DIR}) +set_target_properties(_k2 PROPERTIES CUDA_SEPARABLE_COMPILATION ON) diff --git a/k2/python/csrc/torch.h b/k2/python/csrc/torch.h index a0f742196..e11827598 100644 --- a/k2/python/csrc/torch.h +++ b/k2/python/csrc/torch.h @@ -29,38 +29,6 @@ namespace pybind11 { namespace detail { -#if K2_TORCH_VERSION_MAJOR < 1 || \ - (K2_TORCH_VERSION_MAJOR == 1 && K2_TORCH_VERSION_MINOR < 9) -// Only for torch version < 1.9.0 - -// See https://github.com/pytorch/pytorch/pull/57292 - -template <> -struct type_caster { - public: - PYBIND11_TYPE_CASTER(torch::Device, _("torch::Device")); - - // PYBIND11_TYPE_CASTER defines a member field called value. Since - // torch::Device cannot be default-initialized, we provide this constructor to - // explicitly initialize that field. The value doesn't matter as it will be - // overwritten after a successful call to load. - type_caster() : value(torch::kCPU) {} - - bool load(handle src, bool) { - PyObject *obj = src.ptr(); - if (THPDevice_Check(obj)) { - value = reinterpret_cast(obj)->device; - return true; - } - return false; - } - - static handle cast(const torch::Device &src, return_value_policy /* policy */, - handle /* parent */) { - return handle(THPDevice_New(src)); - } -}; -#endif template <> struct type_caster { diff --git a/k2/python/csrc/torch/fsa.cu b/k2/python/csrc/torch/fsa.cu index 372b853f0..7571667c5 100644 --- a/k2/python/csrc/torch/fsa.cu +++ b/k2/python/csrc/torch/fsa.cu @@ -491,7 +491,7 @@ static void PybindBackpropGetArcPost(py::module &m, const char *name) { @return It returns the gradient of scores of all arcs. */ template -static torch::Tensor GetTotScoresTropicalBackward( +/*static*/ torch::Tensor GetTotScoresTropicalBackward( FsaVec &fsas, const RaggedAny &best_path_arc_indexes, torch::Tensor tot_scores_grad) { DeviceGuard guard(fsas.Context()); @@ -542,7 +542,7 @@ static torch::Tensor GetTotScoresTropicalBackward( @return It returns the gradient of scores of all arcs. */ template -static torch::Tensor GetTotScoresLogBackward(FsaVec &fsas, +/*static*/ torch::Tensor GetTotScoresLogBackward(FsaVec &fsas, torch::Tensor arc_post, torch::Tensor tot_scores_grad) { DeviceGuard guard(fsas.Context()); diff --git a/k2/python/csrc/torch/fsa_algo.cu b/k2/python/csrc/torch/fsa_algo.cu index f4016695d..aa945c70d 100644 --- a/k2/python/csrc/torch/fsa_algo.cu +++ b/k2/python/csrc/torch/fsa_algo.cu @@ -59,7 +59,7 @@ static void PybindTopSort(py::module &m) { static void PybindLinearFsa(py::module &m) { m.def( "linear_fsa", - [](RaggedAny &labels, torch::optional = {}) -> FsaVec { + [](RaggedAny &labels, py::object = py::none()) -> FsaVec { DeviceGuard guard(labels.any.Context()); return LinearFsas(labels.any.Specialize()); }, @@ -68,48 +68,26 @@ static void PybindLinearFsa(py::module &m) { m.def( "linear_fsa", [](const std::vector &labels, - torch::optional device = {}) -> Fsa { - ContextPtr context = - GetContext(device.value_or(torch::Device(torch::kCPU))); + py::object device = py::str("cpu")) -> Fsa { + std::string device_str = device.is_none() ? "cpu" : py::str(device); + ContextPtr context = GetContext(torch::Device(device_str)); DeviceGuard guard(context); Array1 array(context, labels); return LinearFsa(array); // }, - py::arg("labels"), py::arg("device") = py::none()); - - m.def( - "linear_fsa", - [](const std::vector &labels, - torch::optional device = {}) -> Fsa { - ContextPtr context = GetContext(torch::Device(device.value_or("cpu"))); - DeviceGuard guard(context); - Array1 array(context, labels); - return LinearFsa(array); // - }, - py::arg("labels"), py::arg("device") = py::none()); + py::arg("labels"), py::arg("device") = py::str("cpu")); m.def( "linear_fsa", [](const std::vector> &labels, - torch::optional device = {}) -> FsaVec { - ContextPtr context = - GetContext(device.value_or(torch::Device(torch::kCPU))); + py::object device = py::str("cpu")) -> FsaVec { + std::string device_str = device.is_none() ? "cpu" : py::str(device); + ContextPtr context = GetContext(torch::Device(device_str)); DeviceGuard guard(context); Ragged ragged = CreateRagged2(labels).To(context); return LinearFsas(ragged); }, - py::arg("labels"), py::arg("device") = py::none()); - - m.def( - "linear_fsa", - [](const std::vector> &labels, - torch::optional device = {}) -> FsaVec { - ContextPtr context = GetContext(torch::Device(device.value_or("cpu"))); - DeviceGuard guard(context); - Ragged ragged = CreateRagged2(labels).To(context); - return LinearFsas(ragged); - }, - py::arg("labels"), py::arg("device") = py::none()); + py::arg("labels"), py::arg("device") = py::str("cpu")); } static void PybindIntersect(py::module &m) { @@ -481,7 +459,7 @@ static void PybindRemoveEpsilonSelfLoops(py::module &m) { py::arg("src"), py::arg("need_arc_map") = true); } -static void PybindExpandArcs(py::module &m) { +/*static*/ void PybindExpandArcs(py::module &m) { // See doc-string below. m.def( "expand_arcs", @@ -718,59 +696,34 @@ static void PybindCtcGraph(py::module &m) { static void PybindCtcTopo(py::module &m) { m.def( "ctc_topo", - [](int32_t max_token, torch::optional device = {}, + [](int32_t max_token, py::object device = py::str("cpu"), bool modified = false) -> std::pair { - ContextPtr context = GetContext(device.value_or(torch::Device("cpu"))); + std::string device_str = device.is_none() ? "cpu" : py::str(device); + ContextPtr context = GetContext(torch::Device(device_str)); DeviceGuard guard(context); Array1 aux_labels; Fsa fsa = CtcTopo(context, max_token, modified, &aux_labels); torch::Tensor tensor = ToTorch(aux_labels); return std::make_pair(fsa, tensor); }, - py::arg("max_token"), py::arg("device") = py::none(), - py::arg("modified") = false); - - m.def( - "ctc_topo", - [](int32_t max_token, torch::optional device = {}, - bool modified = false) -> std::pair { - ContextPtr context = GetContext(torch::Device(device.value_or("cpu"))); - DeviceGuard guard(context); - Array1 aux_labels; - Fsa fsa = CtcTopo(context, max_token, modified, &aux_labels); - torch::Tensor tensor = ToTorch(aux_labels); - return std::make_pair(fsa, tensor); - }, - py::arg("max_token"), py::arg("device") = py::none(), + py::arg("max_token"), py::arg("device") = py::str("cpu"), py::arg("modified") = false); } static void PybindTrivialGraph(py::module &m) { m.def( "trivial_graph", - [](int32_t max_token, torch::optional device = {}) - -> std::pair { - ContextPtr context = GetContext(device.value_or(torch::Device("cpu"))); - DeviceGuard guard(context); - Array1 aux_labels; - Fsa fsa = TrivialGraph(context, max_token, &aux_labels); - torch::Tensor tensor = ToTorch(aux_labels); - return std::make_pair(fsa, tensor); - }, - py::arg("max_token"), py::arg("device") = py::none()); - - m.def( - "trivial_graph", - [](int32_t max_token, torch::optional device = {}) - -> std::pair { - ContextPtr context = GetContext(torch::Device(device.value_or("cpu"))); + [](int32_t max_token, + py::object device = py::str("cpu")) -> std::pair { + std::string device_str = device.is_none() ? "cpu" : py::str(device); + ContextPtr context = GetContext(torch::Device(device_str)); DeviceGuard guard(context); Array1 aux_labels; Fsa fsa = TrivialGraph(context, max_token, &aux_labels); torch::Tensor tensor = ToTorch(aux_labels); return std::make_pair(fsa, tensor); }, - py::arg("max_token"), py::arg("device") = py::none()); + py::arg("max_token"), py::arg("device") = py::str("cpu")); } static void PybindLevenshteinGraph(py::module &m) { diff --git a/k2/python/csrc/torch/ragged_ops.cu b/k2/python/csrc/torch/ragged_ops.cu index ec6018e13..d2d431b17 100644 --- a/k2/python/csrc/torch/ragged_ops.cu +++ b/k2/python/csrc/torch/ragged_ops.cu @@ -152,8 +152,9 @@ static void PybindNormalizePerSublist(py::module &m, const char *name) { (out.NumElements(),). */ template -static torch::Tensor NormalizePerSublistBackward(Ragged &out, bool use_log, - torch::Tensor out_grad) { +/*static*/ torch::Tensor NormalizePerSublistBackward(Ragged &out, + bool use_log, + torch::Tensor out_grad) { NVTX_RANGE(K2_FUNC); DeviceGuard guard(out.Context()); K2_CHECK_EQ(out_grad.dim(), 1) @@ -397,7 +398,7 @@ void PybindRaggedOps(py::module &m) { PybindArgMaxPerSublist(m); PybindCat(m); PybindCat(m); - PybindCat(m); + PybindCat(m); PybindCreateRagged2(m); PybindCreateRagged2(m); PybindGetLayer(m); diff --git a/k2/python/csrc/torch/v2/any.cu b/k2/python/csrc/torch/v2/any.cu index 0c9f07b4a..74f163d23 100644 --- a/k2/python/csrc/torch/v2/any.cu +++ b/k2/python/csrc/torch/v2/any.cu @@ -40,24 +40,32 @@ void PybindRaggedAny(py::module &m) { // k2.ragged.Tensor methods //-------------------------------------------------- - any.def(py::init(), py::arg("data"), - py::arg("dtype") = py::none(), - py::arg("device") = torch::Device(torch::kCPU), - kRaggedAnyInitDataDeviceDoc); + any.def(py::init([](py::list data, py::object dtype = py::none(), + py::object device = + py::str("cpu")) -> std::unique_ptr { + std::string device_str = device.is_none() ? "cpu" : py::str(device); + return std::make_unique(data, dtype, + torch::Device(device_str)); + }), + py::arg("data"), py::arg("dtype") = py::none(), + py::arg("device") = py::str("cpu"), kRaggedAnyInitDataDeviceDoc); any.def(py::init(), py::arg("data"), py::arg("dtype") = py::none(), py::arg("device") = "cpu", kRaggedAnyInitDataDeviceDoc); - any.def(py::init(), + any.def(py::init([](const std::string &s, py::object dtype = py::none(), + py::object device = + py::str("cpu")) -> std::unique_ptr { + std::string device_str = device.is_none() ? "cpu" : py::str(device); + return std::make_unique(s, dtype, device_str); + }), py::arg("s"), py::arg("dtype") = py::none(), - py::arg("device") = torch::Device(torch::kCPU), - kRaggedAnyInitStrDeviceDoc); + py::arg("device") = py::str("cpu"), kRaggedAnyInitStrDeviceDoc); any.def(py::init(), py::arg("s"), py::arg("dtype") = py::none(), - py::arg("device") = torch::Device(torch::kCPU), - kRaggedAnyInitStrDeviceDoc); + py::arg("device") = "cpu", kRaggedAnyInitStrDeviceDoc); any.def(py::init(), py::arg("shape"), py::arg("value"), kRaggedInitFromShapeAndTensorDoc); @@ -110,7 +118,7 @@ void PybindRaggedAny(py::module &m) { any.def( "__getitem__", [](RaggedAny &self, const py::slice &slice) -> RaggedAny { - py::ssize_t start = 0, stop = 0, step = 0, slicelength = 0; + py::size_t start = 0, stop = 0, step = 0, slicelength = 0; if (!slice.compute(self.any.Dim0(), &start, &stop, &step, &slicelength)) throw py::error_already_set(); int32_t istart = static_cast(start); @@ -168,10 +176,13 @@ void PybindRaggedAny(py::module &m) { }, py::arg("src"), py::arg("indexes"), kRaggedAnyIndexAndSumDoc); - any.def("to", - static_cast( - &RaggedAny::To), - py::arg("device"), kRaggedAnyToDeviceDoc); + any.def( + "to", + [](RaggedAny &self, py::object device) -> RaggedAny { + std::string device_str = device.is_none() ? "cpu" : py::str(device); + return self.To(torch::Device(device_str)); + }, + py::arg("device"), kRaggedAnyToDeviceDoc); any.def("to", static_cast( @@ -243,7 +254,8 @@ void PybindRaggedAny(py::module &m) { [](const RaggedAny &self) -> py::tuple { DeviceGuard guard(self.any.Context()); K2_CHECK(self.any.NumAxes() == 2 || self.any.NumAxes() == 3) - << "Only support Ragged with NumAxes() == 2 or 3 for now, given " + << "Only support Ragged with NumAxes() == 2 or 3 for now, " + "given " << self.any.NumAxes(); Array1 row_splits1 = self.any.RowSplits(1); Dtype t = self.any.GetDtype(); @@ -380,10 +392,8 @@ void PybindRaggedAny(py::module &m) { torch::Device device(device_type, self.any.Context()->GetDeviceId()); - PyObject *ptr = THPDevice_New(device); - - // takes ownership - return py::reinterpret_steal(ptr); + auto torch_device = py::module::import("torch").attr("device"); + return torch_device(device.str()); }, kRaggedAnyDeviceDoc); @@ -443,12 +453,12 @@ void PybindRaggedAny(py::module &m) { m.def( "create_ragged_tensor", [](py::list data, py::object dtype = py::none(), - torch::Device device = torch::kCPU) -> RaggedAny { - return RaggedAny(data, dtype, device); + py::object device = py::str("cpu")) -> RaggedAny { + std::string device_str = device.is_none() ? "cpu" : py::str(device); + return RaggedAny(data, dtype, torch::Device(device_str)); }, py::arg("data"), py::arg("dtype") = py::none(), - py::arg("device") = torch::Device(torch::kCPU), - kCreateRaggedTensorDataDoc); + py::arg("device") = py::str("cpu"), kCreateRaggedTensorDataDoc); m.def( "create_ragged_tensor", @@ -462,12 +472,12 @@ void PybindRaggedAny(py::module &m) { m.def( "create_ragged_tensor", [](const std::string &s, py::object dtype = py::none(), - torch::Device device = torch::kCPU) -> RaggedAny { - return RaggedAny(s, dtype, device); + py::object device = py::str("cpu")) -> RaggedAny { + std::string device_str = device.is_none() ? "cpu" : py::str(device); + return RaggedAny(s, dtype, torch::Device(device_str)); }, py::arg("s"), py::arg("dtype") = py::none(), - py::arg("device") = torch::Device(torch::kCPU), - kCreateRaggedTensorStrDoc); + py::arg("device") = py::str("cpu"), kCreateRaggedTensorStrDoc); m.def( "create_ragged_tensor", diff --git a/k2/python/csrc/torch/v2/ragged_shape.cu b/k2/python/csrc/torch/v2/ragged_shape.cu index cb3bc8c13..f989800de 100644 --- a/k2/python/csrc/torch/v2/ragged_shape.cu +++ b/k2/python/csrc/torch/v2/ragged_shape.cu @@ -66,7 +66,9 @@ void PybindRaggedShape(py::module &m) { shape.def( "to", - [](const RaggedShape &self, torch::Device device) -> RaggedShape { + [](const RaggedShape &self, py::object _device) -> RaggedShape { + std::string device_str = _device.is_none() ? "cpu" : py::str(_device); + torch::Device device = torch::Device(device_str); DeviceGuard guard(self.Context()); if (device.type() == torch::kCPU) return self.To(GetCpuContext()); @@ -166,10 +168,8 @@ void PybindRaggedShape(py::module &m) { torch::Device device(device_type, self.Context()->GetDeviceId()); - PyObject *ptr = THPDevice_New(device); - - // takes ownership - return py::reinterpret_steal(ptr); + auto torch_device = py::module::import("torch").attr("device"); + return torch_device(device.str()); }, kRaggedShapeDeviceDoc); diff --git a/k2/python/host/k2host/fsa.py b/k2/python/host/k2host/fsa.py index 59196a422..ecb8a5b0e 100644 --- a/k2/python/host/k2host/fsa.py +++ b/k2/python/host/k2host/fsa.py @@ -30,9 +30,9 @@ def __init__(self, src_state: int, dest_state: int, label: int, super().__init__(src_state, dest_state, label, weight) def to_tensor(self): - # TODO(fangjun): weight will be truncted to an int. + # TODO(fangjun): weight will be truncated to an int. return torch.tensor( - [self.src_state, self.dest_state, self.label, self.weight], + [self.src_state, self.dest_state, self.label, int(self.weight)], dtype=torch.int32) @staticmethod diff --git a/k2/python/k2/__init__.py b/k2/python/k2/__init__.py index 930affb18..54102705b 100644 --- a/k2/python/k2/__init__.py +++ b/k2/python/k2/__init__.py @@ -1,9 +1,29 @@ import torch # noqa +from .torch_version import k2_torch_cuda_version +from .torch_version import k2_torch_version + +if torch.__version__.split("+")[0] != k2_torch_version.split("+")[0]: + raise ImportError( + f"k2 was built using PyTorch {k2_torch_version}\n" + f"But you are using PyTorch {torch.__version__} to run it" + ) + +if ( + k2_torch_cuda_version != "" + and torch.version.cuda is not None + and torch.version.cuda != k2_torch_cuda_version +): + raise ImportError( + f"k2 was built using CUDA {k2_torch_cuda_version}\n" + f"But you are using CUDA {torch.version.cuda} to run it." + ) + try: from _k2 import DeterminizeWeightPushingType from _k2 import simple_ragged_index_select except ImportError as e: import sys + major_v, minor_v = sys.version_info[:2] raise ImportError( str(e) + "\nNote: If you're using anaconda and importing k2 on MacOS," @@ -18,6 +38,7 @@ from . import dense_fsa_vec from . import fsa from . import utils + # from .autograd import intersect_dense from .autograd import intersect_dense_pruned diff --git a/k2/python/k2/rnnt_decode.py b/k2/python/k2/rnnt_decode.py index 85d56cd5d..7e43d9f82 100644 --- a/k2/python/k2/rnnt_decode.py +++ b/k2/python/k2/rnnt_decode.py @@ -179,7 +179,7 @@ def format_output(self, num_frames: List[int]) -> Fsa: src = self.src_streams[i].fsa for name, value in src.named_tensor_attr(include_scores=False): if name not in tensor_attr_info: - filler = 0.0 + filler = 0 if isinstance(value, Tensor): filler = float(src.get_filler(name)) dtype = value.dtype diff --git a/k2/python/k2/rnnt_loss.py b/k2/python/k2/rnnt_loss.py index 5918d7b9e..67ad28a57 100644 --- a/k2/python/k2/rnnt_loss.py +++ b/k2/python/k2/rnnt_loss.py @@ -471,7 +471,7 @@ def _adjust_pruning_lower_bound( # make the transformed tensor to be non-decreasing s_begin = k2.monotonic_lower_bound(s_begin) # make start symbol to be zero. - s_begin = torch.where(s_begin < 0, 0, s_begin) + s_begin = torch.clamp(s_begin, min=0) # do the magic transformation again to recover s_begin s_begin = -( s_begin - (s_range - 1) * torch.arange(0, T, device=s_begin.device) @@ -568,7 +568,7 @@ def get_rnnt_prune_ranges( s_begin_padding = boundary[:, 2].reshape(B, 1) - s_range + 1 # handle the cases when `len(symbols) < s_range` - s_begin_padding = torch.where(s_begin_padding >= 0, s_begin_padding, 0) + s_begin_padding = torch.clamp(s_begin_padding, min=0) s_begin = torch.where(mask, s_begin, s_begin_padding) @@ -592,9 +592,9 @@ def do_rnnt_pruning( Args: am: - The encoder output, with shape (B, T, C) + The encoder output, with shape (B, T, encoder_dim) lm: - The prediction network output, with shape (B, S + 1, C) + The prediction network output, with shape (B, S + 1, decoder_dim) ranges: A tensor containing the symbol indexes for each frame that we want to keep. Its shape is (B, T, s_range), see the docs in @@ -603,26 +603,28 @@ def do_rnnt_pruning( Returns: Return the pruned am and lm with shape (B, T, s_range, C) """ - # am (B, T, C) - # lm (B, S + 1, C) + # am (B, T, encoder_dm) + # lm (B, S + 1, decoder_dim) # ranges (B, T, s_range) assert ranges.shape[0] == am.shape[0] assert ranges.shape[0] == lm.shape[0] assert am.shape[1] == ranges.shape[1] (B, T, s_range) = ranges.shape - (B, S1, C) = lm.shape + (B, S1, decoder_dim) = lm.shape + encoder_dim = am.shape[-1] + assert am.shape == (B, T, encoder_dim) S = S1 - 1 - # (B, T, s_range, C) - am_pruning = am.unsqueeze(2).expand((B, T, s_range, C)) + # (B, T, s_range, encoder_dim) + am_pruned = am.unsqueeze(2).expand((B, T, s_range, encoder_dim)) - # (B, T, s_range, C) - lm_pruning = torch.gather( - lm.unsqueeze(1).expand((B, T, S + 1, C)), + # (B, T, s_range, decoder_dim) + lm_pruned = torch.gather( + lm.unsqueeze(1).expand((B, T, S + 1, decoder_dim)), dim=2, - index=ranges.reshape((B, T, s_range, 1)).expand((B, T, s_range, C)), + index=ranges.reshape((B, T, s_range, 1)).expand((B, T, s_range, decoder_dim)), ) - return am_pruning, lm_pruning + return am_pruned, lm_pruned def _roll_by_shifts(src: torch.Tensor, shifts: torch.LongTensor): diff --git a/k2/python/k2/torch_version.py.in b/k2/python/k2/torch_version.py.in new file mode 100644 index 000000000..30e83abc0 --- /dev/null +++ b/k2/python/k2/torch_version.py.in @@ -0,0 +1,17 @@ +# Auto generated by the toplevel CMakeLists.txt. +# +# DO NOT EDIT. + +# The torch version used to build k2. We will check it against the torch version +# that is used to run k2. If they are not the same, `import k2` will throw. +# +# Some example values are: +# - 1.10.0+cu102 +# - 1.5.0+cpu +k2_torch_version = "@TORCH_VERSION@" + +# The CUDA version used to build k2. +# Note: It is an empty string if you used a CPU version of PyTorch to build k2 +# +# An example value is "10.2". +k2_torch_cuda_version = "@TORCH_CUDA_VERSION@" diff --git a/k2/python/tests/linear_fsa_with_self_loops_test.py b/k2/python/tests/linear_fsa_with_self_loops_test.py index 1e331bbbc..ec3654cb1 100644 --- a/k2/python/tests/linear_fsa_with_self_loops_test.py +++ b/k2/python/tests/linear_fsa_with_self_loops_test.py @@ -55,7 +55,7 @@ def test_multiple_fsa(self): expected_labels0 = [0, 2, 0, 5, 0, 8, 0, -1] expected_labels1 = [0, 1, 0, 2, 0, -1] expected_labels2 = [0, 3, 0, 2, 0, -1] - expected_labels = expected_labels0 + expected_labels1 + expected_labels2 + expected_labels = expected_labels0 + expected_labels1 + expected_labels2 # noqa assert dst.labels.tolist() == expected_labels diff --git a/k2/python/tests/mutual_information_test.py b/k2/python/tests/mutual_information_test.py index 11917f18f..cddd817e9 100644 --- a/k2/python/tests/mutual_information_test.py +++ b/k2/python/tests/mutual_information_test.py @@ -286,12 +286,12 @@ def get_boundary_row(): observed_delta = (delta_m * m_grad).sum().to("cpu") predicted_delta = (delta_px * px.grad).sum().to("cpu") - atol = 1.0e-02 if dtype == torch.float32 else 1.0e-04 - rtol = 1.0e-02 if dtype == torch.float32 else 1.0e-04 + atol = 1.0e-01 + rtol = atol assert torch.allclose( observed_delta, predicted_delta, atol=atol, rtol=rtol - ) + ), (observed_delta, predicted_delta) delta_py = delta * torch.randn_like(py) m2 = k2.mutual_information_recursion( diff --git a/scripts/github_actions/generate_build_matrix.py b/scripts/github_actions/generate_build_matrix.py new file mode 100755 index 000000000..5899c19b0 --- /dev/null +++ b/scripts/github_actions/generate_build_matrix.py @@ -0,0 +1,111 @@ +#!/usr/bin/env python3 +# Copyright 2022 Xiaomi Corp. (authors: Fangjun Kuang) + +import argparse +import json + + +def get_args(): + parser = argparse.ArgumentParser() + parser.add_argument( + "--enable-cuda", + action="store_true", + default=False, + help="True to enable CUDA", + ) + + parser.add_argument( + "--test-only-latest-torch", + action="store_true", + default=False, + help="""If True, we test only the latest PyTroch + to reduce CI running time.""", + ) + return parser.parse_args() + + +def generate_build_matrix(enable_cuda, test_only_latest_torch): + matrix = { + # there are issues in serializing ragged tensors in 1.5.0 and 1.5.1 + # "1.5.0": { + # "python-version": ["3.6", "3.7", "3.8"], + # "cuda": ["10.1", "10.2"], + # }, + # "1.5.1": { + # "python-version": ["3.6", "3.7", "3.8"], + # "cuda": ["10.1", "10.2"], + # }, + "1.6.0": { + "python-version": ["3.6", "3.7", "3.8"], + "cuda": ["10.1", "10.2"], + }, + "1.7.0": { + "python-version": ["3.6", "3.7", "3.8"], + "cuda": ["10.1", "10.2", "11.0"], + }, + "1.7.1": { + "python-version": ["3.6", "3.7", "3.8", "3.9"], + "cuda": ["10.1", "10.2", "11.0"], + }, + "1.8.0": { + "python-version": ["3.6", "3.7", "3.8", "3.9"], + "cuda": ["10.1", "10.2", "11.1"], + }, + "1.8.1": { + "python-version": ["3.6", "3.7", "3.8", "3.9"], + "cuda": ["10.1", "10.2", "11.1"], + }, + "1.9.0": { + "python-version": ["3.6", "3.7", "3.8", "3.9"], + "cuda": ["10.2", "11.1"], + }, + "1.9.1": { + "python-version": ["3.6", "3.7", "3.8", "3.9"], + "cuda": ["10.2", "11.1"], + }, + "1.10.0": { + "python-version": ["3.6", "3.7", "3.8", "3.9"], + "cuda": ["10.2", "11.1", "11.3"], + }, + "1.10.1": { + "python-version": ["3.6", "3.7", "3.8", "3.9"], + "cuda": ["10.2", "11.1", "11.3"], + }, + "1.10.2": { + "python-version": ["3.6", "3.7", "3.8", "3.9"], + "cuda": ["10.2", "11.1", "11.3"], + }, + "1.11.0": { + "python-version": ["3.7", "3.8", "3.9", "3.10"], + "cuda": ["10.2", "11.3", "11.5"], + }, + } + if test_only_latest_torch: + latest = "1.11.0" + matrix = {latest: matrix[latest]} + + ans = [] + for torch, python_cuda in matrix.items(): + python_versions = python_cuda["python-version"] + cuda_versions = python_cuda["cuda"] + if enable_cuda: + for p in python_versions: + for c in cuda_versions: + ans.append({"torch": torch, "python-version": p, "cuda": c}) + else: + for p in python_versions: + ans.append({"torch": torch, "python-version": p}) + + print(json.dumps({"include": ans})) + + +def main(): + args = get_args() + generate_build_matrix( + enable_cuda=args.enable_cuda, + test_only_latest_torch=args.test_only_latest_torch, + ) + + +if __name__ == "__main__": + main() diff --git a/scripts/github_actions/run-nightly-build.py b/scripts/github_actions/run-nightly-build.py new file mode 100755 index 000000000..1e002fba3 --- /dev/null +++ b/scripts/github_actions/run-nightly-build.py @@ -0,0 +1,35 @@ +#!/usr/bin/env python3 +# Copyright 2022 Xiaomi Corp. (authors: Fangjun Kuang) + +import subprocess +from datetime import datetime, timedelta + + +def get_last_commit_date() -> datetime: + date = ( + subprocess.check_output( + [ + "git", + "log", + "-1", + "--format=%ad", + "--date=unix", + ] + ) + .decode("ascii") + .strip() + ) + return datetime.utcfromtimestamp(int(date)) + + +def main(): + last_commit_date_utc = get_last_commit_date() + now_utc = datetime.utcnow() + if last_commit_date_utc + timedelta(days=1) > now_utc: + print("true") + else: + print("false") + + +if __name__ == "__main__": + main()