Mat1 And Mat2 Must Have The Same Dtype (2024)

1. RuntimeError: mat1 and mat2 must have the same dtype

Nov 24, 2022 · I was trying to use TORCH.NN.FUNCTIONAL.LINEAR on my model. However, I got an error message saying that “mat1 and mat2 must have the same ...
I was trying to use TORCH.NN.FUNCTIONAL.LINEAR on my model. However, I got an error message saying that “mat1 and mat2 must have the same dtype”. It is just a linear function, I don’t get why the matrices have to be in the same dtype. Thank you for any reply, it will help me to gain a better understanding.

See details ›

2. Runtimeerror: mat1 and mat2 must have the same dtype

May 8, 2023 · The RuntimeError: mat1 and mat2 must have the same dtype error indicates that there is a problem with the data types of two matrices being ...
The RuntimeError: mat1 and mat2 must have the same dtype error indicates that there is a problem with the data types of two matrices being used in an operation.

See details ›

3. The Three Most Common Errors in PyTorch - Zero to Mastery Learn ...

Notice both tensors have the same shape. Let's try to perform a matrix ... mat1 and mat2 shapes cannot be multiplied (28x28 and 784x10). This time it's a ...
Learn important machine learning concepts hands-on by writing PyTorch code.

See details ›

4. Unable to read from file into neural network - Python discussion

Dec 31, 2022 · I have come up with the following code for reading some data from a file ... RuntimeError: mat1 and mat2 must have the same dtype. tmk (Thomas ...
I have come up with the following code for reading some data from a file into a neural network. Input 1 is a csv with 2 columns (two input variables) and input 2 single column y values for the function. Finally I need to do some prediction (that part of the code is not yet written because this itself is not working but if someone can help it would be good). For that another file 3 will be read with 2 variables input and a fourth file would be written out containing computed values of y. However...

See details ›

5. torch.Tensor - PyTorch 1.9.0 documentation

Returns a new Tensor with data as the tensor data. By default, the returned Tensor has the same torch.dtype and torch.device as this tensor.
Shortcuts

See details ›

6. RuntimeError: mat1 and mat2 must have the same dtype解决办法原创

Jun 29, 2023 · 报错的意思是类型不匹配，然后通过debug发现是float32，一个是float64，虽然都是浮点数，但就是不一样！解决办法：在这两个矩阵后面都加上.float()。
报错的意思是类型不匹配，然后通过debug发现是float32，一个是float64，虽然都是浮点数，但就是不一样！解决办法：在这两个矩阵后面都加上.float()。这样就都是float32了！

See details ›

7. Getting Started with PyTorch: Your First Hands-On Exercise - Medium

RuntimeError: mat1 and mat2 must have the same dtype. This could occur if the ... has the same size. def train(epochs, model, lr): loss_func = nn.MSELoss ...
We’re going to start this series by building a simple regression model in PyTorch and gradually move on to more advanced models. It’s…

See details ›

8. RuntimeError: mat1 and mat2 must have the same dtype错误的处理- 知乎

Sep 5, 2023 · 今天一个网友配置facechain，在训练时报如下错误： RuntimeError: mat1 and mat2 must have the same dtype这个是训练模型与训练中遇到的其它模型不 ...
今天一个网友配置facechain，在训练时报如下错误： RuntimeError: mat1 and mat2 must have the same dtype这个是训练模型与训练中遇到的其它模型不匹配造成的，只需要在train_text_to_image_lora.py找到这段代码…

See details ›

9. https://openi.pcl.ac.cn/Mymylove/8899653/commit/47...

... same type."); - TORCH_CHECK( - crow_indices_type == kInt || crow_indices_type ... dtype=torch.int32) +# _col_indices +tensor([0, 1, 0, 1], device='cuda:0 ...
From 47d9bf586eb9ba12a1e3a5ff869199a8fe0792e7 Mon Sep 17 00:00:00 2001 From: chronos_secgrp_pytorch_oss_ci_oncall Date: Wed, 26 May 2021 04:01:41 -0700 Subject: [PATCH] 2021-05-26 postnightly release (bbdc428db2fbe87662c863aa3593df32772f0cae) --- .circleci/scripts/binary_linux_build.sh | 4 + .github/templates/windows_ci_workflow.yml.in | 8 + .../workflows/pytorch-win-vs2019-cpu-py3.yml | 8 + .jenkins/pytorch/build.sh | 2 +- .jenkins/pytorch/test.sh | 2 +- CMakeLists.txt | 2 + aten/src/ATen/MemoryOverlap.cpp | 6 +- aten/src/ATen/SparseCsrTensorImpl.cpp | 70 +----- aten/src/ATen/SparseCsrTensorImpl.h | 4 +- aten/src/ATen/SparseTensorImpl.h | 5 + aten/src/ATen/TensorIterator.cpp | 15 +- aten/src/ATen/core/aten_interned_strings.h | 3 + aten/src/ATen/core/custom_class.cpp | 46 ++++ aten/src/ATen/core/jit_type.h | 3 + aten/src/ATen/cuda/cub.cuh | 125 ++++++++++ aten/src/ATen/native/Activation.cpp | 31 +++ aten/src/ATen/native/Activation.h | 2 + aten/src/ATen/native/ForeachUtils.h | 56 +++-- aten/src/ATen/native/UnaryOps.cpp | 2 +- aten/src/ATen/native/cpu/Activation.cpp | 38 +++ aten/src/ATen/native/cuda/Activation.cu | 41 ++++ aten/src/ATen/native/cuda/AmpKernels.cu | 4 +- .../ATen/native/cuda/ForeachPointwiseOp.cu | 14 +- aten/src/ATen/native/cuda/ForeachUnaryOp.cu | 24 +- aten/src/ATen/native/cuda/IndexKernel.cu | 49 ++-- aten/src/ATen/native/cuda/ScanKernels.cu | 61 +---- aten/src/ATen/native/cuda/Sort.cu | 10 +- .../ATen/native/cuda/UnarySpecialOpsKernel.cu | 2 +- aten/src/ATen/native/native_functions.yaml | 37 ++- .../ATen/native/sparse/SparseCsrTensor.cpp | 206 +++++++++++----- aten/src/ATen/test/math_kernel_test.cpp | 9 + aten/src/ATen/test/test_thread_pool_guard.cpp | 31 ++- benchmarks/cpp/tensorexpr/bench_approx.cpp | 11 +- benchmarks/cpp/tensorexpr/bench_gemm.cpp | 32 +-- benchmarks/cpp/tensorexpr/bench_reduce.cpp | 13 +- c10/core/Scalar.h | 8 +- c10/core/TensorImpl.h | 33 ++- .../operator_test/activation_ops_test.py | 2 +- caffe2/python/operator_test/adadelta_test.py | 2 +- caffe2/python/operator_test/adagrad_test.py | 8 +- caffe2/python/operator_test/assert_test.py | 2 +- .../batch_sparse_to_dense_op_test.py | 4 +- .../operator_test/bbox_transform_test.py | 4 +- .../python/operator_test/boolean_mask_test.py | 4 +- .../box_with_nms_limit_op_test.py | 6 +- caffe2/python/operator_test/clip_op_test.py | 2 +- .../operator_test/clip_tensor_op_test.py | 2 +- caffe2/python/operator_test/conv_test.py | 4 +- caffe2/python/operator_test/crf_test.py | 2 +- .../python/operator_test/dropout_op_test.py | 2 +- .../elementwise_op_broadcast_test.py | 8 +- .../operator_test/elementwise_ops_test.py | 6 +- caffe2/python/operator_test/erf_op_test.py | 2 +- caffe2/python/operator_test/expand_op_test.py | 2 +- .../python/operator_test/fc_operator_test.py | 4 +- .../python/operator_test/filler_ops_test.py | 4 +- .../operator_test/flexible_top_k_test.py | 2 +- .../fused_nbit_rowwise_conversion_ops_test.py | 2 +- .../python/operator_test/gather_ops_test.py | 2 +- .../operator_test/gather_ranges_op_test.py | 4 +- .../operator_test/instance_norm_test.py | 2 +- .../operator_test/layer_norm_op_test.py | 2 +- .../operator_test/length_split_op_test.py | 2 +- .../locally_connected_op_test.py | 4 +- caffe2/python/operator_test/lpnorm_op_test.py | 2 +- .../margin_ranking_criterion_op_test.py | 2 +- caffe2/python/operator_test/matmul_op_test.py | 2 +- .../python/operator_test/one_hot_ops_test.py | 2 +- caffe2/python/operator_test/pooling_test.py | 2 +- caffe2/python/operator_test/python_op_test.py | 2 +- .../python/operator_test/reduce_ops_test.py | 8 +- caffe2/python/operator_test/selu_op_test.py | 4 +- .../python/operator_test/sequence_ops_test.py | 4 +- .../sinusoid_position_encoding_op_test.py | 2 +- .../python/operator_test/softmax_ops_test.py | 6 +- .../python/operator_test/softplus_op_test.py | 2 +- .../sparse_to_dense_mask_op_test.py | 4 +- .../python/operator_test/string_ops_test.py | 10 +- caffe2/python/operator_test/top_k_test.py | 4 +- .../operator_test/torch_integration_test.py | 2 +- .../python/operator_test/utility_ops_test.py | 4 +- .../python/operator_test/weighted_sum_test.py | 2 +- caffe2/python/operator_test/wngrad_test.py | 8 +- caffe2/utils/threadpool/pthreadpool-cpp.cc | 9 + cmake/Dependencies.cmake | 12 +- docs/cpp/source/notes/inference_mode.rst | 2 - docs/source/autograd.rst | 4 + docs/source/nn.functional.rst | 1 + docs/source/nn.rst | 1 + docs/source/notes/autograd.rst | 201 +++++++++++---- .../source/scripts/build_activation_images.py | 1 + .../check_backward_compatibility.py | 1 + test/cpp/api/functional.cpp | 9 + test/cpp/api/modules.cpp | 10 + test/cpp/jit/test_backend.cpp | 22 +- .../jit/test_cs_debug_info_serialization.cpp | 49 ++-- test/cpp/jit/test_lite_interpreter.cpp | 136 ++++++++--- .../delegated_submodule_with_debug_info.ptl | Bin 9937 -> 10193 bytes .../test_lite_interpreter_runtime.cpp | 6 +- test/cpp/tensorexpr/test_approx.cpp | 4 +- test/cpp/tensorexpr/test_boundsinference.cpp | 11 +- test/cpp/tensorexpr/test_cuda.cpp | 22 +- test/cpp/tensorexpr/test_llvm.cpp | 8 +- test/cpp/tensorexpr/test_loopnest.cpp | 63 ++--- test/cpp/tensorexpr/test_memdependency.cpp | 12 +- test/cpp/tensorexpr/test_reductions.cpp | 31 +-- test/cpp/tensorexpr/tutorial.cpp | 4 +- test/cpp_api_parity/parity-tracker.md | 2 + test/distributed/test_c10d_gloo.py | 61 +++-- test/distributed/test_c10d_nccl.py | 51 ++-- ...rseCSRCPU.test_sparse_csr_print_cpu.expect | 228 +++++++----------- ...eCSRCUDA.test_sparse_csr_print_cuda.expect | 176 ++++++++++++++ test/jit/test_graph_rewrite_passes.py | 59 +++++ test/jit/test_script_profile.py | 109 +++++++++ test/onnx/test_pytorch_onnx_onnxruntime.py | 12 + test/quantization/test_quantize_fx.py | 18 ++ test/quantization/test_quantize_jit.py | 12 +- test/test_cuda.py | 15 +- test/test_foreach.py | 127 +++++++++- test/test_fx.py | 26 ++ test/test_jit.py | 154 ++++++++++++ test/test_module_init.py | 1 + test/test_nn.py | 6 + test/test_ops.py | 5 +- test/test_sort_and_select.py | 8 + test/test_sparse.py | 14 ++ test/test_sparse_csr.py | 188 +++++++++++++-- test/test_torch.py | 41 +++- test/test_unary_ufuncs.py | 29 +++ tools/autograd/derivatives.yaml | 19 ++ tools/autograd/gen_variable_type.py | 8 +- tools/autograd/load_derivatives.py | 13 +- .../templates/python_torch_functions.cpp | 12 +- tools/code_analyzer/default_op_deps.yaml | 39 +++ tools/codegen/gen.py | 2 +- tools/pyi/gen_pyi.py | 10 +- torch/_C/__init__.pyi.in | 2 +- torch/_jit_internal.py | 9 + torch/_tensor.py | 12 +- torch/autograd/grad_mode.py | 17 ++ torch/csrc/api/include/torch/enum.h | 2 + .../include/torch/nn/functional/activation.h | 6 + .../api/include/torch/nn/modules/activation.h | 22 ++ torch/csrc/api/src/nn/modules/activation.cpp | 12 + torch/csrc/autograd/FunctionsManual.cpp | 9 + torch/csrc/autograd/FunctionsManual.h | 1 + torch/csrc/cuda/shared/cudart.cpp | 8 + .../jit/backends/backend_debug_handler.cpp | 2 +- .../csrc/jit/backends/backend_debug_handler.h | 22 +- torch/csrc/jit/ir/scope.cpp | 36 ++- torch/csrc/jit/ir/scope.h | 24 +- torch/csrc/jit/mobile/backport.h | 1 + torch/csrc/jit/mobile/backport_manager.cpp | 185 +++++++++----- torch/csrc/jit/mobile/backport_manager.h | 9 +- torch/csrc/jit/mobile/debug_info.cpp | 46 +++- torch/csrc/jit/mobile/debug_info.h | 2 +- torch/csrc/jit/mobile/model_compatibility.h | 1 + torch/csrc/jit/mobile/runtime_compatibility.h | 1 + torch/csrc/jit/passes/fuse_linear.cpp | 15 +- torch/csrc/jit/passes/subgraph_rewrite.cpp | 63 ++++- torch/csrc/jit/passes/subgraph_rewrite.h | 8 +- torch/csrc/jit/python/init.cpp | 14 +- torch/csrc/jit/python/script_init.cpp | 1 - torch/csrc/jit/runtime/static/ops.cpp | 4 +- .../callstack_debug_info_serialization.cpp | 62 +++-- .../callstack_debug_info_serialization.h | 4 +- .../csrc/jit/serialization/export_module.cpp | 10 +- .../tensorexpr/external_functions_codegen.cpp | 20 ++ torch/csrc/jit/tensorexpr/kernel.cpp | 25 +- torch/csrc/jit/tensorexpr/loopnest.cpp | 57 ++--- torch/csrc/jit/tensorexpr/loopnest.h | 60 ++++- torch/csrc/jit/tensorexpr/tensorexpr_init.cpp | 12 +- torch/csrc/utils/tensor_new.cpp | 76 +++++- torch/csrc/utils/tensor_new.h | 4 +- torch/cuda/memory.py | 18 ++ torch/custom_class.h | 84 +------ torch/custom_class_detail.h | 48 ++++ torch/distributed/distributed_c10d.py | 47 +++- torch/fx/graph.py | 7 + torch/fx/graph_module.py | 1 - torch/jit/__init__.py | 1 + torch/jit/_script.py | 106 ++++++++ torch/jit/annotations.py | 3 + torch/jit/frontend.py | 27 ++- torch/nn/functional.py | 19 ++ torch/nn/functional.pyi.in | 1 + torch/nn/modules/__init__.py | 4 +- torch/nn/modules/activation.py | 35 +++ torch/nn/modules/module.py | 6 + torch/nn/modules/rnn.py | 203 ++++++++-------- torch/nn/parameter.py | 2 +- torch/onnx/symbolic_opset9.py | 2 + torch/overrides.py | 1 + .../quantization/fx/quantization_patterns.py | 4 + torch/quantization/ns/mappings.py | 7 + .../_internal/common_methods_invocations.py | 145 +++++++++-- torch/testing/_internal/common_nn.py | 11 + .../_internal/distributed/distributed_test.py | 53 +++- torch/utils/benchmark/utils/timer.py | 2 +- torch/utils/hipify/cuda_to_hip_mappings.py | 1 + 200 files changed, 3649 insertions(+), 1296 deletions(-) create mode 100644 test/expect/TestSparseCSRCUDA.test_sparse_csr_print_cuda.expect create mode 100644 test/jit/test_graph_rewrite_passes.py create mode 100644 test/jit/test_script_profile.py diff --git a/.circleci/scripts/binary_linux_build.sh b/.circleci/scripts/binary_linux_build.sh index e36d0690624..755a467fe24 100755 --- a/.circleci/scripts/binary_linux_build.sh +++ b/.circleci/scripts/binary_linux_build.sh @@ -22,5 +22,9 @@ else build_script='manywheel/build.sh' fi +if [[ "$CIRCLE_BRANCH" == "master" ]] || [[ "$CIRCLE_BRANCH" == release/* ]]; then + export BUILD_DEBUG_INFO=1 +fi + # Build the package SKIP_ALL_TESTS=1 "/builder/$build_script" diff --git a/.github/templates/windows_ci_workflow.yml.in b/.github/templates/windows_ci_workflow.yml.in index 5a1c602b40f..9544b83138e 100644 --- a/.github/templates/windows_ci_workflow.yml.in +++ b/.github/templates/windows_ci_workflow.yml.in @@ -31,6 +31,10 @@ jobs: steps: - name: Checkout PyTorch uses: actions/checkout@v2 + - name: Install 7zip if not already installed + shell: powershell + run: | + choco install 7zip.install -y - name: Install Visual Studio 2019 toolchain shell: powershell run: | @@ -73,6 +77,10 @@ jobs: steps: - name: Checkout PyTorch uses: actions/checkout@v2 + - name: Install 7zip if not already installed + shell: powershell + run: | + choco install 7zip.install -y - name: Install Visual Studio 2019 toolchain shell: powershell run: | diff --git a/.github/workflows/pytorch-win-vs2019-cpu-py3.yml b/.github/workflows/pytorch-win-vs2019-cpu-py3.yml index aba6ecdd2cc..d3166967ed8 100644 --- a/.github/workflows/pytorch-win-vs2019-cpu-py3.yml +++ b/.github/workflows/pytorch-win-vs2019-cpu-py3.yml @@ -30,6 +30,10 @@ jobs: steps: - name: Checkout PyTorch uses: actions/checkout@v2 + - name: Install 7zip if not already installed + shell: powershell + run: | + choco install 7zip.install -y - name: Install Visual Studio 2019 toolchain shell: powershell run: | @@ -72,6 +76,10 @@ jobs: steps: - name: Checkout PyTorch uses: actions/checkout@v2 + - name: Install 7zip if not already installed + shell: powershell + run: | + choco install 7zip.install -y - name: Install Visual Studio 2019 toolchain shell: powershell run: | diff --git a/.jenkins/pytorch/build.sh b/.jenkins/pytorch/build.sh index bc309b8a54d..c2be6c96b3e 100755 --- a/.jenkins/pytorch/build.sh +++ b/.jenkins/pytorch/build.sh @@ -24,7 +24,7 @@ if [[ "$BUILD_ENVIRONMENT" == *-mobile-code-analysis* ]]; then exec "$(dirname "${BASH_SOURCE[0]}")/build-mobile-code-analysis.sh" "$@" fi -if [[ "$BUILD_ENVIRONMENT" == pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7* ]]; then +if [[ "$BUILD_ENVIRONMENT" == pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7* ]]; then # Enabling DEPLOY build (embedded torch python interpreter, experimental) # only on one config for now, can expand later export USE_DEPLOY=ON diff --git a/.jenkins/pytorch/test.sh b/.jenkins/pytorch/test.sh index 48840ad6c1b..3bce691f8cf 100755 --- a/.jenkins/pytorch/test.sh +++ b/.jenkins/pytorch/test.sh @@ -452,7 +452,7 @@ elif [[ "${BUILD_ENVIRONMENT}" == *libtorch* ]]; then # TODO: run some C++ tests echo "no-op at the moment" elif [[ "${BUILD_ENVIRONMENT}" == *-test1 || "${JOB_BASE_NAME}" == *-test1 ]]; then - if [[ "${BUILD_ENVIRONMENT}" == pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-test1 ]]; then + if [[ "${BUILD_ENVIRONMENT}" == pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7-test1 ]]; then test_torch_deploy fi test_without_numpy diff --git a/CMakeLists.txt b/CMakeLists.txt index 5f308a75f07..4818b5012b5 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -351,6 +351,7 @@ option(USE_SYSTEM_CPUINFO "Use system-provided cpuinfo." OFF) option(USE_SYSTEM_SLEEF "Use system-provided sleef." OFF) option(USE_SYSTEM_GLOO "Use system-provided gloo." OFF) option(USE_SYSTEM_FP16 "Use system-provided fp16." OFF) +option(USE_SYSTEM_PYBIND11 "Use system-provided PyBind11." OFF) option(USE_SYSTEM_PTHREADPOOL "Use system-provided pthreadpool." OFF) option(USE_SYSTEM_PSIMD "Use system-provided psimd." OFF) option(USE_SYSTEM_FXDIV "Use system-provided fxdiv." OFF) @@ -371,6 +372,7 @@ if(USE_SYSTEM_LIBS) set(USE_SYSTEM_BENCHMARK ON) set(USE_SYSTEM_ONNX ON) set(USE_SYSTEM_XNNPACK ON) + set(USE_SYSTEM_PYBIND11 ON) endif() # Used when building Caffe2 through setup.py diff --git a/aten/src/ATen/MemoryOverlap.cpp b/aten/src/ATen/MemoryOverlap.cpp index 76a2c38244b..4b90f59f5ad 100644 --- a/aten/src/ATen/MemoryOverlap.cpp +++ b/aten/src/ATen/MemoryOverlap.cpp @@ -8,9 +8,9 @@ MemOverlap has_internal_overlap(const Tensor& tensor) { } MemOverlap has_internal_overlap(TensorImpl* t) { - AT_ASSERT(t->layout() == kStrided); + TORCH_INTERNAL_ASSERT_DEBUG_ONLY(t->layout() == kStrided); - if (t->is_contiguous()) { + if (t->is_non_overlapping_and_dense()) { return MemOverlap::NO; } @@ -45,7 +45,7 @@ MemOverlapStatus get_overlap_status(TensorImpl* a, TensorImpl* b) { if (a->numel() == 0 || b->numel() == 0) { return MemOverlapStatus::NO; } - if (!a->is_contiguous() || !b->is_contiguous()) { + if (!a->is_non_overlapping_and_dense() || !b->is_non_overlapping_and_dense()) { return MemOverlapStatus::TOO_HARD; } if (!a->has_storage() || !b->has_storage()) { diff --git a/aten/src/ATen/SparseCsrTensorImpl.cpp b/aten/src/ATen/SparseCsrTensorImpl.cpp index 9466f610aaa..bfd81b79d01 100644 --- a/aten/src/ATen/SparseCsrTensorImpl.cpp +++ b/aten/src/ATen/SparseCsrTensorImpl.cpp @@ -4,6 +4,7 @@ #include #include #include +#include namespace at { namespace { @@ -56,21 +57,6 @@ SparseCsrTensorImpl::SparseCsrTensorImpl( col_indices_(std::move(col_indices)), values_(std::move(values)) {} -void SparseCsrTensorImpl::resize_and_clear_( - const int64_t nnz_size, - IntArrayRef size) { - // call crow_indices().options() here since the struct contructor calls the - // tensor constructor with args for device specific init. - auto empty_crow_indices = at::empty(size[0] + 1, crow_indices().options()); - auto empty_col_indices = at::empty(nnz_size, col_indices().options()); - auto empty_values = at::empty(nnz_size, values().options()); - - crow_indices_ = empty_crow_indices; - col_indices_ = empty_col_indices; - values_ = empty_values; - sizes_and_strides_.set_sizes(size); -} - void SparseCsrTensorImpl::resize_as_sparse_csr_tensor_(const Tensor& src) { crow_indices_ = at::empty_like( src.crow_indices(), @@ -85,22 +71,16 @@ void SparseCsrTensorImpl::resize_as_sparse_csr_tensor_(const Tensor& src) { src.values().options(), src.values().suggest_memory_format()); sizes_and_strides_.set_sizes(src.sizes()); + refresh_numel(); } void SparseCsrTensorImpl::set_member_tensors( const Tensor& crow_indices, const Tensor& col_indices, - const Tensor& values) { - auto crow_indices_type = crow_indices.scalar_type(); - auto col_indices_type = col_indices.scalar_type(); + const Tensor& values, + IntArrayRef size) { - TORCH_CHECK( - crow_indices_type == col_indices_type, - "both crow_indices and col_indices should have the same type."); - TORCH_CHECK( - crow_indices_type == kInt || crow_indices_type == kLong, - "crow_indices and col_indices must be an int32 or int64 type, but got: ", - crow_indices_type); + // CSR Type Invariants TORCH_CHECK( values.scalar_type() == typeMetaToScalarType(dtype()), "dtype of values (", @@ -109,45 +89,11 @@ void SparseCsrTensorImpl::set_member_tensors( typeMetaToScalarType(dtype()), ")"); - TORCH_CHECK( - col_indices.layout() == kStrided, - "expected col_indices to be a strided tensor, but got indices of layout ", - col_indices.layout()); - TORCH_CHECK( - crow_indices.layout() == kStrided, - "expected crow_indices to be a strided tensor, but got crow_indices of layout ", - crow_indices.layout()); - TORCH_CHECK( - values.layout() == kStrided && values.is_contiguous(), - "expected values to be a strided and contiguous tensor, but got values of layout ", - values.layout()); - - TORCH_CHECK( - values.device().type() == device().type(), - "device type of values (", - values.device().type(), - ") must match device type of device().type()", - device().type(), - ")"); - TORCH_CHECK( - values.is_cuda() || col_indices.get_device() == crow_indices.get_device(), - "crow_indices and col_indices devices (", - crow_indices.get_device(), - ", ", - col_indices.get_device(), - ") must match with the (non-cuda) device of values (", - values.get_device(), - ")"); - - TORCH_CHECK( - col_indices.size(0) == values.size(0), - "col_indices and values must have equal sizes, but got col_indices.size(0): ", - col_indices.size(0), - ", values.size(0): ", - values.size(0)); - crow_indices_ = crow_indices; col_indices_ = col_indices; values_ = values; + + sizes_and_strides_.set_sizes(size); + refresh_numel(); } } // namespace at diff --git a/aten/src/ATen/SparseCsrTensorImpl.h b/aten/src/ATen/SparseCsrTensorImpl.h index 8d32b42fed5..f7760330e7c 100644 --- a/aten/src/ATen/SparseCsrTensorImpl.h +++ b/aten/src/ATen/SparseCsrTensorImpl.h @@ -32,12 +32,12 @@ struct TORCH_API SparseCsrTensorImpl : public TensorImpl { public: explicit SparseCsrTensorImpl(at::DispatchKeySet, const caffe2::TypeMeta); - void resize_and_clear_(const int64_t nnz_size, IntArrayRef size); void resize_as_sparse_csr_tensor_(const Tensor& src); void set_member_tensors( const Tensor& crow_indices, const Tensor& col_indices, - const Tensor& values); + const Tensor& values, + IntArrayRef size); const Tensor& crow_indices() const { return crow_indices_; } const Tensor& col_indices() const { return col_indices_; } diff --git a/aten/src/ATen/SparseTensorImpl.h b/aten/src/ATen/SparseTensorImpl.h index a416e5e5305..e2fc89a9db8 100644 --- a/aten/src/ATen/SparseTensorImpl.h +++ b/aten/src/ATen/SparseTensorImpl.h @@ -29,6 +29,11 @@ struct TORCH_API SparseTensorImpl : public TensorImpl { // because many algorithms proceed by merging two sorted lists (of indices). bool coalesced_ = false; + // compute_numel with integer multiplication overflow check, see gh-57542 + void refresh_numel() { + TensorImpl::safe_refresh_numel(); + } + public: // Public for now... explicit SparseTensorImpl(at::DispatchKeySet, const caffe2::TypeMeta); diff --git a/aten/src/ATen/TensorIterator.cpp b/aten/src/ATen/TensorIterator.cpp index 01902566296..e52a62ba723 100644 --- a/aten/src/ATen/TensorIterator.cpp +++ b/aten/src/ATen/TensorIterator.cpp @@ -655,7 +655,7 @@ StrideVector TensorIteratorBase::get_strides() const { StrideVector strides; for (int dim = 0; dim < ndim(); dim++) { for (int arg = 0; arg < ntensors(); arg++) { - strides.push_back(operands_[arg].stride_bytes[dim]); + strides.emplace_back(operands_[arg].stride_bytes[dim]); } } return strides; @@ -670,10 +670,15 @@ void TensorIteratorBase::serial_for_each(loop2d_t loop, Range range) const { strides.push_back(0); } + auto base_ptrs = get_base_ptrs(); if (ndim() <= 1) { - auto ptrs = get_data_ptrs(base_ptrs, { range.begin }); - loop(ptrs.data(), strides.data(), range.size(), 1); + if (range.begin > 0) { + auto ptrs = get_data_ptrs(base_ptrs, {range.begin}); + loop(ptrs.data(), strides.data(), range.size(), 1); + } else { + loop(base_ptrs.data(), strides.data(), range.size(), 1); + } } else { auto counter = DimCounter(shape_, range); while (!counter.is_done()) { @@ -1012,7 +1017,9 @@ void TensorIteratorBase::compute_mem_overlaps(const TensorIteratorConfig& config assert_no_internal_overlap(*output); for (int j = num_outputs_; j < ntensors(); j++) { const auto& input = operands_[j].tensor; - assert_no_partial_overlap(*output, *input); + if (input->unsafeGetTensorImpl()!=output->unsafeGetTensorImpl()) { + assert_no_partial_overlap(*output, *input); + } } } } diff --git a/aten/src/ATen/core/aten_interned_strings.h b/aten/src/ATen/core/aten_interned_strings.h index af1452ecddd..48e3c48370f 100644 --- a/aten/src/ATen/core/aten_interned_strings.h +++ b/aten/src/ATen/core/aten_interned_strings.h @@ -119,6 +119,7 @@ _(aten, _sparse_addmm) \ _(aten, _sparse_coo_tensor_with_dims) \ _(aten, _sparse_coo_tensor_with_dims_and_tensors) \ _(aten, _sparse_coo_tensor_unsafe) \ +_(aten, _sparse_csr_tensor_unsafe) \ _(aten, _sparse_dense_add) \ _(aten, _sparse_div_scalar) \ _(aten, _sparse_div_zerodim) \ @@ -495,6 +496,7 @@ _(aten, miopen_depthwise_convolution_backward_input) \ _(aten, miopen_depthwise_convolution_backward_weight) \ _(aten, miopen_rnn) \ _(aten, miopen_rnn_backward) \ +_(aten, mish) \ _(aten, mkldnn_convolution) \ _(aten, mkldnn_convolution_backward) \ _(aten, mkldnn_convolution_backward_input) \ @@ -654,6 +656,7 @@ _(aten, softshrink_forward) \ _(aten, solve) \ _(aten, sort) \ _(aten, sparse_coo_tensor) \ +_(aten, sparse_csr_tensor) \ _(aten, sparse_mask) \ _(aten, sparse_resize) \ _(aten, sparse_resize_and_clear) \ diff --git a/aten/src/ATen/core/custom_class.cpp b/aten/src/ATen/core/custom_class.cpp index c396e810eab..8f1a6645257 100644 --- a/aten/src/ATen/core/custom_class.cpp +++ b/aten/src/ATen/core/custom_class.cpp @@ -50,5 +50,51 @@ std::vector customClassSchemasForBCCheck() { }); } +namespace detail { +class_base::class_base( + const std::string& namespaceName, + const std::string& className, + std::string doc_string, + const std::type_info& intrusivePtrClassTypeid, + const std::type_info& taggedCapsuleClassTypeid) + : qualClassName("__torch__.torch.classes." + namespaceName + '.' + className), + classTypePtr(at::ClassType::create( + c10::QualifiedName(qualClassName), + std::weak_ptr(), + /*is_module=*/false, + std::move(doc_string))) +{ + detail::checkValidIdent(namespaceName, "Namespace name"); + detail::checkValidIdent(className, "Class name"); + classTypePtr->addAttribute("capsule", at::CapsuleType::get()); + c10::getCustomClassTypeMap().insert( + {std::type_index(intrusivePtrClassTypeid), classTypePtr}); + c10::getCustomClassTypeMap().insert( + {std::type_index(taggedCapsuleClassTypeid), classTypePtr}); + registerCustomClass(classTypePtr); +} + +c10::FunctionSchema class_base::withNewArguments( + const c10::FunctionSchema& schema, + std::initializer_list default_args) { + const auto& old_args = schema.arguments(); + std::vector new_args; + new_args.reserve(old_args.size()); + + new_args.emplace_back(old_args[0]); + // Skip self. + size_t argIdx = 1; + for (const auto& default_arg : default_args) { + auto& old_arg = old_args[argIdx++]; + new_args.emplace_back( + default_arg.name_, + old_arg.type(), + old_arg.N(), + default_arg.value_); + } + return schema.cloneWithArguments(std::move(new_args)); +} + +} // namespace detail } // namespace torch diff --git a/aten/src/ATen/core/jit_type.h b/aten/src/ATen/core/jit_type.h index 980ef90666b..d20231d1953 100644 --- a/aten/src/ATen/core/jit_type.h +++ b/aten/src/ATen/core/jit_type.h @@ -959,6 +959,9 @@ struct TORCH_API TupleType : public NamedType { c10::nullopt, nullptr)); // NOLINT(modernize-make-shared) } + static TupleTypePtr create() { + return create({}); + } at::ArrayRef elements() const { return elements_; diff --git a/aten/src/ATen/cuda/cub.cuh b/aten/src/ATen/cuda/cub.cuh index 84e673dd41f..9725fa5fd93 100644 --- a/aten/src/ATen/cuda/cub.cuh +++ b/aten/src/ATen/cuda/cub.cuh @@ -1,6 +1,8 @@ #pragma once #include +#include +#include // include cub in a safe manner, see: // https://github.com/pytorch/pytorch/pull/55292 @@ -151,4 +153,127 @@ static inline void segmented_sort_pairs( } } +namespace impl { + +template +C10_LAUNCH_BOUNDS_1(1) +__global__ void transform_vals(InputIteratorT1 a, InputIteratorT2 b, OutputIteratorT out, ScanOpT scan_op){ + *out = scan_op(*a, *b); +} + +template +struct chained_iterator { + using iterator_category = std::random_access_iterator_tag; + using difference_type = std::ptrdiff_t; + using value_type = ValueT; + using pointer = ValueT*; + using reference = ValueT&; + + InputIteratorT iter; + ValueT *first; + difference_type offset = 0; + + __device__ ValueT operator[](difference_type i) { + i += offset; + if (i == 0) { + return *first; + } else { + return ValueT(iter[i - 1]); + } + } + __device__ chained_iterator operator+(difference_type i) { + return chained_iterator{iter, first, i}; + } + __device__ ValueT operator*() { + return (*this)[0]; + } +}; + +} + +template +inline void inclusive_scan(InputIteratorT input, OutputIteratorT output, ScanOpT scan_op, int64_t num_items) { + // non synchronizing cub call + // even though cub is supposed to support tensors with int_max elements, in reality it doesn't, + // so split at int_max/2 + constexpr int max_cub_size = std::numeric_limits::max() / 2 + 1; // 2**30 + int size_cub = std::min(num_items, max_cub_size); + CUB_WRAPPER(NO_ROCM(detail)::cub::DeviceScan::InclusiveScan, + input, + output, + scan_op, + size_cub, + at::cuda::getCurrentCUDAStream()); + C10_CUDA_KERNEL_LAUNCH_CHECK(); + using input_t = std::remove_reference_t; + for (int64_t i = max_cub_size; i < num_items; i += max_cub_size) { + auto allocator = c10::cuda::CUDACachingAllocator::get(); + c10::DataPtr first_elem = allocator->allocate(sizeof(input_t)); + auto first_elem_ptr = reinterpret_cast(first_elem.get()); + + size_cub = std::min(num_items - i, max_cub_size); + impl::transform_vals<<<1, 1, 0, at::cuda::getCurrentCUDAStream()>>>( + output + i - 1, + input + i, + first_elem_ptr, + scan_op); + C10_CUDA_KERNEL_LAUNCH_CHECK(); + using ArgIndexInputIterator = NO_ROCM(detail)::cub::ArgIndexInputIterator; + using tuple = typename ArgIndexInputIterator::value_type; + auto input_iter_transform = [=] __device__ (const tuple &x)->input_t { + if (x.key == 0) { + return *first_elem_ptr; + } else { + return x.value; + } + }; + auto input_ = NO_ROCM(detail)::cub::TransformInputIterator( + ArgIndexInputIterator(input + i), input_iter_transform); + CUB_WRAPPER(NO_ROCM(detail)::cub::DeviceScan::InclusiveScan, + input_, + output + i, + scan_op, + size_cub, + at::cuda::getCurrentCUDAStream()); + } +} + +template +inline void exclusive_scan(InputIteratorT input, OutputIteratorT output, ScanOpT scan_op, InitValueT init_value, int64_t num_items) { + // non synchronizing cub call + // even though cub is supposed to support tensors with int_max elements, in reality it doesn't, + // so split at int_max/2 + constexpr int max_cub_size = std::numeric_limits::max() / 2 + 1; // 2**30 + int size_cub = std::min(num_items, max_cub_size); + CUB_WRAPPER(NO_ROCM(detail)::cub::DeviceScan::ExclusiveScan, + input, + output, + scan_op, + init_value, + size_cub, + at::cuda::getCurrentCUDAStream()); + C10_CUDA_KERNEL_LAUNCH_CHECK(); + for (int64_t i = max_cub_size; i < num_items; i += max_cub_size) { + auto allocator = c10::cuda::CUDACachingAllocator::get(); + c10::DataPtr first_elem = allocator->allocate(sizeof(InitValueT)); + auto first_elem_ptr = reinterpret_cast(first_elem.get()); + + size_cub = std::min(num_items - i, max_cub_size); + impl::transform_vals<<<1, 1, 0, at::cuda::getCurrentCUDAStream()>>>( + output + i - 1, + input + i - 1, + first_elem_ptr, + scan_op); + C10_CUDA_KERNEL_LAUNCH_CHECK(); + auto input_ = impl::chained_iterator{ + input + i, first_elem_ptr}; + CUB_WRAPPER(NO_ROCM(detail)::cub::DeviceScan::InclusiveScan, + input_, + output + i, + scan_op, + size_cub, + at::cuda::getCurrentCUDAStream()); + } +} + }}} diff --git a/aten/src/ATen/native/Activation.cpp b/aten/src/ATen/native/Activation.cpp index 6b45368958a..eeb563d34ff 100644 --- a/aten/src/ATen/native/Activation.cpp +++ b/aten/src/ATen/native/Activation.cpp @@ -71,6 +71,10 @@ TORCH_META_FUNC(silu) (const Tensor& self) { build_unary_op(maybe_get_output(), self); } +TORCH_META_FUNC(mish) (const Tensor& self) { + build_unary_op(maybe_get_output(), self); +} + TORCH_META_FUNC(softplus) ( const Tensor& self, const Scalar& beta, const Scalar& threshold ) { @@ -180,6 +184,10 @@ DEFINE_DISPATCH(leaky_relu_backward_stub); DEFINE_DISPATCH(silu_stub); // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) DEFINE_DISPATCH(silu_backward_stub); +// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) +DEFINE_DISPATCH(mish_stub); +// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) +DEFINE_DISPATCH(mish_backward_stub); TORCH_IMPL_FUNC(elu_out) ( const Tensor& self, const Scalar& alpha, const Scalar& scale, const Scalar& input_scale, const Tensor& result @@ -205,6 +213,12 @@ TORCH_IMPL_FUNC(silu_out) ( silu_stub(device_type(), *this); } +TORCH_IMPL_FUNC(mish_out) ( + const Tensor& self, const Tensor& result +) { + mish_stub(device_type(), *this); +} + TORCH_IMPL_FUNC(softplus_out) ( const Tensor& self, const Scalar& beta, const Scalar& threshold, const Tensor& result ) { @@ -372,6 +386,23 @@ Tensor math_silu_backward( return grad_output * (input_sigmoid * (1 + input * (1 - input_sigmoid))); } +Tensor mish_backward( + const Tensor& grad_output, + const Tensor& input) { + Tensor grad_input = at::empty({0}, input.options()); + auto iter = TensorIterator::binary_op(grad_input, grad_output, input); + mish_backward_stub(iter.device_type(), iter); + return grad_input; +} + +Tensor math_mish_backward( + const Tensor& grad_output, + const Tensor& input) { + auto input_tanh_softplus = at::tanh(at::softplus(input)); + auto input_sigmoid = at::sigmoid(input); + return grad_output * (input_tanh_softplus + (input * input_sigmoid * (1 - input_tanh_softplus * input_tanh_softplus))); +} + template inline void _rrelu_with_noise_train( Tensor& output, diff --git a/aten/src/ATen/native/Activation.h b/aten/src/ATen/native/Activation.h index b52131051a5..96a94314ada 100644 --- a/aten/src/ATen/native/Activation.h +++ b/aten/src/ATen/native/Activation.h @@ -54,6 +54,8 @@ DECLARE_DISPATCH(activation_fn, glu_stub); DECLARE_DISPATCH(activation_backward_fn, glu_backward_stub); DECLARE_DISPATCH(structured_activation_fn, silu_stub); DECLARE_DISPATCH(activation_backward_fn, silu_backward_stub); +DECLARE_DISPATCH(structured_activation_fn, mish_stub); +DECLARE_DISPATCH(activation_backward_fn, mish_backward_stub); } // namespace native diff --git a/aten/src/ATen/native/ForeachUtils.h b/aten/src/ATen/native/ForeachUtils.h index 9c4215f9a77..ca50fc6f90f 100644 --- a/aten/src/ATen/native/ForeachUtils.h +++ b/aten/src/ATen/native/ForeachUtils.h @@ -6,17 +6,23 @@ namespace at { namespace native { namespace { +// Check if tensor list has either a boolean tensor or a integer tensor +bool has_integral_tensor(TensorList tensors, const bool includeBool) { + return std::any_of(tensors.begin(), tensors.end(), + [&includeBool](const auto & t) { return at::isIntegralType(t.scalar_type(), includeBool); }); +} +// check if tensor list has bool tensors +bool has_bool_tensor(TensorList tensors) { + return std::any_of(tensors.begin(), tensors.end(), + [](const auto & t) -> bool { return t.scalar_type() == ScalarType::Bool; }); +} + // Check foreach API restrictions // - Tensor lists must be non-empty. -// - All tensors in all lists must have the same dtype. // - All TensorLists and ScalarLists must have the same number of elements. // - Corresponding tensors must have the same size. void check_foreach_api_restrictions(TensorList tensors) { TORCH_CHECK(tensors.size() > 0, "Tensor list must have at least one tensor."); - auto expected_dtype = tensors[0].dtype(); - for (const auto& t : tensors) { - TORCH_CHECK(t.dtype() == expected_dtype, "All tensors in the tensor list must have the same dtype."); - } } void check_foreach_api_restrictions(TensorList tensors, ArrayRef scalars) { @@ -29,11 +35,7 @@ void check_foreach_api_restrictions(TensorList tensors1, TensorList tensors2) { TORCH_CHECK(tensors2.size() > 0, "Tensor list must have at least one tensor."); TORCH_CHECK(tensors1.size() == tensors2.size(), "Tensor lists must have the same number of tensors, got ", tensors1.size(), " and ", tensors2.size()); - auto expected_dtype = tensors1[0].dtype(); - for (const auto i : c10::irange(tensors1.size())) { - TORCH_CHECK(tensors1[i].dtype() == expected_dtype, "All tensors in the tensor list must have the same dtype."); - TORCH_CHECK(tensors2[i].dtype() == expected_dtype, "All tensors in the tensor list must have the same dtype."); TORCH_CHECK(tensors1[i].sizes() == tensors2[i].sizes(), "Corresponding tensors in lists must have the same size, got ", tensors1[i].sizes(), " and ", tensors2[i].sizes()); } } @@ -45,11 +47,7 @@ void check_foreach_api_restrictions(TensorList tensors1, TensorList tensors2, Te TORCH_CHECK(tensors1.size() == tensors2.size(), "Tensor lists must have the same number of tensors, got ", tensors1.size(), " and ", tensors2.size()); TORCH_CHECK(tensors1.size() == tensors3.size(), "Tensor lists must have the same number of tensors, got ", tensors1.size(), " and ", tensors3.size()); - auto expected_dtype = tensors1[0].dtype(); - for (const auto i : c10::irange(tensors1.size())) { - TORCH_CHECK(tensors1[i].dtype() == expected_dtype, "All tensors in the tensor list must have the same dtype."); - TORCH_CHECK(tensors2[i].dtype() == expected_dtype, "All tensors in the tensor list must have the same dtype."); TORCH_CHECK(tensors1[i].sizes() == tensors2[i].sizes(), "Corresponding tensors in lists must have the same size, got ", tensors1[i].sizes(), " and ", tensors2[i].sizes()); TORCH_CHECK(tensors1[i].sizes() == tensors3[i].sizes(), "Corresponding tensors in lists must have the same size, got ", tensors1[i].sizes(), " and ", tensors3[i].sizes()); } @@ -61,20 +59,24 @@ void check_foreach_api_restrictions(TensorList tensors1, TensorList tensors2, Te } // To go via 'fast' path, several conditions must be satisfied +// - All tensors in all lists must have the same dtype. // - All tensors must be on the same device // - All tensors must have strided layout // - All tensors must be non-overlapping and dense // - Resulting tensor must have the same dtype as the input one +// TODO(mkozuki): Consider whether we really need this function or not. +// Note that, there is a possibility that foreach fastpath supports type promotion in the future, +// which might complicate the functionality this function should provides. +// However, as of now, the check of division op with integer inputs is duplicated. +// `check_fast_path_restrictions` does the same thing in it before calling this function. bool will_promote_tensor(const Tensor& tensor, const Scalar& scalar, bool does_op_promote_integer_inputs_to_float = false) { // In case of division, integer inputs will result in float - if (does_op_promote_integer_inputs_to_float) { - if (at::isIntegralType(tensor.scalar_type(), /*includeBool*/ true)) { - return true; - } + if (does_op_promote_integer_inputs_to_float && + at::isIntegralType(tensor.scalar_type(), /* includeBool */ true)) { + return true; } - auto result_dtype = at::result_type(tensor, scalar); - return result_dtype != tensor.scalar_type(); + return tensor.scalar_type() != at::native::result_type(scalar, tensor); } // Please, make sure to call check_foreach_api_restrictions before calling this method. @@ -83,10 +85,12 @@ bool check_fast_path_restrictions( ArrayRef tensorLists, ArrayRef scalarList = {}, bool does_op_promote_integer_inputs_to_float = false) { - auto expected_device = tensorLists[0][0].device(); + const auto expected_dtype = tensorLists[0][0].dtype(); + const auto expected_device = tensorLists[0][0].device(); auto is_tensor_okay = [&](const Tensor& tensor) { - return tensor.device() == expected_device && + return tensor.dtype() == expected_dtype && + tensor.device() == expected_device && tensor.layout() == at::kStrided && tensor.is_non_overlapping_and_dense(); }; @@ -108,9 +112,11 @@ bool check_fast_path_restrictions( } } - // For all j, tensorList[j][0] have the same shape and dtype. (this was a precondition - // checked by `check_foreach_api_restrictions`). This means we only need to check if - // {tensorList[0][0], tensorList[0][1], tensorList[0][2], ...} do type promotion with scalarLIst. + // This function has already checked that `tensorList[j][i]` for all j, i has the same dtype + // using `is_tensor_okay` function above. + // checked by `check_foreach_api_restrictions`). + // This means we only need to check if {tensorList[0][0], tensorList[0][1], tensorList[0][2], ...} + // do type promotion with scalarLIst. for (int i=0; i < tensorLists[0].size(); i++) { if (does_op_promote_integer_inputs_to_float) { if (at::isIntegralType(tensorLists[0][i].scalar_type(), /*includeBool*/ true)) { @@ -123,6 +129,8 @@ bool check_fast_path_restrictions( return false; } } else if (scalarList.size() > 1) { + // FIXME(mkozuki): Consider specializing `TensorListScalarListMetadata` for complex dtypes + // to access the following comment. // Complex scalar list is not supported due to the limit for kernel launch argument (4KB) if (scalarList[i].isComplex()) { return false; diff --git a/aten/src/ATen/native/UnaryOps.cpp b/aten/src/ATen/native/UnaryOps.cpp index 04ba69a604f..6a13af13b37 100644 --- a/aten/src/ATen/native/UnaryOps.cpp +++ b/aten/src/ATen/native/UnaryOps.cpp @@ -49,6 +49,7 @@ CREATE_UNARY_FLOAT_META_FUNC(erfinv) CREATE_UNARY_FLOAT_META_FUNC(exp) CREATE_UNARY_FLOAT_META_FUNC(exp2) CREATE_UNARY_FLOAT_META_FUNC(expm1) +CREATE_UNARY_FLOAT_META_FUNC(i0) CREATE_UNARY_FLOAT_META_FUNC(lgamma) CREATE_UNARY_FLOAT_META_FUNC(log) CREATE_UNARY_FLOAT_META_FUNC(log10) @@ -78,7 +79,6 @@ TORCH_META_FUNC(polygamma)(int64_t n, const Tensor& self) { } CREATE_UNARY_META_FUNC(bitwise_not) CREATE_UNARY_META_FUNC(frac) -CREATE_UNARY_META_FUNC(i0) CREATE_UNARY_META_FUNC(round) CREATE_UNARY_META_FUNC(sgn) diff --git a/aten/src/ATen/native/cpu/Activation.cpp b/aten/src/ATen/native/cpu/Activation.cpp index 73c9046c29d..e26768d5081 100644 --- a/aten/src/ATen/native/cpu/Activation.cpp +++ b/aten/src/ATen/native/cpu/Activation.cpp @@ -632,6 +632,40 @@ void silu_backward_kernel(TensorIterator& iter) { }); } +void mish_kernel(TensorIteratorBase& iter) { + AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "mish_cpu", [&]() { + using Vec = Vectorized; + cpu_kernel_vec( + iter, + [](scalar_t x) -> scalar_t{ + return static_cast(x * std::tanh(std::log1p(std::exp(x)))); + }, + [](Vec x_vec) -> Vec { + return x_vec * x_vec.exp().log1p().tanh(); + }); + }); +} + +void mish_backward_kernel(TensorIterator& iter) { + AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "mish_backward_cpu", [&]() { + using Vec = Vectorized; + const Vec kOneVec(scalar_t(1)); + cpu_kernel_vec( + iter, + [](scalar_t dy, scalar_t x) -> scalar_t { + const scalar_t sigmoid = + scalar_t(1) / (scalar_t(1) + std::exp(-x)); + const scalar_t tanh_softplus = std::tanh(std::log1p(std::exp(x))); + return dy * (tanh_softplus + x * sigmoid * (scalar_t(1) - tanh_softplus * tanh_softplus)); + }, + [kOneVec](Vec dy_vec, Vec x_vec) -> Vec { + const Vec sigmoid = kOneVec / (kOneVec + x_vec.neg().exp()); + const Vec tanh_softplus = x_vec.exp().log1p().tanh(); + return dy_vec * (tanh_softplus + x_vec * sigmoid * (kOneVec - tanh_softplus * tanh_softplus)); + }); + }); +} + } // namespace // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) @@ -680,6 +714,10 @@ REGISTER_DISPATCH(glu_backward_stub, &glu_backward_kernel); REGISTER_DISPATCH(silu_stub, &silu_kernel); // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) REGISTER_DISPATCH(silu_backward_stub, &silu_backward_kernel); +// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) +REGISTER_DISPATCH(mish_stub, &mish_kernel); +// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) +REGISTER_DISPATCH(mish_backward_stub, &mish_backward_kernel); } // namespace native } // namespace at diff --git a/aten/src/ATen/native/cuda/Activation.cu b/aten/src/ATen/native/cuda/Activation.cu index b4c512b0fa9..4ecf7fe00d7 100644 --- a/aten/src/ATen/native/cuda/Activation.cu +++ b/aten/src/ATen/native/cuda/Activation.cu @@ -496,6 +496,45 @@ void silu_backward_kernel(TensorIterator& iter) { }); } +void mish_kernel(TensorIteratorBase& iter) { + AT_DISPATCH_FLOATING_TYPES_AND2( + at::ScalarType::Half, + at::ScalarType::BFloat16, + iter.dtype(), + "mish_cuda", + [&]() { + gpu_kernel( + iter, + [] GPU_LAMBDA(scalar_t x) -> scalar_t { + using T_ACC = acc_type; + const T_ACC x_acc = static_cast(x); + return x_acc * c10::cuda::compat::tanh(c10::cuda::compat::log1p(c10::cuda::compat::exp(x_acc))); + }); + }); +} + +void mish_backward_kernel(TensorIterator& iter) { + AT_DISPATCH_FLOATING_TYPES_AND2( + at::ScalarType::Half, + at::ScalarType::BFloat16, + iter.dtype(), + "mish_backward_cuda", + [&]() { + gpu_kernel( + iter, + [] GPU_LAMBDA(scalar_t dy, scalar_t x) -> scalar_t { + using T_ACC = acc_type; + const T_ACC dy_acc = static_cast(dy); + const T_ACC x_acc = static_cast(x); + const T_ACC s_acc = + T_ACC(1) / (T_ACC(1) + c10::cuda::compat::exp(-x_acc)); + const T_ACC t_acc = + c10::cuda::compat::tanh(c10::cuda::compat::log1p(c10::cuda::compat::exp(x_acc))); + return dy_acc * (t_acc + x_acc * s_acc * (T_ACC(1) - t_acc * t_acc)); + }); + }); +} + } // namespace Tensor gelu_cuda(const Tensor& self) { @@ -540,6 +579,8 @@ REGISTER_DISPATCH(softplus_stub, &softplus_kernel); REGISTER_DISPATCH(softplus_backward_stub, &softplus_backward_kernel); REGISTER_DISPATCH(silu_stub, &silu_kernel); REGISTER_DISPATCH(silu_backward_stub, &silu_backward_kernel); +REGISTER_DISPATCH(mish_stub, &mish_kernel); +REGISTER_DISPATCH(mish_backward_stub, &mish_backward_kernel); REGISTER_DISPATCH(threshold_stub, &threshold_kernel_cuda); } // namespace native diff --git a/aten/src/ATen/native/cuda/AmpKernels.cu b/aten/src/ATen/native/cuda/AmpKernels.cu index 908a8566bd7..a5d8a643648 100644 --- a/aten/src/ATen/native/cuda/AmpKernels.cu +++ b/aten/src/ATen/native/cuda/AmpKernels.cu @@ -113,6 +113,7 @@ void _amp_foreach_non_finite_check_and_unscale_cuda_(TensorList scaled_grads, // - all scaled_grads are strided // - all scaled_grads are non overlapping and dense // - all scaled_grads are on the same device + // - all scaled_grads are of the same dtype TORCH_CHECK(scaled_grads[0].is_cuda(), "scaled_grads must be CUDA tensors."); // Sets up MTA launch to use scaled_grads as-is. tensor_lists.emplace_back(scaled_grads.vec()); @@ -126,12 +127,13 @@ void _amp_foreach_non_finite_check_and_unscale_cuda_(TensorList scaled_grads, tensor_lists.resize(1); tensor_lists[0].reserve(scaled_grads.size()); auto expected_device = scaled_grads[0].device(); + const auto expected_dtype = scaled_grads[0].scalar_type(); for (const Tensor& t : scaled_grads) { // Ensures GradScaler filtered scaled_grads by device. TORCH_CHECK(t.is_cuda(), "one of scaled_grads was not a CUDA tensor."); TORCH_CHECK(t.device() == expected_device, "scaled_grads must be on the same device."); TORCH_CHECK(t.layout() == at::kStrided, "one of scaled_grads was not a strided tensor."); - if (!t.is_non_overlapping_and_dense()) { + if (!t.is_non_overlapping_and_dense() || t.scalar_type() != expected_dtype) { // t is acceptable but not MTA-safe. Falls back to single-tensor TensorIterator kernel. _amp_non_finite_check_and_unscale_cuda_(const_cast(t), found_inf, diff --git a/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu b/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu index 0de012e8ecc..e89a6e1ba07 100644 --- a/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu +++ b/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu @@ -105,7 +105,7 @@ std::vector foreach_pointwise_op(TensorList input, TensorList tensors1, std::vector foreach_tensor_##NAME##_scalar_cuda(TensorList input, TensorList tensors1, TensorList tensors2, const Scalar& scalar) { \ check_foreach_api_restrictions(input, tensors1, tensors2); \ \ - if (!can_use_fast_route({input, tensors1, tensors2}, scalar)) { \ + if (!can_use_fast_route({input, tensors1, tensors2}, scalar) || has_integral_tensor(input, /* includeBool */ true)) { \ return at::native::foreach_tensor_##NAME##_scalar_slow(input, tensors1, tensors2, scalar); \ } \ \ @@ -115,7 +115,7 @@ std::vector foreach_tensor_##NAME##_scalar_cuda(TensorList input, Tensor void foreach_tensor_##NAME##_scalar_cuda_(TensorList input, TensorList tensors1, TensorList tensors2, const Scalar& scalar) { \ check_foreach_api_restrictions(input, tensors1, tensors2); \ \ - if (!can_use_fast_route({input, tensors1, tensors2}, scalar)) { \ + if (!can_use_fast_route({input, tensors1, tensors2}, scalar) || has_integral_tensor(input, /* includeBool */ true)) { \ return at::native::foreach_tensor_##NAME##_scalar_slow_(input, tensors1, tensors2, scalar); \ } \ \ @@ -127,7 +127,7 @@ void foreach_tensor_##NAME##_scalar_cuda_(TensorList input, TensorList tensors1, std::vector foreach_tensor_##NAME##_scalarlist_cuda(TensorList input, TensorList tensors1, TensorList tensors2, at::ArrayRef scalars) { \ check_foreach_api_restrictions(input, tensors1, tensors2, scalars); \ \ - if (!can_use_fast_route({input, tensors1, tensors2}, scalars)) { \ + if (!can_use_fast_route({input, tensors1, tensors2}, scalars) || has_integral_tensor(input, /* includeBool */ true)) { \ return at::native::foreach_tensor_##NAME##_scalarlist_slow(input, tensors1, tensors2, scalars); \ } \ \ @@ -137,7 +137,7 @@ std::vector foreach_tensor_##NAME##_scalarlist_cuda(TensorList input, Te void foreach_tensor_##NAME##_scalarlist_cuda_(TensorList input, TensorList tensors1, TensorList tensors2, at::ArrayRef scalars) { \ check_foreach_api_restrictions(input, tensors1, tensors2, scalars); \ \ - if (!can_use_fast_route({input, tensors1, tensors2}, scalars)) { \ + if (!can_use_fast_route({input, tensors1, tensors2}, scalars) || has_integral_tensor(input, /* includeBool */ true)) { \ return at::native::foreach_tensor_##NAME##_scalarlist_slow_(input, tensors1, tensors2, scalars); \ } \ \ @@ -149,10 +149,14 @@ FOREACH_POINTWISE_OP_SCALAR(addcdiv, std::divides); FOREACH_POINTWISE_OP_SCALARLIST(addcmul, std::multiplies); FOREACH_POINTWISE_OP_SCALARLIST(addcdiv, std::divides); + +// Why bool tensors are pushed to slowpath? +// Because `AT_DISPATCH_ALL_TYPES_AND` is used below. +// TODO(mkozuki): Check whether it's possible to handle bool tensors in fastpath. #define FOREACH_MAXIMUM_MINIMUM_OP(NAME, OP) \ std::vector foreach_tensor_##NAME##_cuda(TensorList tensors1, TensorList tensors2) { \ check_foreach_api_restrictions(tensors1, tensors2); \ - if (!can_use_fast_route({tensors1, tensors2})) { \ + if (!can_use_fast_route({tensors1, tensors2}) || has_bool_tensor(tensors1)) { \ return at::native::foreach_tensor_##NAME##_slow(tensors1, tensors2); \ } \ \ diff --git a/aten/src/ATen/native/cuda/ForeachUnaryOp.cu b/aten/src/ATen/native/cuda/ForeachUnaryOp.cu index 21c4139916a..4c331c61512 100644 --- a/aten/src/ATen/native/cuda/ForeachUnaryOp.cu +++ b/aten/src/ATen/native/cuda/ForeachUnaryOp.cu @@ -133,14 +133,14 @@ struct functor_name { \ #define OP_CUSTOM_FUNCTOR(function, op_name, functor_name) \ std::vector foreach_tensor_##op_name##_cuda(TensorList tensors) { \ check_foreach_api_restrictions(tensors); \ - if (!can_use_fast_route(tensors)) { \ + if (!can_use_fast_route(tensors) || has_integral_tensor(tensors, /* includeBool */ true)) { \ return at::native::foreach_tensor_##op_name##_slow(tensors); \ } \ return function(tensors); \ } \ void foreach_tensor_##op_name##_cuda_(TensorList tensors) { \ check_foreach_api_restrictions(tensors); \ - if (!can_use_fast_route(tensors)) { \ + if (!can_use_fast_route(tensors) || has_integral_tensor(tensors, /* includeBool */ true)) { \ return at::native::foreach_tensor_##op_name##_slow_(tensors); \ } \ \ @@ -247,13 +247,9 @@ struct Abs { std::vector foreach_tensor_abs_cuda(TensorList tensors) { check_foreach_api_restrictions(tensors); - bool has_complex = false; - for (auto t : tensors) { - if (at::isComplexType(t.scalar_type())) { - has_complex = true; - } - } - + const bool has_complex = std::any_of( + tensors.begin(), tensors.end(), + [](const auto & t) { return at::isComplexType(t.scalar_type()); }); if (!can_use_fast_route(tensors) || has_complex) { return at::native::foreach_tensor_abs_slow(tensors); } @@ -263,13 +259,9 @@ std::vector foreach_tensor_abs_cuda(TensorList tensors) { void foreach_tensor_abs_cuda_(TensorList tensors) { check_foreach_api_restrictions(tensors); - bool has_complex = false; - for (auto t : tensors) { - if (at::isComplexType(t.scalar_type())) { - has_complex = true; - } - } - + const bool has_complex = std::any_of( + tensors.begin(), tensors.end(), + [](const auto & t) { return at::isComplexType(t.scalar_type()); }); if (!can_use_fast_route(tensors) || has_complex) { return at::native::foreach_tensor_abs_slow_(tensors); } diff --git a/aten/src/ATen/native/cuda/IndexKernel.cu b/aten/src/ATen/native/cuda/IndexKernel.cu index 80373e212a7..91f4096ee6c 100644 --- a/aten/src/ATen/native/cuda/IndexKernel.cu +++ b/aten/src/ATen/native/cuda/IndexKernel.cu @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -15,11 +16,6 @@ #include #include #include -#include - -#include -#include -#include namespace at { namespace native { @@ -369,36 +365,36 @@ void take_kernel( namespace { +__global__ void masked_scatter_size_check(int64_t *totalElements, int64_t srcSize) { + CUDA_KERNEL_ASSERT(*totalElements <= srcSize); +} + template void masked_scatter_cuda_impl(Tensor& self, const Tensor& mask, const Tensor& source){ auto srcSize = source.numel(); - // Determine our output size - auto totalElements = mask.sum().item(); - - // The number of `1` elements present in the mask must be <= the - // number of elements available in `src` - TORCH_CHECK(totalElements <= srcSize, "source nElements must be == mask `1` elements"); + if (self.numel() == 0) { + return; + } auto mask_cont = mask.contiguous(); // Use a prefix sum to determine the output locations of the masked elements - auto maskPrefixSum = at::empty_like(mask, mask.options().dtype(kLong)); + auto maskPrefixSum = at::empty_like(mask_cont, mask.options().dtype(kLong)); - auto allocator = THCThrustAllocator(globalContext().lazyInitCUDA()); + at::cuda::cub::exclusive_scan( + mask_cont.data_ptr(), maskPrefixSum.data_ptr(), + []__device__(int64_t a, int64_t b) { return a + b; }, int64_t(0), + mask_cont.numel()); - thrust::device_ptr maskData(mask_cont.data_ptr()); - thrust::device_ptr maskPrefixSumData( - maskPrefixSum.data_ptr()); + // Determine our output size + auto totalElements = (at::_unsafe_view(maskPrefixSum, -1)[-1] + at::_unsafe_view(mask_cont, -1)[-1]); - // Reference for using static_cast on `init_value`: - // https://github.com/NVIDIA/thrust/issues/1379 - thrust::exclusive_scan( - thrust::cuda::par(allocator).on(c10::cuda::getCurrentCUDAStream()), - maskData, - maskData + mask_cont.numel(), - maskPrefixSumData, - static_cast(0)); + // Asynchronously check that the number of `1` elements present in the mask + // must be <= the number of elements available in `src`. + masked_scatter_size_check<<<1, 1, 0, at::cuda::getCurrentCUDAStream()>>>( + totalElements.data_ptr(), srcSize); + C10_CUDA_KERNEL_LAUNCH_CHECK(); // We are getting elements from `src` based on an offset from // `maskPrefixSum`, so that should be made contiguous too @@ -444,11 +440,6 @@ Tensor & masked_scatter__cuda(Tensor& self, const Tensor& mask, const Tensor& so " and ", source.scalar_type()); - TensorArg self_arg{self, "self", 1}; - TensorArg mask_arg{mask, "mask", 2}; - TensorArg source_arg{source, "source", 3}; - checkAllSameGPU(__func__, {self_arg, mask_arg, source_arg}); - c10::MaybeOwned b_mask = expand_inplace(self, mask, "masked_scatter_"); if (b_mask->dtype() == ScalarType::Byte) { diff --git a/aten/src/ATen/native/cuda/ScanKernels.cu b/aten/src/ATen/native/cuda/ScanKernels.cu index be3b66963e3..bbfb949038b 100644 --- a/aten/src/ATen/native/cuda/ScanKernels.cu +++ b/aten/src/ATen/native/cuda/ScanKernels.cu @@ -464,65 +464,6 @@ void scan_innermost_dim(const Tensor& self, Tensor& result, scalar_t init, Binar C10_CUDA_KERNEL_LAUNCH_CHECK(); } -template -__global__ void transform_vals(scalar_t * a, scalar_t * b, scalar_t * out, func_t binary_op){ - *out = binary_op(*a, *b); -} - -template -void scan_cub(const Tensor& self, Tensor& result, scalar_t init, BinaryFunction binary_op) { - int64_t size = self.numel(); - // non synchronizing cub call - // even though cub is supposed to support tensors with int_max elements, in reality it doesn't, - // so split at int_max/2 - constexpr int max_cub_size = std::numeric_limits::max() / 2 + 1; // 2**30 - for (int64_t i = 0; i < size; i += max_cub_size) { - int size_cub = std::min(size - i, max_cub_size); - Tensor first_elem; // need to save it for all iterations other than first - if (i > 0) { - // need to temporarily transform first element of the range we are - // operating on; self might be multi-d, but we need to index a single - // element - auto self_view = at::_unsafe_view(self, -1); - first_elem = self_view[i].clone(); - transform_vals<<<1, 1, 0, at::cuda::getCurrentCUDAStream()>>>( - self.data_ptr() + i, - result.data_ptr() + i - 1, - self.data_ptr() + i, - binary_op); - C10_CUDA_KERNEL_LAUNCH_CHECK(); - } - size_t temp_storage_bytes = 0; - AT_CUDA_CHECK(cub::DeviceScan::InclusiveScan( - nullptr, - temp_storage_bytes, - self.data_ptr() + i, - result.data_ptr() + i, - binary_op, - size_cub, - at::cuda::getCurrentCUDAStream())); - auto temp_storage = at::native::empty_cuda( - {static_cast(temp_storage_bytes)}, - kByte, self.options().layout_opt(), self.options().device_opt(), - self.options().pinned_memory_opt()); - AT_CUDA_CHECK(cub::DeviceScan::InclusiveScan( - temp_storage.data_ptr(), - temp_storage_bytes, - self.data_ptr() + i, - result.data_ptr() + i, - binary_op, - size_cub, - at::cuda::getCurrentCUDAStream())); - if (i > 0) { - if (self.data_ptr() != result.data_ptr()) { - // restore modified first element only if it's not an inplace operation - auto self_view = at::_unsafe_view(self, -1); - self_view[i].copy_(first_elem, /*non_blocking=*/true); - } - } - } -} - template void scan_dim(const Tensor& self, Tensor& result, int64_t dim, scalar_t init, BinaryFunction binary_op) { @@ -532,7 +473,7 @@ void scan_dim(const Tensor& self, Tensor& result, Tensor result_ = result.contiguous(); if (self.numel() == self.size(dim)) { - scan_cub(self_, result_, init, binary_op); + cuda::cub::inclusive_scan(self_.data_ptr(), result_.data_ptr(), binary_op, self.numel()); } else if (dim == ndim - 1) { scan_innermost_dim(self_, result_, init, binary_op); } else { diff --git a/aten/src/ATen/native/cuda/Sort.cu b/aten/src/ATen/native/cuda/Sort.cu index cf23671f6d5..b19e0a98bdd 100644 --- a/aten/src/ATen/native/cuda/Sort.cu +++ b/aten/src/ATen/native/cuda/Sort.cu @@ -1,6 +1,7 @@ #include #include +#include #include #include #include @@ -267,12 +268,14 @@ std::tuple sort_out_stable_cuda(const Tensor & self, c10::opt } Tensor self_; + bool newself = false; if (is_non_overlapping_and_dense && self.stride(dim) == 1) { self_ = self; } else { auto new_strides_unsort = infer_dense_strides_dim_last(self, dim); self_ = at::empty_strided(self.sizes(), new_strides_unsort, self.options()); self_.copy_(self); + newself = true; } Tensor values_tmp, indices_tmp; @@ -290,11 +293,12 @@ std::tuple sort_out_stable_cuda(const Tensor & self, c10::opt "Unexpected dtype for values, expect ", self_.scalar_type(), ", got ", values.scalar_type()); values.resize_as_(self); } - if (values.strides() != self_.strides()) { + + if (values.strides() == self_.strides() && (newself || get_overlap_status(self, values) == MemOverlapStatus::NO)) { + values_ptr_ = values.data_ptr(); + } else { values_tmp = at::empty_strided(self_.sizes(), self_.strides(), self_.options()); values_ptr_ = values_tmp.data_ptr(); - } else { - values_ptr_ = values.data_ptr(); } if (!indices.defined()) { diff --git a/aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu b/aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu index b6218b9f558..85108c980c1 100644 --- a/aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu +++ b/aten/src/ATen/native/cuda/UnarySpecialOpsKernel.cu @@ -31,7 +31,7 @@ void exp2_kernel_cuda(TensorIteratorBase& iter) { } void i0_kernel_cuda(TensorIteratorBase& iter) { - AT_DISPATCH_FLOATING_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, iter.dtype(), "i0_cuda", [&]() { + AT_DISPATCH_FLOATING_TYPES_AND2(ScalarType::Half, ScalarType::BFloat16, iter.common_dtype(), "i0_cuda", [&]() { gpu_kernel(iter, []GPU_LAMBDA(scalar_t a) -> scalar_t { return calc_i0(a); }); diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml index b716df615a7..c5ef7892bb4 100644 --- a/aten/src/ATen/native/native_functions.yaml +++ b/aten/src/ATen/native/native_functions.yaml @@ -3526,6 +3526,31 @@ CPU, CUDA: silu_backward CompositeImplicitAutograd: math_silu_backward +- func: mish(Tensor self) -> Tensor + structured_delegate: mish.out + python_module: nn + dispatch: + CompositeExplicitAutograd: mish + +- func: mish_(Tensor(a!) self) -> Tensor(a!) + structured_delegate: mish.out + python_module: nn + dispatch: + CompositeExplicitAutograd: mish_ + +- func: mish.out(Tensor self, *, Tensor(a!) out) -> Tensor(a!) + structured: True + structured_inherits: TensorIteratorBase + python_module: nn + dispatch: + CPU, CUDA: mish_out + +- func: mish_backward(Tensor grad_output, Tensor self) -> Tensor + python_module: nn + dispatch: + CPU, CUDA: mish_backward + CompositeImplicitAutograd: math_mish_backward + - func: sigmoid(Tensor self) -> Tensor device_check: NoCheck # TensorIterator structured_delegate: sigmoid.out @@ -4795,6 +4820,8 @@ - func: sparse_csr_tensor.crow_col_value(Tensor crow_indices, Tensor col_indices, Tensor values, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor +- func: _sparse_csr_tensor_unsafe(Tensor crow_indices, Tensor col_indices, Tensor values, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor + - func: sparse_coo_tensor.size(int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor - func: sparse_coo_tensor.indices(Tensor indices, Tensor values, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor @@ -4805,6 +4832,8 @@ - func: _validate_sparse_coo_tensor_args(Tensor indices, Tensor values, int[] size) -> () +- func: _validate_sparse_csr_tensor_args(Tensor crow_indices, Tensor col_indices, Tensor values, int[] size) -> () + - func: _sparse_coo_tensor_with_dims(int sparse_dim, int dense_dim, int[] size, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=False) -> Tensor dispatch: SparseCPU, SparseCUDA: new_with_dims_sparse @@ -4876,7 +4905,7 @@ variants: method dispatch: SparseCPU, SparseCUDA: _nnz_sparse - SparseCsrCPU: _nnz_sparse_csr + SparseCsrCPU, SparseCsrCUDA: _nnz_sparse_csr device_check: NoCheck device_guard: False @@ -4935,21 +4964,21 @@ variants: method dispatch: SparseCPU, SparseCUDA: values_sparse - SparseCsrCPU: values_sparse_csr + SparseCsrCPU, SparseCsrCUDA: values_sparse_csr device_check: NoCheck device_guard: False - func: crow_indices(Tensor(a) self) -> Tensor(a) variants: method dispatch: - SparseCsrCPU: crow_indices_sparse_csr + SparseCsrCPU, SparseCsrCUDA: crow_indices_sparse_csr device_check: NoCheck device_guard: False - func: col_indices(Tensor(a) self) -> Tensor(a) variants: method dispatch: - SparseCsrCPU: col_indices_sparse_csr + SparseCsrCPU, SparseCsrCUDA: col_indices_sparse_csr device_check: NoCheck device_guard: False diff --git a/aten/src/ATen/native/sparse/SparseCsrTensor.cpp b/aten/src/ATen/native/sparse/SparseCsrTensor.cpp index dbcd1ef4d6f..8e7b8b9e2c0 100644 --- a/aten/src/ATen/native/sparse/SparseCsrTensor.cpp +++ b/aten/src/ATen/native/sparse/SparseCsrTensor.cpp @@ -8,12 +8,116 @@ #include #include #include +#include namespace at { namespace native { using namespace at::sparse_csr; +namespace { + + +} // end anonymous namespace + +void _validate_sparse_csr_tensor_args(const Tensor& crow_indices, const Tensor& col_indices, const Tensor& values, IntArrayRef size) { + // Layout Invariants + TORCH_CHECK( + col_indices.layout() == kStrided && col_indices.is_contiguous(), + "expected col_indices to be a strided and contiguous tensor"); + + TORCH_CHECK( + crow_indices.layout() == kStrided && crow_indices.is_contiguous(), + "expected crow_indices to be a strided and contiguous tensor"); + + TORCH_CHECK( + values.layout() == kStrided && values.is_contiguous(), + "expected values to be a strided and contiguous tensor"); + + // Shape and Strides invariants + TORCH_CHECK( + size.size() == 2, + "size of a CSR tensor must be of length 2, but got: ", + size.size()); + TORCH_CHECK( + crow_indices.dim() == 1, + "crow_indices must have dim=1 but got crow_indices.dim()=", + crow_indices.dim()); + TORCH_CHECK( + col_indices.dim() == 1, + "col_indices must have dim=1 but got col_indices.dim()=", + col_indices.dim()); + TORCH_CHECK( + values.dim() == 1, + "values must have dim=1 but got values.dim()=", + values.dim()); + // Note, this check also enforces `crow_indices.numel() >= 1` + TORCH_CHECK( + crow_indices.numel() == (size[0] + 1), + "crow_indices.numel() must be size(0) + 1, but got: ", + crow_indices.numel()); + TORCH_CHECK( + col_indices.numel() == values.numel(), + "col_indices and values must have equal sizes, but got col_indices.numel(): ", + col_indices.numel(), + ", values.numel(): ", + values.numel()); + + // Indices invariants + AT_DISPATCH_INDEX_TYPES(crow_indices.scalar_type(), "csr_construct_check", [&] { + Tensor crow_indices_cpu = crow_indices.to(kCPU); + auto crow_indices_accessor = crow_indices_cpu.accessor(); + TORCH_CHECK( + crow_indices_accessor[0] == 0, "0th value of crow_indices must be 0."); + + TORCH_CHECK( + crow_indices_accessor[crow_indices.numel() - 1] == col_indices.numel(), + "last value of crow_indices should be equal to the length of col_indices."); + + for (int i = 1; i <= size[0]; i++) { + TORCH_CHECK( + crow_indices_accessor[i - 1] <= crow_indices_accessor[i], + "at position i = ", i, ", this condition crow_indices[i - 1] <= crow_indices[i] fails"); + } + if (col_indices.numel() > 0) { + TORCH_CHECK(0 <= col_indices.min().item(), "col_indices.min() should be greater or equal to zero"); + TORCH_CHECK(size[1] > col_indices.max().item(), "size(1) should be greater than col_indices.max()"); + } + }); + + // CSR Type Invariants + auto crow_indices_type = crow_indices.scalar_type(); + auto col_indices_type = col_indices.scalar_type(); + TORCH_CHECK( + crow_indices_type == col_indices_type, + "both crow_indices and col_indices should have the same type."); + TORCH_CHECK( + crow_indices_type == kInt || crow_indices_type == kLong, + "crow_indices and col_indices must be an int32 or int64 type, but got: ", + crow_indices_type); + + // CSR Device Invariants + TORCH_CHECK( + col_indices.get_device() == crow_indices.get_device(), + "crow_indices and col_indices devices (", + crow_indices.get_device(), + ", ", + col_indices.get_device(), + ") must match"); + TORCH_CHECK( + crow_indices.get_device() == values.get_device(), + "device of crow_indices (", + crow_indices.get_device(), + ") must match device of values (", + values.get_device(), + ")"); + TORCH_CHECK( + values.device().type() == kCPU || values.device().type() == kCUDA, + "device type of values (", + values.device().type(), + ") must be CPU or CUDA"); +} + // Construction of CSR tensors. SparseCsrTensor new_csr_tensor(const TensorOptions& options) { // TODO: remove this comment after enabling autograd support for CSR tensor @@ -22,10 +126,13 @@ SparseCsrTensor new_csr_tensor(const TensorOptions& options) { TORCH_INTERNAL_ASSERT(options.layout() == kSparseCsr); DispatchKey dispatch_key; + TORCH_CHECK_NOT_IMPLEMENTED( + options.device().type() == kCPU || options.device().type() == kCUDA, + "Could not run '", "sparse_csr_tensor", "' from the '", options.device(), "' device.)"); + if (options.device().is_cuda()) { dispatch_key = DispatchKey::SparseCsrCUDA; } else { - TORCH_INTERNAL_ASSERT(options.device().is_cpu()); dispatch_key = DispatchKey::SparseCsrCPU; } @@ -33,6 +140,21 @@ SparseCsrTensor new_csr_tensor(const TensorOptions& options) { DispatchKeySet(dispatch_key), options.dtype()); } +Tensor _sparse_csr_tensor_unsafe(const Tensor& crow_indices, const Tensor& col_indices, + const Tensor& values, + IntArrayRef size, + c10::optional dtype, + c10::optional layout, + c10::optional device, + c10::optional pin_memory) { + + TensorOptions options = TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory); + + SparseCsrTensor self = new_csr_tensor(options); + get_sparse_csr_impl(self)->set_member_tensors(crow_indices, col_indices, values, size); + return self; +} + // TODO: This constructor should probably use an ATen abstract method in order // to make autograd dispatch available for the CSR constructor. See the relevant // note in native_functions.yaml. @@ -47,43 +169,18 @@ Tensor sparse_csr_tensor( c10::optional pin_memory) { // See [Note: hacky wrapper removal for TensorOptions] TensorOptions options = TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory); - TORCH_CHECK( - options.layout() == kSparseCsr, - "expected sparse CSR layout, but got layout ", - options.layout()); - - AT_DISPATCH_INDEX_TYPES(crow_indices.scalar_type(), "csr_construct_check", [&] { - auto crow_indices_accessor = crow_indices.accessor(); - TORCH_CHECK( - crow_indices_accessor[crow_indices.numel() - 1] <= col_indices.numel(), - "last value of crow_indices should be less than length of col_indices."); - TORCH_CHECK( - crow_indices_accessor[0] == 0, "0th value of crow_indices must be 0."); - }); - - TORCH_CHECK( - crow_indices.dim() == 1, - "crow_indices must have dim=1 but got crow_indices.dim()=", - crow_indices.dim()); - TORCH_CHECK( - col_indices.dim() == 1, - "col_indices must have dim=1 but got col_indices.dim()=", - col_indices.dim()); - TORCH_CHECK( - values.dim() == 1, - "values must have dim=1 but got values.dim()=", - values.dim()); - TORCH_CHECK( - (crow_indices.numel() - 1) == size[0], - "crow_indices.numel() must be size(0) + 1, but got: ", - crow_indices.numel()); + at::native::_validate_sparse_csr_tensor_args(crow_indices, col_indices, values, size); - SparseCsrTensor self = new_csr_tensor(options); - get_sparse_csr_impl(self)->resize_and_clear_(values.numel(), size); - get_sparse_csr_impl(self)->set_member_tensors( - crow_indices, col_indices, values); - return self; + return at::native::_sparse_csr_tensor_unsafe( + crow_indices, + col_indices, + values, + size, + optTypeMetaToScalarType(options.dtype_opt()), + options.layout_opt(), + options.device_opt(), + options.pinned_memory_opt()); } Tensor sparse_csr_tensor( @@ -96,37 +193,28 @@ Tensor sparse_csr_tensor( c10::optional pin_memory) { // See [Note: hacky wrapper removal for TensorOptions] TensorOptions options = TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory); - - TORCH_CHECK( - options.layout() == kSparseCsr, - "expected sparse CSR layout, but got layout ", - options.layout()); - TORCH_CHECK(crow_indices.numel() >= 1, "expected crow_indices.numel() >= 1, but got ", - crow_indices.numel()); - // NOLINTNEXTLINE(cppcoreguidelines-pro-type-member-init) std::array size; - if (col_indices.numel() > 0) { - size[0] = crow_indices.numel() - 1; - Tensor max_col_indices = std::get<0>(col_indices.max(0, false)); - - AT_DISPATCH_INDEX_TYPES(crow_indices.scalar_type(), "csr_construct_check", [&] { - auto crow_indices_accessor = crow_indices.accessor(); - TORCH_CHECK( - crow_indices_accessor[crow_indices.numel() - 1] <= col_indices.numel(), - "last value of crow_indices should be less than length of col_indices."); - TORCH_CHECK( - crow_indices_accessor[0] == 0, "0th value of crow_indices must be 0."); - - size[1] = *max_col_indices.data_ptr() + 1; + AT_DISPATCH_INDEX_TYPES(col_indices.scalar_type(), "csr_construct_check", [&] { + size[0] = crow_indices.numel() - 1; + size[1] = col_indices.max().item() + 1; }); } else { size[0] = 0; size[1] = 0; } - return at::sparse_csr_tensor( - crow_indices, col_indices, values, size, options); + at::native::_validate_sparse_csr_tensor_args(crow_indices, col_indices, values, size); + + return at::native::_sparse_csr_tensor_unsafe( + crow_indices, + col_indices, + values, + size, + optTypeMetaToScalarType(options.dtype_opt()), + options.layout_opt(), + options.device_opt(), + options.pinned_memory_opt()); } // Access members of CSR tensors. diff --git a/aten/src/ATen/test/math_kernel_test.cpp b/aten/src/ATen/test/math_kernel_test.cpp index 005c11cb0ea..8c01688825c 100644 --- a/aten/src/ATen/test/math_kernel_test.cpp +++ b/aten/src/ATen/test/math_kernel_test.cpp @@ -110,6 +110,15 @@ TEST(MathKernelTest, SiluBackward) { ASSERT_ALLCLOSE_TOLERANCES(out, math_out, 1e-4, 1e-6); } +// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) +TEST(MathKernelTest, MishBackward) { + const auto input = rand({20, 10}); + const auto grad_output = rand({20, 10}); + auto out = at::native::mish_backward(grad_output, input); + auto math_out = at::native::math_mish_backward(grad_output, input); + ASSERT_ALLCLOSE_TOLERANCES(out, math_out, 1e-4, 1e-6); +} + // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) TEST(MathKernelTest, NarrowCopy) { auto x = rand({5, 8, 7}); diff --git a/aten/src/ATen/test/test_thread_pool_guard.cpp b/aten/src/ATen/test/test_thread_pool_guard.cpp index 24575fb381f..33e4144c141 100644 --- a/aten/src/ATen/test/test_thread_pool_guard.cpp +++ b/aten/src/ATen/test/test_thread_pool_guard.cpp @@ -3,7 +3,6 @@ #include #include - // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) TEST(TestThreadPoolGuard, TestThreadPoolGuard) { auto threadpool_ptr = caffe2::pthreadpool_(); @@ -30,3 +29,33 @@ TEST(TestThreadPoolGuard, TestThreadPoolGuard) { ASSERT_NE(threadpool_ptr4, nullptr); ASSERT_EQ(threadpool_ptr4, threadpool_ptr); } + +TEST(TestThreadPoolGuard, TestRunWithGuard) { + const std::vector array = {1, 2, 3}; + + // Run via pthreadpool_parallelize_1d + int64_t outer = 0; + auto fn1 = [&array, &outer](const size_t task_id) { + outer += array[task_id]; + }; + auto pool = caffe2::pthreadpool(); + pool->run(fn1, 3); + + int64_t inner = 0; + { + // Run on same thread + caffe2::_NoPThreadPoolGuard g1; + auto fn2 = [&array, &inner](const size_t task_id) { + inner += array[task_id]; + }; + pool->run(fn2, 3); + + // confirm the guard is on + auto threadpool_ptr1 = caffe2::pthreadpool_(); + ASSERT_EQ(threadpool_ptr1, nullptr); + } + ASSERT_NE(outer, 0); + ASSERT_NE(inner, 0); + ASSERT_EQ(outer, 6); + ASSERT_EQ(inner, 6); +} diff --git a/benchmarks/cpp/tensorexpr/bench_approx.cpp b/benchmarks/cpp/tensorexpr/bench_approx.cpp index 55e48601673..1f09b1dbac5 100644 --- a/benchmarks/cpp/tensorexpr/bench_approx.cpp +++ b/benchmarks/cpp/tensorexpr/bench_approx.cpp @@ -12,17 +12,18 @@ using namespace torch::jit::tensorexpr; void vectorize(tensorexpr::LoopNest* ln, tensorexpr::Tensor* target, int width) { auto loops = ln->getLoopStmtsFor(target); - For *outer, *inner, *tail; - ln->splitWithTail(loops[0], width, &outer, &inner, &tail); + For *inner, *tail; + ln->splitWithTail(loops[0], width, &inner, &tail); ln->vectorize(inner); } void optimizePointwise(tensorexpr::LoopNest* ln, tensorexpr::Tensor* target) { std::vector loops = ln->getLoopStmtsFor(target); - For *outer, *inner, *tail; - ln->splitWithTail(loops[0], 16 * 8, &outer, &inner, &tail); + For *inner, *tail; + ln->splitWithTail(loops[0], 16 * 8, &inner, &tail); + For* outer = loops[0]; ln->vectorize(inner); - ln->splitWithTail(outer, 8, &outer, &inner, &tail); + ln->splitWithTail(outer, 8, &inner, &tail); Stmt* unrolled; LoopNest::unroll(inner, &unrolled); } diff --git a/benchmarks/cpp/tensorexpr/bench_gemm.cpp b/benchmarks/cpp/tensorexpr/bench_gemm.cpp index 78855264a5b..792d457c2f2 100644 --- a/benchmarks/cpp/tensorexpr/bench_gemm.cpp +++ b/benchmarks/cpp/tensorexpr/bench_gemm.cpp @@ -81,16 +81,12 @@ BENCHMARK_DEFINE_F(Gemm, TensorExprTile32x32)(benchmark::State& state) { { auto const& loops = loop.getLoopStmtsFor(CT); te::For* m = loops[0]; - te::For* mo; - te::For* mi; - loop.splitWithMask(m, 32, &mo, &mi); + loop.splitWithMask(m, 32); } { auto const& loops = loop.getLoopStmtsFor(CT); te::For* n = loops[2]; - te::For* no; - te::For* ni; - loop.splitWithMask(n, 32, &no, &ni); + loop.splitWithMask(n, 32); } // mo, mi, no, ni, k -> // mo, no, mi, ni, k @@ -145,16 +141,12 @@ BENCHMARK_DEFINE_F(Gemm, TensorExprTile4x16)(benchmark::State& state) { { auto const& loops = loop.getLoopStmtsFor(CT); te::For* m = loops[0]; - te::For* mo; - te::For* mi; - loop.splitWithMask(m, 4, &mo, &mi); + loop.splitWithMask(m, 4); } { auto const& loops = loop.getLoopStmtsFor(CT); te::For* n = loops[2]; - te::For* no; - te::For* ni; - loop.splitWithMask(n, 16, &no, &ni); + loop.splitWithMask(n, 16); } // mo, mi, no, ni, k -> // mo, no, mi, ni, k @@ -209,16 +201,12 @@ BENCHMARK_DEFINE_F(Gemm, TensorExprTile4x16VecUnroll)(benchmark::State& state) { { auto const& loops = loop.getLoopStmtsFor(CT); te::For* m = loops[0]; - te::For* mo; - te::For* mi; - loop.splitWithMask(m, 4, &mo, &mi); + loop.splitWithMask(m, 4); } { auto const& loops = loop.getLoopStmtsFor(CT); te::For* n = loops[2]; - te::For* no; - te::For* ni; - loop.splitWithMask(n, 16, &no, &ni); + loop.splitWithMask(n, 16); } // mo, mi, no, ni, k -> // mo, no, mi, ni, k @@ -281,16 +269,12 @@ BENCHMARK_DEFINE_F(Gemm, TensorExprTile4x16Cache)(benchmark::State& state) { { auto const& loops = loop.getLoopStmtsFor(CT); te::For* m = loops[0]; - te::For* mo; - te::For* mi; - loop.splitWithMask(m, 4, &mo, &mi); + loop.splitWithMask(m, 4); } { auto const& loops = loop.getLoopStmtsFor(CT); te::For* n = loops[2]; - te::For* no; - te::For* ni; - loop.splitWithMask(n, 16, &no, &ni); + loop.splitWithMask(n, 16); } // mo, mi, no, ni, k -> // mo, no, mi, ni, k diff --git a/benchmarks/cpp/tensorexpr/bench_reduce.cpp b/benchmarks/cpp/tensorexpr/bench_reduce.cpp index 39462de17ff..d0468139176 100644 --- a/benchmarks/cpp/tensorexpr/bench_reduce.cpp +++ b/benchmarks/cpp/tensorexpr/bench_reduce.cpp @@ -266,10 +266,7 @@ BENCHMARK_DEFINE_F(Reduce1D, TeSplitTail)(benchmark::State& state) { { auto const& loops = loop.getLoopStmtsFor(BT); te::For* m = loops[1]; - te::For* mo; - te::For* mi; - te::For* tail; - loop.splitWithTail(m, kChunkSize, &mo, &mi, &tail); + loop.splitWithTail(m, kChunkSize); } loop.prepareForCodegen(); @@ -310,9 +307,7 @@ BENCHMARK_DEFINE_F(Reduce1D, TeSplitMask)(benchmark::State& state) { { auto const& loops = loop.getLoopStmtsFor(BT); te::For* m = loops[1]; - te::For* mo; - te::For* mi; - loop.splitWithMask(m, kChunkSize, &mo, &mi); + loop.splitWithMask(m, kChunkSize); } loop.prepareForCodegen(); @@ -354,9 +349,9 @@ BENCHMARK_DEFINE_F(Reduce1D, TeRfactorV1)(benchmark::State& state) { auto loops = loop.getLoopStmtsFor(BT); TORCH_CHECK(loops.size() == 1); - te::For* mo; te::For* mi; - loop.splitWithMask(loops.at(0), kChunkSize, &mo, &mi); + loop.splitWithMask(loops.at(0), kChunkSize, &mi); + te::For* mo = loops.at(0); loop.reorderAxis(mo, mi); loops = loop.getLoopStmtsFor(BT); diff --git a/c10/core/Scalar.h b/c10/core/Scalar.h index 802bf17e041..4c0baa431d5 100644 --- a/c10/core/Scalar.h +++ b/c10/core/Scalar.h @@ -63,8 +63,9 @@ class C10_API Scalar { AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF(DEFINE_ACCESSOR) // also support scalar.to(); + // Deleted for unsupported types, but specialized below for supported types template - T to() const; + T to() const = delete; #undef DEFINE_ACCESSOR bool isFloatingPoint() const { @@ -186,11 +187,6 @@ class C10_API Scalar { }; // define the scalar.to() specializations -template -inline T Scalar::to() const { - throw std::runtime_error("to() cast to unexpected type."); -} - #define DEFINE_TO(T, name) \ template <> \ inline T Scalar::to() const { \ diff --git a/c10/core/TensorImpl.h b/c10/core/TensorImpl.h index 5e973da15fc..e383ffb4c57 100644 --- a/c10/core/TensorImpl.h +++ b/c10/core/TensorImpl.h @@ -2023,6 +2023,22 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { return n; } + /** + * Compute the number of elements based on the sizes of a + * tensor. Catches integer overflow that may occur when a tensor + * using a sparse layout has multiple dimensions with large sizes. + */ + int64_t safe_compute_numel() const { + int64_t n = 1; + for (auto s : sizes()) { + TORCH_CHECK( + s == 0 || n <= std::numeric_limits::max() / s, + "numel: integer multiplication overflow"); + n *= s; + } + return n; + } + /** * Compute whether or not a tensor is contiguous based on the sizes and * strides of a tensor. @@ -2041,12 +2057,27 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target { protected: /** - * Recompute the cached numel of a tensor. Call this if you modify sizes. + * Recompute the cached numel of a tensor. Call this if you modify + * sizes. + * + * For tensors with sparse layouts, use safe_refresh_numel() instead + * because it will catch integer overflow that may occur for tensors + * with sparse layouts and large dimensions. */ void refresh_numel() { numel_ = compute_numel(); } + /** + * Recompute the cached numel of a tensor. Call this if you modify + * sizes. Use only for tensors with sparse layouts because only + * sparse tensor are likely to have sizes that may lead to integer + * overflow when computing numel. + */ + void safe_refresh_numel() { + numel_ = safe_compute_numel(); + } + /** * Recompute the cached contiguity of a tensor. Call this if you modify sizes * or strides. diff --git a/caffe2/python/operator_test/activation_ops_test.py b/caffe2/python/operator_test/activation_ops_test.py index 7e5c5f42360..47216d51500 100644 --- a/caffe2/python/operator_test/activation_ops_test.py +++ b/caffe2/python/operator_test/activation_ops_test.py @@ -243,7 +243,7 @@ class TestActivations(serial.SerializedTestCase): @given(X=hu.tensor(), fast_gelu=st.booleans(), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_gelu(self, X, fast_gelu, gc, dc): op = core.CreateOperator( "Gelu", diff --git a/caffe2/python/operator_test/adadelta_test.py b/caffe2/python/operator_test/adadelta_test.py index 930f74ecd99..6c40c379697 100644 --- a/caffe2/python/operator_test/adadelta_test.py +++ b/caffe2/python/operator_test/adadelta_test.py @@ -53,7 +53,7 @@ class TestAdadelta(serial.SerializedTestCase): decay=hu.floats(min_value=0.01, max_value=0.99, allow_nan=False, allow_infinity=False), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_adadelta(self, inputs, lr, epsilon, decay, gc, dc): param, moment, moment_delta, grad = inputs moment = np.abs(moment) diff --git a/caffe2/python/operator_test/adagrad_test.py b/caffe2/python/operator_test/adagrad_test.py index 309c54a25cb..3172026df1b 100644 --- a/caffe2/python/operator_test/adagrad_test.py +++ b/caffe2/python/operator_test/adagrad_test.py @@ -26,7 +26,7 @@ class TestAdagrad(serial.SerializedTestCase): weight_decay=st.sampled_from([0.0, 0.1]), **hu.gcs ) - @settings(deadline=1000) + @settings(deadline=10000) def test_adagrad(self, inputs, lr, epsilon, weight_decay, gc, dc): param, momentum, grad = inputs momentum = np.abs(momentum) @@ -98,7 +98,7 @@ class TestAdagrad(serial.SerializedTestCase): ), **hu.gcs_cpu_only ) - @settings(deadline=1000) + @settings(deadline=10000) def test_adagrad_output_effective_lr_and_update(self, inputs, lr, epsilon, gc, dc): param, momentum, grad = inputs momentum = np.abs(momentum) @@ -158,7 +158,7 @@ class TestAdagrad(serial.SerializedTestCase): ), **hu.gcs ) - @settings(deadline=1000) + @settings(deadline=10000) def test_sparse_adagrad_empty(self, inputs, lr, epsilon, gc, dc): param, momentum = inputs grad = np.empty(shape=(0,) + param.shape[1:], dtype=np.float32) @@ -190,7 +190,7 @@ class TestAdagrad(serial.SerializedTestCase): # Suppress filter_too_much health check. # Likely caused by `assume` call falling through too often. - @settings(suppress_health_check=[HealthCheck.filter_too_much], deadline=1000) + @settings(suppress_health_check=[HealthCheck.filter_too_much], deadline=10000) @given( inputs=hu.tensors(n=3), lr=st.floats( diff --git a/caffe2/python/operator_test/assert_test.py b/caffe2/python/operator_test/assert_test.py index 2bbca5ab737..eef33bc22bc 100644 --- a/caffe2/python/operator_test/assert_test.py +++ b/caffe2/python/operator_test/assert_test.py @@ -14,7 +14,7 @@ class TestAssert(hu.HypothesisTestCase): dtype=st.sampled_from(['bool_', 'int32', 'int64']), shape=st.lists(elements=st.integers(1, 10), min_size=1, max_size=4), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_assert(self, dtype, shape, gc, dc): test_tensor = np.random.rand(*shape).astype(np.dtype(dtype)) diff --git a/caffe2/python/operator_test/batch_sparse_to_dense_op_test.py b/caffe2/python/operator_test/batch_sparse_to_dense_op_test.py index adfc735c66f..968da8da840 100644 --- a/caffe2/python/operator_test/batch_sparse_to_dense_op_test.py +++ b/caffe2/python/operator_test/batch_sparse_to_dense_op_test.py @@ -19,7 +19,7 @@ class TestBatchSparseToDense(serial.SerializedTestCase): default_value=st.floats(min_value=2.0, max_value=3.0), **hu.gcs ) - @settings(deadline=1000) + @settings(deadline=None) def test_batch_sparse_to_dense( self, batch_size, dense_last_dim, default_value, gc, dc ): @@ -75,7 +75,7 @@ class TestBatchSparseToDense(serial.SerializedTestCase): dense_last_dim=st.integers(5, 10), **hu.gcs ) - @settings(deadline=1000) + @settings(deadline=None) def test_batch_dense_to_sparse(self, batch_size, dense_last_dim, gc, dc): L = np.random.randint(1, dense_last_dim + 1, size=(batch_size)) # The following logic ensure that indices in each batch will not be duplicated diff --git a/caffe2/python/operator_test/bbox_transform_test.py b/caffe2/python/operator_test/bbox_transform_test.py index d2584f18af4..adcc2f8723d 100644 --- a/caffe2/python/operator_test/bbox_transform_test.py +++ b/caffe2/python/operator_test/bbox_transform_test.py @@ -214,7 +214,7 @@ class TestBBoxTransformOp(serial.SerializedTestCase): clip_angle_thresh=st.sampled_from([-1.0, 1.0]), **hu.gcs_cpu_only ) - @settings(deadline=1000) + @settings(deadline=10000) def test_bbox_transform( self, num_rois, @@ -282,7 +282,7 @@ class TestBBoxTransformOp(serial.SerializedTestCase): clip_angle_thresh=st.sampled_from([-1.0, 1.0]), **hu.gcs_cpu_only ) - @settings(deadline=1000) + @settings(deadline=10000) def test_bbox_transform_batch( self, roi_counts, diff --git a/caffe2/python/operator_test/boolean_mask_test.py b/caffe2/python/operator_test/boolean_mask_test.py index 38fe4389999..0ccdbd92851 100644 --- a/caffe2/python/operator_test/boolean_mask_test.py +++ b/caffe2/python/operator_test/boolean_mask_test.py @@ -15,7 +15,7 @@ class TestBooleanMaskOp(serial.SerializedTestCase): max_len=100, elements=hu.floats(min_value=0.5, max_value=1.0)), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_boolean_mask_gradient(self, x, gc, dc): op = core.CreateOperator("BooleanMask", ["data", "mask"], @@ -30,7 +30,7 @@ class TestBooleanMaskOp(serial.SerializedTestCase): max_len=5, elements=hu.floats(min_value=0.5, max_value=1.0)), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_boolean_mask(self, x, gc, dc): op = core.CreateOperator("BooleanMask", ["data", "mask"], diff --git a/caffe2/python/operator_test/box_with_nms_limit_op_test.py b/caffe2/python/operator_test/box_with_nms_limit_op_test.py index 3131316feef..e459edb57de 100644 --- a/caffe2/python/operator_test/box_with_nms_limit_op_test.py +++ b/caffe2/python/operator_test/box_with_nms_limit_op_test.py @@ -83,7 +83,7 @@ class TestBoxWithNMSLimitOp(serial.SerializedTestCase): self.assertReferenceChecks(gc, op, [scores, boxes], ref) @given(**HU_CONFIG) - @settings(deadline=1000) + @settings(deadline=10000) def test_score_thresh(self, gc): in_centers = [(0, 0), (20, 20), (50, 50)] in_scores = [0.7, 0.85, 0.6] @@ -102,7 +102,7 @@ class TestBoxWithNMSLimitOp(serial.SerializedTestCase): self.assertReferenceChecks(gc, op, [scores, boxes], ref) @given(det_per_im=st.integers(1, 3), **HU_CONFIG) - @settings(deadline=1000) + @settings(deadline=10000) def test_detections_per_im(self, det_per_im, gc): in_centers = [(0, 0), (20, 20), (50, 50)] in_scores = [0.7, 0.85, 0.6] @@ -131,7 +131,7 @@ class TestBoxWithNMSLimitOp(serial.SerializedTestCase): output_classes_include_bg_cls=st.booleans(), **HU_CONFIG ) - @settings(deadline=1000) + @settings(deadline=10000) def test_multiclass( self, num_classes, diff --git a/caffe2/python/operator_test/clip_op_test.py b/caffe2/python/operator_test/clip_op_test.py index 3304121aab0..0e800dafe01 100644 --- a/caffe2/python/operator_test/clip_op_test.py +++ b/caffe2/python/operator_test/clip_op_test.py @@ -19,7 +19,7 @@ class TestClip(serial.SerializedTestCase): max_=st.floats(min_value=0, max_value=2), inplace=st.booleans(), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_clip(self, X, min_, max_, inplace, gc, dc): # go away from the origin point to avoid kink problems if np.isscalar(X): diff --git a/caffe2/python/operator_test/clip_tensor_op_test.py b/caffe2/python/operator_test/clip_tensor_op_test.py index efc86815bc4..c90c38234c8 100644 --- a/caffe2/python/operator_test/clip_tensor_op_test.py +++ b/caffe2/python/operator_test/clip_tensor_op_test.py @@ -19,7 +19,7 @@ class TestClipTensorByScalingOp(serial.SerializedTestCase): use_additional_threshold=st.booleans(), inplace=st.booleans(), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_clip_tensor_by_scaling(self, n, d, threshold, additional_threshold, use_additional_threshold, inplace, gc, dc): diff --git a/caffe2/python/operator_test/conv_test.py b/caffe2/python/operator_test/conv_test.py index e600aa2c9ee..23217b15b82 100644 --- a/caffe2/python/operator_test/conv_test.py +++ b/caffe2/python/operator_test/conv_test.py @@ -164,7 +164,7 @@ class TestConvolution(serial.SerializedTestCase): use_bias=st.booleans(), **hu.gcs ) - @settings(deadline=1000) + @settings(deadline=None) def test_convolution_separate_stride_pad_layout( self, op_type, @@ -761,7 +761,7 @@ class TestConvolution(serial.SerializedTestCase): engine=st.sampled_from(["CUDNN", ""]), **hu.gcs_no_hip ) - @settings(deadline=1000) + @settings(deadline=None) def test_convolution_sync(self, net_type, num_workers, engine, gc, dc): m = ModelHelper(name="test_model") n = 1 diff --git a/caffe2/python/operator_test/crf_test.py b/caffe2/python/operator_test/crf_test.py index 4d7b90c431a..a4447fa3f36 100644 --- a/caffe2/python/operator_test/crf_test.py +++ b/caffe2/python/operator_test/crf_test.py @@ -15,7 +15,7 @@ class TestCRFOp(hu.HypothesisTestCase): @given(num_tags=st.integers(2, 4), num_words=st.integers(2, 15)) - @settings(deadline=1000) + @settings(deadline=10000) def test_crf_with_loss_op(self, num_tags, num_words): model = ModelHelper(name='external') embeddings_dim = 200 diff --git a/caffe2/python/operator_test/dropout_op_test.py b/caffe2/python/operator_test/dropout_op_test.py index 84c2f7e35f5..d3a5c831d87 100644 --- a/caffe2/python/operator_test/dropout_op_test.py +++ b/caffe2/python/operator_test/dropout_op_test.py @@ -48,7 +48,7 @@ class TestDropout(serial.SerializedTestCase): output_mask=st.booleans(), engine=st.sampled_from(["", "CUDNN"]), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_dropout_ratio0(self, X, in_place, output_mask, engine, gc, dc): """Test with ratio=0 for a deterministic reference impl.""" # TODO(lukeyeager): enable this path when the op is fixed diff --git a/caffe2/python/operator_test/elementwise_op_broadcast_test.py b/caffe2/python/operator_test/elementwise_op_broadcast_test.py index 605c1d74127..bd19ebc6ed9 100644 --- a/caffe2/python/operator_test/elementwise_op_broadcast_test.py +++ b/caffe2/python/operator_test/elementwise_op_broadcast_test.py @@ -75,22 +75,22 @@ class TestElementwiseBroadcast(serial.SerializedTestCase): self.assertGradientChecks(gc, op, [X, Y], 1, [0]) @given(**hu.gcs) - @settings(deadline=1000) + @settings(deadline=None) def test_broadcast_Add(self, gc, dc): self.__test_binary_op(gc, dc, "Add", operator.add) @given(**hu.gcs) - @settings(deadline=1000) + @settings(deadline=None) def test_broadcast_Mul(self, gc, dc): self.__test_binary_op(gc, dc, "Mul", operator.mul) @given(**hu.gcs) - @settings(deadline=1000) + @settings(deadline=None) def test_broadcast_Sub(self, gc, dc): self.__test_binary_op(gc, dc, "Sub", operator.sub) @given(**hu.gcs) - @settings(deadline=1000) + @settings(deadline=None) def test_broadcast_powt(self, gc, dc): np.random.seed(101) diff --git a/caffe2/python/operator_test/elementwise_ops_test.py b/caffe2/python/operator_test/elementwise_ops_test.py index 922e4554e9a..130ebade010 100644 --- a/caffe2/python/operator_test/elementwise_ops_test.py +++ b/caffe2/python/operator_test/elementwise_ops_test.py @@ -59,7 +59,7 @@ class TestElementwiseOps(hu.HypothesisTestCase): @given(n=st.integers(0, 6), m=st.integers(4, 6), seed=st.integers(0, 1000), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_log(self, n, m, gc, dc, seed): np.random.seed(seed) X = np.random.rand(n, m).astype(np.float32) + 1.0 @@ -326,7 +326,7 @@ class TestElementwiseOps(hu.HypothesisTestCase): @given(n=st.integers(0, 6), m=st.integers(4, 6), seed=st.integers(0, 1000), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_swish_gradient_inplace(self, n, m, gc, dc, seed): np.random.seed(seed) @@ -354,7 +354,7 @@ class TestElementwiseOps(hu.HypothesisTestCase): @given(X=hu.tensor(dtype=np.float32), inplace=st.booleans(), engine=st.sampled_from(["", "CUDNN"]), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_sigmoid(self, X, inplace, engine, gc, dc): op = core.CreateOperator( "Sigmoid", diff --git a/caffe2/python/operator_test/erf_op_test.py b/caffe2/python/operator_test/erf_op_test.py index 64714db4315..a4ed0d5fb23 100644 --- a/caffe2/python/operator_test/erf_op_test.py +++ b/caffe2/python/operator_test/erf_op_test.py @@ -18,7 +18,7 @@ class TestErfOp(serial.SerializedTestCase): @given( X=hu.tensor(elements=hu.floats(min_value=-0.7, max_value=0.7)), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_erf(self, X, gc, dc): op = core.CreateOperator('Erf', ["X"], ["Y"]) self.assertReferenceChecks(gc, op, [X], lambda x: (np.vectorize(math.erf)(X),)) diff --git a/caffe2/python/operator_test/expand_op_test.py b/caffe2/python/operator_test/expand_op_test.py index aba2c1106da..bd608f6fcc2 100644 --- a/caffe2/python/operator_test/expand_op_test.py +++ b/caffe2/python/operator_test/expand_op_test.py @@ -59,7 +59,7 @@ class TestExpandOp(serial.SerializedTestCase): np.ones([1, 4, 1, 2]), np.ones([4, 1, 2])]), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_expand_nonrand_shape2(self, X, gc, dc): self._run_expand_op_test(X, [4, 1, 2, 2], gc, dc) self._run_expand_op_test(X, [4, -1, 2, 2], gc, dc) diff --git a/caffe2/python/operator_test/fc_operator_test.py b/caffe2/python/operator_test/fc_operator_test.py index 1e8b5522053..bd203b7c84a 100644 --- a/caffe2/python/operator_test/fc_operator_test.py +++ b/caffe2/python/operator_test/fc_operator_test.py @@ -61,8 +61,8 @@ class TestFcOperator(serial.SerializedTestCase): op.arg.extend([a]) # Check against numpy reference - # ReferenceChecks is flaky on rocm with threshold of 1e-4 for fp16. Relaxing to 1e-3. - threshold = 1e-3 if (gc.device_type == caffe2_pb2.HIP and dtype == np.float16) else 1e-4 + # ReferenceChecks is flaky, Relaxing to 1e-3. + threshold = 1e-3 self.assertReferenceChecks( device_option=gc, op=op, diff --git a/caffe2/python/operator_test/filler_ops_test.py b/caffe2/python/operator_test/filler_ops_test.py index e080dde3eb5..442f5866cb0 100644 --- a/caffe2/python/operator_test/filler_ops_test.py +++ b/caffe2/python/operator_test/filler_ops_test.py @@ -22,7 +22,7 @@ def _fill_diagonal(shape, value): class TestFillerOperator(serial.SerializedTestCase): @given(**hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_shape_error(self, gc, dc): op = core.CreateOperator( 'GaussianFill', @@ -77,7 +77,7 @@ class TestFillerOperator(serial.SerializedTestCase): b=st.integers(min_value=0, max_value=100), **hu.gcs ) - @settings(deadline=1000) + @settings(deadline=10000) def test_uniform_int_fill_op_blob_input(self, shape, a, b, gc, dc): net = core.Net('test_net') diff --git a/caffe2/python/operator_test/flexible_top_k_test.py b/caffe2/python/operator_test/flexible_top_k_test.py index 3e0e5722b0c..0cccabb5f2e 100644 --- a/caffe2/python/operator_test/flexible_top_k_test.py +++ b/caffe2/python/operator_test/flexible_top_k_test.py @@ -40,7 +40,7 @@ class TestFlexibleTopK(serial.SerializedTestCase): return (values_ref, indices_ref) @given(X=hu.tensor(min_dim=2), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_flexible_top_k(self, X, gc, dc): X = X.astype(dtype=np.float32) k_shape = (int(X.size / X.shape[-1]), ) diff --git a/caffe2/python/operator_test/fused_nbit_rowwise_conversion_ops_test.py b/caffe2/python/operator_test/fused_nbit_rowwise_conversion_ops_test.py index b7cb5f68351..d2e794da065 100644 --- a/caffe2/python/operator_test/fused_nbit_rowwise_conversion_ops_test.py +++ b/caffe2/python/operator_test/fused_nbit_rowwise_conversion_ops_test.py @@ -205,7 +205,7 @@ def ErrorThresholdRow(X, bit_rate): class TestNBitFakeFused(hu.HypothesisTestCase): @given(bit_rate=st.sampled_from([2, 4])) - @settings(deadline=1000) + @settings(deadline=10000) def testNBit(self, bit_rate): # uncomment for debugging # np.random.seed(0) diff --git a/caffe2/python/operator_test/gather_ops_test.py b/caffe2/python/operator_test/gather_ops_test.py index fc23be13fda..b0d64506e4c 100644 --- a/caffe2/python/operator_test/gather_ops_test.py +++ b/caffe2/python/operator_test/gather_ops_test.py @@ -209,7 +209,7 @@ class TestGatherFused8BitRowwise(hu.HypothesisTestCase): cols_num=st.integers(1, 128), index_num=st.integers(0, 5000), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_batch_gather_ops(self, rows_num, cols_num, index_num, gc, dc): data = np.random.random((rows_num, cols_num)).astype(np.float32) ind = np.random.randint(rows_num, size=(index_num, )).astype('int32') diff --git a/caffe2/python/operator_test/gather_ranges_op_test.py b/caffe2/python/operator_test/gather_ranges_op_test.py index c0d73af3360..b6ec8823f4d 100644 --- a/caffe2/python/operator_test/gather_ranges_op_test.py +++ b/caffe2/python/operator_test/gather_ranges_op_test.py @@ -166,7 +166,7 @@ def gather_ranges_to_dense_with_key(data, ranges, key, lengths): class TestGatherRanges(serial.SerializedTestCase): @given(boarders_and_data=batched_boarders_and_data(), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_gather_ranges(self, boarders_and_data, gc, dc): boarders, data = boarders_and_data @@ -187,7 +187,7 @@ class TestGatherRanges(serial.SerializedTestCase): ) @given(tensor_splits=_tensor_splits(), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_gather_ranges_split(self, tensor_splits, gc, dc): data, ranges, lengths, _ = tensor_splits diff --git a/caffe2/python/operator_test/instance_norm_test.py b/caffe2/python/operator_test/instance_norm_test.py index efce9d7001f..d97385cbe21 100644 --- a/caffe2/python/operator_test/instance_norm_test.py +++ b/caffe2/python/operator_test/instance_norm_test.py @@ -60,7 +60,7 @@ class TestInstanceNorm(serial.SerializedTestCase): store_mean=st.booleans(), seed=st.integers(0, 1000), store_inv_stdev=st.booleans()) - @settings(deadline=1000) + @settings(deadline=10000) def test_instance_norm_gradients( self, gc, dc, N, C, H, W, order, store_mean, store_inv_stdev, epsilon, seed): diff --git a/caffe2/python/operator_test/layer_norm_op_test.py b/caffe2/python/operator_test/layer_norm_op_test.py index 67d7f14bd33..32a2511e3e8 100644 --- a/caffe2/python/operator_test/layer_norm_op_test.py +++ b/caffe2/python/operator_test/layer_norm_op_test.py @@ -322,7 +322,7 @@ class TestLayerNormOp(serial.SerializedTestCase): eps=st.floats(1e-5, 1e-3), elementwise_affine=st.booleans(), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_layer_norm_op_jit(self, X, eps, elementwise_affine, gc, dc): @torch.jit.script def jit_layer_norm( diff --git a/caffe2/python/operator_test/length_split_op_test.py b/caffe2/python/operator_test/length_split_op_test.py index 28d7134ac5e..3f20ff1f458 100644 --- a/caffe2/python/operator_test/length_split_op_test.py +++ b/caffe2/python/operator_test/length_split_op_test.py @@ -28,7 +28,7 @@ class TestLengthSplitOperator(serial.SerializedTestCase): return [np.array(output).astype(np.int32)] @given(**hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_length_split_edge(self, gc, dc): input_lengths = np.array([3, 4, 5]).astype(np.int32) n_split_ = np.array([5]).astype(np.int32) diff --git a/caffe2/python/operator_test/locally_connected_op_test.py b/caffe2/python/operator_test/locally_connected_op_test.py index 2adc253f4d8..445c3641573 100644 --- a/caffe2/python/operator_test/locally_connected_op_test.py +++ b/caffe2/python/operator_test/locally_connected_op_test.py @@ -103,7 +103,7 @@ class TestLocallyConnectedOp(serial.SerializedTestCase): op_name=st.sampled_from(["LC", "LC1D"]), use_bias=st.booleans(), **hu.gcs) - @settings(deadline=5000) + @settings(deadline=None) # Increased timeout from 1 second to 5 for ROCM def test_lc_1d(self, N, C, size, M, kernel, op_name, use_bias, gc, dc): if size < kernel: @@ -163,7 +163,7 @@ class TestLocallyConnectedOp(serial.SerializedTestCase): op_name=st.sampled_from(["LC", "LC3D"]), use_bias=st.booleans(), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=None) def test_lc_3d(self, N, C, T, H, W, M, kernel, op_name, use_bias, gc, dc): if T < kernel: kernel = T diff --git a/caffe2/python/operator_test/lpnorm_op_test.py b/caffe2/python/operator_test/lpnorm_op_test.py index 3a58cbe6d96..e7ab634d0e7 100644 --- a/caffe2/python/operator_test/lpnorm_op_test.py +++ b/caffe2/python/operator_test/lpnorm_op_test.py @@ -16,7 +16,7 @@ class LpnormTest(hu.HypothesisTestCase): max_dim=3, dtype=np.float32), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_Lp_Norm(self, inputs, gc, dc): X = inputs[0] # avoid kinks by moving away from 0 diff --git a/caffe2/python/operator_test/margin_ranking_criterion_op_test.py b/caffe2/python/operator_test/margin_ranking_criterion_op_test.py index e28dd1ce28f..a91de60a8c1 100644 --- a/caffe2/python/operator_test/margin_ranking_criterion_op_test.py +++ b/caffe2/python/operator_test/margin_ranking_criterion_op_test.py @@ -17,7 +17,7 @@ class TestMarginRankingCriterion(serial.SerializedTestCase): seed=st.integers(min_value=0, max_value=65535), margin=st.floats(min_value=-0.5, max_value=0.5), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_margin_ranking_criterion(self, N, seed, margin, gc, dc): np.random.seed(seed) X1 = np.random.randn(N).astype(np.float32) diff --git a/caffe2/python/operator_test/matmul_op_test.py b/caffe2/python/operator_test/matmul_op_test.py index 8b4001a574a..067eeabbe2d 100644 --- a/caffe2/python/operator_test/matmul_op_test.py +++ b/caffe2/python/operator_test/matmul_op_test.py @@ -60,7 +60,7 @@ class TestMatMul(serial.SerializedTestCase): trans_b=st.booleans(), **hu.gcs ) - @settings(deadline=1000) + @settings(deadline=10000) def test_matmul_axis( self, M, K, N, axis_a, axis_b, trans_a, trans_b, gc, dc ): diff --git a/caffe2/python/operator_test/one_hot_ops_test.py b/caffe2/python/operator_test/one_hot_ops_test.py index 593d5b5aa58..e23e04434ab 100644 --- a/caffe2/python/operator_test/one_hot_ops_test.py +++ b/caffe2/python/operator_test/one_hot_ops_test.py @@ -63,7 +63,7 @@ class TestOneHotOps(serial.SerializedTestCase): elements=st.integers(min_value=-5, max_value=5)), seed=st.integers(min_value=0, max_value=1000), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_batch_bucketized_one_hot(self, x, seed, gc, dc): np.random.seed(seed) d = x.shape[1] diff --git a/caffe2/python/operator_test/pooling_test.py b/caffe2/python/operator_test/pooling_test.py index 7ef98249bd7..2954face6b8 100644 --- a/caffe2/python/operator_test/pooling_test.py +++ b/caffe2/python/operator_test/pooling_test.py @@ -90,7 +90,7 @@ class TestPooling(hu.HypothesisTestCase): op_type=st.sampled_from(["MaxPool", "AveragePool", "MaxPool1D", "AveragePool1D"]), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_pooling_1d(self, stride, pad, kernel, size, input_channels, batch_size, order, op_type, gc, dc): assume(pad < kernel) diff --git a/caffe2/python/operator_test/python_op_test.py b/caffe2/python/operator_test/python_op_test.py index b071070151d..8f41815585d 100644 --- a/caffe2/python/operator_test/python_op_test.py +++ b/caffe2/python/operator_test/python_op_test.py @@ -14,7 +14,7 @@ class PythonOpTest(hu.HypothesisTestCase): @given(x=hu.tensor(), n=st.integers(min_value=1, max_value=20), w=st.integers(min_value=1, max_value=20)) - @settings(deadline=1000) + @settings(deadline=10000) def test_simple_python_op(self, x, n, w): def g(input_, output): output[...] = input_ diff --git a/caffe2/python/operator_test/reduce_ops_test.py b/caffe2/python/operator_test/reduce_ops_test.py index 7b79b3b81ae..299b373e509 100644 --- a/caffe2/python/operator_test/reduce_ops_test.py +++ b/caffe2/python/operator_test/reduce_ops_test.py @@ -96,7 +96,7 @@ class TestReduceOps(serial.SerializedTestCase): @given(n=st.integers(1, 3), m=st.integers(1, 3), k=st.integers(1, 3), keepdims=st.booleans(), num_axes=st.integers(1, 3), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_reduce_l1(self, n, m, k, keepdims, num_axes, gc, dc): X = np.arange(n * m * k, dtype=np.float32) - 0.5 np.random.shuffle(X) @@ -253,7 +253,7 @@ class TestReduceFrontReductions(serial.SerializedTestCase): np.testing.assert_allclose(output, ref_sum(X)[0], atol=1e-3) @given(**hu.gcs) - @settings(deadline=1000) + @settings(deadline=None) def test_reduce_front_sum_with_length(self, dc, gc): num_reduce_dim = 1 X = np.random.rand(2, 3, 4, 5).astype(np.float32) @@ -286,7 +286,7 @@ class TestReduceFrontReductions(serial.SerializedTestCase): "ReduceFrontMeanGradient", X, ref_mean, num_reduce_dim) @given(**hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_reduce_front_mean_with_length(self, dc, gc): num_reduce_dim = 1 X = np.random.rand(2, 3, 4, 5).astype(np.float32) @@ -411,7 +411,7 @@ class TestReduceFrontReductions(serial.SerializedTestCase): "ReduceBackMeanGradient", X, ref_mean, num_reduce_dim) @given(**hu.gcs) - @settings(deadline=1000) + @settings(deadline=None) def test_reduce_back_mean_with_length(self, dc, gc): num_reduce_dim = 1 X = np.random.rand(2, 3, 4, 5).astype(np.float32) diff --git a/caffe2/python/operator_test/selu_op_test.py b/caffe2/python/operator_test/selu_op_test.py index 4dd2fa1848b..73cb0736dce 100644 --- a/caffe2/python/operator_test/selu_op_test.py +++ b/caffe2/python/operator_test/selu_op_test.py @@ -33,7 +33,7 @@ class TestSelu(serial.SerializedTestCase): @given(X=hu.tensor(), engine=st.sampled_from(["", "CUDNN"]), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_selu_2(self, X, gc, dc, engine): alpha = 1.6732 scale = 1.0507 @@ -50,7 +50,7 @@ class TestSelu(serial.SerializedTestCase): @given(X=hu.tensor(), engine=st.sampled_from(["", "CUDNN"]), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_selu_3(self, X, gc, dc, engine): alpha = 1.3 scale = 1.1 diff --git a/caffe2/python/operator_test/sequence_ops_test.py b/caffe2/python/operator_test/sequence_ops_test.py index 65c0669abfb..524d3c8b414 100644 --- a/caffe2/python/operator_test/sequence_ops_test.py +++ b/caffe2/python/operator_test/sequence_ops_test.py @@ -106,7 +106,7 @@ class TestSequenceOps(serial.SerializedTestCase): args=_gen_test_add_padding(with_pad_data=True), ret_lengths=st.booleans(), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_add_padding( self, start_pad_width, end_pad_width, args, ret_lengths, gc, dc ): @@ -278,7 +278,7 @@ class TestSequenceOps(serial.SerializedTestCase): min_size=0, max_size=10), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_find_duplicate_elements(self, elements, gc, dc): mapping = { 0: "a", diff --git a/caffe2/python/operator_test/sinusoid_position_encoding_op_test.py b/caffe2/python/operator_test/sinusoid_position_encoding_op_test.py index 6e8cae62dbf..03b50bfc952 100644 --- a/caffe2/python/operator_test/sinusoid_position_encoding_op_test.py +++ b/caffe2/python/operator_test/sinusoid_position_encoding_op_test.py @@ -33,7 +33,7 @@ class TestSinusoidPositionEncodingOp(serial.SerializedTestCase): amplitude=st.floats(MIN_TEST_AMPLITUDE, MAX_TEST_AMPLITUDE), **hu.gcs_cpu_only ) - @settings(deadline=1000) + @settings(deadline=10000) def test_sinusoid_embedding( self, positions_vec, embedding_size, batch_size, alpha, amplitude, gc, dc ): diff --git a/caffe2/python/operator_test/softmax_ops_test.py b/caffe2/python/operator_test/softmax_ops_test.py index 533d575ee59..8ec92ae1af9 100644 --- a/caffe2/python/operator_test/softmax_ops_test.py +++ b/caffe2/python/operator_test/softmax_ops_test.py @@ -143,7 +143,7 @@ class TestSoftmaxOps(serial.SerializedTestCase): @given(n=st.integers(2, 10), D=st.integers(4, 16), only_loss=st.booleans(), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_softmax_with_loss(self, n, D, gc, only_loss, dc): # n = number of examples, D = |labels| # Initialize X and add 1e-2 for numerical stability @@ -301,7 +301,7 @@ class TestSoftmaxOps(serial.SerializedTestCase): ) @given(n=st.integers(2, 10), D=st.integers(4, 16), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=None) def test_softmax_with_loss_label_prob(self, n, D, gc, dc): # n = number of examples, D = |labels| # Initialize X and add 1e-2 for numerical stability @@ -358,7 +358,7 @@ class TestSoftmaxOps(serial.SerializedTestCase): D=st.integers(4, 16), only_loss=st.booleans(), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=None) def test_softmax_with_loss_weighted(self, n, D, only_loss, gc, dc): # n = number of examples, D = |labels| # Initialize X and add 1e-2 for numerical stability diff --git a/caffe2/python/operator_test/softplus_op_test.py b/caffe2/python/operator_test/softplus_op_test.py index dd183b774f9..f8ca1817176 100644 --- a/caffe2/python/operator_test/softplus_op_test.py +++ b/caffe2/python/operator_test/softplus_op_test.py @@ -14,7 +14,7 @@ class TestSoftplus(hu.HypothesisTestCase): @given(X=hu.tensor(), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_softplus(self, X, gc, dc): op = core.CreateOperator("Softplus", ["X"], ["Y"]) self.assertDeviceChecks(dc, op, [X], [0]) diff --git a/caffe2/python/operator_test/sparse_to_dense_mask_op_test.py b/caffe2/python/operator_test/sparse_to_dense_mask_op_test.py index 41ec8808bb6..267babf2145 100644 --- a/caffe2/python/operator_test/sparse_to_dense_mask_op_test.py +++ b/caffe2/python/operator_test/sparse_to_dense_mask_op_test.py @@ -14,7 +14,7 @@ class TestFcOperator(hu.HypothesisTestCase): @given(n=st.integers(1, 10), k=st.integers(1, 5), use_length=st.booleans(), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_sparse_to_dense_mask(self, n, k, use_length, gc, dc): lengths = np.random.randint(k, size=n).astype(np.int32) + 1 N = sum(lengths) @@ -47,7 +47,7 @@ class TestFcOperator(hu.HypothesisTestCase): @given(n=st.integers(1, 10), k=st.integers(1, 5), use_length=st.booleans(), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_sparse_to_dense_mask_with_int64(self, n, k, use_length, gc, dc): lengths = np.random.randint(k, size=n).astype(np.int32) + 1 N = sum(lengths) diff --git a/caffe2/python/operator_test/string_ops_test.py b/caffe2/python/operator_test/string_ops_test.py index a0c56a68666..aa706ad73d7 100644 --- a/caffe2/python/operator_test/string_ops_test.py +++ b/caffe2/python/operator_test/string_ops_test.py @@ -20,7 +20,7 @@ def _string_lists(alphabet=None): class TestStringOps(serial.SerializedTestCase): @given(strings=_string_lists()) - @settings(deadline=1000) + @settings(deadline=10000) def test_string_prefix(self, strings): length = 3 # although we are utf-8 encoding below to avoid python exceptions, @@ -48,7 +48,7 @@ class TestStringOps(serial.SerializedTestCase): string_prefix_ref) @given(strings=_string_lists()) - @settings(deadline=1000) + @settings(deadline=10000) def test_string_suffix(self, strings): length = 3 strings = np.array( @@ -72,7 +72,7 @@ class TestStringOps(serial.SerializedTestCase): string_suffix_ref) @given(strings=st.text(alphabet=['a', 'b'])) - @settings(deadline=1000) + @settings(deadline=10000) def test_string_starts_with(self, strings): prefix = 'a' strings = np.array( @@ -96,7 +96,7 @@ class TestStringOps(serial.SerializedTestCase): string_starts_with_ref) @given(strings=st.text(alphabet=['a', 'b'])) - @settings(deadline=1000) + @settings(deadline=10000) def test_string_ends_with(self, strings): suffix = 'a' strings = np.array( @@ -120,7 +120,7 @@ class TestStringOps(serial.SerializedTestCase): string_ends_with_ref) @given(strings=st.text(alphabet=['a', 'b'])) - @settings(deadline=1000) + @settings(deadline=10000) def test_string_equals(self, strings): text = "" if strings: diff --git a/caffe2/python/operator_test/top_k_test.py b/caffe2/python/operator_test/top_k_test.py index fa628456c3a..035b1fb3d09 100644 --- a/caffe2/python/operator_test/top_k_test.py +++ b/caffe2/python/operator_test/top_k_test.py @@ -140,7 +140,7 @@ class TestTopK(serial.SerializedTestCase): @given(bs=st.integers(1, 3), n=st.integers(100, 10000), flatten_indices=st.booleans(), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_top_k_4(self, bs, n, flatten_indices, gc, dc): k = np.random.randint(n // 3, 3 * n // 4) X = np.random.rand(bs, n).astype(dtype=np.float32) @@ -177,7 +177,7 @@ class TestTopK(serial.SerializedTestCase): @given(bs=st.integers(1, 3), n=st.integers(1, 5000), flatten_indices=st.booleans(), **hu.gcs) - @settings(deadline=1000) + @settings(deadline=10000) def test_top_k_6(self, bs, n, flatten_indices, gc, dc): k = n X = np.random.rand(bs, n).astype(dtype=np.float32) diff --git a/caffe2/python/operator_test/torch_integration_test.py b/caffe2/python/operator_test/torch_integration_test.py index e568f8bdff7..f99a61688de 100644 --- a/caffe2/python/operator_test/torch_integration_test.py +++ b/caffe2/python/operator_test/torch_integration_test.py @@ -991,7 +991,7 @@ class TorchIntegration(hu.HypothesisTestCase): np.testing.assert_array_almost_equal(ref_outputs[i], outputs[i].numpy()) @given(lengths_0=st.integers(1, 10), lengths_1=st.integers(1, 10)) - @settings(deadline=1000) + @settings(deadline=10000) def test_merge_id_lists(self, lengths_0, lengths_1): def _merge_id_lists(lengths, values): ref_op = core.CreateOperator( diff --git a/caffe2/python/operator_test/utility_ops_test.py b/caffe2/python/operator_test/utility_ops_test.py index aeefbf596af..187328f9e48 100644 --- a/caffe2/python/operator_test/utility_ops_test.py +++ b/caffe2/python/operator_test/utility_ops_test.py @@ -332,7 +332,7 @@ class TestUtilityOps(serial.SerializedTestCase): ) ), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_lengths_gather(self, inputs, gc, dc): items = inputs[0] lengths = inputs[1] @@ -359,7 +359,7 @@ class TestUtilityOps(serial.SerializedTestCase): @given( inputs=hu.lengths_tensor(), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_lengths_to_ranges(self, inputs, gc, dc): _, lengths = inputs diff --git a/caffe2/python/operator_test/weighted_sum_test.py b/caffe2/python/operator_test/weighted_sum_test.py index 2c7dffe9267..fbbe2a6bf6d 100644 --- a/caffe2/python/operator_test/weighted_sum_test.py +++ b/caffe2/python/operator_test/weighted_sum_test.py @@ -61,7 +61,7 @@ class TestWeightedSumOp(serial.SerializedTestCase): @given(n=st.integers(1, 8), m=st.integers(1, 10), d=st.integers(1, 4), grad_on_w=st.booleans(), seed=st.integers(min_value=0, max_value=65535), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_weighted_sum_grad( self, n, m, d, grad_on_w, seed, gc, dc): input_names = [] diff --git a/caffe2/python/operator_test/wngrad_test.py b/caffe2/python/operator_test/wngrad_test.py index 48fe0f94731..0a1f0405e92 100644 --- a/caffe2/python/operator_test/wngrad_test.py +++ b/caffe2/python/operator_test/wngrad_test.py @@ -113,7 +113,7 @@ class TestWngrad(serial.SerializedTestCase): epsilon=st.floats(min_value=0.01, max_value=0.99, allow_nan=False, allow_infinity=False), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_wngrad_dense_output_effective_lr(self, inputs, seq_b, lr, epsilon, gc, dc): param, grad = inputs @@ -142,7 +142,7 @@ class TestWngrad(serial.SerializedTestCase): epsilon=st.floats(min_value=0.01, max_value=0.99, allow_nan=False, allow_infinity=False), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_wngrad_dense_output_effective_lr_and_update( self, inputs, seq_b, lr, epsilon, gc, dc): param, grad = inputs @@ -165,7 +165,7 @@ class TestWngrad(serial.SerializedTestCase): # Suppress filter_too_much health check. # Likely caused by `assume` call falling through too often. - @settings(suppress_health_check=[HealthCheck.filter_too_much], deadline=1000) + @settings(suppress_health_check=[HealthCheck.filter_too_much], deadline=10000) @given(inputs=hu.tensors(n=2), seq_b=st.floats(min_value=0.01, max_value=0.99, allow_nan=False, allow_infinity=False), @@ -186,7 +186,7 @@ class TestWngrad(serial.SerializedTestCase): epsilon=st.floats(min_value=0.01, max_value=0.99, allow_nan=False, allow_infinity=False), **hu.gcs_cpu_only) - @settings(deadline=1000) + @settings(deadline=10000) def test_sparse_wngrad_empty(self, inputs, seq_b, lr, epsilon, gc, dc): param = inputs[0] seq_b = np.array([seq_b, ], dtype=np.float32) diff --git a/caffe2/utils/threadpool/pthreadpool-cpp.cc b/caffe2/utils/threadpool/pthreadpool-cpp.cc index d18206c99ce..6737935d13a 100644 --- a/caffe2/utils/threadpool/pthreadpool-cpp.cc +++ b/caffe2/utils/threadpool/pthreadpool-cpp.cc @@ -45,8 +45,17 @@ void PThreadPool::set_thread_count(const size_t thread_count) { void PThreadPool::run( const std::function& fn, const size_t range) { + // Run on same thread if _NoPThreadPoolGuard guard is enabled + if (caffe2::_NoPThreadPoolGuard::is_enabled()) { + for (size_t i = 0; i < range; ++i) { + fn(i); + } + return; + } + std::lock_guard lock{mutex_}; + TORCH_INTERNAL_ASSERT(!caffe2::_NoPThreadPoolGuard::is_enabled(), "Inside a threadpool guard!"); TORCH_INTERNAL_ASSERT(threadpool_.get(), "Invalid threadpool!"); struct Context final { diff --git a/cmake/Dependencies.cmake b/cmake/Dependencies.cmake index c7fe9b7d4bd..6d9c3ac3ab9 100644 --- a/cmake/Dependencies.cmake +++ b/cmake/Dependencies.cmake @@ -999,24 +999,20 @@ if(BUILD_PYTHON) endif() # ---[ pybind11 -if(NOT pybind11_PREFER_third_party) +if(USE_SYSTEM_BIND11) find_package(pybind11 CONFIG) if(NOT pybind11_FOUND) find_package(pybind11) endif() -endif() - -if(pybind11_FOUND) - message(STATUS "System pybind11 found") + if(NOT pybind11_FOUND) + message(FATAL "Cannot find system pybind11") + endif() else() message(STATUS "Using third_party/pybind11.") set(pybind11_INCLUDE_DIRS ${CMAKE_CURRENT_LIST_DIR}/../third_party/pybind11/include) install(DIRECTORY ${pybind11_INCLUDE_DIRS} DESTINATION ${CMAKE_INSTALL_PREFIX} FILES_MATCHING PATTERN "*.h") - set(pybind11_PREFER_third_party ON CACHE BOOL - "Use the third_party/pybind11 submodule, instead of looking for system - installation of pybind11") endif() message(STATUS "pybind11 include dirs: " "${pybind11_INCLUDE_DIRS}") include_directories(SYSTEM ${pybind11_INCLUDE_DIRS}) diff --git a/docs/cpp/source/notes/inference_mode.rst b/docs/cpp/source/notes/inference_mode.rst index 2ceb2dcdb76..efb1b9de2d1 100644 --- a/docs/cpp/source/notes/inference_mode.rst +++ b/docs/cpp/source/notes/inference_mode.rst @@ -30,8 +30,6 @@ Inside an ``InferenceMode`` block, we make the following performance guarantees: - Inplace operations on inference tensors are guaranteed not to do a version bump. For more implementation details of ``InferenceMode`` please see the `RFC-0011-InferenceMode `_. -Currently this guard is only available in C++ frontend, adding python frontend support -is tracked in #56608. Migration guide from ``AutoNonVariableTypeMode`` ------------------------------------------------ diff --git a/docs/source/autograd.rst b/docs/source/autograd.rst index 5bc588b0fa8..56680803670 100644 --- a/docs/source/autograd.rst +++ b/docs/source/autograd.rst @@ -50,6 +50,10 @@ you can use it as ``functional.jacobian(lambda x: f(x, constant, flag=flag), inp Locally disabling gradient computation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +See :ref:`locally-disable-grad-doc` for more information on the differences +between no-grad and inference mode as well as other related mechanisms that +may be confused with the two. + .. autosummary:: :toctree: generated :nosignatures: diff --git a/docs/source/nn.functional.rst b/docs/source/nn.functional.rst index be2f5536e7a..0e8dcbef16c 100644 --- a/docs/source/nn.functional.rst +++ b/docs/source/nn.functional.rst @@ -89,6 +89,7 @@ Non-linear activation functions sigmoid hardsigmoid silu + mish batch_norm group_norm instance_norm diff --git a/docs/source/nn.rst b/docs/source/nn.rst index f2a1f95daac..1859ae5202f 100644 --- a/docs/source/nn.rst +++ b/docs/source/nn.rst @@ -145,6 +145,7 @@ Non-linear Activations (weighted sum, nonlinearity) nn.GELU nn.Sigmoid nn.SiLU + nn.Mish nn.Softplus nn.Softshrink nn.Softsign diff --git a/docs/source/notes/autograd.rst b/docs/source/notes/autograd.rst index c15a0d0340a..6d0e0e83d3d 100644 --- a/docs/source/notes/autograd.rst +++ b/docs/source/notes/autograd.rst @@ -8,56 +8,6 @@ operations. It's not strictly necessary to understand all this, but we recommend getting familiar with it, as it will help you write more efficient, cleaner programs, and can aid you in debugging. -.. _excluding-subgraphs: - -Excluding subgraphs from backward ---------------------------------- - -Every Tensor has a flag: :attr:`requires_grad` that allows for fine grained -exclusion of subgraphs from gradient computation and can increase efficiency. - -.. _excluding-requires_grad: - -``requires_grad`` -^^^^^^^^^^^^^^^^^ - -If there's a single input to an operation that requires gradient, its output -will also require gradient. Conversely, only if all inputs don't require -gradient, the output also won't require it. Backward computation is never -performed in the subgraphs, where all Tensors didn't require gradients. - -.. code:: - - >>> x = torch.randn(5, 5) # requires_grad=False by default - >>> y = torch.randn(5, 5) # requires_grad=False by default - >>> z = torch.randn((5, 5), requires_grad=True) - >>> a = x + y - >>> a.requires_grad - False - >>> b = a + z - >>> b.requires_grad - True - -This is especially useful when you want to freeze part of your model, or you -know in advance that you're not going to use gradients w.r.t. some parameters. -For example if you want to finetune a pretrained CNN, it's enough to switch the -:attr:`requires_grad` flags in the frozen base, and no intermediate buffers will -be saved, until the computation gets to the last layer, where the affine -transform will use weights that require gradient, and the output of the network -will also require them. - -.. code:: - - model = torchvision.models.resnet18(pretrained=True) - for param in model.parameters(): - param.requires_grad = False - # Replace the last fully-connected layer - # Parameters of newly constructed modules have requires_grad=True by default - model.fc = nn.Linear(512, 100) - - # Optimize only the classifier - optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9) - .. _how-autograd-encodes-history: How autograd encodes the history @@ -86,6 +36,157 @@ flow statements, that can change the overall shape and size of the graph at every iteration. You don't have to encode all possible paths before you launch the training - what you run is what you differentiate. +.. _locally-disable-grad-doc: + +Locally disabling gradient computation +-------------------------------------- + +There are several mechanisms available from Python to locally disable gradient +computation: + +To disable gradients across entire blocks of code, there are context managers +like no-grad mode and inference mode. +For more fine-grained exclusion of subgraphs from gradient computation, +there is setting the ``requires_grad`` field of a tensor. + +Below, in addition to discussing the mechanisms above, we also describe +evaluation mode (:meth:`nn.Module.eval()`), a method that is not actually used +to disable gradient computation but, because of its name, is often mixed up with the three. + +Setting ``requires_grad`` +^^^^^^^^^^^^^^^^^^^^^^^^^ + +:attr:`requires_grad` is a flag that allows for fine-grained exclusion of +subgraphs from gradient computation. It takes effect in both the forward +and backward passes: + +During the forward pass, an operation is only recorded in the backward graph if +at least one of its input tensors require grad. +During the backward pass (``.backward()``), only leaf tensors with +``requires_grad=True`` will have gradients accumulated into their ``.grad`` +fields. + +It is important to note that even though every tensor has this flag, +*setting* it only makes sense for leaf tensors (tensors that do not have a +``grad_fn``, e.g., a ``nn.Module``'s parameters). +Non-leaf tensors (tensors that do have ``grad_fn``) are tensors that have a +backward graph associated with them. Thus their gradients will be needed +as an intermediary result to compute the gradient for a leaf tensor that +requires grad. From this definition, it is clear that all non-leaf tensors +will automatically have ``require_grad=True``. + +Setting ``requires_grad`` should be the main way you control which parts +of the model are part of the gradient computation, for example, if you need to +freeze parts of your pretrained model during model fine-tuning. + +To freeze parts of your model, simply apply ``.requires_grad_(False)`` to +the parameters that you don't want updated. And as described above, +since computations that use these parameters as inputs would not be recorded in +the forward pass, they won't have their ``.grad`` fields updated in the backward +pass because they won't be part of the backward graph in the first place, as +desired. + +Because this is such a common pattern, ``requires_grad`` can also be set at +the module level with :meth:`nn.Module.requires_grad_()`. +When applied to a module, ``.requires_grad_()`` takes effect on all +of the module's parameters (which have ``requires_grad=True`` by default). + +Grad Modes +^^^^^^^^^^ + +Apart from setting ``requires_grad`` there are also three possible modes +enableable from Python that can affect how computations in PyTorch are +processed by autograd internally: default mode (grad mode), no-grad mode, +and inference mode, all of which can be togglable via context managers and +decorators. + +Default Mode (Grad Mode) +^^^^^^^^^^^^^^^^^^^^^^^^ + +The "default mode" is actually the mode we are implicitly in when no other modes like +no-grad and inference mode are enabled. To be contrasted with +"no-grad mode" the default mode is also sometimes called "grad mode". + +The most important thing to know about the default mode is that it is the only +mode in which ``requires_grad`` takes effect. ``requires_grad`` is always overridden +to be ``False`` in both the two other modes. + +No-grad Mode +^^^^^^^^^^^^ + +Computations in no-grad mode behave as if none of the inputs require grad. +In other words, computations in no-grad mode are never recorded in the backward graph +even if there are inputs that have ``require_grad=True``. + +Enable no-grad mode when you need to perform operations that should not be +recorded by autograd, but you’d still like to use the outputs of these +computations in grad mode later. This context manager makes it convenient to +disable gradients for a block of code or function without +having to temporarily set tensors to have ``requires_grad=False``, and then +back to ``True``. + +For example, no-grad mode might be useful when writing an optimizer: when +performing the training update you’d like to update parameters +in-place without the update being recorded by autograd. +You also intend to use the updated parameters for computations in +grad mode in the next forward pass. + +The implementations in :ref:`nn-init-doc` also +rely on no-grad mode when initializing the parameters as to avoid +autograd tracking when updating the intialized parameters in-place. + +Inference Mode +^^^^^^^^^^^^^^ + +Inference mode is the extreme version of no-grad mode. Just like in no-grad +mode, computations in inference mode are not recorded in the backward graph, but +enabling inference mode will allow PyTorch to speed up your model even more. +This better runtime comes with a drawback: tensors created in inference mode +will not be able to be used in computations to be recorded by autograd after +exiting inference mode. + +Enable inference mode when you are performing computations that don’t need +to be recorded in the backward graph, AND you don’t plan on using the tensors +created in inference mode in any computation that is to be recorded by autograd later. + +It is recommended that you try out inference mode in the parts of your code +that do not require autograd tracking (e.g., data processing and model evaluation). +If it works out of the box +for your use case it’s a free performance win. If you run into errors after +enabling inference mode, check that you are not using tensors created in +inference mode in computations that are recorded by autograd after exiting inference +mode. If you cannot avoid such use in your case, you can always switch back +to no-grad mode. + +For details on inference mode please see +`Inference Mode `_. + +For implementation details of inference mode see +`RFC-0011-InferenceMode `_. + +Evaluation Mode (``nn.Module.eval()``) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Evaluation mode is not actually a mechanism to locally disable gradient computation. +It is included here anyway because it is sometimes confused to be such a mechanism. + +Functionally, ``module.eval()`` (or equivalently ``module.train()``) are completely +orthogonal to no-grad mode and inference mode. How ``model.eval()`` affects +your model depends entirely on the specific modules used in your model and +whether they define any training-mode specific behavior. + +You are responsible for calling ``model.eval()`` and ``model.train()`` if your +model relies on modules such as :class:`torch.nn.Dropout` and +:class:`torch.nn.BatchNorm2d` that may behave +differently depending on training mode, for example, to avoid updating your +BatchNorm running statistics on validation data. + +It is recommended that you always use ``model.train()`` when +training and ``model.eval()`` when evaluating your model (validation/testing) even +if you aren’t sure your model has training-mode specific behavior, because a +module you are using might be updated to behave differently in training and +eval modes. + In-place operations with autograd --------------------------------- diff --git a/docs/source/scripts/build_activation_images.py b/docs/source/scripts/build_activation_images.py index 7274d5c06c5..3f4032ae107 100644 --- a/docs/source/scripts/build_activation_images.py +++ b/docs/source/scripts/build_activation_images.py @@ -37,6 +37,7 @@ functions = [ 'RReLU', 'SELU', 'SiLU', + 'Mish', 'CELU', 'GELU', 'Sigmoid', diff --git a/test/backward_compatibility/check_backward_compatibility.py b/test/backward_compatibility/check_backward_compatibility.py index 8e03ad39791..9de94c51125 100644 --- a/test/backward_compatibility/check_backward_compatibility.py +++ b/test/backward_compatibility/check_backward_compatibility.py @@ -89,6 +89,7 @@ allow_list = [ ("aten::_amp_update_scale", datetime.date(2021, 6, 1)), ("aten::randperm", datetime.date(9999, 1, 1)), ("aten::linalg_vector_norm", datetime.date(2021, 5, 15)), + ("aten::repeat_interleave", datetime.date(2021, 5, 26)), ] def allow_listed(schema, allow_list): diff --git a/test/cpp/api/functional.cpp b/test/cpp/api/functional.cpp index f920e49cfb1..adb1d557902 100644 --- a/test/cpp/api/functional.cpp +++ b/test/cpp/api/functional.cpp @@ -1760,6 +1760,15 @@ TEST_F(FunctionalTest, Softsign) { ASSERT_TRUE(torch::allclose(y, y_exp)); } +// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) +TEST_F(FunctionalTest, Mish) { + auto x = torch::randn(100) * 10; + auto y_exp = x * x.exp().log1p().tanh(); + auto y = F::mish(x); + + ASSERT_TRUE(torch::allclose(y, y_exp)); +} + // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) TEST_F(FunctionalTest, Tanhshrink) { auto x = torch::randn(100) * 10; diff --git a/test/cpp/api/modules.cpp b/test/cpp/api/modules.cpp index f2c945fa800..4b22a383437 100644 --- a/test/cpp/api/modules.cpp +++ b/test/cpp/api/modules.cpp @@ -2958,6 +2958,16 @@ TEST_F(ModulesTest, GELU) { ASSERT_TRUE(torch::allclose(y, y_exp)); } +// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) +TEST_F(ModulesTest, Mish) { + Mish model; + auto x = torch::randn(100) * 10; + auto y_exp = x * x.exp().log1p().tanh(); + auto y = model(x); + + ASSERT_TRUE(torch::allclose(y, y_exp)); +} + // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) TEST_F(ModulesTest, Sigmoid) { Sigmoid model; diff --git a/test/cpp/jit/test_backend.cpp b/test/cpp/jit/test_backend.cpp index b85994b3ee3..bf4b48d3e23 100644 --- a/test/cpp/jit/test_backend.cpp +++ b/test/cpp/jit/test_backend.cpp @@ -190,7 +190,7 @@ TEST(BackendTestDebugInfo, TestCompiler) { lm._save_for_mobile(ss, ExtraFilesMap(), true); auto mlm = _load_for_mobile(ss); std::string error_pattern = R"( - Module hierarchy:top(backend_with_compiler_demoLoweredModule) + Module hierarchy:top(backend_with_compiler_demoLoweredModule).aten::add Traceback of TorchScript (most recent call last): File "", line 5, in FunctionName_UNKNOWN typed_inputs: List[Any] = [x, h, ] @@ -244,7 +244,7 @@ TEST(BackendTestDebugInfo, TestExceptionStackForCompilerWithModuleHierarchy) { lm._save_for_mobile(ss, ExtraFilesMap(), true); auto mlm = _load_for_mobile(ss); std::string error_pattern = R"( - Module hierarchy:top(backend_with_compiler_demoLoweredModule).A0(A) + Module hierarchy:top(backend_with_compiler_demoLoweredModule).A0(A).aten::add Traceback of TorchScript (most recent call last): File "", line 5, in FunctionName_UNKNOWN typed_inputs: List[Any] = [x, y, ] @@ -259,7 +259,7 @@ Traceback of TorchScript (most recent call last): return self.A0.forward(x, y) + self.B0.forward(x) ~~~~~~~~~~~~~~~ <--- HERE - File "", line 3, in FunctionName_UNKNOWN + File "", line 3, in forward def forward(self, x, y): return x + y @@ -337,7 +337,7 @@ TEST( * */ std::string error_pattern = R"( - Module hierarchy:top(backend_with_compiler_demoLoweredModule).B0(B).A0(A) + Module hierarchy:top(backend_with_compiler_demoLoweredModule).B0(B).A0(A).aten::add Traceback of TorchScript (most recent call last): File "", line 5, in FunctionName_UNKNOWN typed_inputs: List[Any] = [x, y, ] @@ -352,13 +352,13 @@ Traceback of TorchScript (most recent call last): return self.B0.forward(x, y) + 3 ~~~~~~~~~~~~~~~ <--- HERE - File "", line 3, in FunctionName_UNKNOWN + File "", line 3, in forward def forward(self, x, y): return self.A0.forward(x, y) + 2 ~~~~~~~~~~~~~~~ <--- HERE - File "", line 3, in FunctionName_UNKNOWN + File "", line 3, in forward def forward(self, x, y): return x + y @@ -424,7 +424,7 @@ TEST(BackendTestDebugInfo, TestExceptionStackForCompilerWithLoweredSubModule) { c._save_for_mobile(ss, ExtraFilesMap(), true); auto c_loaded = _load_for_mobile(ss); std::string error_pattern = R"( - Module hierarchy:top(C).A0(backend_with_compiler_demoLoweredModule) + Module hierarchy:top(C).A0(backend_with_compiler_demoLoweredModule).aten::add Traceback of TorchScript (most recent call last): File "", line 3, in FunctionName_UNKNOWN @@ -432,7 +432,7 @@ Traceback of TorchScript (most recent call last): return self.A0.forward(x, y) + self.B0.forward(x) ~~~~~~~~~~~~~~~ <--- HERE - File "", line 5, in FunctionName_UNKNOWN + File "", line 5, in forward typed_inputs: List[Any] = [x, y, ] if self.__backend.is_available() : _0, = self.__backend.execute(self.__handles["forward"], typed_inputs) @@ -545,7 +545,7 @@ TEST( * * */ std::string error_pattern = R"( - Module hierarchy:top(C).A0(backend_with_compiler_demoLoweredModule).AA0(AA) + Module hierarchy:top(C).A0(backend_with_compiler_demoLoweredModule).AA0(AA).aten::add Traceback of TorchScript (most recent call last): File "", line 3, in FunctionName_UNKNOWN @@ -553,7 +553,7 @@ Traceback of TorchScript (most recent call last): return self.A0.forward(x, y) + self.B0.forward(x) ~~~~~~~~~~~~~~~ <--- HERE - File "", line 5, in FunctionName_UNKNOWN + File "", line 5, in forward typed_inputs: List[Any] = [x, y, ] if self.__backend.is_available() : _0, = self.__backend.execute(self.__handles["forward"], typed_inputs) @@ -566,7 +566,7 @@ Traceback of TorchScript (most recent call last): return self.AA0.forward(x, y) + 3 ~~~~~~~~~~~~~~~~ <--- HERE - File "", line 3, in FunctionName_UNKNOWN + File "", line 3, in forward def forward(self, x, y): return x + y diff --git a/test/cpp/jit/test_cs_debug_info_serialization.cpp b/test/cpp/jit/test_cs_debug_info_serialization.cpp index f5b816cbacf..c34f0da1b63 100644 --- a/test/cpp/jit/test_cs_debug_info_serialization.cpp +++ b/test/cpp/jit/test_cs_debug_info_serialization.cpp @@ -25,38 +25,57 @@ namespace jit { namespace { bool validate_debug_info( - const DebugInfoPair& pre_serialize, - const DebugInfoPair& post_serialize) { - auto sr1 = pre_serialize.first; - auto sr2 = post_serialize.first; + const DebugInfoTuple& pre_serialize, + const DebugInfoTuple& post_serialize) { + auto sr1 = std::get(pre_serialize); + auto sr2 = std::get(post_serialize); if (sr1 != sr2) { return false; } - if (!pre_serialize.second.defined()) { - return !post_serialize.second.defined(); + auto csptr1 = std::get(pre_serialize); + auto csptr2 = std::get(post_serialize); + if (!csptr1.defined()) { + return !csptr2.defined(); } - if (!post_serialize.second.defined()) { + if (!csptr2.defined()) { return false; } - auto vec1 = pre_serialize.second->vec(); - auto vec2 = post_serialize.second->vec(); + auto vec1 = csptr1->vec(); + auto vec2 = csptr2->vec(); if (vec1.size() != vec2.size()) { return false; } - for (size_t i = 0; i < vec1.size(); i++) { - auto rhs_sr = std::get<1>(vec1[i]); - auto lhs_sr = std::get<1>(vec2[i]); - auto rhs_module = std::get<2>(vec1[i]); - auto lhs_module = std::get<2>(vec2[i]); + while (csptr1) { + auto rhs_sr = csptr1->source_range(); + auto lhs_sr = csptr2->source_range(); + auto rhs_module = csptr1->module_instance(); + auto lhs_module = csptr2->module_instance(); + std::string rhs_fn_name, lhs_fn_name; + if (csptr1->function()) { + rhs_fn_name = csptr1->function()->name(); + } else { + rhs_fn_name = csptr1->function_name(); + } + if (csptr2->function()) { + lhs_fn_name = csptr2->function()->name(); + } else { + lhs_fn_name = csptr2->function_name(); + } if (!((rhs_module.has_value() == lhs_module.has_value()) && (rhs_module.has_value() && (rhs_module.value().class_type()->name().value() == lhs_module.value().class_type()->name().value()) && (rhs_module.value().instance_name() == lhs_module.value().instance_name())) && - (rhs_sr == lhs_sr))) { + (rhs_fn_name == lhs_fn_name) && (rhs_sr == lhs_sr))) { return false; } + if (csptr1->callee()) { + csptr1 = csptr1->callee().value(); + csptr2 = csptr2->callee().value(); + } else { + csptr1 = c10::intrusive_ptr(); + } } return true; } diff --git a/test/cpp/jit/test_lite_interpreter.cpp b/test/cpp/jit/test_lite_interpreter.cpp index ece646f6ede..fe019a67512 100644 --- a/test/cpp/jit/test_lite_interpreter.cpp +++ b/test/cpp/jit/test_lite_interpreter.cpp @@ -496,8 +496,7 @@ TEST(LiteInterpreterTest, ModuleInfoBasic) { } } - std::unordered_set expected_result({"top(M)"}); - AT_ASSERT(module_debug_info_set == expected_result); + AT_ASSERT(module_debug_info_set.count("top(M).aten::mul")); } // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) @@ -559,8 +558,9 @@ TEST(LiteInterpreterTest, OneSubmoduleModuleInfo) { } } - std::set expected_result({"top(B)", "top(B).A0(A)"}); - AT_ASSERT(module_debug_info_set == expected_result); + AT_ASSERT(module_debug_info_set.count("top(B).aten::add")); + AT_ASSERT(module_debug_info_set.count("top(B).A0(A).aten::add")); + AT_ASSERT(module_debug_info_set.count("top(B).A0(A).aten::mul")); } // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) @@ -594,7 +594,6 @@ TEST(LiteInterpreterTest, TwoSubmodulesModuleInfo) { std::string module_info = bc.get_forward_method_debug_info(pc); if (!module_info.empty() && (module_info.find("debug_handle") == std::string::npos)) { - std::cout << "Module info:" << module_info << std::endl; module_debug_info_set.insert(module_info); } ++pc; @@ -603,9 +602,9 @@ TEST(LiteInterpreterTest, TwoSubmodulesModuleInfo) { } } - std::set expected_result( - {"top(C)", "top(C).A0(A)", "top(C).B0(B)"}); - AT_ASSERT(module_debug_info_set == expected_result); + AT_ASSERT(module_debug_info_set.count("top(C).aten::add")); + AT_ASSERT(module_debug_info_set.count("top(C).A0(A).aten::add")); + AT_ASSERT(module_debug_info_set.count("top(C).B0(B).aten::add")); } TEST(LiteInterpreterTest, GetRuntimeByteCodeVersion) { @@ -625,6 +624,34 @@ TEST(LiteInterpreterTest, GetByteCodeVersion) { } namespace { + +void compareModelOutput( + const std::vector& actual_result_list, + const std::vector& expect_result_list) { + AT_ASSERT(actual_result_list.size() == expect_result_list.size()); + AT_ASSERT(actual_result_list[0].toTensor().equal(expect_result_list[0])); + AT_ASSERT( + actual_result_list[1].toTensor().dim() == expect_result_list[1].dim()); + AT_ASSERT(actual_result_list[2].toTensor().equal(expect_result_list[2])); +} + +void runAndCheckTorchScriptModel( + std::stringstream& input_model_stream, + const std::vector& input_data, + const std::vector& expect_result_list, + const int64_t expect_version) { + auto actual_version = _get_model_bytecode_version(input_model_stream); + AT_ASSERT(actual_version == expect_version); + + // Load and run the backport model, then compare the result with expect + // result + Module m_mobile = load(input_model_stream); + + auto actual_result = m_mobile.forward(input_data); + std::vector actual_result_list = actual_result.toTuple()->elements(); + compareModelOutput(actual_result_list, expect_result_list); +} + void runAndCheckBytecodeModel( std::stringstream& input_model_stream, const std::vector& input_data, @@ -635,16 +662,12 @@ void runAndCheckBytecodeModel( // Load and run the backport model, then compare the result with expect // result - mobile::Module m_mobile = _load_for_mobile(input_model_stream); + Module m_mobile = load(input_model_stream); auto actual_result = m_mobile.forward(input_data); std::vector actual_result_list = actual_result.toTuple()->elements(); - AT_ASSERT(actual_result_list.size() == expect_result_list.size()); - AT_ASSERT(actual_result_list[0].toTensor().equal(expect_result_list[0])); - AT_ASSERT( - actual_result_list[1].toTensor().dim() == expect_result_list[1].dim()); - AT_ASSERT(actual_result_list[2].toTensor().equal(expect_result_list[2])); + compareModelOutput(actual_result_list, expect_result_list); } void backportAllVersionCheck( @@ -659,29 +682,33 @@ void backportAllVersionCheck( constexpr int64_t minimum_to_version = 4; int64_t current_to_version = from_version - 1; - std::ostringstream oss; // Verify all candidate to_version work as expected. All backport to version // larger than minimum_to_version should success. while (current_to_version >= minimum_to_version) { - oss.clear(); + // Do not declare std::stringstream oss outside of the while loop as + // oss.clear() doesn't reset the stream content, only clears out error state + // flag in stringstream causing a problematic stream. Instead, it's cleaner + // and safer to just declare a new std::stringstream one and swap them. + std::stringstream oss; bool backPortSuccess = _backport_for_mobile(test_model_file_stream, oss, current_to_version); AT_ASSERT(backPortSuccess); // Check backport model version - std::stringstream iss(oss.str()); - auto backport_version = _get_model_bytecode_version(iss); + auto backport_version = _get_model_bytecode_version(oss); AT_ASSERT(backport_version == current_to_version); // Load and run the backport model, then compare the result with expect // result runAndCheckBytecodeModel( - iss, input_data, expect_result_list, current_to_version); + oss, input_data, expect_result_list, current_to_version); + runAndCheckTorchScriptModel( + oss, input_data, expect_result_list, current_to_version); current_to_version--; } // backport to minimum version - 1 should fail - oss.clear(); + std::stringstream oss; bool backPortSuccess = _backport_for_mobile(test_model_file_stream, oss, minimum_to_version - 1); AT_ASSERT(!backPortSuccess); @@ -790,9 +817,9 @@ TEST(LiteInterpreterTest, SequentialModuleInfo) { // def forward(self, x): // return self.A0.forward(self.B0.forward(x)) - std::set expected_result( - {"top(C)", "top(C).A0(A)", "top(C).B0(B)"}); - AT_ASSERT(module_debug_info_set == expected_result); + AT_ASSERT(module_debug_info_set.count("top(C).prim::Return")); + AT_ASSERT(module_debug_info_set.count("top(C).A0(A).aten::add")); + AT_ASSERT(module_debug_info_set.count("top(C).B0(B).aten::add")); } // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) @@ -838,9 +865,9 @@ TEST(LiteInterpreterTest, HierarchyModuleInfo) { // "top(C).forward": for the add operator in top. // "top(C).B0(B).forward": for the add operator in B0. // "top(C).B0(B).forward.A0(A).forward": for the add operator in A0. - std::set expected_result( - {"top(C)", "top(C).B0(B)", "top(C).B0(B).A0(A)"}); - AT_ASSERT(module_debug_info_set == expected_result); + AT_ASSERT(module_debug_info_set.count("top(C).aten::add")); + AT_ASSERT(module_debug_info_set.count("top(C).B0(B).aten::add")); + AT_ASSERT(module_debug_info_set.count("top(C).B0(B).A0(A).aten::add")); } // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) @@ -898,9 +925,9 @@ TEST(LiteInterpreterTest, DuplicatedClassTypeModuleInfo) { // "top(B).A0(A).forward": for the add operator in A0. // "top(B).A1(A).forward": for the add operator in A1. - std::set expected_result( - {"top(B)", "top(B).A0(A)", "top(B).A1(A)"}); - AT_ASSERT(module_debug_info_set == expected_result); + AT_ASSERT(module_debug_info_set.count("top(B).aten::add")); + AT_ASSERT(module_debug_info_set.count("top(B).A0(A).aten::add")); + AT_ASSERT(module_debug_info_set.count("top(B).A1(A).aten::add")); } // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) @@ -1286,6 +1313,57 @@ TEST(LiteInterpreterTest, DefaultArgsPinvSpecifyDefault) { testLiteModuleCompareResultTensors(m, inputs); } +// NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) +TEST(LiteInterpreterTest, TestExceptionStackWithTwoLevelModuleHierarchy) { + Module a("A"); + a.define(R"( + def bar(self, x, y): + return x + y + )"); + Module b("B"); + b.register_module("A0", a); + b.define(R"( + def foo(self, x, y): + return self.A0.bar(x, y) + 2 + )"); + Module c("C"); + c.register_module("B0", b); + c.define(R"( + def forward(self, x, y): + return self.B0.foo(x, y) + 3 + )"); + + std::vector inputs; + inputs.emplace_back(torch::rand({2, 4})); + inputs.emplace_back(torch::rand({13, 9})); + + std::stringstream ss; + c._save_for_mobile(ss, ExtraFilesMap(), true); + auto lite_m = _load_for_mobile(ss); + std::string error_pattern = R"( + Module hierarchy:top(C).B0(B).A0(A).aten::add +Traceback of TorchScript (most recent call last): + File "", line 3, in FunctionName_UNKNOWN + + def forward(self, x, y): + return self.B0.foo(x, y) + 3 + ~~~~~~~~~~~ <--- HERE + + File "", line 3, in foo + + def foo(self, x, y): + return self.A0.bar(x, y) + 2 + ~~~~~~~~~~~ <--- HERE + + File "", line 3, in bar + + def bar(self, x, y): + return x + y + ~~~~~ <--- HERE + )"; + ASSERT_THROWS_WITH_MESSAGE(lite_m.forward(inputs), error_pattern); +} + namespace { // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) static auto reg = diff --git a/test/cpp/lite_interpreter_runtime/delegated_submodule_with_debug_info.ptl b/test/cpp/lite_interpreter_runtime/delegated_submodule_with_debug_info.ptl index 06300d1136ff74942ed53719ccbdf24a2c6da353..901724d82225b87b78f97590d209235e207b45ee 100644 GIT binary patch delta 1444 zcmZ`(eK6Ds7~bEmwW!z)VXdg_vb(;&L#&UmUf;AtKDOQUxv||5DYqB-x_tKQxH?@( zC6U#tQ;{~Mqg*F(PMS*Z0UTiqI0;gNa`STY!kjP|bcF0B24`S_3%G(C zxWiWP08j7&Hh2So5BP!~_(K2$LJ$N)2y6omghCi>hj5619S{jo5DhWFg`L0yJ_sNd z;vgOpAQ6&42)iH|Qa}W||5eR52>^mZW6F$a<+5KZi`L=DI2OhMoz-?jY?lqQld>o* zM7-^juc-o-0y=&I`{hv}35Yl|6h%YHfJmD(p{o4R_!FcH!=kM{5~X=j>*w z6Y74$^R*}Q#sqgJ3S~d`16U8-Fa%^85@zi$V;RuqkNE;rDZc@);60wd#BSzN9Q}gpfvOzm=J3O zUiWvUUe>jZ;JWEM68kckPNYJGOXJl8#Ikhh`>z^AOKP;Rq4EqW+Ln-OzxP|?o~j2B zYV=n#ll;C^`1k#UUg50IVt#AqZn}8QlHkRKi;*|FQ=98l-vy2J&H36oW;OZS8)R#QLa4iLqfk2E&I~)vp*CZ?Wr4Q z8j?NN2h)oq$W!c>o?NwtUy=?wZ^dX7@zgr##bM;Sb58lQf!gAi)F{>1Rx`1C2{*gS z*ME2t4YvxBSlx=fx76R-(Neq3H0)T5RZ>Dp-&OSOorw=#^#O&n3A3d6y*8%7;gYPY zTM6yHqWtbBCR6D17j3uY=LgJ3=29DFRY$unGVY$Zft|AbvaR6l;)sd$pwSk|eA%8X z_nGhVrWLaj9tX?KWex3<=Qa`a%L!!f91b9^pI zND@lhtx5}W4CtxR-?+!1>B=SelF+k8bIGKOzH#Zz(~T*jGuJv~3p&sLXitxMr^XpR z$Xg;zTw}kMrs#0gcieoGE{S3%X7%vU7yEB)F%CH^yhuL3v~OyT|A8cP9+ zkIQp@CaIk4(lnvB6_qcXuBTN#lN>p%QsETT?;rBSQl^pJ{6NRAYp(b@FZjM=Eupb2 zbH6wKDT!X+o3fHNnv6Wm!eKCN@)%4R8OOvJEY}ki9n)1rHZA8(|FGW1&l-%WmxMLO zq6n2Hy{6ZL`_x#6^uv|@3oZIeKNB0Ly~07)Q7C8=l}Pzq!~yg9T%vp={!Mkkmg#+j dW4Z)1g=S9L@QD@x@`M$XVt}^LwDBLW{{W1`MhO4_ delta 1275 zcmZ9Mdr;H`5XbMnzY`FX&;UU|AciBrOh6ArB_64yegz--Mp8jU?GEvgCF4nsRyZ9a zzf1&01x4^dAQg8iAihxKAuovpyd+5R0fx^+Nzs`c_mAD3ozKkf%zk&Pf~%P0CzYm% z{!)htijm$!qhq4J@bL*$DOasts9f#nIVL<}(VEDxr3;neDcJB2_KsVq-6z}`rv znfLsN_<>c5R3@Ia3RQClm;~+wlfe{l7q}Zt1^0k^!F^yFm=5j-4}clqK`;}{0uONFdO_C%mH)3V_+Wm3-~Me8<-DjK^=G;JOQ2r3&29K2s{OP7lWt4GvHb99C#iq z0WW|T!BX%NSO%7Z6<{TJ84;_eZQinZ%WPk9qvZ&(da#9>RbVw(1J;6d;1#eQ{2gon z{{XLojbIbl3|<5E;Gf`ium$`JYz5oEzrh>eO|Tuj1>OelfOo+TuoJuo{sVS__rV9? zL+}y!7<>Z$3%WlAyTNDRbFc@L2$TwhKtXst>$k%~CK(!)XQkbeK_ZbsOf|9*{3IqX zHsmB`Sr37+B=4d1M5+3-nvzvDA*Ke=trukhL77&iejBZw6P~t*=^} zkhUO7bK+cRtaZzjSapH0VpeU&mU#IgQ~PT-K&*{VVDxEpW~@umA3_`5LbZ zam=h;jujS>-+P|RvWzrNGRW9uHO(_@zWp$>K`!k}<`3=n(YCGw*F9G1v;kA!ZJD@o zds(by{%B~vSsd@0)DpjF!vl5C34^AL?h%6r9zV4IYGQeCx=&1Sbc8y2tV`;^;Pfi1 zXp5%GxGp`kM#JJR>5a9<*>Sh@2HL*LxT|_i+T6D6hOuq+_f8504uQ5Rzs34d%EiOW z995ms!TD}C2e+Scs~vxL?DVqcSu>(y?R`JFG(qpa4ynn@mt{JBxztWqIeF0@w?iI< zreSH1&DZsB-k6_s-?nduY=O_Q?H_YS)`Cb@m=${|X72WMKF*Q_00#XX`$_ G)&BrG?Dk#& diff --git a/test/cpp/lite_interpreter_runtime/test_lite_interpreter_runtime.cpp b/test/cpp/lite_interpreter_runtime/test_lite_interpreter_runtime.cpp index e76e36b3ff9..2ccf6ee18d3 100644 --- a/test/cpp/lite_interpreter_runtime/test_lite_interpreter_runtime.cpp +++ b/test/cpp/lite_interpreter_runtime/test_lite_interpreter_runtime.cpp @@ -142,7 +142,7 @@ TEST(RunTimeTest, DelegateException) { inputs.emplace_back(torch::rand({13, 9})); std::string error_pattern = R"( - Module hierarchy:top(C).A0(backend_with_compiler_demoLoweredModule).AA0(AA) + Module hierarchy:top(C).A0(backend_with_compiler_demoLoweredModule).AA0(AA).aten::add Traceback of TorchScript (most recent call last): File "", line 3, in FunctionName_UNKNOWN @@ -150,7 +150,7 @@ Traceback of TorchScript (most recent call last): return self.A0.forward(x, y) + self.B0.forward(x) ~~~~~~~~~~~~~~~ <--- HERE - File "", line 5, in FunctionName_UNKNOWN + File "", line 5, in forward typed_inputs: List[Any] = [x, y, ] if self.__backend.is_available() : _0, = self.__backend.execute(self.__handles["forward"], typed_inputs) @@ -163,7 +163,7 @@ Traceback of TorchScript (most recent call last): return self.AA0.forward(x, y) + 3 ~~~~~~~~~~~~~~~~ <--- HERE - File "", line 3, in FunctionName_UNKNOWN + File "", line 3, in forward def forward(self, x, y): return x + y diff --git a/test/cpp/tensorexpr/test_approx.cpp b/test/cpp/tensorexpr/test_approx.cpp index 5a56771990f..6bd31e2ef04 100644 --- a/test/cpp/tensorexpr/test_approx.cpp +++ b/test/cpp/tensorexpr/test_approx.cpp @@ -13,8 +13,8 @@ namespace te = torch::jit::tensorexpr; static void vectorize(te::LoopNest* ln, te::Tensor* target, int width) { auto loops = ln->getLoopStmtsFor(target); - te::For *outer, *inner, *tail; - ln->splitWithTail(loops[0], width, &outer, &inner, &tail); + te::For *inner, *tail; + ln->splitWithTail(loops[0], width, &inner, &tail); ln->vectorize(inner); } diff --git a/test/cpp/tensorexpr/test_boundsinference.cpp b/test/cpp/tensorexpr/test_boundsinference.cpp index b3bc26b51da..87fb244e0cb 100644 --- a/test/cpp/tensorexpr/test_boundsinference.cpp +++ b/test/cpp/tensorexpr/test_boundsinference.cpp @@ -217,14 +217,13 @@ TEST(BoundsInference, _5) { Compute("b", {{n, "i"}}, [&](const VarHandle& i) { return a.load(i); }); LoopNest l({b}); - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* outer; // NOLINTNEXTLINE(cppcoreguidelines-init-variables) For* inner; // NOLINTNEXTLINE(cppcoreguidelines-init-variables) For* tail; std::vector loops = l.getLoopStmtsFor(b); - l.splitWithTail(loops[0], 16, &outer, &inner, &tail); + l.splitWithTail(loops[0], 16, &inner, &tail); + For* outer = loops[0]; { // Verify inferred bounds for the outer loop @@ -729,11 +728,13 @@ TEST(BoundsInference, GetPotentialHazardsLoopSplit) { LoopNest l({A}); // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For *outer, *inner, *tail; + For *inner, *tail; // Splitting with tail by something offset creates a tail which also writes to // A. - l.splitWithTail(l.getLoopStmtsFor(A)[0], 5, &outer, &inner, &tail); + For* outer = l.getLoopStmtsFor(A)[0]; + // `outer` loop get transformed to the outer loop after splitting. + l.splitWithTail(outer, 5, &inner, &tail); using namespace analysis; diff --git a/test/cpp/tensorexpr/test_cuda.cpp b/test/cpp/tensorexpr/test_cuda.cpp index 71f6967da94..a12592939a0 100644 --- a/test/cpp/tensorexpr/test_cuda.cpp +++ b/test/cpp/tensorexpr/test_cuda.cpp @@ -6,14 +6,14 @@ #include -#include "test/cpp/tensorexpr/test_base.h" +#include +#include +#include +#include +#include +#include #include -#include "test/cpp/tensorexpr/padded_buffer.h" -#include "torch/csrc/jit/tensorexpr/cuda_codegen.h" -#include "torch/csrc/jit/tensorexpr/ir_simplifier.h" -#include "torch/csrc/jit/tensorexpr/loopnest.h" -#include "torch/csrc/jit/tensorexpr/tensor.h" #include @@ -172,11 +172,10 @@ static void testCudaTestVectorAdd02_impl(int N, int block_size) { }, [&](const VarHandle& n) { return a_buf.load(n) + b_buf.load(n); }); LoopNest l({c}); - For* n_outer; For* n_inner; std::vector loops = l.getLoopStmtsFor(c); - l.splitWithMask(loops[0], block_size, &n_outer, &n_inner); - l.setGPUBlockIndex(n_outer, 0); + l.splitWithMask(loops[0], block_size, &n_inner); + l.setGPUBlockIndex(loops[0], 0); l.setGPUThreadIndex(n_inner, 0); l.prepareForCodegen(); Stmt* stmt = l.root_stmt(); @@ -391,11 +390,10 @@ TEST(Cuda, DynamicShapeSplit_CUDA) { Tensor* b = Compute( "b", {{n, "n"}}, [&](const VarHandle& i) { return a.load(i) * 2.0f; }); LoopNest l({b}); - For* outer; For* inner; std::vector loops = l.getLoopStmtsFor(b); - l.splitWithMask(loops[0], 1024, &outer, &inner); - l.setGPUBlockIndex(outer, 0); + l.splitWithMask(loops[0], 1024, &inner); + l.setGPUBlockIndex(loops[0], 0); l.setGPUThreadIndex(inner, 0); Stmt* s = l.root_stmt(); CudaCodeGen cg(s, {a, b, n}); diff --git a/test/cpp/tensorexpr/test_llvm.cpp b/test/cpp/tensorexpr/test_llvm.cpp index 6c22b0310ef..06113640714 100644 --- a/test/cpp/tensorexpr/test_llvm.cpp +++ b/test/cpp/tensorexpr/test_llvm.cpp @@ -1721,16 +1721,12 @@ TEST(LLVM, VectorizedGEMM) { { auto const& loops = loop.getLoopStmtsFor(CT); For* m = loops[0]; - For* mo; - For* mi; - loop.splitWithMask(m, 16, &mo, &mi); + loop.splitWithMask(m, 16); } { auto const& loops = loop.getLoopStmtsFor(CT); For* n = loops[2]; - For* no; - For* ni; - loop.splitWithMask(n, 16, &no, &ni); + loop.splitWithMask(n, 16); } // mo, mi, no, ni, k -> // mo, no, mi, ni, k diff --git a/test/cpp/tensorexpr/test_loopnest.cpp b/test/cpp/tensorexpr/test_loopnest.cpp index 6522ed7d703..c0860bc0d47 100644 --- a/test/cpp/tensorexpr/test_loopnest.cpp +++ b/test/cpp/tensorexpr/test_loopnest.cpp @@ -36,16 +36,10 @@ TEST(LoopNest, ExprSimple01) { return ExprHandle(1.0f) + cast(x) * x + cast(y) * y; }); LoopNest l({tensor}); - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_outer; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_inner; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_tail; std::vector loops = l.getAllLoopNestsWritingToBuf(tensor->buf()).at(0); - l.splitWithTail(loops[0], 2, &x_outer, &x_inner, &x_tail); - l.splitWithTail(x_outer, 2); + l.splitWithTail(loops[0], 2); + l.splitWithTail(loops[0], 2); } // NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables) @@ -395,8 +389,6 @@ TEST(LoopNest, ExprSplitAndSlice) { Tensor* tensor = Compute("f", {{100, "x"}}, func); LoopNest l({tensor}); - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* outer; // NOLINTNEXTLINE(cppcoreguidelines-init-variables) For* inner; // NOLINTNEXTLINE(cppcoreguidelines-init-variables) @@ -405,9 +397,9 @@ TEST(LoopNest, ExprSplitAndSlice) { // outer: [0, 4) // inner: [0, 21) // tail: [84, 100) - l.splitWithTail(loops[0], 21, &outer, &inner, &tail); + l.splitWithTail(loops[0], 21, &inner, &tail); l.sliceTail(inner, 2); - l.sliceHead(outer, 2); + l.sliceHead(loops[0], 2); // for (int x_outer = 0; x_outer < 2; x_outer++) { // for (int x_inner = 0; x_inner < 19; x_inner++) { @@ -522,15 +514,11 @@ TEST(LoopNest, ExprSplitWithTail) { }; Tensor* tensor = Compute("f", {{199, "x"}}, func); LoopNest l({tensor}); - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_outer; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_inner; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_tail; std::vector loops = l.getAllLoopNestsWritingToBuf(tensor->buf()).at(0); - l.splitWithTail(loops[0], 17, &x_outer, &x_inner, &x_tail); - l.splitWithTail(x_outer, 7); + // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) + l.splitWithTail(loops[0], 17); + // NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers) + l.splitWithTail(loops[0], 7); Stmt* stmt = l.root_stmt(); Stmt* simplified = IRSimplifier::simplify(stmt); @@ -557,14 +545,8 @@ TEST(LoopNest, ExprSplitWithTailNone) { }; Tensor* tensor = Compute("f", {{24, "x"}, {5, "y"}}, func); LoopNest l({tensor}); - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_outer; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_inner; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_tail; std::vector loops = l.getAllLoopNestsWritingToBuf(tensor->buf()).at(0); - l.splitWithTail(loops[0], 4, &x_outer, &x_inner, &x_tail); + l.splitWithTail(loops[0], 4); Stmt* stmt = l.root_stmt(); std::ostringstream oss; @@ -663,10 +645,8 @@ TEST(LoopNest, ExprSplitWithMaskRepeatedNoMask) { LoopNest l({tensor}); std::vector loops = l.getAllLoopNestsWritingToBuf(tensor->buf()).at(0); - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For *outer, *mid, *inner; - l.splitWithMask(loops[0], 4, &outer, &inner); - l.splitWithMask(outer, 4); + l.splitWithMask(loops[0], 4); + l.splitWithMask(loops[0], 4); Stmt* stmt1 = IRSimplifier::simplify(l.root_stmt()); @@ -691,16 +671,16 @@ TEST(LoopNest, SplitWithTailWithLoopOptions) { return a_buf.load(m) + b_buf.load(m) + 1.0f; }); // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For *outer, *inner, *tail; + For *inner, *tail; LoopNest l({tensor}); auto loops = NodeFinder::find(l.root_stmt()); ASSERT_GT(loops.size(), 0); l.setGPUBlockIndex(loops[0], LoopOptions::IDX_Y); - l.splitWithTail(loops[0], 4, &outer, &inner, &tail); - ASSERT_NE(outer, nullptr); + l.splitWithTail(loops[0], 4, &inner, &tail); ASSERT_NE(inner, nullptr); ASSERT_NE(tail, nullptr); + For* outer = loops[0]; // Outer loop carries loop axis bindings. ASSERT_TRUE(outer->loop_options().is_gpu_block_index()); @@ -723,12 +703,13 @@ TEST(LoopNest, SplitWithMaskWithLoopOptions) { return a_buf.load(m) + b_buf.load(m) + 1.0f; }); // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For *outer, *inner; + For* inner; LoopNest l({tensor}); auto loops = NodeFinder::find(l.root_stmt()); l.setGPUBlockIndex(loops[0], LoopOptions::IDX_Y); - l.splitWithMask(loops[0], 4, &outer, &inner); + l.splitWithMask(loops[0], 4, &inner); + For* outer = loops[0]; // Outer loop carries loop axis bindings. ASSERT_TRUE(outer->loop_options().is_gpu_block_index()); @@ -1305,13 +1286,11 @@ TEST(LoopNest, ScheduleSplitTwiceThenInline) { return a->load(j + ExprHandle(8)); }); // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* i_outer; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) For* i_inner; LoopNest l({b}, {a, b}); std::vector loops = l.getAllLoopNestsWritingToBuf(a->buf()).at(0); - l.splitWithMask(loops[0], 4, &i_outer, &i_inner); + l.splitWithMask(loops[0], 4, &i_inner); l.splitWithMask(i_inner, 2); ASSERT_THROWS_WITH(l.computeInline(a->buf()), "compound indices"); } @@ -3165,15 +3144,13 @@ TEST(LoopNest, NormalizeAndSplitWithTail) { LoopNest::normalize(for_stmt); - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_outer; // NOLINTNEXTLINE(cppcoreguidelines-init-variables) For* x_inner; // NOLINTNEXTLINE(cppcoreguidelines-init-variables) For* x_tail; - l.splitWithTail(for_stmt, 10, &x_outer, &x_inner, &x_tail); + l.splitWithTail(for_stmt, 10, &x_inner, &x_tail); - auto x_outer_result = IRSimplifier::simplify(x_outer); + auto x_outer_result = IRSimplifier::simplify(for_stmt); std::ostringstream oss_outer; oss_outer << *x_outer_result; const std::string& expected_outer_ir = diff --git a/test/cpp/tensorexpr/test_memdependency.cpp b/test/cpp/tensorexpr/test_memdependency.cpp index 296212ac2f8..93177795051 100644 --- a/test/cpp/tensorexpr/test_memdependency.cpp +++ b/test/cpp/tensorexpr/test_memdependency.cpp @@ -2995,20 +2995,12 @@ TEST(MemDependency, MemDependencyCheckerComputeGEMM) { { auto const& loops = loop.getLoopStmtsFor(CT); For* m = loops[0]; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* mo; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* mi; - loop.splitWithMask(m, 4, &mo, &mi); + loop.splitWithMask(m, 4); } { auto const& loops = loop.getLoopStmtsFor(CT); For* n = loops[2]; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* no; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* ni; - loop.splitWithMask(n, 16, &no, &ni); + loop.splitWithMask(n, 16); } // mo, mi, no, ni, k -> // mo, no, mi, ni, k diff --git a/test/cpp/tensorexpr/test_reductions.cpp b/test/cpp/tensorexpr/test_reductions.cpp index 5d2c0f2a8a0..de28871bd0a 100644 --- a/test/cpp/tensorexpr/test_reductions.cpp +++ b/test/cpp/tensorexpr/test_reductions.cpp @@ -624,22 +624,9 @@ TEST(Reductions, SplitNonReduceAxis) { std::vector out(16, -1.f); Tensor* tensor = Reduce("sum", {{16, "m"}}, Sum(), in, {{8, "n"}}); LoopNest l({tensor}); - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_outer; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_inner; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_tail; std::vector loops = l.getLoopStmtsFor(tensor); - l.splitWithTail(loops[0], 2, &x_outer, &x_inner, &x_tail); - - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_2; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_1; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* x_tail_2; - l.splitWithTail(x_outer, 2, &x_2, &x_1, &x_tail_2); + l.splitWithTail(loops[0], 2); + l.splitWithTail(loops[0], 2); l.prepareForCodegen(); @@ -1133,8 +1120,8 @@ TEST(Reductions, ReduceOverSplitRfactor) { LoopNest loop({c}); std::vector loops = loop.getLoopStmtsFor(c); // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For *o, *i, *t; - loop.splitWithTail(loops[1], SPLIT_FACTOR, &o, &i, &t); + For *i, *t; + loop.splitWithTail(loops[1], SPLIT_FACTOR, &i, &t); loop.reorderAxis(loops[0], i); auto all_loops = loop.getAllLoopNestsWritingToBuf(c->buf()); @@ -1525,16 +1512,14 @@ TEST(Reductions, ReductionSplitCacheConsumerAccess) { LoopNest l({e}, {c, d, e}); - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* outer; // NOLINTNEXTLINE(cppcoreguidelines-init-variables) For* inner; // Split outer reduction axis. - l.splitWithMask(l.getLoopStmtsFor(d)[0], 4, &outer, &inner); + l.splitWithMask(l.getLoopStmtsFor(d)[0], 4, &inner); // Split reduction consumer. - l.splitWithMask(l.getLoopStmtsFor(e)[0], 4, &outer, &inner); + l.splitWithMask(l.getLoopStmtsFor(e)[0], 4, &inner); l.cacheAccesses(d->buf(), "sum_local", inner); l.prepareForCodegen(); @@ -1576,8 +1561,6 @@ TEST(Reductions, ReductionReorderCacheConsumerAccess) { LoopNest l({e}, {c, d, e}); - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* outer; // NOLINTNEXTLINE(cppcoreguidelines-init-variables) For* inner; @@ -1586,7 +1569,7 @@ TEST(Reductions, ReductionReorderCacheConsumerAccess) { l.reorderAxis(loops[0], loops[1]); // Split reduction consumer. - l.splitWithMask(l.getLoopStmtsFor(e)[0], 4, &outer, &inner); + l.splitWithMask(l.getLoopStmtsFor(e)[0], 4, &inner); l.cacheAccesses(d->buf(), "sum_local", inner); l.prepareForCodegen(); diff --git a/test/cpp/tensorexpr/tutorial.cpp b/test/cpp/tensorexpr/tutorial.cpp index a9d7b9a4f37..dcd9358b3f3 100644 --- a/test/cpp/tensorexpr/tutorial.cpp +++ b/test/cpp/tensorexpr/tutorial.cpp @@ -313,8 +313,6 @@ int main(int argc, char* argv[]) { // instance. std::vector loops = loopnest.getLoopStmtsFor(Y); // NOLINTNEXTLINE(cppcoreguidelines-init-variables) - For* j_outer; - // NOLINTNEXTLINE(cppcoreguidelines-init-variables) For* j_inner; // NOLINTNEXTLINE(cppcoreguidelines-init-variables) For* j_tail; @@ -322,9 +320,9 @@ int main(int argc, char* argv[]) { loopnest.splitWithTail( loops[1], // loops[0] is the outer loop, loops[1] is inner split_factor, - &j_outer, // These are handles that we would be using for &j_inner, // further transformations &j_tail); + // loops[1] will become the outer loop, j_outer, after splitWithTail. std::cout << *loopnest.root_stmt() << std::endl; // Prints: // { diff --git a/test/cpp_api_parity/parity-tracker.md b/test/cpp_api_parity/parity-tracker.md index 0a3f940a0f3..9252c7fa3ad 100644 --- a/test/cpp_api_parity/parity-tracker.md +++ b/test/cpp_api_parity/parity-tracker.md @@ -49,6 +49,7 @@ torch::nn::Hardshrink|Yes|No torch::nn::Hardtanh|Yes|No torch::nn::LeakyReLU|Yes|No torch::nn::LogSigmoid|Yes|No +torch::nn::Mish|Yes|No torch::nn::MultiheadAttention|No|No torch::nn::PReLU|Yes|No torch::nn::ReLU|Yes|No @@ -187,6 +188,7 @@ F::rrelu|Yes|No F::glu|Yes|No F::gelu|Yes|No F::silu|Yes|No +F::mish|Yes|No F::logsigmoid|Yes|No F::hardshrink|Yes|No F::tanhshrink|Yes|No diff --git a/test/distributed/test_c10d_gloo.py b/test/distributed/test_c10d_gloo.py index d1ed9838f69..d2f336f9352 100644 --- a/test/distributed/test_c10d_gloo.py +++ b/test/distributed/test_c10d_gloo.py @@ -29,6 +29,7 @@ from torch.testing._internal.common_distributed import ( simple_sparse_reduce_tests, skip_if_win32, create_device, + with_dist_debug_levels, ) from torch.testing._internal.common_utils import ( TestCase, @@ -217,42 +218,74 @@ class ProcessGroupGlooWrapperTest(AbstractProcessGroupWrapperTest): opts._threads = threads return opts - def _create_wrapper_pg(self, timeout=10.0): + def _create_wrapper_pg(self, with_new_group=False, timeout=10.0): store = c10d.FileStore(self.file_name, self.world_size) c10d.init_process_group( backend="gloo", rank=self.rank, world_size=self.world_size, store=store ) - _pg = c10d.ProcessGroupGloo(store, self.rank, self.world_size, self.opts(timeout=timeout)) - pg = c10d._create_process_group_wrapper( - _pg, - "unused", - store, - self.rank, - self.world_size, - timeout=timeout, - ) + if with_new_group: + pg = c10d.new_group(backend="gloo") + else: + _pg = c10d.ProcessGroupGloo(store, self.rank, self.world_size, self.opts(timeout=timeout)) + pg = c10d._create_process_group_wrapper( + _pg, + "unused", + store, + self.rank, + self.world_size, + timeout=timeout, + ) return pg def test_collective_hang(self): pg = self._create_wrapper_pg(timeout=2.0) self._test_collective_hang(pg) + # NOTE: these tests are separated by debug level instead of combined into + # one due to https://github.com/pytorch/pytorch/issues/55967, they can be + # combined after that is resolved. + @with_dist_debug_levels(levels=["DETAIL"]) + def test_collectives_op_mismatch_debug_mode(self): + pg = self._create_wrapper_pg(with_new_group=True) + self._test_collectives_op_mismatch(pg) + + @with_dist_debug_levels(levels=["OFF"]) def test_collectives_op_mismatch(self): - pg = self._create_wrapper_pg() + pg = self._create_wrapper_pg(with_new_group=False) self._test_collectives_op_mismatch(pg) + @with_dist_debug_levels(levels=["DETAIL"]) + def test_collective_shape_mismatch_debug_mode(self): + pg = self._create_wrapper_pg(with_new_group=True) + self._test_collective_shape_mismatch(pg) + + @with_dist_debug_levels(levels=["OFF"]) def test_collective_shape_mismatch(self): - pg = self._create_wrapper_pg() + pg = self._create_wrapper_pg(with_new_group=False) self._test_collective_shape_mismatch(pg) @skip_if_lt_x_gpu(4) + @with_dist_debug_levels(levels=["DETAIL"]) + def test_collectives_op_mismatch_cuda_debug_mode(self): + pg = self._create_wrapper_pg(with_new_group=True) + self._test_collectives_op_mismatch(pg, use_cuda=True) + + @skip_if_lt_x_gpu(4) + @with_dist_debug_levels(levels=["OFF"]) def test_collectives_op_mismatch_cuda(self): - pg = self._create_wrapper_pg() + pg = self._create_wrapper_pg(with_new_group=False) self._test_collectives_op_mismatch(pg, use_cuda=True) @skip_if_lt_x_gpu(4) + @with_dist_debug_levels(levels=["DETAIL"]) + def test_collective_shape_mismatch_cuda_debug_mode(self): + pg = self._create_wrapper_pg(with_new_group=True) + self._test_collective_shape_mismatch(pg, use_cuda=True) + + @skip_if_lt_x_gpu(4) + @with_dist_debug_levels(levels=["OFF"]) def test_collective_shape_mismatch_cuda(self): - pg = self._create_wrapper_pg() + pg = self._create_wrapper_pg(with_new_group=False) self._test_collective_shape_mismatch(pg, use_cuda=True) @requires_gloo() diff --git a/test/distributed/test_c10d_nccl.py b/test/distributed/test_c10d_nccl.py index d1633276f06..94119634112 100644 --- a/test/distributed/test_c10d_nccl.py +++ b/test/distributed/test_c10d_nccl.py @@ -180,7 +180,7 @@ class ProcessGroupNCCLWrapperTest(AbstractProcessGroupWrapperTest): def world_size(self) -> int: return 2 - def _create_wrapper_pg(self, timeout=10.0): + def _create_wrapper_pg(self, with_new_group=False, timeout=10.0): store = c10d.FileStore(self.file_name, self.world_size) c10d.init_process_group( backend="nccl", @@ -189,15 +189,18 @@ class ProcessGroupNCCLWrapperTest(AbstractProcessGroupWrapperTest): store=store, timeout=timedelta(seconds=timeout) ) - _pg = c10d.ProcessGroupNCCL(store, self.rank, self.world_size, timeout=timedelta(seconds=timeout)) - pg = c10d._create_process_group_wrapper( - _pg, - "unused", - store, - self.rank, - self.world_size, - timeout=timeout, - ) + if with_new_group: + pg = c10d.new_group(backend="nccl", timeout=timedelta(seconds=timeout)) + else: + _pg = c10d.ProcessGroupNCCL(store, self.rank, self.world_size, timeout=timedelta(seconds=timeout)) + pg = c10d._create_process_group_wrapper( + _pg, + "unused", + store, + self.rank, + self.world_size, + timeout=timeout, + ) return pg @requires_nccl() @@ -206,17 +209,36 @@ class ProcessGroupNCCLWrapperTest(AbstractProcessGroupWrapperTest): pg = self._create_wrapper_pg(timeout=2.0) self._test_collective_hang(pg) + # NOTE: these tests are separated by debug level instead of combined into + # one due to https://github.com/pytorch/pytorch/issues/55967, they can be + # combined after that is resolved. + @requires_nccl() + @skip_if_lt_x_gpu(2) + @with_dist_debug_levels(levels=["DETAIL"]) + def test_collectives_op_mismatch_debug_mode(self): + pg = self._create_wrapper_pg(with_new_group=True) + self._test_collectives_op_mismatch(pg, use_cuda=True) + @requires_nccl() @skip_if_lt_x_gpu(2) + @with_dist_debug_levels(levels=["OFF"]) def test_collectives_op_mismatch(self): - wrapper_pg = self._create_wrapper_pg() - self._test_collectives_op_mismatch(wrapper_pg, use_cuda=True) + pg = self._create_wrapper_pg(with_new_group=False) + self._test_collectives_op_mismatch(pg, use_cuda=True) + + @requires_nccl() + @skip_if_lt_x_gpu(2) + @with_dist_debug_levels(levels=["DETAIL"]) + def test_collective_shape_mismatch_debug_mode(self): + pg = self._create_wrapper_pg(with_new_group=True) + self._test_collective_shape_mismatch(pg, use_cuda=True) @requires_nccl() @skip_if_lt_x_gpu(2) + @with_dist_debug_levels(levels=["OFF"]) def test_collective_shape_mismatch(self): - wrapper_pg = self._create_wrapper_pg() - self._test_collective_shape_mismatch(wrapper_pg, use_cuda=True) + pg = self._create_wrapper_pg(with_new_group=False) + self._test_collective_shape_mismatch(pg, use_cuda=True) class ProcessGroupNCCLNoGPUTest(TestCase): @@ -1993,6 +2015,7 @@ class NcclErrorHandlingTest(MultiProcessTestCase): @requires_nccl_version(2400, "Need NCCL 2.4+ for error checking") @skip_if_lt_x_gpu(3) @skip_if_rocm + @unittest.skip("Frequently times out see https://github.com/pytorch/pytorch/issues/58920") def test_nccl_errors_blocking_abort(self): self._test_nccl_errors_blocking(lambda: os.abort()) diff --git a/test/expect/TestSparseCSRCPU.test_sparse_csr_print_cpu.expect b/test/expect/TestSparseCSRCPU.test_sparse_csr_print_cpu.expect index 3253e9e5616..a30958d09d9 100644 --- a/test/expect/TestSparseCSRCPU.test_sparse_csr_print_cpu.expect +++ b/test/expect/TestSparseCSRCPU.test_sparse_csr_print_cpu.expect @@ -5,64 +5,56 @@ # values_shape: torch.Size([10]) ########## torch.float32/torch.int32 ########## # sparse tensor -tensor(crow_indices=tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]), - col_indices=tensor([5, 1, 6, 5, 6, 4, 2, 5, 5, 9]), - values=tensor([ 0.5674, 0.1261, 0.5497, 0.6416, -0.4414, 0.3634, - -0.4327, 0.3135, -0.5225, 0.4626]), size=(10, 10), - nnz=10) +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), size=(2, 2), nnz=4, + layout=torch.sparse_csr) # _crow_indices -tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) +tensor([0, 2, 4], dtype=torch.int32) # _col_indices -tensor([5, 1, 6, 5, 6, 4, 2, 5, 5, 9]) +tensor([0, 1, 0, 1], dtype=torch.int32) # _values -tensor([ 0.5674, 0.1261, 0.5497, 0.6416, -0.4414, 0.3634, -0.4327, 0.3135, - -0.5225, 0.4626]) +tensor([1., 2., 3., 4.]) ########## torch.float64/torch.int32 ########## # sparse tensor -tensor(crow_indices=tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]), - col_indices=tensor([8, 2, 0, 4, 9, 2, 1, 9, 2, 2]), - values=tensor([ 0.3324, -0.3314, 0.5786, -0.3567, 0.0494, 0.3377, - 0.6872, -0.1470, 0.9123, -0.8460]), size=(10, 10), - nnz=10) +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), size=(2, 2), nnz=4, + dtype=torch.float64, layout=torch.sparse_csr) # _crow_indices -tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) +tensor([0, 2, 4], dtype=torch.int32) # _col_indices -tensor([8, 2, 0, 4, 9, 2, 1, 9, 2, 2]) +tensor([0, 1, 0, 1], dtype=torch.int32) # _values -tensor([ 0.3324, -0.3314, 0.5786, -0.3567, 0.0494, 0.3377, 0.6872, -0.1470, - 0.9123, -0.8460]) +tensor([1., 2., 3., 4.], dtype=torch.float64) ########## torch.float32/torch.int64 ########## # sparse tensor -tensor(crow_indices=tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]), - col_indices=tensor([1, 5, 2, 1, 7, 4, 3, 0, 7, 6]), - values=tensor([ 0.5056, 0.7977, 0.3677, 0.5317, 0.8298, -0.2015, - -0.7799, -0.4918, -0.1335, -0.1099]), size=(10, 10), - nnz=10) +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), size=(2, 2), nnz=4, + layout=torch.sparse_csr) # _crow_indices -tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) +tensor([0, 2, 4]) # _col_indices -tensor([1, 5, 2, 1, 7, 4, 3, 0, 7, 6]) +tensor([0, 1, 0, 1]) # _values -tensor([ 0.5056, 0.7977, 0.3677, 0.5317, 0.8298, -0.2015, -0.7799, -0.4918, - -0.1335, -0.1099]) +tensor([1., 2., 3., 4.]) ########## torch.float64/torch.int64 ########## # sparse tensor -tensor(crow_indices=tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]), - col_indices=tensor([6, 3, 4, 8, 5, 1, 5, 6, 4, 2]), - values=tensor([-0.2544, -0.2462, -0.9784, 0.8910, 0.5322, -0.4732, - -0.6239, 0.0348, 0.5698, -0.7176]), size=(10, 10), - nnz=10) +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), size=(2, 2), nnz=4, + dtype=torch.float64, layout=torch.sparse_csr) # _crow_indices -tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) +tensor([0, 2, 4]) # _col_indices -tensor([6, 3, 4, 8, 5, 1, 5, 6, 4, 2]) +tensor([0, 1, 0, 1]) # _values -tensor([-0.2544, -0.2462, -0.9784, 0.8910, 0.5322, -0.4732, -0.6239, 0.0348, - 0.5698, -0.7176]) +tensor([1., 2., 3., 4.], dtype=torch.float64) # shape: torch.Size([100, 10]) @@ -72,100 +64,56 @@ tensor([-0.2544, -0.2462, -0.9784, 0.8910, 0.5322, -0.4732, -0.6239, 0.0348, # values_shape: torch.Size([10]) ########## torch.float32/torch.int32 ########## # sparse tensor -tensor(crow_indices=tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), - col_indices=tensor([7, 4, 5, 2, 8, 8, 8, 8, 4, 4]), - values=tensor([ 0.0548, 0.2650, -0.8181, -0.5354, 0.4537, -0.7625, - -0.2098, 0.4398, 0.5190, 0.0622]), size=(100, 10), - nnz=10) +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), size=(2, 2), nnz=4, + layout=torch.sparse_csr) # _crow_indices -tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0]) +tensor([0, 2, 4], dtype=torch.int32) # _col_indices -tensor([7, 4, 5, 2, 8, 8, 8, 8, 4, 4]) +tensor([0, 1, 0, 1], dtype=torch.int32) # _values -tensor([ 0.0548, 0.2650, -0.8181, -0.5354, 0.4537, -0.7625, -0.2098, 0.4398, - 0.5190, 0.0622]) +tensor([1., 2., 3., 4.]) ########## torch.float64/torch.int32 ########## # sparse tensor -tensor(crow_indices=tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), - col_indices=tensor([8, 9, 2, 9, 3, 4, 9, 2, 6, 2]), - values=tensor([ 0.0069, -0.3837, -0.2516, -0.1406, 0.9457, 0.9479, - -0.0935, -0.3003, 0.4856, -0.0798]), size=(100, 10), - nnz=10) +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), size=(2, 2), nnz=4, + dtype=torch.float64, layout=torch.sparse_csr) # _crow_indices -tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0]) +tensor([0, 2, 4], dtype=torch.int32) # _col_indices -tensor([8, 9, 2, 9, 3, 4, 9, 2, 6, 2]) +tensor([0, 1, 0, 1], dtype=torch.int32) # _values -tensor([ 0.0069, -0.3837, -0.2516, -0.1406, 0.9457, 0.9479, -0.0935, -0.3003, - 0.4856, -0.0798]) +tensor([1., 2., 3., 4.], dtype=torch.float64) ########## torch.float32/torch.int64 ########## # sparse tensor -tensor(crow_indices=tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), - col_indices=tensor([1, 2, 3, 2, 1, 2, 7, 4, 7, 6]), - values=tensor([ 0.5833, 0.0894, 0.2440, -0.6665, -0.2136, 0.6597, - 0.4587, -0.2891, 0.1230, 0.7656]), size=(100, 10), - nnz=10) +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), size=(2, 2), nnz=4, + layout=torch.sparse_csr) # _crow_indices -tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0]) +tensor([0, 2, 4]) # _col_indices -tensor([1, 2, 3, 2, 1, 2, 7, 4, 7, 6]) +tensor([0, 1, 0, 1]) # _values -tensor([ 0.5833, 0.0894, 0.2440, -0.6665, -0.2136, 0.6597, 0.4587, -0.2891, - 0.1230, 0.7656]) +tensor([1., 2., 3., 4.]) ########## torch.float64/torch.int64 ########## # sparse tensor -tensor(crow_indices=tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), - col_indices=tensor([6, 1, 5, 9, 0, 8, 6, 1, 0, 9]), - values=tensor([-0.2178, 0.7886, 0.3778, 0.6779, -0.6440, 0.2883, - 0.1788, 0.1743, 0.9286, 0.5536]), size=(100, 10), - nnz=10) +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), size=(2, 2), nnz=4, + dtype=torch.float64, layout=torch.sparse_csr) # _crow_indices -tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0]) +tensor([0, 2, 4]) # _col_indices -tensor([6, 1, 5, 9, 0, 8, 6, 1, 0, 9]) +tensor([0, 1, 0, 1]) # _values -tensor([-0.2178, 0.7886, 0.3778, 0.6779, -0.6440, 0.2883, 0.1788, 0.1743, - 0.9286, 0.5536]) +tensor([1., 2., 3., 4.], dtype=torch.float64) # shape: torch.Size([1000, 10]) @@ -175,62 +123,54 @@ tensor([-0.2178, 0.7886, 0.3778, 0.6779, -0.6440, 0.2883, 0.1788, 0.1743, # values_shape: torch.Size([10]) ########## torch.float32/torch.int32 ########## # sparse tensor -tensor(crow_indices=tensor([0, 0, 0, ..., 0, 0, 0]), - col_indices=tensor([5, 4, 4, 0, 5, 6, 8, 0, 2, 8]), - values=tensor([-0.2851, -0.7618, 0.9845, 0.7515, 0.4756, 0.9898, - -0.5324, -0.5695, -0.5853, -0.0484]), size=(1000, 10), - nnz=10) +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), size=(2, 2), nnz=4, + layout=torch.sparse_csr) # _crow_indices -tensor([0, 0, 0, ..., 0, 0, 0]) +tensor([0, 2, 4], dtype=torch.int32) # _col_indices -tensor([5, 4, 4, 0, 5, 6, 8, 0, 2, 8]) +tensor([0, 1, 0, 1], dtype=torch.int32) # _values -tensor([-0.2851, -0.7618, 0.9845, 0.7515, 0.4756, 0.9898, -0.5324, -0.5695, - -0.5853, -0.0484]) +tensor([1., 2., 3., 4.]) ########## torch.float64/torch.int32 ########## # sparse tensor -tensor(crow_indices=tensor([0, 0, 0, ..., 0, 0, 0]), - col_indices=tensor([3, 6, 2, 3, 7, 8, 6, 7, 7, 2]), - values=tensor([ 0.3105, -0.6785, -0.1184, -0.2653, 0.4315, 0.6985, - 0.2432, -0.0908, -0.2561, 0.7840]), size=(1000, 10), - nnz=10) +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), size=(2, 2), nnz=4, + dtype=torch.float64, layout=torch.sparse_csr) # _crow_indices -tensor([0, 0, 0, ..., 0, 0, 0]) +tensor([0, 2, 4], dtype=torch.int32) # _col_indices -tensor([3, 6, 2, 3, 7, 8, 6, 7, 7, 2]) +tensor([0, 1, 0, 1], dtype=torch.int32) # _values -tensor([ 0.3105, -0.6785, -0.1184, -0.2653, 0.4315, 0.6985, 0.2432, -0.0908, - -0.2561, 0.7840]) +tensor([1., 2., 3., 4.], dtype=torch.float64) ########## torch.float32/torch.int64 ########## # sparse tensor -tensor(crow_indices=tensor([0, 0, 0, ..., 0, 0, 0]), - col_indices=tensor([2, 3, 1, 1, 0, 2, 5, 9, 3, 0]), - values=tensor([ 0.3443, -0.2613, 0.1793, 0.5857, -0.9265, -0.9102, - -0.5984, 0.1220, -0.1854, 0.2155]), size=(1000, 10), - nnz=10) +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), size=(2, 2), nnz=4, + layout=torch.sparse_csr) # _crow_indices -tensor([0, 0, 0, ..., 0, 0, 0]) +tensor([0, 2, 4]) # _col_indices -tensor([2, 3, 1, 1, 0, 2, 5, 9, 3, 0]) +tensor([0, 1, 0, 1]) # _values -tensor([ 0.3443, -0.2613, 0.1793, 0.5857, -0.9265, -0.9102, -0.5984, 0.1220, - -0.1854, 0.2155]) +tensor([1., 2., 3., 4.]) ########## torch.float64/torch.int64 ########## # sparse tensor -tensor(crow_indices=tensor([0, 0, 0, ..., 0, 0, 0]), - col_indices=tensor([3, 7, 7, 9, 7, 7, 6, 6, 9, 2]), - values=tensor([ 0.3393, -0.9329, -0.8195, 0.5085, 0.4854, -0.9112, - 0.7196, -0.1944, 0.7424, -0.5868]), size=(1000, 10), - nnz=10) +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), size=(2, 2), nnz=4, + dtype=torch.float64, layout=torch.sparse_csr) # _crow_indices -tensor([0, 0, 0, ..., 0, 0, 0]) +tensor([0, 2, 4]) # _col_indices -tensor([3, 7, 7, 9, 7, 7, 6, 6, 9, 2]) +tensor([0, 1, 0, 1]) # _values -tensor([ 0.3393, -0.9329, -0.8195, 0.5085, 0.4854, -0.9112, 0.7196, -0.1944, - 0.7424, -0.5868]) +tensor([1., 2., 3., 4.], dtype=torch.float64) diff --git a/test/expect/TestSparseCSRCUDA.test_sparse_csr_print_cuda.expect b/test/expect/TestSparseCSRCUDA.test_sparse_csr_print_cuda.expect new file mode 100644 index 00000000000..551092b4a56 --- /dev/null +++ b/test/expect/TestSparseCSRCUDA.test_sparse_csr_print_cuda.expect @@ -0,0 +1,176 @@ +# shape: torch.Size([10, 10]) +# nnz: 10 +# crow_indices shape: torch.Size([11]) +# col_indices shape: torch.Size([10]) +# values_shape: torch.Size([10]) +########## torch.float32/torch.int32 ########## +# sparse tensor +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), device='cuda:0', size=(2, 2), nnz=4, + layout=torch.sparse_csr) +# _crow_indices +tensor([0, 2, 4], device='cuda:0', dtype=torch.int32) +# _col_indices +tensor([0, 1, 0, 1], device='cuda:0', dtype=torch.int32) +# _values +tensor([1., 2., 3., 4.], device='cuda:0') + +########## torch.float64/torch.int32 ########## +# sparse tensor +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), device='cuda:0', size=(2, 2), nnz=4, + dtype=torch.float64, layout=torch.sparse_csr) +# _crow_indices +tensor([0, 2, 4], device='cuda:0', dtype=torch.int32) +# _col_indices +tensor([0, 1, 0, 1], device='cuda:0', dtype=torch.int32) +# _values +tensor([1., 2., 3., 4.], device='cuda:0', dtype=torch.float64) + + +########## torch.float32/torch.int64 ########## +# sparse tensor +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), device='cuda:0', size=(2, 2), nnz=4, + layout=torch.sparse_csr) +# _crow_indices +tensor([0, 2, 4], device='cuda:0') +# _col_indices +tensor([0, 1, 0, 1], device='cuda:0') +# _values +tensor([1., 2., 3., 4.], device='cuda:0') + +########## torch.float64/torch.int64 ########## +# sparse tensor +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), device='cuda:0', size=(2, 2), nnz=4, + dtype=torch.float64, layout=torch.sparse_csr) +# _crow_indices +tensor([0, 2, 4], device='cuda:0') +# _col_indices +tensor([0, 1, 0, 1], device='cuda:0') +# _values +tensor([1., 2., 3., 4.], device='cuda:0', dtype=torch.float64) + + +# shape: torch.Size([100, 10]) +# nnz: 10 +# crow_indices shape: torch.Size([101]) +# col_indices shape: torch.Size([10]) +# values_shape: torch.Size([10]) +########## torch.float32/torch.int32 ########## +# sparse tensor +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), device='cuda:0', size=(2, 2), nnz=4, + layout=torch.sparse_csr) +# _crow_indices +tensor([0, 2, 4], device='cuda:0', dtype=torch.int32) +# _col_indices +tensor([0, 1, 0, 1], device='cuda:0', dtype=torch.int32) +# _values +tensor([1., 2., 3., 4.], device='cuda:0') + +########## torch.float64/torch.int32 ########## +# sparse tensor +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), device='cuda:0', size=(2, 2), nnz=4, + dtype=torch.float64, layout=torch.sparse_csr) +# _crow_indices +tensor([0, 2, 4], device='cuda:0', dtype=torch.int32) +# _col_indices +tensor([0, 1, 0, 1], device='cuda:0', dtype=torch.int32) +# _values +tensor([1., 2., 3., 4.], device='cuda:0', dtype=torch.float64) + + +########## torch.float32/torch.int64 ########## +# sparse tensor +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), device='cuda:0', size=(2, 2), nnz=4, + layout=torch.sparse_csr) +# _crow_indices +tensor([0, 2, 4], device='cuda:0') +# _col_indices +tensor([0, 1, 0, 1], device='cuda:0') +# _values +tensor([1., 2., 3., 4.], device='cuda:0') + +########## torch.float64/torch.int64 ########## +# sparse tensor +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), device='cuda:0', size=(2, 2), nnz=4, + dtype=torch.float64, layout=torch.sparse_csr) +# _crow_indices +tensor([0, 2, 4], device='cuda:0') +# _col_indices +tensor([0, 1, 0, 1], device='cuda:0') +# _values +tensor([1., 2., 3., 4.], device='cuda:0', dtype=torch.float64) + + +# shape: torch.Size([1000, 10]) +# nnz: 10 +# crow_indices shape: torch.Size([1001]) +# col_indices shape: torch.Size([10]) +# values_shape: torch.Size([10]) +########## torch.float32/torch.int32 ########## +# sparse tensor +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), device='cuda:0', size=(2, 2), nnz=4, + layout=torch.sparse_csr) +# _crow_indices +tensor([0, 2, 4], device='cuda:0', dtype=torch.int32) +# _col_indices +tensor([0, 1, 0, 1], device='cuda:0', dtype=torch.int32) +# _values +tensor([1., 2., 3., 4.], device='cuda:0') + +########## torch.float64/torch.int32 ########## +# sparse tensor +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), device='cuda:0', size=(2, 2), nnz=4, + dtype=torch.float64, layout=torch.sparse_csr) +# _crow_indices +tensor([0, 2, 4], device='cuda:0', dtype=torch.int32) +# _col_indices +tensor([0, 1, 0, 1], device='cuda:0', dtype=torch.int32) +# _values +tensor([1., 2., 3., 4.], device='cuda:0', dtype=torch.float64) + + +########## torch.float32/torch.int64 ########## +# sparse tensor +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), device='cuda:0', size=(2, 2), nnz=4, + layout=torch.sparse_csr) +# _crow_indices +tensor([0, 2, 4], device='cuda:0') +# _col_indices +tensor([0, 1, 0, 1], device='cuda:0') +# _values +tensor([1., 2., 3., 4.], device='cuda:0') + +########## torch.float64/torch.int64 ########## +# sparse tensor +tensor(crow_indices=tensor([0, 2, 4]), + col_indices=tensor([0, 1, 0, 1]), + values=tensor([1., 2., 3., 4.]), device='cuda:0', size=(2, 2), nnz=4, + dtype=torch.float64, layout=torch.sparse_csr) +# _crow_indices +tensor([0, 2, 4], device='cuda:0') +# _col_indices +tensor([0, 1, 0, 1], device='cuda:0') +# _values +tensor([1., 2., 3., 4.], device='cuda:0', dtype=torch.float64) + diff --git a/test/jit/test_graph_rewrite_passes.py b/test/jit/test_graph_rewrite_passes.py new file mode 100644 index 00000000000..d4da1e58e32 --- /dev/null +++ b/test/jit/test_graph_rewrite_passes.py @@ -0,0 +1,59 @@ +from torch.testing._internal.jit_utils import JitTestCase +import torch +import torch._C +from torch.testing import FileCheck + + +class TestGraphRewritePasses(JitTestCase): + def test_fuse_linear(self): + class FunctionalLinear(torch.nn.Module): + def __init__(self, weight, bias): + super(FunctionalLinear, self).__init__() + self.weight = weight + self.bias = bias + + def forward(self, x): + res = torch.matmul(x, self.weight.t()) + if self.bias is not None: + res.add_(self.bias) + return res + + x1 = torch.rand(3) + w1 = torch.rand(5, 3) + b1 = torch.rand(5) + for has_bias in [True, False]: + bias = b1 if has_bias else None + model = torch.jit.trace(FunctionalLinear(w1, bias), [x1]) + for node in model.graph.nodes(): + if node.kind() == "aten::matmul": + source_range_1 = node.sourceRange() + torch._C._jit_pass_fuse_linear(model.graph) + for node in model.graph.nodes(): + if node.kind() == "aten::linear": + source_range_2 = node.sourceRange() + FileCheck().check("aten::linear").run(model.graph) + check_not = ["aten::matmul", "aten::addmm", "aten::add_", "aten::t("] + for cn in check_not: + FileCheck().check_not(cn).run(model.graph) + self.assertTrue(source_range_1 == source_range_2) + # make sure it runs + model(x1) + + # check matmuls are not fused + class Matmul(torch.nn.Module): + def __init__(self, weight): + super(Matmul, self).__init__() + self.weight = weight + + def forward(self, x): + return torch.matmul(x, self.weight) + + x = torch.rand(5, 6, 5) + w = torch.rand(5, 5, 100) + model = torch.jit.trace(Matmul(w), [x]) + torch._C._jit_pass_fuse_linear(model.graph) + # check 3d matmul is not fused + FileCheck().check("aten::matmul").run(model.graph) + FileCheck().check_not("aten::linear").run(model.graph) + # make sure it runs + model(x) diff --git a/test/jit/test_script_profile.py b/test/jit/test_script_profile.py new file mode 100644 index 00000000000..d7d1b2f092c --- /dev/null +++ b/test/jit/test_script_profile.py @@ -0,0 +1,109 @@ +import os +import sys + +import torch +from torch import nn + +# Make the helper files in test/ importable +pytorch_test_dir = os.path.dirname(os.path.dirname(os.path.realpath(__file__))) +sys.path.append(pytorch_test_dir) +from torch.testing._internal.jit_utils import JitTestCase + +if __name__ == '__main__': + raise RuntimeError("This test file is not meant to be run directly, use:\n\n" + "\tpython test/test_jit.py TESTNAME\n\n" + "instead.") + +class Sequence(nn.Module): + def __init__(self): + super(Sequence, self).__init__() + self.lstm1 = nn.LSTMCell(1, 51) + self.lstm2 = nn.LSTMCell(51, 51) + self.linear = nn.Linear(51, 1) + + def forward(self, input): + outputs = [] + h_t = torch.zeros(input.size(0), 51) + c_t = torch.zeros(input.size(0), 51) + h_t2 = torch.zeros(input.size(0), 51) + c_t2 = torch.zeros(input.size(0), 51) + + for input_t in input.split(1, dim=1): + h_t, c_t = self.lstm1(input_t, (h_t, c_t)) + h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2)) + output = self.linear(h_t2) + outputs += [output] + outputs = torch.cat(outputs, dim=1) + return outputs + +class TestScriptProfile(JitTestCase): + + def test_basic(self): + seq = torch.jit.script(Sequence()) + p = torch.jit._ScriptProfile() + p.enable() + seq(torch.rand((10, 100))) + p.disable() + self.assertNotEqual(p.dump_string(), "") + + def test_script(self): + seq = Sequence() + + @torch.jit.script + def fn(): + p = torch.jit._ScriptProfile() + p.enable() + _ = seq(torch.rand((10, 100))) + p.disable() + return p + + self.assertNotEqual(fn().dump_string(), "") + + def test_multi(self): + seq = torch.jit.script(Sequence()) + profiles = [torch.jit._ScriptProfile() for _ in range(5)] + for p in profiles: + p.enable() + + last = None + while len(profiles) > 0: + seq(torch.rand((10, 10))) + p = profiles.pop() + p.disable() + stats = p.dump_string() + self.assertNotEqual(stats, "") + if last: + self.assertNotEqual(stats, last) + last = stats + + def test_section(self): + seq = Sequence() + + @torch.jit.script + def fn(): + p = torch.jit._ScriptProfile() + p.enable() + _ = seq(torch.rand((10, 100))) + p.disable() + stats0 = p.dump_string() + + _ = seq(torch.rand((10, 10))) + stats1 = p.dump_string() + + p.enable() + _ = seq(torch.rand((10, 10))) + p.disable() + stats2 = p.dump_string() + + p.enable() + return stats0, stats1, stats2 + + s0, s1, s2 = fn() + self.assertEqual(s0, s1) + self.assertNotEqual(s1, s2) + + def test_empty(self): + p = torch.jit._ScriptProfile() + p.enable() + p.disable() + self.assertEqual(p.dump_string(), "") diff --git a/test/onnx/test_pytorch_onnx_onnxruntime.py b/test/onnx/test_pytorch_onnx_onnxruntime.py index e1df640c405..22cb093ef38 100644 --- a/test/onnx/test_pytorch_onnx_onnxruntime.py +++ b/test/onnx/test_pytorch_onnx_onnxruntime.py @@ -5389,6 +5389,18 @@ class TestONNXRuntime(unittest.TestCase): x = torch.randn(2, 3, 4) self.run_test(SiLUModel(), (x)) + def test_mish(self): + class MishModel(torch.nn.Module): + def __init__(self): + super(MishModel, self).__init__() + self.mish = torch.nn.Mish() + + def forward(self, x): + return self.mish(x) + + x = torch.randn(2, 3, 4) + self.run_test(MishModel(), (x)) + def test_remainder(self): class RemainderModel(torch.nn.Module): def forward(self, input, other): diff --git a/test/quantization/test_quantize_fx.py b/test/quantization/test_quantize_fx.py index 194352a8ca8..51e5bfe7cd1 100644 --- a/test/quantization/test_quantize_fx.py +++ b/test/quantization/test_quantize_fx.py @@ -3358,6 +3358,24 @@ class TestQuantizeFxOps(QuantizationTestCase): self._test_default_node_quant_handler_ops( module, functional, qconfig, is_reference, node_list) + def test_mish_reference(self): + module = torch.nn.Mish + functional = torch.nn.functional.mish + qconfig = float16_static_qconfig + is_reference = True + node_list = [ + ns.call_method("to"), + ns.call_method("dequantize"), + ns.call_module(module), + ns.call_method("to"), + ns.call_method('dequantize'), + ns.call_function(functional), + ns.call_method("to"), + ns.call_method('dequantize') + ] + self._test_default_node_quant_handler_ops( + module, functional, qconfig, is_reference, node_list) + def test_bmm_int_reference(self): class M(torch.nn.Module): def __init__(self): diff --git a/test/quantization/test_quantize_jit.py b/test/quantization/test_quantize_jit.py index 9db4b99dfe0..f5946d89133 100644 --- a/test/quantization/test_quantize_jit.py +++ b/test/quantization/test_quantize_jit.py @@ -349,7 +349,10 @@ class TestQuantizeJitPasses(QuantizationTestCase): self.bias = bias def forward(self, x): - return F.linear(x, self.weight, self.bias) + res = torch.matmul(x, self.weight.t()) + if self.bias is not None: + res.add_(self.bias) + return res x1 = torch.rand(3) w1 = torch.rand(5, 3) @@ -367,12 +370,19 @@ class TestQuantizeJitPasses(QuantizationTestCase): ): bias = b if has_bias else None model = torch.jit.trace(FunctionalLinear(weight, bias), [x]) + for node in model.graph.nodes(): + if node.kind() == "aten::matmul": + source_range_1 = node.sourceRange() torch._C._jit_pass_fuse_linear(model.graph) + for node in model.graph.nodes(): + if node.kind() == "aten::linear": + source_range_2 = node.sourceRange() FileCheck().check("aten::linear").run(model.graph) check_not = ["aten::matmul", "aten::addmm", "aten::add_", "aten::t("] for cn in check_not: FileCheck().check_not(cn).run(model.graph) # make sure it runs + self.assertTrue(source_range_1 == source_range_2) model(x) # check matmuls are not fused diff --git a/test/test_cuda.py b/test/test_cuda.py index ae93b6bd50b..cae7f6c3127 100644 --- a/test/test_cuda.py +++ b/test/test_cuda.py @@ -1908,13 +1908,16 @@ torch.cuda.synchronize() for grad in grads: self.assertTrue(torch.allclose(grad, torch.ones_like(grad), atol=1e-7)) - # Passing lists with mismatched devices or dtypes to a raw + # When passing lists with mismatched dtypes to a raw + # _amp_foreach_non_finite_check_and_unscale_ call, + # it's expected to fall back to single-tensor TensorIterator kernel. + grads = [g.clone(), g.to(dtype=torch.float16)] + torch._amp_foreach_non_finite_check_and_unscale_(grads, found_inf, inv_scale) + for grad in grads: + self.assertTrue(torch.allclose(grad, torch.ones_like(grad), atol=1e-7)) + + # Passing lists with mismatched devices to a raw # _amp_foreach_non_finite_check_and_unscale_ call should raise errors. - with self.assertRaisesRegex(RuntimeError, r"must have the same dtype"): - torch._amp_foreach_non_finite_check_and_unscale_([g.clone(), g.to(dtype=torch.float16)], - found_inf, - inv_scale) - if TEST_MULTIGPU: with self.assertRaisesRegex(RuntimeError, r"Expected all tensors to be on the same device"): torch._amp_foreach_non_finite_check_and_unscale_([g.clone(), g.to(device="cuda:1")], diff --git a/test/test_foreach.py b/test/test_foreach.py index 8d1c26c531c..b15420cbe57 100644 --- a/test/test_foreach.py +++ b/test/test_foreach.py @@ -1,8 +1,9 @@ +import re import torch import unittest from torch.testing._internal.common_utils import TestCase, run_tests, TEST_WITH_ROCM, TEST_WITH_SLOW from torch.testing._internal.common_device_type import \ - (instantiate_device_type_tests, dtypes, skipCUDAIfRocm, ops) + (instantiate_device_type_tests, dtypes, skipCUDAIfRocm, skipMeta, ops) from torch._six import inf, nan from torch.testing._internal.common_methods_invocations import foreach_unary_op_db @@ -178,7 +179,7 @@ class TestForeach(TestCase): return self._test_pointwise_op(device, dtype, torch._foreach_addcdiv, torch._foreach_addcdiv_, torch.addcdiv) - @dtypes(*torch.testing.get_all_dtypes(include_bfloat16=False, include_bool=False, include_complex=False)) + @dtypes(*torch.testing.get_all_dtypes(include_bfloat16=False, include_complex=False)) def test_min_max(self, device, dtype): for N in N_values: tensors1 = self._get_test_data(device, dtype, N) @@ -672,10 +673,19 @@ class TestForeach(TestCase): res = torch._foreach_add(tensors, 1) self.assertEqual(res, expected) + # note(mkozuki): this test case fails with Meta at least in my local environment. + # The message was + # `AssertionError: NotImplementedError("Could not run 'aten::_foreach_add.Scalar' with arguments from the 'Meta' backend.` + @skipMeta def test_bin_op_scalar_with_different_tensor_dtypes(self, device): tensors = [torch.tensor([1.1], dtype=torch.float, device=device), torch.tensor([1], dtype=torch.long, device=device)] - self.assertRaises(RuntimeError, lambda: torch._foreach_add(tensors, 1)) + runtime_error = None + try: + torch._foreach_add(tensors, 1) + except RuntimeError as e: + runtime_error = e + self.assertIsNone(runtime_error) # # Ops with list @@ -706,15 +716,6 @@ class TestForeach(TestCase): with self.assertRaisesRegex(RuntimeError, "Tensor lists must have the same number of tensors, got 1 and 2"): bin_op_(tensors1, tensors2) - # Different dtypes - tensors1 = [torch.zeros(10, 10, device=device, dtype=torch.float) for _ in range(10)] - tensors2 = [torch.ones(10, 10, device=device, dtype=torch.int) for _ in range(10)] - - with self.assertRaisesRegex(RuntimeError, "All tensors in the tensor list must have the same dtype."): - bin_op(tensors1, tensors2) - with self.assertRaisesRegex(RuntimeError, "All tensors in the tensor list must have the same dtype."): - bin_op_(tensors1, tensors2) - # different devices if torch.cuda.is_available() and torch.cuda.device_count() > 1: tensor1 = torch.zeros(10, 10, device="cuda:0") @@ -790,6 +791,13 @@ class TestForeach(TestCase): @unittest.skipIf(not torch.cuda.is_available(), "CUDA not found") @dtypes(*torch.testing.get_all_dtypes()) def test_add_list_slow_path(self, device, dtype): + # 0-strides + tensor1 = torch.rand(10, 10, device=device) + tensor2 = torch.rand(1, device=device).expand_as(tensor1) + res = torch._foreach_add([tensor1], [tensor2]) + torch._foreach_add_([tensor1], [tensor2]) + self.assertEqual(res, [tensor1]) + # different strides tensor1 = torch.zeros(10, 10, device=device, dtype=dtype) tensor2 = torch.ones(10, 10, device=device, dtype=dtype) @@ -806,6 +814,101 @@ class TestForeach(TestCase): torch._foreach_add_([tensor1], [tensor2]) self.assertEqual(res, [tensor1]) + # sliced tensor + tensor1 = torch.randn(5, 2, 1, 3, device=device).to(dtype) + tensor2 = torch.randn(5, 2, 1, 3 * 7, device=device).to(dtype)[:, :, :, ::7] + res = torch._foreach_add([tensor1], [tensor2]) + torch._foreach_add_([tensor1], [tensor2]) + self.assertEqual(res, [tensor1]) + + # note: Below three tests (postfixed with `_tensors_on_different_devices`) + # checks whether foreach works with lists of tensors on different devices + # but tensors of the same index are on the same device, e.g., ['cuda', 'cpu]. + @ops(foreach_unary_op_db) + def test_unary_op_tensors_on_different_devices(self, device, dtype, op): + if self.device_type != 'cuda': + self.skipTest('CUDA is necessary for tests with tensors on different devices') + # tensors: ['cuda', 'cpu] + tensors = op.sample_inputs(device, dtype, 2) + tensors[1] = tensors[1].to('cpu') + try: + actual = op.get_method()(tensors) + except RuntimeError as e: + with self.assertRaisesRegex(type(e), str(e)): + [op.ref(t) for t in tensors] + else: + expected = [op.ref(t) for t in tensors] + self.assertEqual(expected, actual) + + try: + op.get_inplace()(tensors) + except RuntimeError as e: + with self.assertRaisesRegex(type(e), str(e)): + [getattr(t, op.ref.__name__ + '_')() for t in tensors] + else: + self.assertEqual(expected, tensors) + + @dtypes(*torch.testing.get_all_dtypes(include_bfloat16=True)) + def test_binary_op_tensors_on_different_devices(self, device, dtype): + if self.device_type != 'cuda': + self.skipTest('CUDA is necessary for tests with tensors on different devices') + for foreach_op, foreach_op_, native_op in self.bin_ops: + # `tensors1`: ['cuda', 'cpu'] + # `tensors2`: ['cuda', 'cpu'] + _cuda_tensors = self._get_test_data(device, dtype, 2) + _cpu_tensors = self._get_test_data('cpu', dtype, 2) + tensors1, tensors2 = list(tensors for tensors in zip(_cuda_tensors, _cpu_tensors)) + + try: + actual = foreach_op(tensors1, tensors2) + except RuntimeError as e: + with self.assertRaisesRegex(type(e), re.escape(str(e))): + [native_op(t1, t2) for t1, t2 in zip(tensors1, tensors2)] + else: + expected = [native_op(t1, t2) for t1, t2 in zip(tensors1, tensors2)] + self.assertEqual(expected, actual) + try: + foreach_op_(tensors1, tensors2) + except RuntimeError as e: + with self.assertRaisesRegex(type(e), re.escape(str(e))): + [getattr(t1, native_op.__name__ + '_')(t2) for t1, t2 in zip(tensors1, tensors2)] + else: + self.assertEqual(actual, tensors1) + + @dtypes(*torch.testing.get_all_dtypes(include_bfloat16=True)) + def test_pointwise_op_tensors_on_different_devices(self, device, dtype): + if self.device_type != 'cuda': + self.skipTest('CUDA is necessary for tests with tensors on different devices') + + pointwise_ops = [ + (torch._foreach_addcmul, torch._foreach_addcmul_, torch.addcmul), + (torch._foreach_addcdiv, torch._foreach_addcdiv_, torch.addcdiv), + ] + for foreach_op, foreach_op_, native_op in pointwise_ops: + # tensors1: ['cuda', 'cpu] + # tensors2: ['cuda', 'cpu] + # tensors3: ['cuda', 'cpu] + _cuda_tensors = self._get_test_data(device, dtype, 3) + _cpu_tensors = self._get_test_data('cpu', dtype, 3) + tensors1, tensors2, tensors3 = list(tensors for tensors in zip(_cuda_tensors, _cpu_tensors)) + + try: + actual = foreach_op(tensors1, tensors2, tensors3) + except RuntimeError as e: + with self.assertRaisesRegex(type(e), re.escape(str(e))): + expected = [native_op(t1, t2, t3) for t1, t2, t3 in zip(tensors1, tensors2, tensors3)] + else: + expected = [native_op(t1, t2, t3) for t1, t2, t3 in zip(tensors1, tensors2, tensors3)] + self.assertEqual(expected, actual) + try: + foreach_op_(tensors1, tensors2, tensors3) + except RuntimeError as e: + with self.assertRaisesRegex(type(e), re.escape(str(e))): + [getattr(t1, native_op.__name__ + '_')(t2, t3) for t1, t2, t3 in zip(tensors1, tensors3, tensors3)] + else: + self.assertEqual(expected, tensors1) + + instantiate_device_type_tests(TestForeach, globals()) if __name__ == '__main__': diff --git a/test/test_fx.py b/test/test_fx.py index 7cf9190b8f6..083d39b9e42 100644 --- a/test/test_fx.py +++ b/test/test_fx.py @@ -1520,6 +1520,7 @@ class TestFX(JitTestCase): b : torch.fx.Node = graph.create_node('call_function', target=torch.relu, args=(x,), type_expr=List[float]) output : torch.fx.Node = graph.output(b) + self.assertTrue('typing.List[float]' in str(graph)) def test_ellipsis(self): @@ -2442,6 +2443,30 @@ class TestFX(JitTestCase): finally: del sys.modules["__future__"] + def test_annotations_empty_tuple(self): + class Foo(torch.nn.Module): + def forward(self, x: Tuple[()], y: Tuple[str, Tuple[()]]): + return "foo" + + traced = torch.fx.symbolic_trace(Foo()) + + x = () + y = ("bar", ()) + + traced(x, y) + + FileCheck().check("_Tuple[()]") \ + .check("typing_Tuple[str,typing_Tuple[()]]") \ + .run(traced.code) + + scripted = torch.jit.script(traced) + + scripted(x, y) + + FileCheck().check("Tuple[()]") \ + .check("Tuple[str, Tuple[()]]") \ + .run(scripted.code) + @skipIfNoTorchVision def test_cpatcher(self): @@ -2773,6 +2798,7 @@ class TestFunctionalTracing(JitTestCase): "rrelu": CONTROL_FLOW, "selu": CONTROL_FLOW, "silu": CONTROL_FLOW, + "mish": CONTROL_FLOW, "smooth_l1_loss": CONTROL_FLOW, "soft_margin_loss": CONTROL_FLOW, "threshold": CONTROL_FLOW, diff --git a/test/test_jit.py b/test/test_jit.py index ef4aeb15e3d..ed27fb4ab46 100644 --- a/test/test_jit.py +++ b/test/test_jit.py @@ -16,6 +16,7 @@ from jit.test_models import TestModels # noqa: F401 from jit.test_autodiff_subgraph_slicing import TestAutodiffSubgraphSlicing # noqa: F401 from jit.test_custom_operators import TestCustomOperators # noqa: F401 from jit.test_export_modes import TestExportModes # noqa: F401 +from jit.test_graph_rewrite_passes import TestGraphRewritePasses # noqa: F401 from jit.test_class_type import TestClassType # noqa: F401 from jit.test_builtins import TestBuiltins, TestTensorBuiltins # noqa: F401 from jit.test_ignore_context_manager import TestIgnoreContextManager # noqa: F401 @@ -53,6 +54,7 @@ from jit.test_misc import TestMisc # noqa: F401 from jit.test_pdt import TestPDT # noqa: F401 from jit.test_tensor_creation_ops import TestTensorCreationOps # noqa: F401 from jit.test_module_apis import TestModuleAPIs # noqa: F401 +from jit.test_script_profile import TestScriptProfile # noqa: F401 # Torch from torch import Tensor @@ -1272,6 +1274,158 @@ graph(%Ra, %Rb): FileCheck().check("my::matched_conv_bn").run(m._c._get_method("forward").graph) + def test_pattern_based_rewrite_with_source_range_preserved(self): + class TestModule1(torch.nn.Module): + def __init__(self): + super(TestModule1, self).__init__() + + def forward(self, x, y, z, w): + x = x + y + x = x * z + return w - x + + input_pattern = """ + graph(%x, %y, %z, %const): + %t = aten::add(%x, %y, %const) + %o = aten::mul(%t, %z) + return (%o)""" + replacement_pattern = """ + graph(%x, %y, %z, %const): + %o = my::add_mul(%x, %y, %z, %const) + return (%o)""" + scripted_model = torch.jit.script(TestModule1()) + graph = scripted_model.graph + value_mappings = [("o", "t")] + for node in graph.nodes(): + if node.kind() == "aten::add": + source_range_1 = node.sourceRange() + torch._C._jit_pass_custom_pattern_based_rewrite_graph( + input_pattern, replacement_pattern, scripted_model.graph, value_name_pairs=value_mappings) + graph = scripted_model.graph + for node in graph.nodes(): + if node.kind() == "my::add_mul": + source_range_2 = node.sourceRange() + self.assertTrue(source_range_1 == source_range_2) + + class TestModule2(torch.nn.Module): + def __init__(self): + super(TestModule2, self).__init__() + + def forward(self, x, y, z, w): + x = x + y + x = x + z + x = x * z + x = x * w + return x - 2 + + # Check source range preservation for two node transforms add -> my_add + input_pattern = """ + graph(%x, %y, %const): + %o = aten::add(%x, %y, %const) + return (%o)""" + replacement_pattern = """ + graph(%x, %y, %const): + %o = my::add(%x, %y, %const) + return (%o)""" + scripted_model = copy.deepcopy(torch.jit.script(TestModule2())) + graph_copy = scripted_model.graph.copy() + value_mappings = [("o", "o")] + source_range_add_1 = None + for node in graph_copy.nodes(): + if source_range_add_1 is None and node.kind() == "aten::add": + source_range_add_1 = node.sourceRange() + if source_range_add_1 is not None and node.kind() == "aten::add": + source_range_add_2 = node.sourceRange() + torch._C._jit_pass_custom_pattern_based_rewrite_graph( + input_pattern, replacement_pattern, graph_copy, value_name_pairs=value_mappings) + source_range_my_add_1 = None + for node in graph_copy.nodes(): + if source_range_my_add_1 is None and node.kind() == "my::add": + source_range_my_add_1 = node.sourceRange() + if source_range_my_add_1 is not None and node.kind() == "my::add": + source_range_my_add_2 = node.sourceRange() + self.assertTrue(source_range_add_1 == source_range_my_add_1) + self.assertTrue(source_range_add_2 == source_range_my_add_2) + + # Check source range preservation for add-add -> double_add transform + # fuse nodes + input_pattern = """ + graph(%x, %y, %z, %const): + %t = aten::add(%x, %y, %const) + %o = aten::add(%t, %z, %const) + return (%o)""" + replacement_pattern = """ + graph(%x, %y, %z, %const): + %o = my::double_add(%x, %y, %z, %const) + return (%o)""" + scripted_model = torch.jit.script(TestModule2()) + graph_copy = scripted_model.graph.copy() + value_mappings = [("o", "t")] + source_range_1 = None + source_range_2 = None + for node in graph_copy.nodes(): + if node.kind() == "aten::add": + source_range_1 = node.sourceRange() + break + torch._C._jit_pass_custom_pattern_based_rewrite_graph( + input_pattern, replacement_pattern, graph_copy, value_name_pairs=value_mappings) + for node in graph_copy.nodes(): + if node.kind() == "my::double_add": + source_range_2 = node.sourceRange() + self.assertTrue(source_range_1 == source_range_2) + + # Check source range preservation for mul -> add + add transform + # split node + input_pattern = """ + graph(%x, %y): + %t = aten::mul(%x, %y) + return (%t)""" + replacement_pattern = """ + graph(%x, %y): + %t = my::add(%x, %y) + %o = my::add(%t, %y) + return (%o)""" + scripted_model = torch.jit.script(TestModule2()) + graph_copy = scripted_model.graph.copy() + value_mappings = [("t", "t"), ("o", "t")] + source_range_mul_1 = None + for node in graph_copy.nodes(): + if source_range_mul_1 is None and node.kind() == "aten::mul": + source_range_mul_1 = node.sourceRange() + if source_range_mul_1 is not None and node.kind() == "aten::mul": + source_range_mul_2 = node.sourceRange() + torch._C._jit_pass_custom_pattern_based_rewrite_graph( + input_pattern, replacement_pattern, graph_copy, value_name_pairs=value_mappings) + source_range_add_1 = None + for node in graph_copy.nodes(): + if source_range_add_1 is None and node.kind() == "my::add": + source_range_add_1 = node.sourceRange() + if source_range_add_1 is not None and node.kind() == "my::add": + source_range_add_2 = node.sourceRange() + self.assertTrue(source_range_mul_1 == source_range_add_1) + self.assertTrue(source_range_mul_2 == source_range_add_2) + + # Check lack of source range preservation for mul-mul-> double_mul transform + input_pattern = """ + graph(%x, %y, %z): + %t = aten::mul(%x, %y) + %o = aten::mul(%t, %z) + return (%o)""" + replacement_pattern = """ + graph(%x, %y, %z): + %o = my::double_mul(%x, %y, %z) + return (%o)""" + scripted_model = torch.jit.script(TestModule2()) + graph_copy = scripted_model.graph.copy() + for node in graph_copy.nodes(): + if node.kind() == "aten::mul": + source_range_1 = node.sourceRange() + torch._C._jit_pass_custom_pattern_based_rewrite_graph(input_pattern, replacement_pattern, graph_copy) + for node in graph_copy.nodes(): + if node.kind() == "my::double_mul": + source_range_2 = node.sourceRange() + self.assertFalse(source_range_1 == source_range_2) + def test_expand_quantlint(self): pass diff --git a/test/test_module_init.py b/test/test_module_init.py index 64846ae20ea..80bb7c7c666 100644 --- a/test/test_module_init.py +++ b/test/test_module_init.py @@ -104,6 +104,7 @@ def build_constructor_arg_db(): torch.nn.MaxUnpool1d: ((5,), {}), torch.nn.MaxUnpool2d: ((5,), {}), torch.nn.MaxUnpool3d: ((5,), {}), + torch.nn.Mish: ((), {}), torch.nn.ModuleDict: ((), {}), torch.nn.ModuleList: ((), {}), torch.nn.MultiLabelMarginLoss: ((), {}), diff --git a/test/test_nn.py b/test/test_nn.py index 913f0fb1889..f43e9af0a97 100644 --- a/test/test_nn.py +++ b/test/test_nn.py @@ -16084,6 +16084,12 @@ class TestNNDeviceType(NNTestCase): with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): F.silu(x, inplace=True) + @onlyOnCPUAndCUDA + def test_mish_inplace_overlap(self, device): + x = torch.randn((1, 6), device=device).expand((6, 6)) + with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): + F.mish(x, inplace=True) + def test_softplus_inplace_overlap(self, device): x = torch.randn((1, 6), device=device).expand((6, 6)) with self.assertRaisesRegex(RuntimeError, 'unsupported operation'): diff --git a/test/test_ops.py b/test/test_ops.py index f36fc8bd514..ea160b65ae1 100644 --- a/test/test_ops.py +++ b/test/test_ops.py @@ -321,7 +321,10 @@ class TestCommon(JitCommonTestCase): cloned = clone_input_helper(sample.input) if variant in inplace_ops else sample.input if variant in inplace_ops and sample.broadcasts_input: - with self.assertRaises(RuntimeError): + with self.assertRaises(RuntimeError, + msg=('inplace variant either incorrectly allowed ' + 'resizing or you have marked the sample {}' + ' incorrectly with `broadcasts_self=True'.format(sample.summary()))): variant_forward = variant(cloned, *sample.args, **sample.kwargs) diff --git a/test/test_sort_and_select.py b/test/test_sort_and_select.py index 6b8c010024d..ce858b6263b 100644 --- a/test/test_sort_and_select.py +++ b/test/test_sort_and_select.py @@ -57,6 +57,14 @@ class TestSortAndSelect(TestCase): x = torch.rand(4, SIZE, device=device) res1val, res1ind = torch.sort(x) + # Test inplace + y = x.clone() + y_inds = torch.tensor((), dtype=torch.int64, device=device) + torch.sort(y, out=(y, y_inds)) + x_vals, x_inds = torch.sort(x) + self.assertEqual(x_vals, y) + self.assertEqual(x_inds, y_inds) + # Test use of result tensor res2val = torch.tensor((), device=device) res2ind = torch.tensor((), device=device, dtype=torch.long) diff --git a/test/test_sparse.py b/test/test_sparse.py index c201704cdf5..5b9b873fe64 100644 --- a/test/test_sparse.py +++ b/test/test_sparse.py @@ -246,6 +246,20 @@ class TestSparse(TestCase): ref = test_sparse_sum() self.assertTrue(ref.expired()) + @dtypes(torch.double) + def test_ctor_large_sizes(self, device, dtype): + # Test that integer overflow is detected when computing numel + # of a sparse tensor with large dimensions (gh-57416). Notice + # that numel is computed internally when constructing a + # tensor, hence the overflow may appear during the tensor + # construction step. + N = 100000 + indices = torch.tensor([[N, N - 1]] * 4, dtype=torch.int64, device=device) + values = torch.tensor([1, 2], dtype=dtype, device=device) + self.assertRaises(RuntimeError, + lambda: torch.sparse_coo_tensor( + indices, values, (N + 1,) * 4, device=device)) + @dtypes(torch.double, torch.cdouble) def test_ctor_size_checks(self, device, dtype): indices = self.index_tensor([ diff --git a/test/test_sparse_csr.py b/test/test_sparse_csr.py index 4c80e922d24..a309a294d27 100644 --- a/test/test_sparse_csr.py +++ b/test/test_sparse_csr.py @@ -4,8 +4,9 @@ import unittest import random from torch.testing._internal.common_utils import \ (IS_MACOS, IS_WINDOWS, TestCase, run_tests, load_tests, coalescedonoff) +import itertools from torch.testing._internal.common_device_type import \ - (instantiate_device_type_tests, dtypes, onlyCPU) + (instantiate_device_type_tests, dtypes, onlyCPU, onlyCUDA) # load_tests from torch.testing._internal.common_utils is used to automatically filter tests for # sharding on sandcastle. This line silences flake warnings @@ -18,7 +19,6 @@ class TestSparseCSR(TestCase): self.assertEqual(str(torch.sparse_csr), 'torch.sparse_csr') self.assertEqual(type(torch.sparse_csr), torch.layout) - @onlyCPU @dtypes(torch.double) def test_sparse_csr_constructor_shape_inference(self, device, dtype): crow_indices = [0, 2, 4] @@ -32,7 +32,6 @@ class TestSparseCSR(TestCase): self.assertEqual(dtype, sparse.dtype) self.assertEqual(torch.device(device), sparse.device) - @onlyCPU @dtypes(*torch.testing.get_all_dtypes(include_bool=False, include_half=False, include_bfloat16=False, include_complex=False)) def test_sparse_csr_constructor(self, device, dtype): @@ -51,40 +50,178 @@ class TestSparseCSR(TestCase): self.assertEqual(torch.tensor(col_indices, dtype=index_dtype), sparse.col_indices()) self.assertEqual(torch.tensor(values, dtype=dtype), sparse.values()) - with self.assertRaises(RuntimeError): - torch.sparse_csr_tensor(crow_indices, torch.tensor(col_indices), values, size=(2, 10)) + @dtypes(*torch.testing.get_all_dtypes(include_bool=False, include_half=False, + include_bfloat16=False, include_complex=False)) + def test_sparse_csr_constructor_from_lists(self, device, dtype): + # without size + sparse = torch.sparse_csr_tensor([0, 2, 4], + [0, 1, 0, 1], + [1, 2, 3, 4], + dtype=dtype, + device=device) + + self.assertEqual((2, 2), sparse.shape) + self.assertEqual(4, sparse.numel()) + self.assertEqual(torch.tensor([0, 2, 4], dtype=torch.int64, device=device), sparse.crow_indices()) + self.assertEqual(torch.tensor([0, 1, 0, 1], dtype=torch.int64, device=device), sparse.col_indices()) + self.assertEqual(torch.tensor([1, 2, 3, 4], dtype=dtype, device=device), sparse.values()) + + # with size + for sparse_csr_tensor in [torch.sparse_csr_tensor, torch._sparse_csr_tensor_unsafe]: + sparse = sparse_csr_tensor([0, 2, 4], + [0, 1, 0, 1], + [1, 2, 3, 4], + size=(2, 10), + dtype=dtype, + device=device) - @onlyCPU - @dtypes(torch.double) - def test_factory_size_check(self, device, dtype): + self.assertEqual((2, 10), sparse.shape) + self.assertEqual(torch.tensor([0, 2, 4], dtype=torch.int64, device=device), sparse.crow_indices()) + self.assertEqual(torch.tensor([0, 1, 0, 1], dtype=torch.int64, device=device), sparse.col_indices()) + self.assertEqual(torch.tensor([1, 2, 3, 4], dtype=dtype, device=device), sparse.values()) + + def test_factory_type_invariants_check(self, device): + with self.assertRaisesRegex(RuntimeError, "both crow_indices and col_indices should have the same type."): + torch.sparse_csr_tensor(torch.tensor([0, 2, 4], dtype=torch.int64), + torch.tensor([0, 1, 0, 1], dtype=torch.int32), + torch.tensor([1, 2, 3, 4]), + device=device) + + with self.assertRaisesRegex(RuntimeError, r"\"csr_construct_check\" not implemented for 'Short'"): + torch.sparse_csr_tensor(torch.tensor([0, 2, 4], dtype=torch.int16), + torch.tensor([0, 1, 0, 1], dtype=torch.int16), + torch.tensor([1, 2, 3, 4]), + device=device) + + def test_factory_layout_invariants_check(self, device): + with self.assertRaisesRegex(RuntimeError, "expected values to be a strided and contiguous tensor"): + values = torch.tensor([1.], device=device).expand(4,) + torch.sparse_csr_tensor(torch.tensor([0, 2, 4], device=device), + torch.tensor([0, 1, 0, 1], device=device), + values) + + with self.assertRaisesRegex(RuntimeError, "expected col_indices to be a strided and contiguous tensor"): + col_indices = torch.tensor([0], device=device).expand(4,) + torch.sparse_csr_tensor(torch.tensor([0, 2, 4]), + col_indices, + torch.tensor([1, 2, 3, 4])) + + with self.assertRaisesRegex(RuntimeError, "expected crow_indices to be a strided and contiguous tensor"): + crow_indices = torch.arange(6, device=device) + torch.sparse_csr_tensor(crow_indices[::2], + torch.tensor([0, 1, 0, 1], device=device), + torch.tensor([1, 2, 3, 4])) + + def test_factory_shape_invariants_check(self, device): crow_indices = [0, 2, 4] col_indices = [0, 1, 0, 1] values = [1, 2, 3, 4] size = (2, 10) torch.sparse_csr_tensor(torch.tensor(crow_indices), torch.tensor(col_indices), torch.tensor(values), size, - dtype=dtype, device=device) + device=device) + + with self.assertRaisesRegex(RuntimeError, r"size of a CSR tensor must be of length 2, but got: 3"): + torch.sparse_csr_tensor(torch.tensor(crow_indices), torch.tensor(col_indices), torch.tensor(values), + size=(2, 10, 2), + device=device) + + with self.assertRaisesRegex(RuntimeError, r"crow_indices must have dim\=1 but got crow_indices\.dim\=2"): + torch.sparse_csr_tensor(torch.tensor(crow_indices).repeat(2, 1), + torch.tensor(col_indices), + torch.tensor(values), + size, + device=device) + + with self.assertRaisesRegex(RuntimeError, r"col_indices must have dim\=1 but got col_indices\.dim\=2"): + torch.sparse_csr_tensor(torch.tensor(crow_indices), + torch.tensor(col_indices).repeat(2, 1), + torch.tensor(values), + size, + device=device) + + with self.assertRaisesRegex(RuntimeError, r"values must have dim\=1 but got values\.dim\=2"): + torch.sparse_csr_tensor(torch.tensor(crow_indices), + torch.tensor(col_indices), + torch.tensor(values).repeat(2, 1), + size, + device=device) with self.assertRaisesRegex(RuntimeError, r"crow_indices\.numel must be size$0$ \+ 1, but got: 3"): torch.sparse_csr_tensor(torch.tensor(crow_indices), torch.tensor(col_indices), torch.tensor(values), (1, 1), - dtype=dtype, device=device) + device=device) - with self.assertRaisesRegex(RuntimeError, "0th value of crow_indices must be 0"): - torch.sparse_csr_tensor(torch.tensor([-1, -1, -1]), torch.tensor(col_indices), torch.tensor(values), size, - dtype=dtype, device=device) - - with self.assertRaisesRegex(RuntimeError, "last value of crow_indices should be less than length of col_indices."): - torch.sparse_csr_tensor(torch.tensor(crow_indices), torch.tensor([0, 0, 0]), torch.tensor(values), size, - dtype=dtype, device=device) with self.assertRaisesRegex(RuntimeError, r"col_indices and values must have equal sizes, " + - r"but got col_indices\.size$0$: 4, values\.size$0$: 5"): - torch.sparse_csr_tensor(torch.tensor(crow_indices), torch.tensor(col_indices), torch.tensor([0, 0, 0, 0, 0]), - size, dtype=dtype, device=device) + r"but got col_indices\.numel: 3, values\.numel: 4"): + torch.sparse_csr_tensor(torch.tensor(crow_indices), torch.tensor([0, 1, 0]), torch.tensor(values), size, + device=device) + + def test_factory_indices_invariants_check(self, device): + crow_indices = [0, 2, 4] + col_indices = [0, 1, 0, 1] + values = [1, 2, 3, 4] + size = (2, 10) + with self.assertRaisesRegex(RuntimeError, "0th value of crow_indices must be 0."): + torch.sparse_csr_tensor(torch.tensor([-1, 0, 4]), torch.tensor(col_indices), torch.tensor(values), size, + device=device) + + with self.assertRaisesRegex(RuntimeError, + "last value of crow_indices should be equal to the length of col_indices."): + torch.sparse_csr_tensor(torch.tensor([0, 2, 5]), torch.tensor(col_indices), torch.tensor(values), size, + device=device) + + with self.assertRaisesRegex(RuntimeError, + r"at position i \= 2," + + r" this condition crow_indices\[i - 1\] <\= crow_indices\[i\] fails"): + torch.sparse_csr_tensor(torch.tensor([0, 5, 4]), torch.tensor(col_indices), torch.tensor(values), size, + device=device) + + with self.assertRaisesRegex(RuntimeError, r"col_indices\.min should be greater or equal to zero"): + torch.sparse_csr_tensor(torch.tensor(crow_indices), torch.tensor([0, -1, 0, 1]), torch.tensor(values), size, + device=device) + + with self.assertRaisesRegex(RuntimeError, r"size$1$ should be greater than col_indices\.max"): + torch.sparse_csr_tensor(torch.tensor(crow_indices), torch.tensor([0, 11, 0, 1]), torch.tensor(values), size, + device=device) + + @onlyCUDA + @dtypes(*torch.testing.get_all_dtypes(include_bool=False, include_half=False, + include_bfloat16=False, include_complex=False)) + def test_factory_device_type_inference(self, device, dtype): + cpu_cuda = ('cpu', 'cuda') + cpu_cuda_none = cpu_cuda + (None,) + for crow_indices_device, col_indices_device, values_device, device in itertools.product(cpu_cuda, + cpu_cuda, + cpu_cuda, + cpu_cuda_none): + for index_dtype in [torch.int32, torch.int64]: + crow_indices = torch.tensor([0, 2, 4], dtype=index_dtype, device=crow_indices_device) + col_indices = torch.tensor([0, 1, 0, 1], dtype=index_dtype, device=col_indices_device) + values = torch.tensor([1, 2, 3, 4], dtype=dtype, device=values_device) + if device is None and (crow_indices_device != col_indices_device or + crow_indices_device != values_device): + with self.assertRaises(RuntimeError): + torch.sparse_csr_tensor(crow_indices, + col_indices, + values, + size=(2, 10), + device=device) + else: + t = torch.sparse_csr_tensor(crow_indices, + col_indices, + values, + size=(2, 10), + device=device) + should_be_cuda = (device == 'cuda' or (device is None and values_device == 'cuda')) + self.assertEqual(should_be_cuda, t.is_cuda) + t.crow_indices().dtype == index_dtype + t.col_indices().dtype == index_dtype + t.values().dtype == dtype + t.crow_indices().device == t.values().device + t.col_indices().device == t.values().device - @onlyCPU - @unittest.skip("see: https://github.com/pytorch/pytorch/issues/58762") def test_sparse_csr_print(self, device): orig_maxDiff = self.maxDiff self.maxDiff = None @@ -106,7 +243,9 @@ class TestSparseCSR(TestCase): for index_dtype in [torch.int32, torch.int64]: for dtype in torch.testing.floating_types(): printed.append("########## {}/{} ##########".format(dtype, index_dtype)) - x = self.genSparseCSRTensor(shape, nnz, device=device, dtype=torch.float32, index_dtype=torch.int64) + x = torch.sparse_csr_tensor(torch.tensor([0, 2, 4], dtype=index_dtype), + torch.tensor([0, 1, 0, 1], dtype=index_dtype), + torch.tensor([1, 2, 3, 4]), dtype=dtype, device=device) printed.append("# sparse tensor") printed.append(str(x)) printed.append("# _crow_indices") @@ -120,7 +259,6 @@ class TestSparseCSR(TestCase): self.assertExpected('\n'.join(printed)) self.maxDiff = orig_maxDiff - @onlyCPU def test_sparse_csr_from_dense(self, device): dense = torch.tensor([[4, 5, 0], [0, 0, 0], [1, 0, 0]], device=device) sparse = dense.to_sparse_csr() @@ -140,7 +278,6 @@ class TestSparseCSR(TestCase): self.assertEqual(torch.tensor([0, 1, 2] * 3, dtype=torch.int64), sparse.col_indices()) self.assertEqual(torch.tensor([2] * 9), sparse.values()) - @onlyCPU @dtypes(torch.double) def test_dense_convert(self, device, dtype): size = (5, 5) @@ -262,7 +399,6 @@ class TestSparseCSR(TestCase): for k in range(2, 8): test_shape(i, j, k, i * j // 2) - @onlyCPU @dtypes(*torch.testing.floating_types()) def test_coo_csr_conversion(self, device, dtype): size = (5, 5) diff --git a/test/test_torch.py b/test/test_torch.py index 077c3665db1..a49fd36eda9 100644 --- a/test/test_torch.py +++ b/test/test_torch.py @@ -2398,8 +2398,11 @@ tensor([[[1.+1.j, 1.+1.j, 1.+1.j, ..., 1.+1.j, 1.+1.j, 1.+1.j], # Check for zero strided, size 1 axis, in non-contiguous storage (gh-33812) c = torch.randn(10).as_strided([2, 1, 5], [1, 0, 2]) + self.assertEqual(torch._debug_has_internal_overlap(c), OVERLAP_NO) + c = torch.randn(2, 1, 10)[::2].as_strided((2, 1, 5), (10, 0, 2)) self.assertEqual(torch._debug_has_internal_overlap(c), OVERLAP_TOO_HARD) + def test_allow_tensor_metadata_change(self): def do_test(t): with self.assertRaisesRegex( @@ -5702,12 +5705,31 @@ else: dest_ones.masked_scatter_(mask, src_ones) self.assertEqual(dest_ones, dest_ones_expected, atol=0, rtol=0) - # make src smaller. this should fail - src = torch.zeros(num_copy - 1, dtype=dt, device=device) - with self.assertRaises(RuntimeError): - dest.masked_scatter_(mask, src) + # Bound checking in CUDA is done inside a kernel + # in order to avoid synchronization, but this means + # we can not clear the failures. So there is no way + # to test it then recover. + if self.device_type != 'cuda': + # make src smaller. this should fail + src = torch.zeros(num_copy - 1, dtype=dt, device=device) + with self.assertRaises(RuntimeError): + dest.masked_scatter_(mask, src) + + # empty tensor + dest = torch.empty((5, 0, 5), dtype=dt, device=device) + mask = torch.ones_like(dest, dtype=maskType, device=device) + src = torch.empty((0,), dtype=dt, device=device) + dest.masked_scatter_(mask, src) - self.assertEqual(len(w), 3) + dest = torch.empty((5, 0, 5), dtype=dt, device=device) + mask = torch.ones((5, 1, 5), dtype=maskType, device=device) + src = torch.empty((0,), dtype=dt, device=device) + dest.masked_scatter_(mask, src) + + if self.device_type != 'cuda': + self.assertEqual(len(w), 5) + else: + self.assertEqual(len(w), 4) warn = 'masked_scatter_ received a mask with dtype torch.uint8,' for wi in w: @@ -5725,6 +5747,15 @@ else: dst = dst.masked_scatter(mask, src) self.assertEqual(dst, torch.tensor([True, True, True], device=device)) + @onlyCUDA + @largeTensorTest('30GB') + def test_masked_scatter_large_tensor(self, device): + t_cpu = torch.empty(2**31 + 1, dtype=torch.bool).random_() + t = t_cpu.to(device) + result_cpu = t_cpu.masked_scatter(t_cpu, t_cpu) + result = t.masked_scatter(t, t) + self.assertEqual(result, result_cpu) + @dtypes(*torch.testing.get_all_dtypes()) def test_masked_select(self, device, dtype): if device == 'cpu': diff --git a/test/test_unary_ufuncs.py b/test/test_unary_ufuncs.py index 02dc4919449..29f62c234e0 100644 --- a/test/test_unary_ufuncs.py +++ b/test/test_unary_ufuncs.py @@ -1018,6 +1018,35 @@ class TestUnaryUfuncs(TestCase): input_noncontig, inplace=True), expected_output_noncontig, atol=atol, rtol=rtol) + @skipIfNoSciPy + @dtypes(torch.float, torch.double) + def test_mish(self, device, dtype): + input_np = np.random.randn(5, 8) + special_input = [[-1000, -1, -0.1, 0, 0.5, 1, 2, 1000]] + input_np = np.concatenate((input_np, special_input), axis=0).astype( + torch_to_numpy_dtype_dict[dtype]) + expected_output_np = input_np * np.tanh(np.log1p(np.exp(input_np))) + + expected_output = torch.from_numpy(expected_output_np).to(device) + expected_output_noncontig = expected_output.transpose(0, 1) + + atol = 1e-6 + rtol = 1e-6 + + input = torch.from_numpy(input_np).clone().contiguous().to(device) + self.assertEqual(torch.nn.functional.mish(input), expected_output, + atol=atol, rtol=rtol) + self.assertEqual(torch.nn.functional.mish(input, inplace=True), + expected_output, atol=atol, rtol=rtol) + + input = torch.from_numpy(input_np).clone().to(device) + input_noncontig = input.transpose(0, 1) + self.assertEqual(torch.nn.functional.mish(input_noncontig), + expected_output_noncontig, atol=atol, rtol=rtol) + self.assertEqual(torch.nn.functional.mish( + input_noncontig, inplace=True), expected_output_noncontig, + atol=atol, rtol=rtol) + # do ops like threshold need a test_unary(_nonufunc) test suite? @onlyCPU @dtypes(*torch.testing.get_all_math_dtypes('cpu')) diff --git a/tools/autograd/derivatives.yaml b/tools/autograd/derivatives.yaml index 54315e29fd9..b0a056bf08d 100644 --- a/tools/autograd/derivatives.yaml +++ b/tools/autograd/derivatives.yaml @@ -182,6 +182,7 @@ - name: acos(Tensor self) -> Tensor self: grad * -((-self * self + 1).rsqrt()).conj() + result: auto_element_wise - name: add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor self: handle_r_to_c(self.scalar_type(), grad) @@ -190,26 +191,31 @@ - name: add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> Tensor self: handle_r_to_c(self.scalar_type(), grad) + result: self_t - name: addbmm(Tensor self, Tensor batch1, Tensor batch2, *, Scalar beta=1, Scalar alpha=1) -> Tensor self: maybe_multiply(grad, beta.conj()) batch1: grad.unsqueeze(0).expand({ batch1.size(0), batch1.size(1), batch2.size(2) }).bmm(batch2.transpose(1, 2).conj()) * alpha.conj() batch2: batch1.transpose(1, 2).conj().bmm(grad.unsqueeze(0).expand({ batch1.size(0), batch1.size(1), batch2.size(2) })) * alpha.conj() + result: maybe_multiply(self_t, beta) + maybe_multiply(batch1_t.bmm(batch2_p).sum(0), alpha) + maybe_multiply(batch1_p.bmm(batch2_t).sum(0), alpha) - name: addcdiv(Tensor self, Tensor tensor1, Tensor tensor2, *, Scalar value=1) -> Tensor self: handle_r_to_c(self.scalar_type(), grad) tensor1: handle_r_to_c(tensor1.scalar_type(), grad * (value / tensor2).conj()) tensor2: handle_r_to_c(tensor2.scalar_type(), -grad * (value * tensor1 / (tensor2 * tensor2)).conj()) + result: self_t + maybe_multiply(tensor1_t / tensor2_p, value) - maybe_multiply(tensor2_t * (tensor1_p / tensor2_p) / tensor2_p, value) - name: addcmul(Tensor self, Tensor tensor1, Tensor tensor2, *, Scalar value=1) -> Tensor self: handle_r_to_c(self.scalar_type(), grad) tensor1: handle_r_to_c(tensor1.scalar_type(), grad * (tensor2 * value).conj()) tensor2: handle_r_to_c(tensor2.scalar_type(), grad * (tensor1 * value).conj()) + result: self_t + maybe_multiply(tensor1_t * tensor2_p, value) + maybe_multiply(tensor2_t * tensor1_p, value) - name: addmm(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1) -> Tensor self: maybe_multiply(grad, beta.conj()) mat1: mm_mat1_backward(grad, mat2, mat1.sizes(), mat1.strides(), alpha) mat2: mm_mat2_backward(grad, mat1, mat2.sizes(), mat2.strides(), alpha) + result: maybe_multiply(self_t, beta) + maybe_multiply(mat1_t.mm(mat2_p), alpha) + maybe_multiply(mat1_p.mm(mat2_t), alpha) - name: _sparse_addmm(Tensor self, Tensor sparse, Tensor dense, *, Scalar beta=1, Scalar alpha=1) -> Tensor self: maybe_multiply(grad, beta) @@ -220,20 +226,24 @@ self: maybe_multiply(grad, beta.conj()) mat: grad.ger(vec.conj()) * alpha.conj() vec: mat.t().conj().mv(grad) * alpha.conj() + result: maybe_multiply(self_t, beta) + maybe_multiply(mat_t.mv(vec_p), alpha) + maybe_multiply(mat_p.mv(vec_t), alpha) - name: addr(Tensor self, Tensor vec1, Tensor vec2, *, Scalar beta=1, Scalar alpha=1) -> Tensor self: maybe_multiply(grad, beta.conj()) vec1: grad.mv(vec2.conj()) * alpha.conj() vec2: grad.t().mv(vec1.conj()) * alpha.conj() + result: maybe_multiply(self_t, beta) + maybe_multiply(vec1_t.outer(vec2_p), alpha) + maybe_multiply(vec1_p.outer(vec2_t), alpha) - name: affine_grid_generator(Tensor theta, int[] size, bool align_corners) -> Tensor theta: affine_grid_generator_backward(grad, size, align_corners) - name: alias(Tensor(a) self) -> Tensor(a) self: grad + result: self_t - name: angle(Tensor self) -> Tensor self: angle_backward(grad, self) + result: handle_r_to_c(result.scalar_type(), angle_backward(self_t, self_p)) # The four items below are necessary because TensorIterator doesn't work on # Variables (codegen does not unwrap the input Tensor for all() and any() ). @@ -251,18 +261,21 @@ - name: acosh(Tensor self) -> Tensor self: grad * (self.pow(2) - 1).rsqrt().conj() + result: auto_element_wise - name: acosh_(Tensor(a!) self) -> Tensor(a!) self: not_implemented("inplace version of acosh") - name: asinh(Tensor self) -> Tensor self: grad * (self.pow(2) + 1).rsqrt().conj() + result: auto_element_wise - name: asinh_(Tensor(a!) self) -> Tensor(a!) self: not_implemented("inplace version of asinh") - name: atanh(Tensor self) -> Tensor self: grad * 1 / (1 - self.pow(2)).conj() + result: auto_element_wise - name: atanh_(Tensor(a!) self) -> Tensor(a!) self: not_implemented("inplace version of atanh") @@ -272,9 +285,11 @@ - name: asin(Tensor self) -> Tensor self: grad * (-self * self + 1).rsqrt().conj() + result: auto_element_wise - name: atan(Tensor self) -> Tensor self: grad / (self * self + 1).conj() + result: auto_element_wise - name: atan2(Tensor self, Tensor other) -> Tensor self, other: atan2_backward(grad, self, other, grad_input_mask) @@ -362,6 +377,7 @@ - name: _conj(Tensor self) -> Tensor self: grad.conj() + result: self_t.conj() - name: copysign.Tensor(Tensor self, Tensor other) -> Tensor self: copysign_tensor_self_backward(grad, self, result) @@ -1400,6 +1416,9 @@ - name: silu(Tensor self) -> Tensor self: "GradMode::is_enabled() ? infinitely_differentiable_silu_backward(grad, self) : silu_backward(grad, self)" +- name: mish(Tensor self) -> Tensor + self: "GradMode::is_enabled() ? infinitely_differentiable_mish_backward(grad, self) : mish_backward(grad, self)" + - name: elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> Tensor self: elu_backward(grad, alpha, scale, input_scale, /* is_result */ false, self) diff --git a/tools/autograd/gen_variable_type.py b/tools/autograd/gen_variable_type.py index 197066e5cb3..3862e6f5961 100644 --- a/tools/autograd/gen_variable_type.py +++ b/tools/autograd/gen_variable_type.py @@ -752,10 +752,10 @@ def emit_body(fn: NativeFunctionWithDifferentiabilityInfo) -> List[str]: # Handle functions like stack # For these, we don't unpack anything and always call the user function if not (len(differentiable_inputs) == 1 and is_tensor_list_type(differentiable_inputs[0].type)): - raise RuntimeError(f'No differentiable input to "{name}" is a differentiable Tensor even though a ' - 'forward gradient formula has been defined for it. This case should only happen ' - 'for function that take a single TensorList as input. All other cases are not ' - 'supported right now.') + raise RuntimeError(f'No differentiable input to "{name}" is a differentiable Tensor (as the provided' + 'forward AD formula does not use any input tangent) even though a forward gradient ' + 'formula has been defined for it. This case should only happen for function that ' + 'take a single TensorList as input. All other cases are not supported right now.') requires_fw_grad = "true" unpacked_arguments = "" for inp in differentiable_inputs: diff --git a/tools/autograd/load_derivatives.py b/tools/autograd/load_derivatives.py index 8418b68b8c7..821ffbdf818 100644 --- a/tools/autograd/load_derivatives.py +++ b/tools/autograd/load_derivatives.py @@ -170,21 +170,23 @@ def postprocess_forward_derivatives( "forward definition of gradient as element_wise but it does not " "defines the gradient formula for its argument which is required.") # This transformation is based on the observation that for element-wise functions, the Jacobian - # matrix is diagonal and thus doing J * v or v * J gives the same result. + # matrix is diagonal and thus doing J * v is the same as (v^T J)^T (in practice, we ignore the transpositions) + # For the complex case, we use hermitian transpose and get (v.conj() J).conj() # So here we are going to re-use the backward formula and replace two things: - # 1) all occurrences of "grad" with "foo_t", where foo is the name of the unique differentiable input. + # 1) all occurrences of "grad" with "foo_t.conj()", where foo is the name of the unique differentiable input. # 2) all usage of an original input "foo" with its primal value "foo_p". + # 3) conjugate the final result # For example, for abs, the backward formula is: # grad * self.sgn() # And this function generates a forward formula that is: - # self_t * self_p.sgn() + # (self_t.conj() * self_p.sgn()).conj() backward_formula = derivatives[0].original_formula input_name = args_with_derivatives[0].name # Do replacement 1) of the grad def repl(m: Any) -> str: - return f"{m.group(1)}{input_name}_t{m.group(2)}" + return f"{m.group(1)}{input_name}_t.conj(){m.group(2)}" fw_formula = re.sub(IDENT_REGEX.format("grad"), repl, backward_formula) # Do replacement 2) of the input variables @@ -195,6 +197,9 @@ def postprocess_forward_derivatives( return f"{m.group(1)}{arg_name}_p{m.group(2)}" fw_formula = re.sub(IDENT_REGEX.format(arg_name), repl, fw_formula) + # Do the final conjugate 3) + fw_formula = f"({fw_formula}).conj()" + # Since there is a single differentiable inputs and we necessarily need its tangent we can # simply require all differentiable input's tangent. required_inputs_tangent = tuple(all_arg_names) diff --git a/tools/autograd/templates/python_torch_functions.cpp b/tools/autograd/templates/python_torch_functions.cpp index e8df8310b42..5c0fd354161 100644 --- a/tools/autograd/templates/python_torch_functions.cpp +++ b/tools/autograd/templates/python_torch_functions.cpp @@ -414,6 +414,14 @@ static PyObject * THPVariable_sparse_csr_tensor(PyObject* self, PyObject* args, END_HANDLE_TH_ERRORS } +static PyObject * THPVariable__sparse_csr_tensor_unsafe(PyObject* self, PyObject* args, PyObject* kwargs) +{ + HANDLE_TH_ERRORS + jit::tracer::warn("torch._sparse_csr_tensor_unsafe", jit::tracer::WARN_CONSTRUCTOR); + return THPVariable_Wrap(torch::utils::_sparse_csr_tensor_unsafe_ctor(torch::tensors::get_default_dispatch_key(), torch::tensors::get_default_scalar_type(), args, kwargs)); + END_HANDLE_TH_ERRORS +} + static PyObject * THPVariable_sparse_coo_tensor(PyObject* self, PyObject* args, PyObject* kwargs) { HANDLE_TH_ERRORS @@ -493,9 +501,11 @@ static PyMethodDef torch_functions[] = { {"range", castPyCFunctionWithKeywords(THPVariable_range), METH_VARARGS | METH_KEYWORDS | METH_STATIC, NULL}, {"saddmm", castPyCFunctionWithKeywords(THPVariable_sspaddmm), METH_VARARGS | METH_KEYWORDS | METH_STATIC, NULL}, {"sparse_coo_tensor", castPyCFunctionWithKeywords(THPVariable_sparse_coo_tensor), METH_VARARGS | METH_KEYWORDS | METH_STATIC, NULL}, - {"sparse_csr_tensor", castPyCFunctionWithKeywords(THPVariable_sparse_csr_tensor), METH_VARARGS | METH_KEYWORDS | METH_STATIC, NULL}, {"_sparse_coo_tensor_unsafe", castPyCFunctionWithKeywords(THPVariable__sparse_coo_tensor_unsafe), METH_VARARGS | METH_KEYWORDS | METH_STATIC, NULL}, {"_validate_sparse_coo_tensor_args", castPyCFunctionWithKeywords(THPVariable__validate_sparse_coo_tensor_args), METH_VARARGS | METH_KEYWORDS | METH_STATIC, NULL}, + {"sparse_csr_tensor", castPyCFunctionWithKeywords(THPVariable_sparse_csr_tensor), METH_VARARGS | METH_KEYWORDS | METH_STATIC, NULL}, + {"_sparse_csr_tensor_unsafe", castPyCFunctionWithKeywords(THPVariable__sparse_csr_tensor_unsafe), METH_VARARGS | METH_KEYWORDS | METH_STATIC, NULL}, + {"_validate_sparse_csr_tensor_args", castPyCFunctionWithKeywords(THPVariable__validate_sparse_csr_tensor_args), METH_VARARGS | METH_KEYWORDS | METH_STATIC, NULL}, {"spmm", castPyCFunctionWithKeywords(THPVariable_mm), METH_VARARGS | METH_KEYWORDS | METH_STATIC, NULL}, {"tensor", castPyCFunctionWithKeywords(THPVariable_tensor), METH_VARARGS | METH_KEYWORDS | METH_STATIC, NULL}, {"get_device", castPyCFunctionWithKeywords(THPVariable_get_device), METH_VARARGS | METH_KEYWORDS | METH_STATIC, NULL}, diff --git a/tools/code_analyzer/default_op_deps.yaml b/tools/code_analyzer/default_op_deps.yaml index 3f100af1685..9b71fbf0e65 100644 --- a/tools/code_analyzer/default_op_deps.yaml +++ b/tools/code_analyzer/default_op_deps.yaml @@ -7433,6 +7433,45 @@ depends: - name: aten::eq - name: aten::is_nonzero +- name: aten::mish + depends: + - name: aten::as_strided_ + - name: aten::copy_ + - name: aten::empty + - name: aten::empty_like + - name: aten::empty_meta + - name: aten::empty_strided + - name: aten::eq + - name: aten::is_nonzero + - name: aten::resize_ + - name: aten::resize_as_ + - name: aten::mish + - name: aten::to +- name: aten::mish_ + depends: + - name: aten::eq + - name: aten::is_nonzero + - name: aten::mish +- name: aten::mish_backward + depends: + - name: aten::add + - name: aten::as_strided_ + - name: aten::copy_ + - name: aten::empty + - name: aten::empty_like + - name: aten::empty_meta + - name: aten::empty_strided + - name: aten::eq + - name: aten::fill_ + - name: aten::is_nonzero + - name: aten::mul + - name: aten::resize_ + - name: aten::resize_as_ + - name: aten::sigmoid + - name: aten::softplus + - name: aten::sub_ + - name: aten::to + - name: aten::tanh - name: aten::mkldnn_adaptive_avg_pool2d depends: - name: aten::eq diff --git a/tools/codegen/gen.py b/tools/codegen/gen.py index af2cab42d1f..9ce4ebcafcf 100644 --- a/tools/codegen/gen.py +++ b/tools/codegen/gen.py @@ -1026,7 +1026,7 @@ def main() -> None: 'native_function_declarations': list(concatMap( # Convert to a set first to remove duplicate kernel names. # Backends are allowed to repeat kernel names; only generate the declaration once! - lambda f: list(set(concatMap( + lambda f: list(OrderedDict.fromkeys(concatMap( lambda backend_idx: dest.compute_native_function_declaration(f, backend_idx), backend_indices.values()))), diff --git a/tools/pyi/gen_pyi.py b/tools/pyi/gen_pyi.py index 3a8db7db142..40ad0e4da9a 100644 --- a/tools/pyi/gen_pyi.py +++ b/tools/pyi/gen_pyi.py @@ -291,13 +291,19 @@ def gen_pyi(native_yaml_path: str, deprecated_yaml_path: str, fm: FileManager) - 'sparse_coo_tensor': ['def sparse_coo_tensor(indices: Tensor, values: Union[Tensor,List],' ' size: Optional[_size]=None, *, dtype: Optional[_dtype]=None,' ' device: Union[_device, str, None]=None, requires_grad:_bool=False) -> Tensor: ...'], - 'sparse_csr_tensor' : ['def sparse_csr_tensor(crow_indices: Tensor, col_indices: Tensor,' - ' values: Tensor, size: Optional[_size]=None,' + 'sparse_csr_tensor' : ['def sparse_csr_tensor(crow_indices: Union[Tensor, List],' + 'col_indices: Union[Tensor, List],' + ' values: Union[Tensor, List], size: Optional[_size]=None,' ' *, dtype: Optional[_dtype]=None,' ' device: Union[_device, str, None]=None, requires_grad:_bool=False) -> Tensor: ...'], '_sparse_coo_tensor_unsafe': ['def _sparse_coo_tensor_unsafe(indices: Tensor, values: Tensor, size: List[int],' ' dtype: Optional[_dtype] = None, device: Optional[_device] = None,' ' requires_grad: bool = False) -> Tensor: ...'], + '_sparse_csr_tensor_unsafe': ['def _sparse_csr_tensor_unsafe(crow_indices: Union[Tensor, List],' + 'col_indices: Union[Tensor, List],' + ' values: Union[Tensor, List], size: List[int],' + ' dtype: Optional[_dtype] = None, device: Optional[_device] = None,' + ' requires_grad: bool = False) -> Tensor: ...'], 'range': ['def range(start: Number, end: Number,' ' step: Number=1, *, out: Optional[Tensor]=None, {}) -> Tensor: ...' .format(FACTORY_PARAMS)], diff --git a/torch/_C/__init__.pyi.in b/torch/_C/__init__.pyi.in index 147af0558d0..3ce7f1ada9d 100644 --- a/torch/_C/__init__.pyi.in +++ b/torch/_C/__init__.pyi.in @@ -964,7 +964,7 @@ class DictType(JitType): def getValueType(self) -> JitType: ... class TupleType(JitType): - def __init__(self, a: List[JitType]) -> None: ... + def __init__(self, a: List[Optional[JitType]]) -> None: ... def elements(self) -> List[JitType]: ... class ClassType(JitType): diff --git a/torch/_jit_internal.py b/torch/_jit_internal.py index 748c4069341..94a72b5553b 100644 --- a/torch/_jit_internal.py +++ b/torch/_jit_internal.py @@ -17,6 +17,7 @@ import sys import builtins import io import pickle +import functools # This is needed. `torch._jit_internal` is imported before `torch.distributed.__init__`. # Explicitly ask to import `torch.distributed.__init__` first. # Otherwise, "AttributeError: module 'torch' has no attribute 'distributed'" is raised. @@ -60,6 +61,11 @@ def createResolutionCallbackFromEnv(lookup_base): while i < len(expr) and expr[i] not in (',', '[', ']'): i += 1 + # Special case logic for the empty Tuple as a subscript (used + # in the type annotation `Tuple[()]`) + if expr[:i] == '()': + return (), i + base = lookupInModule(expr[:i].strip(), module) assert base is not None, f"Unresolvable type {expr[:i]}" if i == len(expr) or expr[i] != '[': @@ -971,6 +977,9 @@ class SourceContext(torch._C._jit_tree_views.SourceRangeFactory): self.uses_true_division = uses_true_division self.filename = filename +@functools.lru_cache(maxsize=None) +def make_source_context(*args): + return SourceContext(*args) def fake_range(): return SourceContext('', None, 0, 0).make_raw_range(0, 1) diff --git a/torch/_tensor.py b/torch/_tensor.py index 4e93494aa82..8bc5d06d562 100644 --- a/torch/_tensor.py +++ b/torch/_tensor.py @@ -953,10 +953,14 @@ class Tensor(torch._C._TensorBase): while i < row_indices.size()[0] and row_indices[i] == irow: i += 1 ro.append(i) - - return torch.sparse_csr_tensor(torch.tensor(ro, dtype=row_indices.dtype), - coalesced_self.indices()[1], coalesced_self.values(), - size=coalesced_self.shape, dtype=coalesced_self.dtype) + device = coalesced_self.values().device + crow_indices = torch.tensor(ro, dtype=row_indices.dtype, device=device) + return torch.sparse_csr_tensor(crow_indices, + coalesced_self.indices()[1].contiguous(), + coalesced_self.values(), + size=coalesced_self.shape, + dtype=coalesced_self.dtype, + device=device) elif self.is_sparse_csr: return self else: diff --git a/torch/autograd/grad_mode.py b/torch/autograd/grad_mode.py index 7cbd5516e56..1cabb72b1e3 100644 --- a/torch/autograd/grad_mode.py +++ b/torch/autograd/grad_mode.py @@ -97,6 +97,10 @@ class no_grad(_DecoratorContextManager): Also functions as a decorator. (Make sure to instantiate with parenthesis.) + .. note:: + No-grad is one of several mechanisms that can enable or + disable gradients locally see :ref:`locally-disable-grad-doc` for + more information on how they compare. Example:: @@ -136,6 +140,10 @@ class enable_grad(_DecoratorContextManager): Also functions as a decorator. (Make sure to instantiate with parenthesis.) + .. note:: + enable_grad is one of several mechanisms that can enable or + disable gradients locally see :ref:`locally-disable-grad-doc` for + more information on how they compare. Example:: @@ -178,6 +186,10 @@ class set_grad_enabled(object): (``False``). This can be used to conditionally enable gradients. + .. note:: + set_grad_enabled is one of several mechanisms that can enable or + disable gradients locally see :ref:`locally-disable-grad-doc` for + more information on how they compare. Example:: @@ -222,6 +234,11 @@ class inference_mode(_DecoratorContextManager): Also functions as a decorator. (Make sure to instantiate with parenthesis.) + .. note:: + Inference mode is one of several mechanisms that can enable or + disable gradients locally see :ref:`locally-disable-grad-doc` for + more information on how they compare. + Args: mode (bool): Flag whether to enable or disable inference mode diff --git a/torch/csrc/api/include/torch/enum.h b/torch/csrc/api/include/torch/enum.h index 7e662fc83b4..e19b0ebe34c 100644 --- a/torch/csrc/api/include/torch/enum.h +++ b/torch/csrc/api/include/torch/enum.h @@ -104,6 +104,7 @@ TORCH_ENUM_DECLARE(Tanh) TORCH_ENUM_DECLARE(ReLU) TORCH_ENUM_DECLARE(GELU) TORCH_ENUM_DECLARE(SiLU) +TORCH_ENUM_DECLARE(Mish) TORCH_ENUM_DECLARE(LeakyReLU) TORCH_ENUM_DECLARE(FanIn) TORCH_ENUM_DECLARE(FanOut) @@ -147,6 +148,7 @@ struct _compute_enum_name { TORCH_ENUM_PRETTY_PRINT(ReLU) TORCH_ENUM_PRETTY_PRINT(GELU) TORCH_ENUM_PRETTY_PRINT(SiLU) + TORCH_ENUM_PRETTY_PRINT(Mish) TORCH_ENUM_PRETTY_PRINT(LeakyReLU) TORCH_ENUM_PRETTY_PRINT(FanIn) TORCH_ENUM_PRETTY_PRINT(FanOut) diff --git a/torch/csrc/api/include/torch/nn/functional/activation.h b/torch/csrc/api/include/torch/nn/functional/activation.h index 3230cacf425..a0487c61835 100644 --- a/torch/csrc/api/include/torch/nn/functional/activation.h +++ b/torch/csrc/api/include/torch/nn/functional/activation.h @@ -348,6 +348,12 @@ inline Tensor silu(const Tensor& input) { // ============================================================================ +inline Tensor mish(const Tensor& input) { + return torch::mish(input); +} + +// ============================================================================ + inline Tensor prelu(const Tensor& input, const Tensor& weight) { return torch::prelu(input, weight); } diff --git a/torch/csrc/api/include/torch/nn/modules/activation.h b/torch/csrc/api/include/torch/nn/modules/activation.h index 64b96b2bed3..865914ec887 100644 --- a/torch/csrc/api/include/torch/nn/modules/activation.h +++ b/torch/csrc/api/include/torch/nn/modules/activation.h @@ -606,6 +606,28 @@ class TORCH_API SiLUImpl : public torch::nn::Cloneable { /// module storage semantics. TORCH_MODULE(SiLU); +// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mish ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +/// Applies mish over a given input. +/// See https://pytorch.org/docs/master/nn.html#torch.nn.Mish to learn +/// about the exact behavior of this module. +// NOLINTNEXTLINE(bugprone-exception-escape) +class TORCH_API MishImpl : public torch::nn::Cloneable { + public: + Tensor forward(const Tensor& input); + + void reset() override; + + /// Pretty prints the `Mish` module into the given `stream`. + void pretty_print(std::ostream& stream) const override; +}; + +/// A `ModuleHolder` subclass for `MishImpl`. +/// See the documentation for `MishImpl` class to learn what methods it +/// provides, or the documentation for `ModuleHolder` to learn about PyTorch's +/// module storage semantics. +TORCH_MODULE(Mish); + // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sigmoid ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Applies sigmoid over a given input. diff --git a/torch/csrc/api/src/nn/modules/activation.cpp b/torch/csrc/api/src/nn/modules/activation.cpp index 74f3719ad4b..3c4d2b8c98f 100644 --- a/torch/csrc/api/src/nn/modules/activation.cpp +++ b/torch/csrc/api/src/nn/modules/activation.cpp @@ -308,6 +308,18 @@ void SiLUImpl::pretty_print(std::ostream& stream) const { // ============================================================================ +Tensor MishImpl::forward(const Tensor& input) { + return F::mish(input); +} + +void MishImpl::reset() {} + +void MishImpl::pretty_print(std::ostream& stream) const { + stream << "torch::nn::Mish()"; +} + +// ============================================================================ + Tensor SigmoidImpl::forward(const Tensor& input) { return torch::sigmoid(input); } diff --git a/torch/csrc/autograd/FunctionsManual.cpp b/torch/csrc/autograd/FunctionsManual.cpp index 656c313364e..110f81e34cd 100644 --- a/torch/csrc/autograd/FunctionsManual.cpp +++ b/torch/csrc/autograd/FunctionsManual.cpp @@ -1140,6 +1140,15 @@ Tensor infinitely_differentiable_silu_backward( return grad_output * sigmoid * (1.0 + input * (1.0 - sigmoid)); } +Tensor infinitely_differentiable_mish_backward( + const Tensor& grad_output, + const Tensor& input) { + const Tensor sigmoid = input.sigmoid(); + const Tensor softplus = input.exp().log1p(); + const Tensor tanh_softplus = softplus.tanh(); + return grad_output * (tanh_softplus + input * sigmoid * (1.0 - tanh_softplus * tanh_softplus)); +} + Tensor infinitely_differentiable_logit_backward( const Tensor& grad, const Tensor& self, diff --git a/torch/csrc/autograd/FunctionsManual.h b/torch/csrc/autograd/FunctionsManual.h index c11bb7ac263..44778f4bb7c 100644 --- a/torch/csrc/autograd/FunctionsManual.h +++ b/torch/csrc/autograd/FunctionsManual.h @@ -110,6 +110,7 @@ at::Tensor max_pool_double_backward(const at::Tensor & grad, const at::Tensor & at::Tensor glu_double_backward(const at::Tensor & grad, const at::Tensor & grad_output, const at::Tensor & input, int64_t dim); at::Tensor glu_double_backward_grad_output(const at::Tensor & grad, const at::Tensor & input, int64_t dim); at::Tensor infinitely_differentiable_silu_backward(const at::Tensor& grad_output, const at::Tensor& input); +at::Tensor infinitely_differentiable_mish_backward(const at::Tensor& grad_output, const at::Tensor& input); Tensor infinitely_differentiable_logit_backward(const Tensor& grad, const Tensor& self, c10::optional eps); at::Tensor kl_div_double_backward_grad_output(const at::Tensor & grad, const at::Tensor & input, const at::Tensor & target, int64_t reduction, bool log_target); at::Tensor binary_cross_entropy_with_logits_target_backward(const at::Tensor& grad_output, const at::Tensor& self, const at::Tensor& target, const c10::optional& weight, const c10::optional& pos_weight, int64_t reduction); diff --git a/torch/csrc/cuda/shared/cudart.cpp b/torch/csrc/cuda/shared/cudart.cpp index a8f80a35855..30a43bed053 100644 --- a/torch/csrc/cuda/shared/cudart.cpp +++ b/torch/csrc/cuda/shared/cudart.cpp @@ -6,6 +6,7 @@ #else #include #endif +#include namespace torch { namespace cuda { namespace shared { @@ -38,6 +39,13 @@ void initCudartBindings(PyObject* module) { #ifndef __HIP_PLATFORM_HCC__ cudart.def("cuda" "ProfilerInitialize", cudaProfilerInitialize); #endif + cudart.def("cuda" "MemGetInfo", [](int device) -> std::pair { + C10_CUDA_CHECK(cudaGetDevice(&device)); + size_t device_free; + size_t device_total; + cudaMemGetInfo(&device_free, &device_total); + return {device_free, device_total}; + }); } } // namespace shared diff --git a/torch/csrc/jit/backends/backend_debug_handler.cpp b/torch/csrc/jit/backends/backend_debug_handler.cpp index b0d4fd3daa3..d21e4efd568 100644 --- a/torch/csrc/jit/backends/backend_debug_handler.cpp +++ b/torch/csrc/jit/backends/backend_debug_handler.cpp @@ -20,7 +20,7 @@ int64_t BackendDebugInfoRecorder::getNextDebugHandle(const Node* node) { DebugHandleType debug_handle = unique_debug_handle_; const SourceRange& range = node->sourceRange(); handles_to_inlined_callstack_ptrs_[debug_handle] = - std::make_pair(range, cs_ptr); + std::make_tuple(range, node->kind().toQualString(), cs_ptr); // This increment is with seq memory order. // Not trying to perf optimizing this for now. unique_debug_handle_++; diff --git a/torch/csrc/jit/backends/backend_debug_handler.h b/torch/csrc/jit/backends/backend_debug_handler.h index 1e121f0ad04..60727bfcc24 100644 --- a/torch/csrc/jit/backends/backend_debug_handler.h +++ b/torch/csrc/jit/backends/backend_debug_handler.h @@ -13,7 +13,7 @@ namespace jit { * BackendDebugHandleManager is responsible for issuing debug handles to * backends. Debug handles are associated with nodes of a graph. * BackendDebugHandleManager also maintains a map - * [debug-handle, DebugInfoPair = {source range, inlined callstack ptr]} that + * [debug-handle, DebugInfoTuple = {source range, inlined callstack ptr]} that * will help generate a callstack for exception raised using debug handles. * Effectively debug handles are something that is given to backend and later * when an exception occurs in the backend, backend can tell, using debug @@ -21,14 +21,14 @@ namespace jit { * callstack correspoding to the exception. * There are two parts to BackendDebugHandleManager: * 1. static std::atomic debug_handle - * 2. Map of [debug-handle, DebugInfoPair] + * 2. Map of [debug-handle, DebugInfoTuple] * * About 1: * Why do they have to be unique. The reason is that by ensuring * uniqueness of debug handles, we remove the burden of another layer of * mapping where we need to say this set of debug handles were generated for * this lowered module or this bytecode function. This simplifies the API for - * serialization since debug handles can uniquely identify DebugInfoPair. + * serialization since debug handles can uniquely identify DebugInfoTuple. * Thus simplifies the runtime API for throwing exception. Exception throwing * only needs to know debug_handle and not which module or method threw it. * There are 2 issues to keep in mind, though,for static std::atomic @@ -40,8 +40,8 @@ namespace jit { * done. * * Now about 2: - * There are two usecases for [debug-handle, DebugInfoPair] - * A. During bytecode generation the DebugInfoPair corresponding to the nodes + * There are two usecases for [debug-handle, DebugInfoTuple] + * A. During bytecode generation the DebugInfoTuple corresponding to the nodes * of the inlined graph being serialized, are stored in this object and a * unique debug handle is returned. This unique debug handle is stored in * mobile_debug info for pytorch lite models. It will be used for raising @@ -52,13 +52,13 @@ namespace jit { * the debug handles provide a way to map nodes of the graph to the model level * debug info. * - * During byte-code model serialization, [debug-handle, DebugInfoPair] is + * During byte-code model serialization, [debug-handle, DebugInfoTuple] is * serialized. Now we know a. debug handles and b. how to map debug handles to * model source code. Thus we can either do eager symbolication by converting * debug handles to corresponding source code at runtime, or do lazy * symbolicattion offline. * - * Note that it is not necessary to serialize [debug-handle, DebugInfoPair] + * Note that it is not necessary to serialize [debug-handle, DebugInfoTuple] * corresponding to lowered backend if the lowering process, that is * preprocess/compile, and execution happens in the same session, then eager * symbolication can be employed. @@ -66,15 +66,15 @@ namespace jit { * Now how does BackendDebugHandleManager capture all of the above? * By providing two API. * 1. getNextDebugHandle which given a Node* returns a unique debug handle, - * that will uniquely identify DebugInfoPair. + * that will uniquely identify DebugInfoTuple. * and * 2. getCallStackPtrMap which returns the map - * [debug-handle, DebugInfoPair] + * [debug-handle, DebugInfoTuple] * * 1 provides debug handles to backends and 2 provides runtime a way to map * debug handles to source level debug info. * - * So why does debug handle map to DebugInfoPair = {source range and inlined + * So why does debug handle map to DebugInfoTuple = {source range and inlined * cs}? {debug_handle, source_range_tag, serialized_callstack} Take this * example: class L(nn.Module): def __init__(self): * ... @@ -112,7 +112,7 @@ namespace jit { using DebugHandleType = int64_t; using BackendDebugInfoMapType = - std::unordered_map; + std::unordered_map; /* * This class is used to generate debug info map. diff --git a/torch/csrc/jit/ir/scope.cpp b/torch/csrc/jit/ir/scope.cpp index 474dc47cc9f..b3fd559dcea 100644 --- a/torch/csrc/jit/ir/scope.cpp +++ b/torch/csrc/jit/ir/scope.cpp @@ -88,7 +88,11 @@ InlinedCallStackPtr InlinedCallStack::intrusive_from_this() { } InlinedCallStack::InlinedCallStack(Function* fn, SourceRange source_range) - : fn_(fn), source_range_(std::move(source_range)) {} + : fn_(fn), source_range_(std::move(source_range)) { + if (fn_) { + set_function_name(fn_->name()); + } +} InlinedCallStack::InlinedCallStack( Function* fn, @@ -96,7 +100,11 @@ InlinedCallStack::InlinedCallStack( c10::optional module_instance_info) : fn_(fn), source_range_(std::move(source_range)), - module_instance_info_(std::move(module_instance_info)) {} + module_instance_info_(std::move(module_instance_info)) { + if (fn_) { + set_function_name(fn_->name()); + } +} InlinedCallStack::InlinedCallStack( InlinedCallStackPtr callee, @@ -104,7 +112,11 @@ InlinedCallStack::InlinedCallStack( SourceRange source_range) : callee_(std::move(callee)), fn_(fn), - source_range_(std::move(source_range)) {} + source_range_(std::move(source_range)) { + if (fn_) { + set_function_name(fn_->name()); + } +} InlinedCallStack::InlinedCallStack( InlinedCallStackPtr callee, @@ -114,7 +126,11 @@ InlinedCallStack::InlinedCallStack( : callee_(std::move(callee)), fn_(fn), source_range_(std::move(source_range)), - module_instance_info_(std::move(module_instance_info)) {} + module_instance_info_(std::move(module_instance_info)) { + if (fn_) { + set_function_name(fn_->name()); + } +} c10::optional InlinedCallStack::callee() const { return callee_; @@ -132,6 +148,18 @@ SourceRange InlinedCallStack::source_range() const { return source_range_; } +Function* InlinedCallStack::function() const { + return fn_; +} + +void InlinedCallStack::set_function_name(std::string fn_name) { + fn_name_ = std::move(fn_name); +} + +std::string InlinedCallStack::function_name() const { + return fn_name_; +} + std::vector InlinedCallStack::vec() { std::vector r; c10::optional current = intrusive_from_this(); diff --git a/torch/csrc/jit/ir/scope.h b/torch/csrc/jit/ir/scope.h index c0155e5db94..83d4e8fdd13 100644 --- a/torch/csrc/jit/ir/scope.h +++ b/torch/csrc/jit/ir/scope.h @@ -120,6 +120,15 @@ struct TORCH_API InlinedCallStack : public c10::intrusive_ptr_target { private: c10::optional callee_; Function* fn_; + // Reason for fn_name_ even though we have fn_ + // Serialized callstack is used in circustmances where InlinedCallstack + // cannot be constructed during runtime, e.g. mobile runtime or + // delegated backends. + // Since in those cases we do not have Function* we store function name + // fn_name does not give you access to the same information that Function* + // does, however in mobile/delegated backend runtime we use InlindedCallStack + // for exception stack and for that purpose fn_name_ suffices. + std::string fn_name_; SourceRange source_range_; InlinedCallStackPtr intrusive_from_this(); c10::optional module_instance_info_; @@ -155,6 +164,12 @@ struct TORCH_API InlinedCallStack : public c10::intrusive_ptr_target { // Returns the source range of the node SourceRange source_range() const; + Function* function() const; + + void set_function_name(std::string fn_name); + + std::string function_name() const; + // Return callstack as a vector of [Function, SourceRange] pairs. std::vector vec(); @@ -175,6 +190,13 @@ struct TORCH_API InlinedCallStack : public c10::intrusive_ptr_target { } }; -using DebugInfoPair = std::pair; +// {source range, node name, InlinedCallStack} +// We store node name because same debug infor will be used for +// profiling as well, so we need to know op names as well. +using DebugInfoTuple = + std::tuple; +constexpr size_t kDebugInfoTupleSourceRangeIndex{0}; +constexpr size_t kDebugInfoTupleNodeNameIndex{1}; +constexpr size_t kDebugInfoTupleInlinedCSIndex{2}; } // namespace jit } // namespace torch diff --git a/torch/csrc/jit/mobile/backport.h b/torch/csrc/jit/mobile/backport.h index 845bb12298b..3e82a1e78af 100644 --- a/torch/csrc/jit/mobile/backport.h +++ b/torch/csrc/jit/mobile/backport.h @@ -1,5 +1,6 @@ #pragma once +#include #include #include diff --git a/torch/csrc/jit/mobile/backport_manager.cpp b/torch/csrc/jit/mobile/backport_manager.cpp index 25d12c9c566..37eb4b781c3 100644 --- a/torch/csrc/jit/mobile/backport_manager.cpp +++ b/torch/csrc/jit/mobile/backport_manager.cpp @@ -6,8 +6,11 @@ #include #include #include +#include +#include #include #include +#include namespace torch { namespace jit { @@ -86,31 +89,50 @@ void selective_copy( } } -bool check_bytecode_version( - const std::vector& bytecode_values, - const int64_t expect_bytecode_version) { - if (bytecode_values.empty()) { - TORCH_WARN("Empty bytecode archive."); - return false; - } else if (bytecode_values[0] != expect_bytecode_version) { - TORCH_WARN( - "Expect bytecode version ", - expect_bytecode_version, - ", but it gets ", - bytecode_values[0]); - return false; - } - return true; +// Copy all content from reader to stringstream +void get_model_stream(PyTorchStreamReader& reader, std::stringstream& out) { + auto writer_func = [&](const void* buf, size_t nbytes) -> size_t { + out.write(static_cast(buf), nbytes); + return !out ? 0 : nbytes; + }; + PyTorchStreamWriter writer(writer_func); + selective_copy( + reader, + writer, + std::unordered_set({"version"}), + std::unordered_set()); } } // namespace -// To add next backport -// function, for example, backport_vn_to_vn-1, create an anonymous namespace -// with a backport_vn_to_vn-1 function + other necessary customized function. If -// a function can be reused by other backport functions, move it to the utility -// function group. It will be easier to split out backport_manager.cpp to -// smaller files when it grows too long. +/* + To add next backport function, for example, backport_vn_to_vn-1, create an + anonymous namespace with a backport_vn_to_vn-1 function + other necessary + customized function. If a function can be reused by other backport functions, + move it to the utility function group. It will be easier to split out + backport_manager.cpp to smaller files when it grows too long. + + How to add backport_v{i}_to_v{i-1} ? + There are two options: + 1) [Format change only, recommended] Constrcut a reader with the + input_model_stream, modify the file, and use PyTorchWriter to write it to + output_model_stream. See backport_v5_to_v4. + + 2) [Both format and content change] ]Use torch.jit.load() to load the stream, + and save it to output_model_stream. + + The first option is preferred, because it will be purely format change, and + the model doesn't need to go through inline again and model content will + remain the same. + + A note for manipulate stringstream, it's recommend to declare a new + stringstream, tmp_stream, and swap it with the argument output_model_stream + once it's ready, output_model_stream.swap(tmp_stream). Do not use + output_model_stream.clear(). It only clears out error state flag + (https://www.cplusplus.com/reference/ios/ios/clear/), while the content is the + same. It's cleaner to just declare a new one and swap. + +*/ // The functions needed for backport model from v5 to v4. namespace { @@ -145,15 +167,10 @@ void writeArchiveV4( writer.writeRecord(fname, data.data(), data.size()); } -bool backport_v5_to_v4( - PyTorchStreamReader& reader, - PyTorchStreamWriter& writer) { +std::stringstream backport_v5_to_v4(std::stringstream& input_model_stream) { // 1) read from archive `bytecode` archive + PyTorchStreamReader reader(&input_model_stream); std::vector

See details ›

10. PyTorch Error: mat1 and mat2 shapes cannot be multiplied - Sling Academy

Jul 7, 2023 · This error occurs when you try to perform a matrix multiplication using torch.matmul() or torch.mm() with two tensors that have incompatible ...
Overview When working with PyTorch, you might encounter the following error: RuntimeError: mat1 and mat2 shapes cannot be multiplied This error occurs

See details ›

11. Pytorch Stable Baselines3 运行时错误：mat1和mat2必须具有相同的 ...

在使用Pytorch Stable Baselines3进行强化学习任务时，有时会遇到 RuntimeError: mat1 and mat2 must have the same dtype 错误。这个错误通常是由于输入数据的数据类型（ ...
Pytorch Stable Baselines3 运行时错误：mat1和mat2必须具有相同的dtype 在本文中，我们将介绍Pytorch Stable Baselines3中遇到的一种常见运行时错误：RuntimeError: mat1 and mat2 must have the same dtype，并提供解决方法和示例说明。阅读更多：Pytorch 教程什么是Pytorch

See details ›

12. Tensor objects - torch for R

addmm(mat1, mat2, *, beta=1, alpha=1) -> Tensor. See ?torch_addmm. addmm_. addmm ... self must have floating point dtype , and the result will have the same dtype ...
torch

See details ›

13. torch.Tensor — PyTorch master documentation

... mat1, mat2) → Tensor. See torch.addmm(). addmm_ (beta=1, mat, alpha=1, mat1 ... self must have floating point dtype , and the result will have the same dtype .
Shortcuts

See details ›

14. [PDF] torch: Tensors and Neural Networks with 'GPU' Acceleration - CRAN

... have gradients normalized clip_value. (float or int): maximum allowed value of the ... dtype) the desired data type of returned tensor. Has to be one of the.

Free Download ›

15. RuntimeError: Expected all tensors to be on the same device, but found at ...

May 11, 2023 · I found the code great. Sorry maybe this is a stupid question, how can I use the generated “model(**tokenizer(“Hello World”, return_tensors=“pt”)) ...
The code is below. It runs on 1 GPU. But fails on 2 or more GPU. from transformers import AutoTokenizer, DataCollatorWithPadding, TrainingArguments, Trainer, AutoModelForCausalLM from peft import get_peft_config, get_peft_model, PromptTuningInit, PromptTuningConfig, TaskType, PeftType from torch.utils.data import TensorDataset, DataLoader,Dataset device = torch.device("cuda" if torch.cuda.is_available() else "cpu") tokenizer = AutoTokenizer.from_pretrained("dolly-v2-3b") model = AutoModelForC...

See details ›

16. F.linear(input self.weight self.bias) runtimeerror mat1 and mat2 must ...

F.linear(input self.weight self.bias) runtimeerror mat1 and mat2 must have the same dtype Web25 Jan 2022 · RuntimeError: mat1 and mat2 shapes cannot be ...

See details ›

17. 稳定的Baselines3 RuntimeError: mat1和mat2必须具有相同的dtype-腾讯云 ...

bias) RuntimeError: mat1 and mat2 must have the same dtype 复制. 行动和观察空间：. self.action_space = Box(low=-1., high=1., shape=(2,), dtype=np.float) self ...
我试图在稳定的Baselines3中使用自定义环境来实现SAC，并且我不断地得到标题中的错误。错误发生在任何off策略算法中，而不仅仅是SAC。回溯：File "\src\main.py", line 70, in main()File "\src\main.py", line 66, in main mod

See details ›

18. samba.functional — SambaFlow documentation

dtype – the desired data type of returned tensor. If specified, the input tensor is cast to dtype before the operation is performed. This is useful for ...
SambaFlow

See details ›

19. torch-PyTorch 1.0 中文文档 & 教程

The returned out tensor only has values 0 or 1 and is of the same shape as input . out can have integral dtype , but :attr input must have floating point dtype ...
torch,PyTorch 1.0 中文文档 & 教程

See details ›

20. mindspore.common.tensor — TinyMS alpha documentation

[docs] def addmm(self, mat1, mat2, *, beta=1, alpha=1): r""" For details ... dtype and shape must have values at the same time.") if input_data is not ...
[docs]class Tensor(Tensor_, metaclass=_TensorMeta): """ Tensor is a data structure that stores an n-dimensional array. Args: input_data (Union[Tensor, float, int, bool, tuple, list, numpy.ndarray]): The data to be stored. It can be another Tensor, Python number or NumPy ndarray. Default: None. dtype (:class:`mindspore.dtype`): Used to indicate the data type of the output Tensor. The argument should be defined in `mindspore.dtype`. If it is None, the data type of the output Tensor will be the same as the `input_data`. Default: None. shape (Union[tuple, list, int]): Used to indicate the shape of the output Tensor. The argument should be a list of integers, a tuple of integers or an integer. If `input_data` is available, `shape` doesn't need to be set. If None in shape, a tensor of dynamic shape is created, `input_data` doesn't need to be set; if None not in shape, a tensor of static shape is created, `input_data` or `init` must be set. Default: None. init (Initializer): The information of init data. 'init' is used for delayed initialization in parallel mode. Usually, it is not recommended to use 'init' interface to initialize Tensor in the other conditions. If 'init' interface is used to initialize Tensor, the `Tensor.init_data` API needs to be called to convert `Tensor` to the actual data. Default: None. internal (bool): Whether it is created by the framework. 'True' means that the tensor is created by framework. 'False' means that the tensor is created by user. Default: False const_arg (bool): Whether the tensor is a constant when it is used for the argument of a network. Default: False. Outputs: Tensor. Note: The default value None of `input_data` works as a placeholder, it does not mean that we can create a NoneType Tensor. Examples: >>> import numpy as np >>> import mindspore as ms >>> from mindspore import Tensor >>> from mindspore.common.initializer import One >>> # initialize a tensor with numpy.ndarray >>> t1 = Tensor(np.zeros([1, 2, 3]), ms.float32) >>> print(t1) [[[0. 0. 0.] [0. 0. 0.]]] >>> print(type(t1)) >>> print(t1.shape) (1, 2, 3) >>> print(t1.dtype) Float32 >>> >>> # initialize a tensor with a float scalar >>> t2 = Tensor(0.1) >>> print(t2) 0.1 >>> print(type(t2)) >>> print(t2.shape) () >>> print(t2.dtype) Float32 >>> >>> # initialize a tensor with a tuple >>> t3 = Tensor((1, 2)) >>> print(t3) [1 2] >>> print(type(t3)) >>> print(t3.shape) (2,) >>> print(t3.dtype) Int64 ... >>> # initialize a tensor with init >>> t4 = Tensor(shape = (1, 3), dtype=ms.float32, init=One()) >>> print(t4) [[1. 1. 1.]] >>> print(type(t4)) >>> print(t4.shape) (1, 3) >>> print(t4.dtype) Float32 """ delta_seed = 0 def __init__(self, input_data=None, dtype=None, shape=None, init=None, internal=False, const_arg=False): self.init_finished = False if is_stub_tensor(input_data): input_data = input_data.stub_sync() if internal: if input_data is not None: Tensor_.__init__(self, input_data) else: # If input data is numpy number, convert it to np array if isinstance(input_data, np_types): input_data = np.array(input_data) if isinstance(shape, numbers.Number): shape = (shape,) _check_tensor_input(input_data, dtype, shape, init) # If input_data is tuple/list/numpy.ndarray, it's support in check_type method. if (isinstance(shape, (list, tuple)) and None in shape) or init is not None: shape = _check_tensor_dynamic_shape(dtype, shape, init) Tensor_.__init__(self, dtype, shape) else: _check_input_data_type(input_data) if dtype is not None: validator.check_type_name('dtype', dtype, mstype.number_type + (mstype.bool_, mstype.string), "Tensor") else: dtype = self._set_default_dtype(input_data, dtype) if isinstance(input_data, np.ndarray) and (not input_data.flags['FORC']): input_data = np.ascontiguousarray(input_data) if dtype is not None: Tensor_.__init__(self, input_data, dtype) else: Tensor_.__init__(self, input_data) validator.check_value_type('const_arg', const_arg, bool, 'Tensor') self.const_arg = const_arg self.virtual_flag = False self.init = init self.init_finished = True # if cur Tensor is a index value of another Tensor, # parent_tensor_ set to another Tensor # index_of_parent_ will set to the index self.parent_tensor_ = None self.index_of_parent_ = None self.slice_num_of_persistent_data_ = None self.slice_shape_of_persistent_data_ = None @classmethod def __subclasshook__(cls, sub): """ Subclass with stub_sync attr will be instance of Tensor """ if cls is Tensor: if any("stub_sync" in s.__dict__ for s in sub.__mro__): return True return NotImplemented @staticmethod def _set_default_dtype(input_data, dtype): """Set tensor default dtype""" if isinstance(input_data, (float, list, tuple)): if np.array(input_data).dtype == np.float64: return mstype.float32 if isinstance(input_data, (int, list, tuple)): if np.array(input_data).dtype == np.int32 or np.array(input_data).dtype == np.int64: return mstype.int64 return dtype def __deepcopy__(self, memodict): new_obj = Tensor(self) new_obj.init = self.init new_obj.virtual_flag = self.virtual_flag new_obj.const_arg = self.const_arg return new_obj def __repr__(self): if self.init_finished: Tensor_.data_sync(self, True) return Tensor_.__repr__(self) return '' def __eq__(self, other): if not isinstance(other, (int, float, Tensor)): return False # bool type is not supported for `Equal` operator in backend. if self.dtype == mstype.bool_ or (isinstance(other, Tensor) and other.dtype == mstype.bool_): if isinstance(other, Tensor): return Tensor(np.array(self.asnumpy() == other.asnumpy())) return Tensor(np.array(self.asnumpy() == other)) return tensor_operator_registry.get('__eq__')(self, other) def __ne__(self, other): if not isinstance(other, (int, float, Tensor)): return True # bool type is not supported for `NotEqual` operator in backend. if self.dtype == mstype.bool_ or (isinstance(other, Tensor) and other.dtype == mstype.bool_): return Tensor(np.array(self.asnumpy() != other.asnumpy())) return tensor_operator_registry.get('__ne__')(self, other) def __hash__(self): return hash(id(self)) def __neg__(self): out = tensor_operator_registry.get('__neg__')(self) return out def __invert__(self): out = tensor_operator_registry.get('__logical_not__')(self) return out def __round__(self): out = tensor_operator_registry.get('round')()(self) return out def __bool__(self): data = self.asnumpy() if data.shape == (): return bool(data) if data.shape == (1,): return bool(data[0]) raise ValueError("The truth value of an array with more than one element is ambiguous.") @staticmethod def _convert_scalar_(data, func, message): if data.shape == (): return func(data) if data.shape == (1,): return func(data[0]) raise ValueError(message) def __int__(self): data = self.asnumpy() return self._convert_scalar_(data, int, "Only one element tensors can be converted to Python scalars") def __float__(self): data = self.asnumpy() return self._convert_scalar_(data, float, "Only one element tensors can be converted to Python scalars") def __index__(self): data = self.asnumpy() if not (data.dtype == "int8" or data.dtype == "int16" or data.dtype == "int32" or data.dtype == "int64" or data.dtype == "bool"): raise ValueError("Only integer tensors of a single element can be converted to an index.") return self._convert_scalar_(data, int, "Only integer tensors of a single element can be converted to an index.") def __pos__(self): return self def __abs__(self): data = abs(self.asnumpy()) if isinstance(data, np.number): data = np.array(data) return Tensor(data) def __add__(self, other): return tensor_operator_registry.get('__add__')(self, other) def __and__(self, other): if isinstance(other, (int, bool, float, Tensor)): return tensor_operator_registry.get('bitwise_and')(self, other) raise TypeError("Unsupported operand type(s) for &: 'Tensor' and '{}'".format(type(other))) def __xor__(self, other): if isinstance(other, (int, bool, float, Tensor)): return tensor_operator_registry.get('bitwise_xor')(self, other) raise TypeError("Unsupported operand type(s) for ^: 'Tensor' and '{}'".format(type(other))) def __or__(self, other): if isinstance(other, (int, bool, float, Tensor)): return tensor_operator_registry.get('bitwise_or')(self, other) raise TypeError("Unsupported operand type(s) for |: 'Tensor' and '{}'".format(type(other))) def __radd__(self, other): return self.__add__(other) def __iadd__(self, other): return self.__add__(other) def __sub__(self, other): return tensor_operator_registry.get('__sub__')(self, other) def __rsub__(self, other): return tensor_operator_registry.get('__sub__')(other, self) def __isub__(self, other): return self.__sub__(other) def __mul__(self, other): return tensor_operator_registry.get('__mul__')(self, other) def __rmul__(self, other): return self.__mul__(other) def __imul__(self, other): return self.__mul__(other) def __matmul__(self, other): return tensor_operator_registry.get('__matmul__')(self, other) def __rmatmul__(self, other): return tensor_operator_registry.get('__matmul__')(other, self) def __imatmul__(self, other): return self.__matmul__(other) def __truediv__(self, other): return tensor_operator_registry.get('__truediv__')(self, other) def __rtruediv__(self, other): return tensor_operator_registry.get('__truediv__')(other, self) def __mod__(self, other): return tensor_operator_registry.get('__mod__')(self, other) def __rmod__(self, other): return tensor_operator_registry.get('__mod__')(other, self) def __imod__(self, other): return self.__mod__(other) def __pow__(self, other): return tensor_operator_registry.get('__pow__')(self, other) def __rpow__(self, other): return tensor_operator_registry.get('__rpow__')(self, other) def __floordiv__(self, other): return tensor_operator_registry.get('__floordiv__')(self, other) def __rfloordiv__(self, other): return tensor_operator_registry.get('__floordiv__')(other, self) def __ifloordiv__(self, other): return self.__floordiv__(other) def __lt__(self, other): out = tensor_operator_registry.get('__lt__')(self, other) return out def __le__(self, other): out = tensor_operator_registry.get('__le__')(self, other) return out def __getitem__(self, index): out = tensor_operator_registry.get('__getitem__')(self, index) if out is not self: out.parent_tensor_ = self out.index_of_parent_ = index return out def __setitem__(self, index, value): out = tensor_operator_registry.get('__setitem__')(self, index, value) self.assign_value(out) if self.parent_tensor_ is not None and self.index_of_parent_ is not None: self.parent_tensor_.__setitem__(self.index_of_parent_, self) return self def __gt__(self, other): out = tensor_operator_registry.get('__gt__')(self, other) return out def __ge__(self, other): out = tensor_operator_registry.get('__ge__')(self, other) return out def __len__(self): out = tensor_operator_registry.get('shape')(self) if out: return out[0] raise TypeError("Not support len of a 0-D tensor") def __str__(self): if self.dtype == mstype.type_none: return "Unknown Tensor type!" return str(self.asnumpy()) @property def shape(self): """ For details, please refer to :func:`mindspore.ops.shape`. """ return self._shape @property def dtype(self): """Return the dtype of the tensor (:class:`mindspore.dtype`).""" return self._dtype @property def size(self): """ For details, please refer to :func:`mindspore.ops.size`. """ return self._size @property def ndim(self): """Return the number of tensor dimensions.""" return len(self._shape) @property def H(self): """ Returns a view of a matrix (2-D tensor) conjugated and transposed. x.H is equivalent to `mindspore.Tensor.swapaxes(0, 1).conj()` for complex matrices and `mindspore.Tensor.swapaxes(0, 1)` for real matrices. """ if self.ndim != 2: raise ValueError(f"For tensor.H only support 2-D Tensor, but got {self.ndim}-D.") output = self.swapaxes(0, 1) if self.dtype in (mstype.complex64, mstype.complex128): return output.conj() return output @property def has_init(self): """Whether tensor is initialized.""" return self.init is not None @property def itemsize(self): """Return the length of one tensor element in bytes.""" return self._itemsize @property def strides(self): """Return the tuple of bytes to step in each dimension when traversing a tensor.""" return self._strides @property def nbytes(self): """Return the total number of bytes taken by the tensor.""" return self._nbytes @property def T(self): """Return the transposed tensor.""" return self.transpose()

See details ›

21. PyTorch - Supported Linear Algebra operations - Runebook.dev

self and mask tensors must have the same shape. Note. The returned sparse ... mat1 need to have sparse_dim = 2 . This function also supports backward for both ...

See details ›

22. Ascend/pytorch - Gitee

input tensors of copy_memory_ should have same dtype. input tensors of ... npu_bmmV2(mat1, mat2, []) >>> res.shape torch.Size([10, 3, 5]). fast_gelu(self) ...
Ascend PyTorch adapter

See details ›

23. Introduction-to-PyTorch-reading-notes

Mar 27, 2023 · RuntimeError: mat1 and mat2 must have the same dtype. 我遇到的问题应该是格式不统一，比如使用 .float() 方法，或者在创建tensor 的时候指定dtype。
Introduction to PyTorch 阅读笔记 Datetime: 2023-03-24T20:45+08:00 Categories: Python | MachineLearning 兜兜转转还是来到了 DL 的大门前写了一半，结果不知道为什么都没了，明明 vscode autosa

See details ›

24. torch.sparse — PyTorch master documentation

self and mask tensors must have the same shape. Note. The returned sparse ... mat1 need to have sparse_dim = 2 . Note that the gradients of mat1 is a ...
Shortcuts

See details ›