-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-1185] Support large array in several operators (part 1) #13418
[MXNET-1185] Support large array in several operators (part 1) #13418
Conversation
The PR looks good to me. |
thanks for the great work. your code modifies a lot of code and the test doesn't cover all of them. import mxnet as mx
import numpy as np
long_dim = 50000000
arr = mx.nd.ones(shape=(long_dim, 200))
deg = mx.nd.ones(shape=(long_dim,1)) * 2
# Test broadcast div
res = arr/deg
assert np.sum(arr[-1].asnumpy() == 0.5) == arr.shape[1]
# Test element-wise
arr2 = mx.nd.ones(shape=(long_dim, 200))
res = arr + arr2
assert np.sum(res[-1].asnumpy() == 2) == arr.shape[1]
res = arr + 1
assert np.sum(res[-1].asnumpy() == 2) == arr.shape[1]
res = mx.nd.sqrt(arr + 3)
assert np.sum(res[-1].asnumpy() == 2) == arr.shape[1]
# Test reduce
assert mx.nd.sum(arr).asnumpy() == arr.shape[0] * arr.shape[1]
# Test dot
weight = mx.nd.ones(shape=(200, 100))
res = mx.nd.dot(arr, weight)
assert np.sum(res[-1].asnumpy() == 200) == weight.shape[1]
# Test FullyConnected
res = mx.nd.FullyConnected(arr, weight, num_hidden=weight.shape[1], no_bias=True)
assert np.sum(res[-1].asnumpy() == 200) == weight.shape[1]
# Test broadcast
range = mx.nd.arange(0, long_dim).reshape(long_dim, 1)
res = mx.nd.broadcast_to(range, shape=(range.shape[0], 200))
assert np.sum(res[-1].asnumpy() == long_dim) == res.shape[1]
res = mx.nd.broadcast_like(range, arr)
assert np.sum(res[-1].asnumpy() == long_dim) == arr.shape[1]
# Test clip
data = res
res = mx.nd.clip(data, a_min=100, a_max=1000)
assert np.sum(res[-1].asnumpy() == 1000) == arr.shape[1]
# Test take
idx = mx.nd.arange(long_dim-1000, long_dim)
res = mx.nd.take(arr, idx)
assert np.sum(res[-1].asnumpy() == 1) == res.shape[1]
# Test slice
res = mx.nd.slice(arr, begin=(long_dim-1000, 1), end=(long_dim, 100))
assert np.sum(res[-1].asnumpy() == 1) == res.shape[1]
# Test slice assign
res = arr.copy()
res[long_dim-1:long_dim] = 1000
assert np.sum(res[-1].asnumpy() == 1000) == arr.shape[1]
# Test expand_dims
res = mx.nd.expand_dims(arr, axis=1)
assert res.shape == (arr.shape[0], 1, arr.shape[1])
# Test squeeze
data = res
res = mx.nd.squeeze(data)
assert sum(res.shape == arr.shape) == 2 |
@mxnet-label-bot add [pr-awaiting-review, NDArray] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty useful. However, going forward to ensure consistency, could you guide us as to
when to use
size_t, index_t and auto (instead of the usual int)
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. How are the test coming? If you need help with the test creation, can you point out a group of things needing testing and i'll help out.
@zheng-da I have fixed the bug in broadcast_div and added the tests you provided. Thanks for your review and testing! |
Could you fix this bug as well? thanks. def run_linear():
data = mx.nd.ones(shape=(50*1000*1000, 100))
linear = gluon.nn.Dense(100)
linear.initialize(ctx=mx.cpu(0))
res = linear(data)
res.wait_to_read() |
@zheng-da I don't see problem with this test. I have added it to the nightly test script. Please review. |
@ChaiBapchya Yes, I will document this in cwiki after this change is merged. In general, I found the Google C++ style guide to be a very good one to follow in programming: https://google.github.io/styleguide/cppguide.html#Integer_Types |
@apeforest i can't run the test for the dense layer in my C5.18x. |
do you build your code with mkldnn or without mkldnn? |
i can verify that the test code for dense works when mxnet isn't built with mkldnn. |
@pengzhao-intel raised some good concerns on #13036 about performance. Also from @wkcn tests I see div op performance affected between int32 and int64. Have we done any micro-benchmarks for the operators affected ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@anirudh2290 This PR only changes the indexing variable type from int32_t to int64_t, not the data type of each element. So the performance impact is different from what @wkcn test. I also did a test in broadcast_div operator in MXNet and found no obvious runtime change (int64_t version actually is slightly faster)
Master branch: 15.357867193222045 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. You have some good information in the conversation which I suggest to add to the comments section of the PR. It will make things clearer and easier for reviewers to catch up in the first place. I think you should mention: (1) This PR only changes the indexing variable type from int32_t to int64_t, not the data type of each element. No performance impact observed. (2) Provide a cwiki link where you will add guidance like: "In general, index_t should be used for indexing elements in the tensor. size_t is used for returning the size of object or total number of elements. auto is a keyword in C++11 which derives the data type based on the rhs."
@yuxihu Thanks for your nice suggestion. I have updated the PR comment section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. Looks good overall. Can we also add a unit test for scatter_nd.
@anirudh2290 Given the overhead of repeating CI, can we get this merged and add unit test for scatter_nd in an second PR? This PR is only part-1 of a larger PR anyways. Thanks. |
@anirudh2290 Agree with you. |
@pengzhao-intel I think the performance impact is minimal. Only the index type is changed. The operations on the index is basically add and comparison. All other computation is much more expensive. I double there is measurable performance difference. |
@zheng-da @apeforest I am fine with the current PR since it's only the first part of the changes. I have re-visited the proposal in here and I suggest to add more details and data, such as how CUDA and MKL-DNN supports INT64 type and any potential impacts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a TODO (JIRA) for the remaining test. Also, makes sense to bring this proposal up for discussion on the dev list.
Thank god the CI tests finally passed! |
…ile (#13478) * updated to v1.5.0 * Bumped minor version from 1.4.0 to 1.5.0 on master * added Anirudh as maintainer for R package ... adding something useful and re-trigger PR check * Updated license file for clojure, onnx-tensorrt, gtest, R-package * Get the correct include path in pip package (#13452) * add find_include_path API * address reviewer comment * change return type from list to string * add unit test * address reviewer comment * address reviewer comment * address reviewer comment * address reviewer comment * fix include path problem in pip package * add comment * fix lint error * address reviewer comment * address reviewer comment * Use ~/.ccache as default ccache directory so is not cache is not erased on reboot (#13431) * Skip flaky test #13446 (#13480) * Rewrite dataloader with process pool, improves responsiveness and reliability (#13447) * fix recordio.py * rewrite dataloader with pool * fix batch as tuple * fix prefetching * fix pylint * picklable function * use pickle * add missing commit * Fix errors in docstrings for subgraph op; use code directive (#13463) * [MXNET-1158] JVM Memory Management Documentation (#13105) * update train_mnist * Add documentation for JVM Memory Management * update doc * address nit picks * address nit picks * Grammar and clarity edits for memory management doc * Edits for scala memory management * Update memory-management.md * Update memory-management.md * Update memory-management.md * capitalization fix * Update row_sparse tutorial (#13414) Update row_sparse tutorial * Add resiliency to onnx export code (#13426) * Added resiliency to onnx export code - With previous infer-shape implementation, if input shape was list instead of tuple or if extra non-existent parameters were provided, the code would still work. The fixes in this commit make sure that behavior is restored to prevent any compatibility issues with existing export code. * Fixed name of net in unittest * Fix pylint * [MXNET-1185] Support large array in several operators (part 1) (#13418) * fix a few operators with large arrays (# of elements) * fix bug in broadcast_div and add tests * address reviewer comment * add unit test * add empty line * retrigger CI * [MXNET-1210 ] Gluon Audio - Example (#13325) * Initialized the example * Addressed PR comments, about existing synset.txt file - no overwrite * RST - docstring issues fixed * added README * Addressed PR comments * Addressed PR comments, checking Divide by 0 * Raising error if format is not supported. * changed a line for ndarray of labels * Trigger CI * Trigger CI * PR comments addressed around skip_header argument * Addressed PR comments around librosa import * PR Comments * Passing lazy=lazy from argument * Added PR comments, labels to README.MD * Trigger CI * Addressing PR Comments in README * Modified README.md * Added example under audio folder * Retrigger CI * Retrigger CI * ONNX export: Instance normalization, Shape (#12920) * ONNX import/export: Make backend_rep common * ONNX export: Instance Normalization * ONNX export: Shape operator * Clarify dependency on OpenCV in CNN Visualization tutorial. (#13495) * clarify ops faq regarding docs strings (#13492) * Add graph_compact operator. (#13436) * add graph_compact. * fix. * add doc. * add tests for graph_compact. * address comments. * update docs. * trigger CI * Deprecate Jenkinsfile (#13474) * update github location for sampled_block.py (#13508) Updated to https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py * #13453 [Clojure] - Add Spec Validations to the Optimizer namespace (#13499) * ONNX export: Logical operators (#12852) * Fix cmake options parsing in dev_menu (#13458) Add GPU+MKLDNN unittests to dev_menu * Revert "Manually track num_max_thread (#12380)" (#13501) This reverts commit 7541021. * Feature/mkldnn static 2 (#13503) * build mkldnn as static lib * update makefile to statically build mkldnn * build static mkldnn * fix static name * fix static name * update static for mac * rename mkldnn dep in ci * remove moving mkldnn dynamic lib * remove commented code * remove mkldnn dnaymic for unitest * force static for mkldnn lib * remove dynamic mkldnn bind * only link windows * add mkldnn.mk * try force linking * remove mkldnn dynanmic check * remove test mkldnn install * fix spacing * fix index * add artifacts * add comment about windows * remove static * update makefile * fix toctree Sphinx errors (#13489) * fix toctree errors * nudging file for CI * Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (#13527) * [MXNET-1234] Fix shape inference problems in Activation backward (#13409) * Provide a failing test for ReLU activation shape inference bug * Fix Activation backward shape inference fixes: #13333 * Add softsign Activation to test_gluon.py * Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now * Don't disable MKLDNN
…e#13418) * fix a few operators with large arrays (# of elements) * fix bug in broadcast_div and add tests * address reviewer comment * add unit test * add empty line * retrigger CI
…ile (apache#13478) * updated to v1.5.0 * Bumped minor version from 1.4.0 to 1.5.0 on master * added Anirudh as maintainer for R package ... adding something useful and re-trigger PR check * Updated license file for clojure, onnx-tensorrt, gtest, R-package * Get the correct include path in pip package (apache#13452) * add find_include_path API * address reviewer comment * change return type from list to string * add unit test * address reviewer comment * address reviewer comment * address reviewer comment * address reviewer comment * fix include path problem in pip package * add comment * fix lint error * address reviewer comment * address reviewer comment * Use ~/.ccache as default ccache directory so is not cache is not erased on reboot (apache#13431) * Skip flaky test apache#13446 (apache#13480) * Rewrite dataloader with process pool, improves responsiveness and reliability (apache#13447) * fix recordio.py * rewrite dataloader with pool * fix batch as tuple * fix prefetching * fix pylint * picklable function * use pickle * add missing commit * Fix errors in docstrings for subgraph op; use code directive (apache#13463) * [MXNET-1158] JVM Memory Management Documentation (apache#13105) * update train_mnist * Add documentation for JVM Memory Management * update doc * address nit picks * address nit picks * Grammar and clarity edits for memory management doc * Edits for scala memory management * Update memory-management.md * Update memory-management.md * Update memory-management.md * capitalization fix * Update row_sparse tutorial (apache#13414) Update row_sparse tutorial * Add resiliency to onnx export code (apache#13426) * Added resiliency to onnx export code - With previous infer-shape implementation, if input shape was list instead of tuple or if extra non-existent parameters were provided, the code would still work. The fixes in this commit make sure that behavior is restored to prevent any compatibility issues with existing export code. * Fixed name of net in unittest * Fix pylint * [MXNET-1185] Support large array in several operators (part 1) (apache#13418) * fix a few operators with large arrays (# of elements) * fix bug in broadcast_div and add tests * address reviewer comment * add unit test * add empty line * retrigger CI * [MXNET-1210 ] Gluon Audio - Example (apache#13325) * Initialized the example * Addressed PR comments, about existing synset.txt file - no overwrite * RST - docstring issues fixed * added README * Addressed PR comments * Addressed PR comments, checking Divide by 0 * Raising error if format is not supported. * changed a line for ndarray of labels * Trigger CI * Trigger CI * PR comments addressed around skip_header argument * Addressed PR comments around librosa import * PR Comments * Passing lazy=lazy from argument * Added PR comments, labels to README.MD * Trigger CI * Addressing PR Comments in README * Modified README.md * Added example under audio folder * Retrigger CI * Retrigger CI * ONNX export: Instance normalization, Shape (apache#12920) * ONNX import/export: Make backend_rep common * ONNX export: Instance Normalization * ONNX export: Shape operator * Clarify dependency on OpenCV in CNN Visualization tutorial. (apache#13495) * clarify ops faq regarding docs strings (apache#13492) * Add graph_compact operator. (apache#13436) * add graph_compact. * fix. * add doc. * add tests for graph_compact. * address comments. * update docs. * trigger CI * Deprecate Jenkinsfile (apache#13474) * update github location for sampled_block.py (apache#13508) Updated to https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py * apache#13453 [Clojure] - Add Spec Validations to the Optimizer namespace (apache#13499) * ONNX export: Logical operators (apache#12852) * Fix cmake options parsing in dev_menu (apache#13458) Add GPU+MKLDNN unittests to dev_menu * Revert "Manually track num_max_thread (apache#12380)" (apache#13501) This reverts commit 7541021. * Feature/mkldnn static 2 (apache#13503) * build mkldnn as static lib * update makefile to statically build mkldnn * build static mkldnn * fix static name * fix static name * update static for mac * rename mkldnn dep in ci * remove moving mkldnn dynamic lib * remove commented code * remove mkldnn dnaymic for unitest * force static for mkldnn lib * remove dynamic mkldnn bind * only link windows * add mkldnn.mk * try force linking * remove mkldnn dynanmic check * remove test mkldnn install * fix spacing * fix index * add artifacts * add comment about windows * remove static * update makefile * fix toctree Sphinx errors (apache#13489) * fix toctree errors * nudging file for CI * Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (apache#13527) * [MXNET-1234] Fix shape inference problems in Activation backward (apache#13409) * Provide a failing test for ReLU activation shape inference bug * Fix Activation backward shape inference fixes: apache#13333 * Add softsign Activation to test_gluon.py * Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now * Don't disable MKLDNN
Description
This PR fixed the large array issue (#13036, #13070) in the following operators:
ndarray.ones
ndarray.zeros
ndarray.sum
ndarray.slice
ndarray.random.uniform
ndarray.empty
This PR is only the first effort that addresses some limitations in some basic operators. More tests and fix are coming as we identify more operators. This is part of a large project tracked in JIRA by https://issues.apache.org/jira/browse/MXNET-1184
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments
This PR only supports the total number of elements in the array to go beyond 2 billion. The max shape size in any of the dimension is still limited to 2^31 because this PR does not update the C API. Due to the large change required in all language bindings, we decided to break the original PR (#13191) into several smaller parts. And this PR is part 1.
This PR only changes the indexing variable type from int32_t to int64_t, not the data type of each element. No performance impact observed. For future guidance of data types to choose, please refer to https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support