Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

MXNet silently produces bad results (all zeroes) when allocating NDArray larger than 2^32 in size via random_normal #14939

Closed
jschmitz28 opened this issue May 13, 2019 · 2 comments

Comments

@jschmitz28
Copy link

Description

MXNet silently produces bad results (all zeroes) when allocating NDArray larger than 2^32 in size via random_normal().

Environment info (Required)

Base deep learning AMI on AWS: ami-01ac4e28da63bac3c
[ec2-user@ip-10-2-68-132 ~]$ source activate mxnet_p36
(mxnet_p36) [ec2-user@ip-10-2-68-132 ~]$ python diagnose.py
----------Python Info----------
Version : 3.6.5
Compiler : GCC 7.2.0
Build : ('default', 'Apr 29 2018 16:14:56')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 10.0.1
Directory : /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version : 1.4.0
Directory : /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet
Commit Hash : a03d59e
----------System Info----------
Platform : Linux-4.14.104-78.84.amzn1.x86_64-x86_64-with-glibc2.9
system : Linux
node : ip-10-2-68-132
release : 4.14.104-78.84.amzn1.x86_64
version : #1 SMP Mon Mar 4 19:19:37 UTC 2019
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 2702.223
BogoMIPS: 4600.12
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-7
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0023 sec, LOAD: 0.8111 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0007 sec, LOAD: 0.0224 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0006 sec, LOAD: 0.3334 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0007 sec, LOAD: 0.1205 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0020 sec, LOAD: 0.0719 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0007 sec, LOAD: 0.0281 sec.

Error Message:

None (although I would prefer if there were an error compared to bad results)

Minimum reproducible example

source activate mxnet_p36 && python -c 'import mxnet; print(mxnet.nd.random_normal(shape=(42949672,50)))'

[[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]]
<NDArray 42949672x50 @cpu(0)>

Steps to reproduce

  1. Launch p3.2xlarge with base deep learning AMI
  2. source activate mxnet_p36 && python -c 'import mxnet; print(mxnet.nd.random_normal(shape=(42949672,50)))'
@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

@jschmitz28
Copy link
Author

Looks like a dupe of #13036

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants