libc getenv is not threadsafe #13438

samskalicky · 2018-11-28T16:26:33Z

Description

getenv() calls in libc are not threadsafe according to:

https://rachelbythebay.com/w/2017/01/30/env/
and
OpenMathLib/OpenBLAS#716

There are indirect calls to dmlc::GetEnv() all across the mxnet codebase, here are a few:

https://github.com/apache/incubator-mxnet/blob/266de6bef4da5769431557288d41fab2a02e52ca/src/engine/threaded_engine_perdevice.cc#L79
or
https://github.com/apache/incubator-mxnet/blob/5a83b6b563211f430688e41eab4752c6de4ecf22/src/executor/graph_executor.cc#L1194

Error Message:

/lib64/libc.so.6(+0x35250) [0x7fdc5b99b250]
/lib64/libc.so.6(getenv+0xad) [0x7fdc5b99e0cd]
/opt/amazon/lib/libmxnet.so(ZN4dmlc6GetEnvIbEET_PKcS1+0x1b) [0x7fdc4a1bedab]
/opt/amazon/lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor10InitOpSegsEv+0x1cb) [0x7fdc4a1b6e0b]
/opt/amazon/lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor15FinishInitGraphEN4nnvm6SymbolENS2_5GraphEPNS_8ExecutorERKSt13unordered_mapINS2_9NodeEntryENS_7NDArrayENS2_13NodeEntryHashENS2_14NodeEntryEqualESaISt4pairIKS8_S9_EEE+0x71b) [0x7fdc4a1b760b]
/opt/amazon/lib/libmxnet.so(_ZN5mxnet4exec13GraphExecutor4InitEN4nnvm6SymbolERKNS_7ContextERKSt3mapISsS4_St4lessISsESaISt4pairIKSsS4_EEERKSt6vectorIS4_SaIS4_EESL_SL_RKSt13unordered_mapISsNS2_6TShapeESt4hashISsESt8equal_toISsESaISA_ISB_SN_EEERKSM_ISsiSP_SR_SaISA_ISB_iEEES11_RKSH_INS_9OpReqType
ESaIS12_EERKSt13unordered_setISsSP_SR_SaISsEEPSH_INS_7NDArrayESaIS1C_EES1F_S1F_PSM_ISsS1C_SP_SR_SaISA_ISB_S1C_EEEPNS_8ExecutorERKSM_INS2_9NodeEntryES1C_NS2_13NodeEntryHashENS2_14NodeEntryEqualESaISA_IKS1M_S1C_EEE+0x75d) [0x7fdc4a1b958d]
/opt/amazon/lib/libmxnet.so(ZN5mxnet8Executor10SimpleBindEN4nnvm6SymbolERKNS_7ContextERKSt3mapISsS3_St4lessISsESaISt4pairIKSsS3_EEERKSt6vectorIS3_SaIS3_EESK_SK_RKSt13unordered_mapISsNS1_6TShapeESt4hashISsESt8equal_toISsESaIS9_ISA_SM_EEERKSL_ISsiSO_SQ_SaIS9_ISA_iEEES10_RKSG_INS_9OpReqTypeESaI
S11_EERKSt13unordered_setISsSO_SQ_SaISsEEPSG_INS_7NDArrayESaIS1B_EES1E_S1E_PSL_ISsS1B_SO_SQ_SaIS9_ISA_S1B_EEEPS0+0x1a6) [0x7fdc4a1b9f46]
/opt/amazon/lib/libmxnet.so(MXExecutorSimpleBind+0x1e38) [0x7fdc4a1031a8]
/opt/amazon/python2.7/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7fdc5af9b858]

Minimum reproducible example

TODO

Steps to reproduce

TODO

vrakesh · 2018-11-28T20:40:19Z

@samskalicky Thanks for identifying the thread safety issue

vrakesh · 2018-11-28T20:52:19Z

@mxnet-label-bot add [Backend, Bug]

samskalicky · 2018-11-28T21:34:00Z

I found a set/get in MXNet for MXNET_CPU_WORKER_NTHREADS:

So if we're running multiple programs using MXNet on the same machine, its possible that one will be setting MXNET_CPU_WORKER_NTHREADS when the other is getting MXNET_CPU_WORKER_NTHREADS and since they both use the same libc, we're going to hit this issue.

samskalicky · 2018-11-28T21:35:49Z

FYI @KellenSunderland and @larroy

larroy · 2018-11-28T22:10:17Z

@samskalicky I don't think the problem is running different processes, one process shouldn't affect the other in terms of environment and per separation of the process space. The problem is using getenv / setenv from the same process in different threads, as is the case from the code you linked.

larroy · 2018-11-28T22:10:46Z

Could you decode the stack trace via c++filt?

larroy · 2018-11-28T22:33:03Z

Looks like the evidence points to where we suspected, initialize.cc We should rework that code.

larroy · 2018-11-28T22:35:00Z

/lib64/libc.so.6(+0x35250) [0x7fdc5b99b250]
/lib64/libc.so.6(getenv+0xad) [0x7fdc5b99e0cd]
/opt/amazon/lib/libmxnet.so(ZN4dmlc6GetEnvIbEET_PKcS1+0x1b) [0x7fdc4a1bedab]
/opt/amazon/lib/libmxnet.so(mxnet::exec::GraphExecutor::InitOpSegs()+0x1cb) [0x7fdc4a1b6e0b]
/opt/amazon/lib/libmxnet.so(mxnet::exec::GraphExecutor::FinishInitGraph(nnvm::Symbol, nnvm::Graph, mxnet::Executor*, std::unordered_map<nnvm::NodeEntry, mxnet::NDArray, nnvm::NodeEntryHash, nnvm::NodeEntryEqual, std::allocator<std::pair<nnvm::NodeEntry const, mxnet::NDArray> > > const&)+0x71b) [0x7fdc4a1b760b]
/opt/amazon/lib/libmxnet.so(mxnet::exec::GraphExecutor::Init(nnvm::Symbol, mxnet::Context const&, std::map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, mxnet::Context, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mxnet::Context> > > const&, std::vector<mxnet::Context, std::allocator<mxnet::Context> > const&, std::vector<mxnet::Context, std::allocator<mxnet::Context> > const&, std::vector<mxnet::Context, std::allocator<mxnet::Context> > const&, std::unordered_map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, nnvm::TShape, std::hash<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, nnvm::TShape> > > const&, std::unordered_map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > const&, std::unordered_map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::unordered_set<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> >*, std::unordered_map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, mxnet::NDArray, std::hash<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, mxnet::NDArray> > >*, mxnet::Executor*, std::unordered_map<nnvm::NodeEntry, mxnet::NDArray, nnvm::NodeEntryHash, nnvm::NodeEntryEqual, std::allocator<std::pair<nnvm::NodeEntry const, mxnet::NDArray> > > const&)+0x75d)
[0x7fdc4a1b958d] /opt/amazon/lib/libmxnet.so(ZN5mxnet8Executor10SimpleBindEN4nnvm6SymbolERKNS_7ContextERKSt3mapISsS3_St4lessISsESaISt4pairIKSsS3_EEERKSt6vectorIS3_SaIS3_EESK_SK_RKSt13unordered_mapISsNS1_6TShapeESt4hashISsESt8equal_toISsESaIS9_ISA_SM_EEERKSL_ISsiSO_SQ_SaIS9_ISA_iEEES10_RKSG_INS_9OpReqTypeESaIS11_EERKSt13unordered_setISsSO_SQ_SaISsEEPSG_INS_7NDArrayESaIS1B_EES1E_S1E_PSL_ISsS1B_SO_SQ_SaIS9_ISA_S1B_EEEPS0+0x1a6) [0x7fdc4a1b9f46]
/opt/amazon/lib/libmxnet.so(MXExecutorSimpleBind+0x1e38) [0x7fdc4a1031a8]
/opt/amazon/python2.7/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7fdc5af9b858]

KellenSunderland · 2018-11-29T16:26:37Z

What if we set a process wide mutex here: https://github.com/dmlc/dmlc-core/blob/600bc28bc476a1b7866cfe9f444c91b6d49760b7/include/dmlc/parameter.h#L1062

and here: https://github.com/dmlc/dmlc-core/blob/600bc28bc476a1b7866cfe9f444c91b6d49760b7/include/dmlc/parameter.h#L1076

Should fix the issue right? No deadlocks? We'd have to make sure the mutex wasn't threadpool scoped as it's getting set in a new threadpool

larroy · 2018-11-29T16:36:15Z

as you said, what about libraries like openmp which might access the environment variable without a lock?. We could try monkey patching getenv with our own wrapper at link time or runtime. Best is not to use setenv after we have multiple threads running.

larroy · 2018-11-30T11:56:00Z

I don't think we need any mutex. We can just pass any config to the thread as data if we need. Definitely we shouldn't be using the environment to config workers.

samskalicky · 2018-12-03T18:11:15Z

In an attempt to reproduce the issue (assuming its coming from simultaneous processes initializing at the same time) I created a cpp engine unit test by modifying tests/cpp/engine/threaded_engine_test.cc and adding the following two functions:

void handler(int sig) {
  void *array[10];
  size_t size;

  // get void*'s for all entries on the stack                                                                                                                        
  size = backtrace(array, 10);

  // print out all the frames to stderr                                                                                                                              
  fprintf(stderr, "Error: signal %d:\n", sig);
  backtrace_symbols_fd(array, size, STDERR_FILENO);
  exit(1);
}

TEST(Engine, envsegfault) {
  signal(SIGSEGV, handler);
 mxnet::Engine* engine = mxnet::engine::CreateThreadedEnginePerDevice();

 for (int i = 0; i < 10000; ++i) {
   engine->Stop();
   engine->Start();
 }
}

The handler attempts to capture the segfault and print the stack trace, and the envsegfault test creates a threaded engine object and repeatedly stops and starts it to trigger the SetEnv call in initialize.cc and GetEnv call in threaded_engine_perdevice.cc.

Then I wrote this python script to run multiple independent processes doing this:

import multiprocessing
import time
import os

def mxnet_worker():
    for i in range(100):
        ret = os.system('./mxnet_unit_tests  --gtest_filter=Engine.envsegfault')
        if ret != 0:
            print('Failed@@@@@@@@@@@@@@@@')
            exit(2)


read_process = [multiprocessing.Process(target=mxnet_worker) for i in range(10)]
for p in read_process:
    p.daemon = True
    p.start()

for p in read_process:
    p.join()

I was able to trigger a segfault, but I hadnt added the handler yet to print the stack trace. I need to run again and try to reproduce again.

larroy · 2018-12-03T18:24:37Z

Do you think the multiple process have any impact? Env is not shared across processes.

larroy · 2018-12-05T18:25:14Z

process != thread. In this case I think you mean thread.

Other than that I agree with your analysis.

I think we need to understand if not using this variable as in #13472 makes unwanted threads to be spawn inside the engine workers, because it might actually not be the case.

Depending on that, we need to see if we need to use a different mechanism in the engine to control OMP threads, then run performance regression tests.

As you see from my PR, not changing the environment is functionally correct, but I don't know if it has performance impact because of the reason explained above.

larroy · 2018-12-05T18:26:43Z

I would suggest using an atomic counter in pthread_atfork that is incremented whenever we create a thread. In this way see how many threads we are creating and printing it out and comparing two runs with, and without the variables that I removed to see if there's any difference, that would be a quick check.

samskalicky · 2018-12-05T18:31:18Z

Heres the relevant code from the version of libc (2.17) that was being used in the failure:
getenv.c: https://pastebin.com/gj4XZgdE
setenv.c: https://pastebin.com/NuTfDdDp

It looks like the issue is the __environ variable being iterated over in lines 15, 32, 55. This variable is defined in setenv.c:

#if !_LIBC
# define __environ      environ
# ifndef HAVE_ENVIRON_DECL
extern char **environ;
# endif
#endif

So there is this nasty global pointer to the environment. While setenv has a lock around changing the environment, theres no lock for getenv so (as mentioned in the block post linked in the description) :

"this is a char**, so it's supposed to have a bunch of pointers to char* buffers, one for each var=val pairing. What if that region has been reused and now contains a bogus pointer into the weeds? Yep, segfault."

So I think @KellenSunderland's suggestion of adding a lock/mutex in dmls parameter.h's GetEnv and SetEnv should prevent any failure due to MXNet's various threads accessing the environment simultaneously. Although without a reproducible test case we're unable to validate this.

Anybody have any ideas for reproducing something similar to this code from the blog post with mxnet's engine tests so that we have a reproducible test case?

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
 
static void* worker(void* arg) {
  for (;;) {
    int i;
    char var[256], *p = var;
 
    for (i = 0; i < 8; ++i) {
      *p++ = 65 + (random() % 26);
    }
 
    *p++ = '\0';
 
    setenv(var, "test", 1);
  }
 
  return NULL;
}
 
int main() {
  pthread_t t;
 
  printf("start\n");
  setenv("foo", "bar", 0);
  pthread_create(&t, NULL, worker, 0);
 
  for (;;) {
    getenv("foo");
  }
 
  return 0;
}

I tried simplifying the start_stop test https://github.com/apache/incubator-mxnet/blob/master/tests/cpp/engine/threaded_engine_test.cc#L124 for just the threaded_engine_perdevice and stop/started it a few hundred times and it didnt cause a segfault.

larroy · 2018-12-05T18:45:45Z

We should not use locking in the handlers in pthread_atfork, it will likely have negative impacts in terms of lock contention and performance and architecturally is not a good solution. We should find a solution using good architecture to pass the number of threads in the engine. threads get a data pointer where you can put configuration or pass any other data at creation time.

samskalicky · 2018-12-05T19:08:54Z

@larroy We were already using locks inside the handlers. setenv uses a lock in libc around modifying the environment. This doesnt change anything.

Are you suggesting that we dont use an EnvVar to pass the OMP_NUM_THREADS=1 to subsequent processes? What do you think the impact will be the other processes that are using MXNet using more threads?

Was there a conscious decision to use an EnvVar to limit OMP_NUM_THREADS in subsequent processes?

anirudh2290 · 2018-12-06T04:08:55Z

The problem as provided in the article linked in this issue and related article here: https://rachelbythebay.com/w/2017/01/30/env/ is that if the main thread spawns another thread, which calls setenv and while we call setenv the process is forked, the mutex is currently in locked state in the child process and it will never be unlocked since there is no thread to release the lock which causes it to hang.
This can be replicated in MXNet in the following way. Pull the code from https://github.com/anirudh2290/mxnet/tree/setenv_issue and build it similar to the following:

cd build && cmake VERBOSE=1 -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_MKLDNN=ON -DUSE_OPENMP=ON -DUSE_OPENCV=OFF -DCMAKE_BUILD_TYPE=Debug -GNinja ..

Run the following script:

  import multiprocessing
  import os
  import sys
  import mxnet as mx

  def mxnet_worker():
       print 'inside mxnet_worker'

  mx.base._LIB.MXStartBackgroundThread(mx.base.c_str("dummy"))
  read_process = [multiprocessing.Process(target=mxnet_worker) for i in range(8)]
  for p in read_process:
      p.daemon = True
      p.start()
      p.join()

Now run the script, you will be able to see the process hangs.
When I attach gdb to the process I see the following:

#0  __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1  0x00007fc0fabab99c in __add_to_environ (name=0x7fc093a935fc "MXNET_CPU_WORKER_NTHREADS", value=0x7fffec2eff10 "1", combined=0x0,
    replace=1) at setenv.c:133

which means it is stuck trying to acquire the lock: https://github.com/lattera/glibc/blob/master/stdlib/setenv.c#L133

I checked the mxnet codebase to see if we are calling SetEnv anywhere else and we dont seem to be calling it anywhere except here. Also, pthread_at_fork statement calls Engine::Get()->Stop() which would mean that all engine threads are suspended. It is still possible that it could be called from other multithreaded code in MXNet iterators for example, but I couldnt find it and it is unlikely that we are not using dmlc::SetEnv but something else to set env vars for mxnet or dmlc-core code. I think it is more likely that the customer application spawned a thread, which called SetEnv at the same time pthread_at_fork was called which let to this behavior.

larroy · 2018-12-06T12:27:59Z

Good analysis @anirudh2290 agree with it and makes sense. I would suggest we don't use environment for this and have a propper configuration object that we can pass when needed, or in the worst case, can be a singleton, what do you think?

anirudh2290 · 2018-12-06T22:19:23Z

@larroy Can you elaborate more on the singleton solution . For example even if its a singleton its still possible that we may fork during setenv call inside a thread and run across the same issue. Removing env variables and changing it to config would be a breaking change. I dont think we can do it before 2.0.
Maybe we should understand more on the customer application and customer issue and come up with a set of guidelines when using multiprocessing with mxnet and document it for now.

Vikas-kum · 2018-12-11T23:03:01Z

@anirudh2290 good one!
I think this is problem in general. For this particular case we can try to use fork handlers:
pthread_at_fork(prepare, parent, child)

in prepare method, we should try to set bogus env variable, which will unlock the lock and then child will get that lock in unlocked state.
Something like dmlc::setEnv("Bogus", "Bogus") here: https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L54

anirudh2290 · 2018-12-12T08:38:30Z

Thanks for the suggestion @Vikas89 . I think with your suggested change, it may be less likely to be encountered, but it is possible that mutex is in locked state in child process, if the thread calls setenv after the prepare step and before fork.

Vikas-kum · 2018-12-12T18:58:00Z

If there is only 2 places that too after fork where we setEnv(initialize.cc), why are we setting env ? Can we just keep these values in memory or in some file tied to this mxnet processID ?

samskalicky · 2018-12-13T19:34:33Z

I was able to run the user's test script and reproduce in GDB. Looks like its failing when calling getenv("MXNET_EXEC_BULK_EXEC_INFERENCE"). This only seems to happen here:

https://github.com/apache/incubator-mxnet/blob/5a83b6b563211f430688e41eab4752c6de4ecf22/src/executor/graph_executor.cc#L1186-L1209

This matches the original stack trace where we found it failing in GraphExecutor::InitOpSegs

samskalicky · 2018-12-17T21:33:25Z

So far, we've been able to verify that compiling mxnet without openmp does not trigger the segfault. This validates that the problem is between one thread in mxnet running openmp and another thread in mxnet calling getenv.

Thus the current workaround (to avoid the segfault) is to recompile mxnet without openmp. This will cause performance degredation for any operators that used to run using openmp (ie. those that are not using BLAS/LAPACK routines or MKLDNN).

samskalicky · 2018-12-17T21:40:08Z

@Vikas89 and @anirudh2290 have discussed a couple of options for implementing a permanent solution:

Moving all get/setenv calls to the beginning of mxnet initialization, before any openmp or any other dependent library is initialized. This would clearly prevent the problem by moving the getenv to before openmp runs. But then this may cause a breaking change with how developers are using mxnet to change its behavior using environment variables at runtime. If we only sample env vars at init time this will no longer work.
We could write our own thread-safe getenv/setenv functions and implement them across mxnet and openmp (since we're compiling it from source when we build mxnet). This would eliminate the problem, but would be harder to maintain as we would have to constantly patch openmp
We could create APIs to enable the behavior that users currently have when setting env vars, and with the change described in Add some ops #1 have the same functionality albeit with a different use-model.

None of these seem like the best options, any other comments or thoughts?

anirudh2290 · 2018-12-18T08:54:34Z

I think to arrive at the right solution need to understand more on the openmp behavior and why it calls the library initialize within an operator execution when an omp for loop is called. Need to also confirm if this happens only during forking but also outside forking and what specific openmp versions the customer has and if this can be reproduced on other openmp versions.

KellenSunderland · 2018-12-31T23:13:44Z

Good analysis. Has anyone checked to see this problem is reproducible with all versions of OMP? I'm wondering if another workaround that may not have a perf impact would be to use a different implementation than what is used by default.

KellenSunderland · 2018-12-31T23:17:53Z

Regarding option 1. I actually think this is a good design, but several tests are currently relying on the fact that we can change behaviour via env var changes during runtime. However, we could probably fix this be reinit'ing the library whenever we set an env var in a test. I think we should avoid reading env vars at runtime in any case for performance reasons, so it makes sense to read these once at startup and not again after.

larroy · 2019-01-03T13:46:53Z

I would go for option 1.

samskalicky · 2019-01-03T15:44:44Z

Thanks @larroy @KellenSunderland @anirudh2290, just found out that the problem was due to using Intel OpenMP rather than the LLVM OpenMP submodule. Upon switching back to using LLVM OpenMP the problem did not occur. I think the problem is effectively avoided since we're not advocating for using Intel OpenMP.

@TaoLv @pengzhao-intel any interest in seeing MXNet switch to Intel OpenMP and investigating this issue? Are there any perceived benefits?

pengzhao-intel · 2019-01-04T04:28:28Z

thanks @samskalicky It's a long thread, we need to warm up and understand the background :)

@TaoLv could you help take a look for the questions?

TaoLv · 2019-01-04T08:04:47Z

@samskalicky Can you explain more about how to use Intel OMP in MXNet and how to switch back to LLVM OMP? I think LLVM OMP is using the same runtime as Intel OMP.

samskalicky · 2019-01-04T17:39:09Z

@TaoLv according to this page: https://software.intel.com/en-us/forums/intel-c-compiler/topic/793552 they should be the same but may differ slightly as they have different release schedules. So this is a possibility that the functionality differs between a particular versions of LLVM/Intel OMP.

As for how to use it, I didnt do the building myself but I imagine that those who did simply modified the makefile to link against the Intel OMP binary rather than building the LLVM OMP submodule. Unfortunately I dont have step-by-step instructions for you.

I was just hoping to see what your thoughts were on the current way MXNet is architected in regards to OMP and if what you thought could be improved (possibly by using Intel OMP). And if you had any interest in seeing MXNet switch to Intel OMP and what any benefits might be. Of course we'd still need to maintain a build without Intel OMP for other architectures.

TaoLv · 2019-01-06T02:22:15Z

@samskalicky Yes. I would expect ICC + Intel OMP should have better performance and I know there is an effort ongoing for enabling them for MXNet. Currently MXNet cannot be successfully built with ICC.

Fixes apache#13438 Fixes apache#14979

…cal concurrency crashes. (#15762) * Refactor LibraryInitializer so it's thread safe. Fixes #13438 Fixes #14979 * Refactor around lib loading * Fix lint * CR * Add option to choose between OMP implementations * Fix bug * Fix from CR

…cal concurrency crashes. (apache#15762) * Refactor LibraryInitializer so it's thread safe. Fixes apache#13438 Fixes apache#14979 * Refactor around lib loading * Fix lint * CR * Add option to choose between OMP implementations * Fix bug * Fix from CR

marcoabreu added BLAS and removed BLAS labels Nov 28, 2018

marcoabreu added Backend Issues related to the backend of MXNet Bug labels Nov 28, 2018

larroy pushed a commit to larroy/mxnet that referenced this issue Nov 30, 2018

Comment out dmlc::SetEnv in pthread_atfork apache#13438

fbb3c15

larroy mentioned this issue Nov 30, 2018

[WIP][Don't merge] Comment out dmlc::SetEnv in pthread_atfork #13438 #13472

Closed

5 tasks

larroy pushed a commit to larroy/mxnet that referenced this issue Nov 30, 2018

Comment out dmlc::SetEnv in pthread_atfork apache#13438

e10e845

larroy pushed a commit to larroy/mxnet that referenced this issue Dec 4, 2018

Comment out dmlc::SetEnv in pthread_atfork apache#13438

33fc658

larroy pushed a commit to larroy/mxnet that referenced this issue Feb 13, 2019

Comment out dmlc::SetEnv in pthread_atfork apache#13438

ca467e7

larroy added a commit to larroy/mxnet that referenced this issue Aug 6, 2019

Refactor LibraryInitializer so it's thread safe.

74b2282

Fixes apache#13438 Fixes apache#14979

larroy added a commit to larroy/mxnet that referenced this issue Aug 6, 2019

Refactor LibraryInitializer so it's thread safe.

34be4c7

Fixes apache#13438 Fixes apache#14979

larroy added a commit to larroy/mxnet that referenced this issue Aug 6, 2019

Refactor LibraryInitializer so it's thread safe.

1430a94

Fixes apache#13438 Fixes apache#14979

larroy added a commit to larroy/mxnet that referenced this issue Aug 6, 2019

Refactor LibraryInitializer so it's thread safe.

ff4efd9

Fixes apache#13438 Fixes apache#14979

larroy mentioned this issue Aug 6, 2019

Refactor LibraryInitializer so it's thread safe. Fixes random sporadical concurrency crashes. #15762

Merged

5 tasks

larroy added a commit to larroy/mxnet that referenced this issue Aug 6, 2019

Refactor LibraryInitializer so it's thread safe.

27fb2ae

Fixes apache#13438 Fixes apache#14979

larroy added a commit to larroy/mxnet that referenced this issue Aug 6, 2019

Refactor LibraryInitializer so it's thread safe.

d630731

Fixes apache#13438 Fixes apache#14979

larroy added a commit to larroy/mxnet that referenced this issue Aug 8, 2019

Refactor LibraryInitializer so it's thread safe.

51c4091

Fixes apache#13438 Fixes apache#14979

larroy added a commit to larroy/mxnet that referenced this issue Aug 8, 2019

Refactor LibraryInitializer so it's thread safe.

b2ae2d2

Fixes apache#13438 Fixes apache#14979

zachgk closed this as completed in #15762 Aug 9, 2019

luotao1 mentioned this issue Aug 30, 2019

mkl random failed on CI machine PaddlePaddle/Paddle#15791

Closed

libc getenv is not threadsafe #13438

libc getenv is not threadsafe #13438

Comments

samskalicky commented Nov 28, 2018

Description

Error Message:

Minimum reproducible example

Steps to reproduce

vrakesh commented Nov 28, 2018

vrakesh commented Nov 28, 2018

samskalicky commented Nov 28, 2018 • edited Loading

samskalicky commented Nov 28, 2018

larroy commented Nov 28, 2018

larroy commented Nov 28, 2018

larroy commented Nov 28, 2018

larroy commented Nov 28, 2018

KellenSunderland commented Nov 29, 2018

larroy commented Nov 29, 2018 • edited Loading

larroy commented Nov 30, 2018

samskalicky commented Dec 3, 2018

larroy commented Dec 3, 2018

larroy commented Dec 5, 2018

larroy commented Dec 5, 2018

samskalicky commented Dec 5, 2018

larroy commented Dec 5, 2018 • edited Loading

samskalicky commented Dec 5, 2018 • edited Loading

anirudh2290 commented Dec 6, 2018

larroy commented Dec 6, 2018

anirudh2290 commented Dec 6, 2018

Vikas-kum commented Dec 11, 2018

anirudh2290 commented Dec 12, 2018

Vikas-kum commented Dec 12, 2018

samskalicky commented Dec 13, 2018 • edited Loading

samskalicky commented Dec 17, 2018

samskalicky commented Dec 17, 2018 • edited Loading

anirudh2290 commented Dec 18, 2018

KellenSunderland commented Dec 31, 2018

KellenSunderland commented Dec 31, 2018 • edited Loading

larroy commented Jan 3, 2019

samskalicky commented Jan 3, 2019

pengzhao-intel commented Jan 4, 2019

TaoLv commented Jan 4, 2019

samskalicky commented Jan 4, 2019

TaoLv commented Jan 6, 2019

samskalicky commented Nov 28, 2018 •

edited

Loading

larroy commented Nov 29, 2018 •

edited

Loading

larroy commented Dec 5, 2018 •

edited

Loading

samskalicky commented Dec 5, 2018 •

edited

Loading

samskalicky commented Dec 13, 2018 •

edited

Loading

samskalicky commented Dec 17, 2018 •

edited

Loading

KellenSunderland commented Dec 31, 2018 •

edited

Loading