Parallelize convolutional neural network #2341

adiwajshing · 2020-03-26T17:43:44Z

Hello all

I was playing around with CNNs in mlpack on the MNIST data set and for some reason, only 1 core of my CPU was being used while training. When I looked into the code, the code for the Convolutional layers & Pooling layers was not parallelize and so did exactly that using OpenMP. I also made a few incremental improvements to the NaiveConvolution class & the Convolution class. I have also successfully run the test suite on my changes.

With the changes, on my dual core MacBook Air, I could train in about ~55% the time it was taking earlier. Here are some results:

Without the parallelization:
Reading data ...
Training ...
Epoch 0: Training Accuracy = 9.44709%, Validation Accuracy = 9.40476%time taken: 80s
Epoch 1: Training Accuracy = 15.0106%, Validation Accuracy = 14.9286%time taken: 85s
Epoch 2: Training Accuracy = 19.3148%, Validation Accuracy = 19.119%time taken: 107s
Epoch 3: Training Accuracy = 24.0661%, Validation Accuracy = 23.5714%time taken: 94s
Epoch 4: Training Accuracy = 28.6349%, Validation Accuracy = 28.4048%time taken: 88s
Predicting ...
total time taken: 454; avg. epoch duration: 90.8s

With the parallelization:
Reading data ...
Training ...
Epoch 0: Training Accuracy = 9.7672%, Validation Accuracy = 9.83333%time taken: 49s
Epoch 1: Training Accuracy = 14.4127%, Validation Accuracy = 14.619%time taken: 47s
Epoch 2: Training Accuracy = 19.6376%, Validation Accuracy = 19.0952%time taken: 47s
Epoch 3: Training Accuracy = 25.0317%, Validation Accuracy = 24.5%time taken: 47s
Epoch 4: Training Accuracy = 29.8677%, Validation Accuracy = 29.9286%time taken: 47s
Predicting ...
total time taken: 237; avg. epoch duration: 47.4s

Without the parallelization:
Training ...
Epoch 0: Training Accuracy = 82.8704%, Validation Accuracy = 82.1429%time taken: 361s
(didn't bother to run more because it was taking so long :/)

With the parallelization:
Reading data ...
Training ...
Epoch 0: Training Accuracy = 82.4815%, Validation Accuracy = 81.9286%time taken: 225s
Epoch 1: Training Accuracy = 88.8439%, Validation Accuracy = 88.5952%time taken: 229s
Epoch 2: Training Accuracy = 91.6323%, Validation Accuracy = 91.7857%time taken: 220s
Epoch 3: Training Accuracy = 93.2116%, Validation Accuracy = 93.1905%time taken: 221s
Epoch 4: Training Accuracy = 94.3386%, Validation Accuracy = 93.8333%time taken: 220s
Predicting ...
total time taken: 1115; avg. epoch duration: 223s

If this is good, I could clean up my changes, extend this to the AtrousConvolution and other classes that need changes?

mlpack-bot · 2020-03-26T17:43:46Z

Thanks for opening your first pull request in this repository! Someone will review it when they have a chance. In the mean time, please be sure that you've handled the following things, to make the review process quicker and easier:

All code should follow the style guide
Documentation added for any new functionality
Tests added for any new functionality
Tests that are added follow the testing guide
Headers and license information added to the top of any new code files
HISTORY.md updated if the changes are big or user-facing
All CI checks should be passing

Thank you again for your contributions! 👍

kartikdutt18 · 2020-03-26T17:58:27Z

Hey @adiwajshing , do you mind running these tests with BLAS or OpenBLAS?
Maybe that might help in CPU training because training mostly requires armadillo operations (like multiplication etc) and they can be parallelized using BLAS or OpenBLAS which is more prominent also.

adiwajshing · 2020-03-26T18:02:43Z

I Already did that. As the convolution functions don't really involve too many armadillo operations and so using BLAS or OpenBLAS didn't help. Also, the pooling functions involve a combination of for loops & armadillo operations. That's why just parallelizing the for loops in these operations made a huge difference.

kartikdutt18

Hi, So I haven't gone through the file yet, I think I need to pay a bit more attention to why some of the changes were made. I'll go over them tomorrow.
Till then I had a couple of questions:

How was this tested ?
Can you share the results with BLAS and OpenBLAS?
Do you mind sharing the testing script ?

Sorry to bother with so many questions, I would love to see convolution improve however something that I learned is that benchmarking especially correct benchmarking is not so easy.
Some other compiler flags that you might find useful:
-O3 -fopenmp -DNDEBUG -DARMA_NO_DEBUG

kartikdutt18 · 2020-03-26T18:05:04Z

src/mlpack/methods/ann/layer/convolution_impl.hpp

+  arma::cube inputTemp = arma::cube(const_cast<arma::Mat<eT>&>(input).memptr(),
+  inputWidth, inputHeight, inSize * batchSize, false, false);


Any particular reason why we need to make this change? I might be missing something. Thanks.

Sorry about that, its a merge issue. I'll revert that part

kartikdutt18 · 2020-03-26T18:07:05Z

src/mlpack/methods/ann/layer/convolution_impl.hpp

  outputTemp.zeros();

-  for (size_t outMap = 0, outMapIdx = 0, batchCount = 0; outMap <
-      outSize * batchSize; outMap++)
+  arma::Cube<eT> inp = (padWLeft | padWRight | padHTop | padHBottom) != 0 ? inputPaddedTemp : inputTemp;


We generally prefer names that are much more intuitive. Let me know what you think.

I needed that variable earlier, don't anymore. Will remove it.

kartikdutt18 · 2020-03-26T18:18:46Z

Also kindly refer to the style guidelines here. Hmm, Looks the convolution causes out of index bound error. Let me know what you think.

adiwajshing · 2020-03-26T19:47:12Z

That's odd. Tests were passing on my machine. I'll implement the style guide, check the tests & get back.

kartikdutt18 · 2020-03-27T12:06:56Z

Hey @adiwajshing, Something I forgot to mention yesterday, size_t causes problems with openMP across some devices so you should use omp_size_t to fix the build.

-Using omp_size_t instead of size_t when using OpenMP -Convolution::Backward() bug fix -Parallelized Convolution::Gradient()

adiwajshing · 2020-03-27T14:25:46Z

@kartikdutt18 thanks a lot! I was wondering the problem OMP had with Windows. Anyway, I benchmarked everything with the -O3 optimization (the rest I had done already), and the results were quite interesting.

I have been using this mlpack script for my testing.

When the iterations per epoch < 1000, the performance is almost exactly the same, with or without my changes. However, when the iterations per epoch >= 10000, I get at least 2x performance with these changes.

Here are some results with -O3 & 39000 iterations per epoch:

Without parallelization:
Reading data ...
Training on 37800 samples
Epoch 0: Training Accuracy = 79.6667%, Validation Accuracy = 79.4524%time taken: 91s
Epoch 1: Training Accuracy = 87.3677%, Validation Accuracy = 87.6905%time taken: 90s
Epoch 2: Training Accuracy = 90.3651%, Validation Accuracy = 90.5238%time taken: 90s
Epoch 3: Training Accuracy = 92.3307%, Validation Accuracy = 92.1905%time taken: 90s
Epoch 4: Training Accuracy = 93.2513%, Validation Accuracy = 93.119%time taken: 90s
Predicting ...
total time taken: 451; avg. epoch duration: 90.2

With parallelization
Training on 37800 samples
Epoch 0: Training Accuracy = 73.8651%, Validation Accuracy = 73.6667%time taken: 36s
Epoch 1: Training Accuracy = 85.3333%, Validation Accuracy = 84.5476%time taken: 42s
Epoch 2: Training Accuracy = 89.537%, Validation Accuracy = 89.0952%time taken: 35s
Epoch 3: Training Accuracy = 91.3333%, Validation Accuracy = 91.0238%time taken: 35s
Epoch 4: Training Accuracy = 92.5979%, Validation Accuracy = 92.3095%time taken: 34s
Predicting ...
total time taken: 182; avg. epoch duration: 36.4

adiwajshing · 2020-03-27T14:30:41Z

src/mlpack/methods/ann/convolution_rules/naive_convolution.hpp

@@ -55,28 +55,32 @@ class NaiveConvolution
              const size_t dW = 1,
              const size_t dH = 1,
              const size_t dilationW = 1,
-              const size_t dilationH = 1)
+              const size_t dilationH = 1, const size_t appending = false)


add appending option; when appending=true, the convolution just adds to the output instead of allocating a new matrix and then adding to it. Spares an allocation & some CPU time.

I think appending should be a bool value then.

sure, will change it to that then.

adiwajshing · 2020-03-30T08:42:43Z

Hi @kartikdutt18

Would you have any idea why the build is failing now on 'mlpack.mlpack' and the rest is passing? The build section on Azure is just empty.

Thank you

kartikdutt18 · 2020-03-30T08:44:16Z

Hi, It's unrelated to your PR. Thanks.

lozhnikov

Hello, thanks for the contribution. I didn't look in detail yet. I just added some minor comments regarding the style and a couple of questions.

lozhnikov · 2020-03-30T20:05:25Z

CMakeLists.txt

@@ -481,14 +481,17 @@ if (OPENMP_FOUND)
  add_definitions(-DHAS_OPENMP)
  set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
+  if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "AppleClang")
+    set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} /usr/local/Cellar/llvm/9.0.1/lib/libomp.dylib")


Probably this was a wrong git add.

So, this line is required when building with Xcode. libomp.dylib has to be linked. In hindsight, it's too specific with the version of LLVM, maybe you could suggest a better way to do this?

Unfortunately, I don't know since I've never used OSX. I think the cmake configuration file shouldn't depend on a particular system.

I think you can solve the issue by means of cmake variables. Try to modify the compiler flags or the environment:

# I am not quite sure which variable you need. cmake -D CMAKE_SHARED_LINKER_FLAGS=/usr/local/Cellar/llvm/9.0.1/lib/libomp.dylib path/to/sources # Probably cmake scan the standard variables. LDFLAGS=/usr/local/Cellar/llvm/9.0.1/lib/libomp.dylib cmake path/to/sources

lozhnikov · 2020-03-30T20:06:04Z

CMakeLists.txt

  else ()
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /wd4068")
  endif ()
-  set(OpenMP_CXX_FLAGS "")
+    set(OpenMP_CXX_FLAGS "")


Looks like the previous indentation was correct.

lozhnikov · 2020-03-30T20:13:23Z

src/mlpack/methods/ann/convolution_rules/naive_convolution.hpp

+    const eT *kernelPtr = filter.memptr(), *inputPtr;
+    size_t j, i, kj, ki;
+    const size_t o_cols = output.n_cols, o_rows = output.n_rows;
+    const size_t f_cols = filter.n_cols, f_rows = filter.n_rows;

-    for (size_t j = 0; j < output.n_cols; ++j)
+    for (j = 0; j < o_cols; ++j)
    {
-      for (size_t i = 0; i < output.n_rows; ++i, outputPtr++)
+      for (i = 0; i < o_rows; ++i, outputPtr ++)
      {
-        const eT* kernelPtr = filter.memptr();
-        for (size_t kj = 0; kj < filter.n_cols; ++kj)
+        for (kj = 0; kj < f_cols; ++kj)
        {
-          const eT* inputPtr = input.colptr(kj * dilationW + j * dW) + i * dH;
-          for (size_t ki = 0; ki < filter.n_rows; ++ki, ++kernelPtr,
-              inputPtr += dilationH)
+          inputPtr = input.colptr(kj * dilationW + j * dW) + i * dH;
+          for (ki = 0; ki < f_rows; ++ki, ++kernelPtr, inputPtr += dilationH)
            *outputPtr += *kernelPtr * (*inputPtr);
        }
+        kernelPtr -= f_rows*f_cols;


I didn't get the point. What's the purpose of these changes? Could you elaborate a bit?

I was working with no optimization, and so these changes made an incremental difference. However, these are just small optimizations the compiler probably would have done anyway. Can remove them if required

So, I ran a test with this change. There does not seem to be a difference when optimization is enabled.

However, I then parallelized the function when the image is large enough, here are the results:

INPUT SIZE: 16 FILTER SIZE 3: old: 0.038158s new: 0.036094s FILTER SIZE 5: old: 0.044838s new: 0.057577s FILTER SIZE 7: old: 0.066811s new: 0.063331s INPUT SIZE: 32 FILTER SIZE 3: old: 0.127948s new: 0.140982s FILTER SIZE 5: old: 0.279254s new: 0.405592s FILTER SIZE 7: old: 0.625482s new: 0.604379s INPUT SIZE: 48 FILTER SIZE 3: old: 0.528458s new: 0.640729s FILTER SIZE 5: old: 0.718363s new: 0.572617s FILTER SIZE 7: old: 1.00586s new: 0.969761s INPUT SIZE: 64 FILTER SIZE 3: old: 0.503006s new: 0.729201s FILTER SIZE 5: old: 1.10458s new: 1.09253s FILTER SIZE 7: old: 1.80471s new: 0.791141s INPUT SIZE: 80 FILTER SIZE 3: old: 0.830632s new: 0.466671s FILTER SIZE 5: old: 1.58484s new: 0.782858s FILTER SIZE 7: old: 2.84117s new: 1.71891s INPUT SIZE: 96 FILTER SIZE 3: old: 1.21493s new: 0.706584s FILTER SIZE 5: old: 2.2874s new: 1.13919s FILTER SIZE 7: old: 4.15312s new: 1.8547s INPUT SIZE: 112 FILTER SIZE 3: old: 1.53625s new: 0.901717s FILTER SIZE 5: old: 3.10605s new: 1.57691s FILTER SIZE 7: old: 6.06146s new: 4.87558s INPUT SIZE: 128 FILTER SIZE 3: old: 2.34454s new: 1.24114s FILTER SIZE 5: old: 4.60348s new: 2.96314s FILTER SIZE 7: old: 9.58979s new: 7.50299s INPUT SIZE: 144 FILTER SIZE 3: old: 3.30705s new: 2.80564s FILTER SIZE 5: old: 6.0074s new: 2.97862s FILTER SIZE 7: old: 9.66638s new: 4.33965s INPUT SIZE: 160 FILTER SIZE 3: old: 3.08419s new: 1.82183s FILTER SIZE 5: old: 6.39535s new: 3.26939s FILTER SIZE 7: old: 12.2366s new: 6.49565s INPUT SIZE: 176 FILTER SIZE 3: old: 4.12684s new: 3.04944s FILTER SIZE 5: old: 9.23612s new: 4.31397s FILTER SIZE 7: old: 16.8792s new: 6.71569s INPUT SIZE: 192 FILTER SIZE 3: old: 4.55307s new: 3.37878s FILTER SIZE 5: old: 9.91482s new: 5.1516s FILTER SIZE 7: old: 17.925s new: 7.8205s INPUT SIZE: 208 FILTER SIZE 3: old: 6.43632s new: 4.4773s FILTER SIZE 5: old: 13.3028s new: 7.28228s FILTER SIZE 7: old: 22.9444s new: 10.3726s INPUT SIZE: 224 FILTER SIZE 3: old: 5.9436s new: 3.52242s FILTER SIZE 5: old: 12.6614s new: 6.46533s FILTER SIZE 7: old: 24.3096s new: 11.6745s INPUT SIZE: 240 FILTER SIZE 3: old: 6.78909s new: 3.9888s FILTER SIZE 5: old: 14.444s new: 7.399s FILTER SIZE 7: old: 27.606s new: 12.273s INPUT SIZE: 256 FILTER SIZE 3: old: 8.25704s new: 5.41013s FILTER SIZE 5: old: 17.2394s new: 9.49333s FILTER SIZE 7: old: 32.0725s new: 14.911s

Here is a link to the testing script

That's interesting. I'll look into it. I thought the compiler is able to optimize the old code.

Probably, but it won't be able to automatically parallelize

lozhnikov · 2020-03-30T20:16:59Z

src/mlpack/methods/ann/dists/bernoulli_distribution_impl.hpp

+  #pragma omp parallel for
+  for (omp_size_t i = 0; i < sample.n_elem; i++)


This loop contains a pretty simple operation. I think in this case the performance depends on the memory clock speed rather than on the CPU. How do you think?

I think it would depend on the number of samples, right? If there are maybe only 10 samples, there may not be a difference in performance. However, as the number increases & the overload of creating the threads becomes less & less relevant, we would see a greater increase in performance

I deleted the last comment due to wrong values.

Added another comment. Now the test seems correct.

You didn't take into account that RAM frequency is much lower than CPU frequency and the loop requires 2 new values each iteration. Actually, there are a great number of factors such as memory frequency, memory bandwidth, CPU frequency, CPU cache size and so on.

I wrote a simple test which measures the duration of a similar loop.

https://gist.github.com/lozhnikov/3486432717ea04f25a722c97fbd79edd

(Weird, I couldn't upload the file, so I created a gist)

Here are the results:

I used a system with core-i7 2600K and 16GB DDR3 memory.

Look at the "Parallel (s)" and "Sequential (s)" columns.

g++ 9.2.1

g++ -O3 -fopenmp speedup_test.cpp -lgomp -o speedup_test_g++ ./speedup_test_g++ Size Count Par. Parallel (s) Count Seq. Sequential (s) 10 6 1.880000e-07 4 5.900000e-08 100 49 1.600000e-07 49 1.070000e-07 1000 502 1.101000e-06 500 8.470000e-07 10000 4964 1.166800e-05 4968 8.230000e-06 100000 49841 2.085290e-04 49958 8.180200e-05 1000000 500518 1.397976e-03 499577 1.487266e-03 10000000 4997717 1.245767e-02 4998914 1.237367e-02 100000000 49997606 1.240536e-01 49997601 1.196321e-01

clang 9.0.1

clang++ -O3 -fopenmp speedup_test.cpp -lgomp -o speedup_test_clang++ ./speedup_test_clang++ Size Count Par. Parallel (s) Count Seq. Sequential (s) 10 4 4.780000e-07 5 1.270000e-07 100 47 2.260000e-07 51 7.800000e-08 1000 520 1.019000e-06 489 2.660000e-07 10000 4965 1.011800e-05 4993 3.656000e-06 100000 49904 6.398400e-05 49987 4.442300e-05 1000000 499302 1.313890e-03 498852 1.307853e-03 10000000 4999504 1.020465e-02 4998569 1.076408e-02 100000000 50005009 9.804719e-02 49992123 9.778105e-02

There's hardly any difference.

You're right. I just ran a similar test on my machine:
1.6GHz dual-core Core i5, 4GB DDR3 mem

I underestimated the amount of data you would need to see a significant difference. You only see a real difference when the number of items is 10M+

If the comparison is really quick, then I can remove the parallel part, and reduce overhead. What do you say?

Yeah, I think you can remove the parallel part here.

lozhnikov · 2020-03-30T20:33:07Z

src/mlpack/methods/ann/layer/convolution_impl.hpp

-      outMapIdx = 0;
-    }
+    size_t outMapIdx = (outMap % outSize) * inSize, batchCount = outMap/outSize;
+    arma::Mat<eT> &curSlice = outputTemp.slice(outMap);


Just a tiny issue. According to the style guide we should write references and data types in one word.

Suggested change

arma::Mat<eT> &curSlice = outputTemp.slice(outMap);

arma::Mat<eT>& curSlice = outputTemp.slice(outMap);

Sorry about the style issues, I'll put in all these changes in the next commit

lozhnikov · 2020-03-30T20:56:17Z

src/mlpack/methods/ann/layer/convolution_impl.hpp

-      inputHeight, inSize * batchSize, false, false);
+  arma::cube inputTemp;
+  if (padWLeft != 0 || padWRight != 0 || padHTop != 0 || padHBottom != 0) {
+    inputTemp = inputPaddedTemp;


Probably I missed something. Where was inputPaddedTemp defined?

Oh, it's a property of the Convolution class

I have just shifted the checking for padding outside the for loop.

lozhnikov · 2020-03-30T21:00:11Z

src/mlpack/methods/ann/layer/convolution_impl.hpp

    {
-      batchCount++;
-      outMapIdx = 0;
+      arma::Mat<eT> &curGradTemp = gradientTemp.slice(outMapIdx+inMap);


Just a tiny style issue. There are the same issues below.

Suggested change

arma::Mat<eT> &curGradTemp = gradientTemp.slice(outMapIdx+inMap);

arma::Mat<eT>& curGradTemp = gradientTemp.slice(outMapIdx+inMap);

lozhnikov · 2020-03-30T21:01:40Z

src/mlpack/methods/ann/layer/convolution_impl.hpp

      arma::Mat<eT> output;
      GradientConvolutionRule::Convolution(inputSlice, deltaSlice,
-          output, strideWidth, strideHeight);
+                                           output, strideWidth, strideHeight);


Actually, according to the style guide the indentation was correct:)

lozhnikov · 2020-03-30T21:07:06Z

src/mlpack/methods/ann/layer/convolution_impl.hpp

+      arma::Mat<eT> rotatedFilter;
+      Rotate180(weight.slice(outMapIdx+inMap), rotatedFilter);
+      #pragma omp for
+      for (omp_size_t batchCount = 0; batchCount < batchSize; batchCount++) {


Sorry, I didn't look in detail yet. Why did you change the nesting of the loops? I was wondering if it provides any cache optimizations.

Yes, we avoid extra computations. Earlier, we were retrieving and rotating the same weight slice for every batch, but now we only do it once for all batches.

lozhnikov · 2020-03-30T21:08:27Z

src/mlpack/methods/ann/layer/leaky_relu_impl.hpp

+  #pragma omp parallel for
+  for (omp_size_t i = 0; i < input.n_elem; i++)


Again, I think this loop contains a pretty simple operation. I guess in this case the performance depends on the memory clock speed rather than on the CPU.

Again, I think it would depend on the size of the input, right?

-removed parallel loop from bernoulli_distribution & leaky_relu -fixed data race in convolution_impl -conform to style guide -parallelized naive convolution

mlpack-bot · 2020-05-20T12:06:21Z

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

adiwajshing · 2020-05-25T07:25:21Z

Hi, is there any progress on this?

mlpack-bot · 2020-06-24T07:58:06Z

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

FabioMBB · 2021-02-26T19:21:24Z

Hi Guys, was this ever merged?

rcurtin · 2021-02-27T22:13:14Z

Hey @adiwajshing, sorry, it looks like this one kind of fell off the list a little bit. Are you still interested in it? We can reopen the PR and I can try and review it and get it merged. @FabioMBB said he had some success with it.

adiwajshing · 2021-02-28T14:14:20Z

@rcurtin sure -- can review

mlpack-bot · 2021-03-14T23:53:27Z

Thanks for opening your first pull request in this repository! Someone will review it when they have a chance. In the mean time, please be sure that you've handled the following things, to make the review process quicker and easier:

All code should follow the style guide
Documentation added for any new functionality
Tests added for any new functionality
Tests that are added follow the testing guide
Headers and license information added to the top of any new code files
HISTORY.md updated if the changes are big or user-facing
All CI checks should be passing

Thank you again for your contributions! 👍

rcurtin · 2021-03-14T23:54:31Z

I tried the PR locally and saw some really nice speedup (3x) on the mnist_cnn example! (See mlpack/examples#144 for a few more details.) However, it seems like there might be a minor correctness issue, since I was getting different results in each case.

I merged master into this branch---hopefully I didn't mess anything up during the merge process. 👍

I think that we should try to figure out what's wrong here and then incorporate it, because based on every benchmark we've seen here it's a nice speedup.

adiwajshing · 2021-03-15T11:37:35Z

@rcurtin good to hear. I'll see where the issue is over the next few days and try and resolve it. Can you give me a specific test I can try out and what you expect out of it?

rcurtin · 2021-03-15T17:41:01Z

@adiwajshing yeah, so what I did was build the mnist_cnn example from https://github.com/mlpack/examples, both against master and against this branch. I noticed different output---

Here's master:

$ time LD_LIBRARY_PATH=/home/ryan/src/mlpack/build/lib/ ./mnist_cnn 
Reading data ...
Start training ...
Epoch 1
2150.4680 [====================================================================================================] 100% - ETA: 0s - loss: 2148.47
1080/1080 [====================================================================================================] 100% - 44s 40ms/step - loss: 2150.46
Validation loss: 2.77534e+06.
Epoch 2
309.01780 [====================================================================================================] 100% - ETA: 0s - loss: 308.731
1080/1080 [====================================================================================================] 100% - 45s 41ms/step - loss: 309.017
Validation loss: 1.21627e+06.
Epoch 3
157.82680 [====================================================================================================] 100% - ETA: 0s - loss: 157.68
1080/1080 [====================================================================================================] 100% - 44s 41ms/step - loss: 157.826
Validation loss: 719143.
Epoch 4
96.544480 [====================================================================================================] 100% - ETA: 0s - loss: 96.455
1080/1080 [====================================================================================================] 100% - 45s 41ms/step - loss: 96.5444
Validation loss: 485079.
Epoch 5
65.006180 [====================================================================================================] 100% - ETA: 0s - loss: 64.946
1080/1080 [====================================================================================================] 100% - 44s 41ms/step - loss: 65.0061
Validation loss: 346494.
Epoch 6
46.316680 [====================================================================================================] 100% - ETA: 0s - loss: 46.2738
1080/1080 [====================================================================================================] 100% - 45s 41ms/step - loss: 46.3166
Validation loss: 254303.
Epoch 7
34.012180 [====================================================================================================] 100% - ETA: 0s - loss: 33.9806
1080/1080 [====================================================================================================] 100% - 45s 42ms/step - loss: 34.0121
Validation loss: 190001.
Epoch 8
25.310880 [====================================================================================================] 100% - ETA: 0s - loss: 25.2874
1080/1080 [====================================================================================================] 100% - 46s 42ms/step - loss: 25.3108
Validation loss: 144774.
Epoch 9
19.189280 [====================================================================================================] 100% - ETA: 0s - loss: 19.1715
1080/1080 [====================================================================================================] 100% - 45s 42ms/step - loss: 19.1892
Validation loss: 123542.
Epoch 10
14.814480 [====================================================================================================] 100% - ETA: 0s - loss: 14.8007
1080/1080 [====================================================================================================] 100% - 46s 42ms/step - loss: 14.8144
Validation loss: 99427.4.
Epoch 11
11.806080 [====================================================================================================] 100% - ETA: 0s - loss: 11.795
1080/1080 [====================================================================================================] 100% - 47s 43ms/step - loss: 11.806
Validation loss: 82894.4.
Epoch 12
9.4929180 [====================================================================================================] 100% - ETA: 0s - loss: 9.48413
1080/1080 [====================================================================================================] 100% - 46s 43ms/step - loss: 9.49291
Validation loss: 70650.2.
Epoch 13
7.6873380 [====================================================================================================] 100% - ETA: 0s - loss: 7.68022
1080/1080 [====================================================================================================] 100% - 48s 44ms/step - loss: 7.68733
Validation loss: 60630.2.
Epoch 14
6.4707980 [====================================================================================================] 100% - ETA: 0s - loss: 6.4648
1080/1080 [====================================================================================================] 100% - 48s 44ms/step - loss: 6.47079
Validation loss: 57717.8.
Epoch 15
5.2746980 [====================================================================================================] 100% - ETA: 0s - loss: 5.26981
1080/1080 [====================================================================================================] 100% - 47s 43ms/step - loss: 5.27469
Validation loss: 52386.2.
Epoch 16
4.3709180 [====================================================================================================] 100% - ETA: 0s - loss: 4.36687
1080/1080 [====================================================================================================] 100% - 47s 44ms/step - loss: 4.37091
Validation loss: 50859.3.
Epoch 17
3.7218780 [====================================================================================================] 100% - ETA: 0s - loss: 3.71843
1080/1080 [====================================================================================================] 100% - 48s 45ms/step - loss: 3.72187
Validation loss: 42875.3.
Epoch 18
3.1274680 [====================================================================================================] 100% - ETA: 0s - loss: 3.12457
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 3.12746
Validation loss: 41909.3.
Epoch 19
2.8877980 [====================================================================================================] 100% - ETA: 0s - loss: 2.88512
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 2.88779
Validation loss: 41560.
Epoch 20
2.4151180 [====================================================================================================] 100% - ETA: 0s - loss: 2.41287
1080/1080 [====================================================================================================] 100% - 50s 46ms/step - loss: 2.41511
Validation loss: 31542.1.
Epoch 21
2.1022380 [====================================================================================================] 100% - ETA: 0s - loss: 2.10029
1080/1080 [====================================================================================================] 100% - 50s 47ms/step - loss: 2.10223
Validation loss: 34162.1.
Epoch 22
1.9803680 [====================================================================================================] 100% - ETA: 0s - loss: 1.97853
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 1.98036
Validation loss: 32856.6.
Epoch 23
1.7359580 [====================================================================================================] 100% - ETA: 0s - loss: 1.73434
1080/1080 [====================================================================================================] 100% - 50s 46ms/step - loss: 1.73595
Validation loss: 30949.5.
Epoch 24
1.5832580 [====================================================================================================] 100% - ETA: 0s - loss: 1.58178
1080/1080 [====================================================================================================] 100% - 50s 46ms/step - loss: 1.58325
Validation loss: 28673.1.
Epoch 25
1.3232780 [====================================================================================================] 100% - ETA: 0s - loss: 1.32204
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 1.32327
Validation loss: 29069.5.
Epoch 26
1.2727880 [====================================================================================================] 100% - ETA: 0s - loss: 1.27165
1080/1080 [====================================================================================================] 100% - 50s 46ms/step - loss: 1.27278
Validation loss: 25303.8.
Epoch 27
1.2116580 [====================================================================================================] 100% - ETA: 0s - loss: 1.21053
1080/1080 [====================================================================================================] 100% - 50s 46ms/step - loss: 1.21165
Validation loss: 26911.2.
Epoch 28
1.0501380 [====================================================================================================] 100% - ETA: 0s - loss: 1.04916
1080/1080 [====================================================================================================] 100% - 50s 46ms/step - loss: 1.05013
Validation loss: 24317.2.
Epoch 29
1.0180980 [====================================================================================================] 100% - ETA: 0s - loss: 1.01715
1080/1080 [====================================================================================================] 100% - 50s 46ms/step - loss: 1.01809
Validation loss: 23769.9.
Epoch 30
0.9273660 [====================================================================================================] 100% - ETA: 0s - loss: 0.926508
1080/1080 [====================================================================================================] 100% - 50s 46ms/step - loss: 0.927366
Validation loss: 22773.5.
Epoch 31
0.8413060 [====================================================================================================] 100% - ETA: 0s - loss: 0.840528
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 0.841306
Validation loss: 20781.8.
Epoch 32
0.8605250 [====================================================================================================] 100% - ETA: 0s - loss: 0.859729
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 0.860525
Validation loss: 23822.4.
Epoch 33
0.7307910 [====================================================================================================] 100% - ETA: 0s - loss: 0.730115
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 0.730791
Validation loss: 21462.2.
Epoch 34
0.7655850 [====================================================================================================] 100% - ETA: 0s - loss: 0.764877
1080/1080 [====================================================================================================] 100% - 49s 45ms/step - loss: 0.765585
Validation loss: 23261.7.
Epoch 35
0.6729320 [====================================================================================================] 100% - ETA: 0s - loss: 0.672309
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 0.672932
Validation loss: 21602.3.
Epoch 36
0.7120680 [====================================================================================================] 100% - ETA: 0s - loss: 0.711401
1080/1080 [====================================================================================================] 100% - 49s 45ms/step - loss: 0.71206
Validation loss: 19921.5.
Epoch 37
0.6118950 [====================================================================================================] 100% - ETA: 0s - loss: 0.611329
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 0.611895
Validation loss: 21609.2.
Epoch 38
0.5873770 [====================================================================================================] 100% - ETA: 0s - loss: 0.586833
1080/1080 [====================================================================================================] 100% - 50s 46ms/step - loss: 0.587377
Validation loss: 17931.3.
Epoch 39
0.5226340 [====================================================================================================] 100% - ETA: 0s - loss: 0.522151
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 0.522634
Validation loss: 18900.3.
Epoch 40
0.5187220 [====================================================================================================] 100% - ETA: 0s - loss: 0.518242
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 0.518722
Validation loss: 18648.6.
Epoch 41
0.4907320 [====================================================================================================] 100% - ETA: 0s - loss: 0.490278
1080/1080 [====================================================================================================] 100% - 49s 45ms/step - loss: 0.490732
Validation loss: 21183.7.
Epoch 42
0.5101930 [====================================================================================================] 100% - ETA: 0s - loss: 0.509721
1080/1080 [====================================================================================================] 100% - 49s 45ms/step - loss: 0.510193
Validation loss: 18619.4.
Epoch 43
0.4506470 [====================================================================================================] 100% - ETA: 0s - loss: 0.450231
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 0.450647
Validation loss: 17743.5.
Epoch 44
0.4243330 [====================================================================================================] 100% - ETA: 0s - loss: 0.42394
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 0.424333
Validation loss: 19425.8.
Epoch 45
0.4857840 [====================================================================================================] 100% - ETA: 0s - loss: 0.485334
1080/1080 [====================================================================================================] 100% - 49s 45ms/step - loss: 0.485784
Validation loss: 18821.4.
Epoch 46
0.4195440 [====================================================================================================] 100% - ETA: 0s - loss: 0.419156
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 0.419544
Validation loss: 17202.
Epoch 47
0.4419720 [====================================================================================================] 100% - ETA: 0s - loss: 0.441564
1080/1080 [====================================================================================================] 100% - 49s 45ms/step - loss: 0.441972
Validation loss: 18088.7.
Epoch 48
0.3860690 [====================================================================================================] 100% - ETA: 0s - loss: 0.385711
1080/1080 [====================================================================================================] 100% - 49s 45ms/step - loss: 0.386069
Validation loss: 16440.7.
Epoch 49
0.4003420 [====================================================================================================] 100% - ETA: 0s - loss: 0.399971
1080/1080 [====================================================================================================] 100% - 49s 45ms/step - loss: 0.400342
Validation loss: 17862.4.
Epoch 50
0.4180330 [====================================================================================================] 100% - ETA: 0s - loss: 0.417647
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 0.418033
Validation loss: 18738.5.
Epoch 51
0.4121690 [====================================================================================================] 100% - ETA: 0s - loss: 0.411788
1080/1080 [====================================================================================================] 100% - 49s 45ms/step - loss: 0.412169
Validation loss: 17226.5.
Epoch 52
0.2911080 [====================================================================================================] 100% - ETA: 0s - loss: 0.290731
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 0.291
Validation loss: 16592.6.
Epoch 53
0.4126380 [====================================================================================================] 100% - ETA: 0s - loss: 0.412248
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 0.41263
Validation loss: 17579.6.
Epoch 54
0.3190650 [====================================================================================================] 100% - ETA: 0s - loss: 0.31877
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 0.319065
Validation loss: 18338.1.
Epoch 55
0.3547280 [====================================================================================================] 100% - ETA: 0s - loss: 0.354392
1080/1080 [====================================================================================================] 100% - 49s 46ms/step - loss: 0.35472
Validation loss: 16453.6.
Epoch 56
0.3648910 [====================================================================================================] 100% - ETA: 0s - loss: 0.364554
1080/1080 [====================================================================================================] 100% - 49s 45ms/step - loss: 0.364891
Validation loss: 16569.9.
Epoch 57
0.3383880 [====================================================================================================] 100% - ETA: 0s - loss: 0.338067
1080/1080 [====================================================================================================] 100% - 49s 45ms/step - loss: 0.33838
Validation loss: 16980.7.
Epoch 58
0.3366140 [====================================================================================================] 100% - ETA: 0s - loss: 0.336302
1080/1080 [====================================================================================================] 100% - 49s 45ms/step - loss: 0.336614
Validation loss: 18033.8.
Accuracy: train = 99.5278%,	 valid = 97.65%
Predicting ...
Saving predicted labels to "results.csv."...
Neural network model is saved to "model.bin"
Finished

real	48m49.996s
user	268m9.838s
sys	0m15.430s

and here's this branch:

$ time LD_LIBRARY_PATH=/home/ryan/src/mlpack/build-2341/lib/ ./mnist_cnn 
Reading data ...
Start training ...
Epoch 1
5116.3280 [====================================================================================================] 100% - ETA: 0s - loss: 5111.58
1080/1080 [====================================================================================================] 100% - 9s 8ms/step - loss: 5116.32
Validation loss: 1.56285e+07.
Epoch 2
1995.6280 [====================================================================================================] 100% - ETA: 0s - loss: 1993.78
1080/1080 [====================================================================================================] 100% - 9s 9ms/step - loss: 1995.62
Validation loss: 1.12203e+07.
Epoch 3
1722.3080 [====================================================================================================] 100% - ETA: 0s - loss: 1720.7
1080/1080 [====================================================================================================] 100% - 9s 8ms/step - loss: 1722.3
Validation loss: 1.21165e+07.
Epoch 4
2006.1280 [====================================================================================================] 100% - ETA: 0s - loss: 2004.26
1080/1080 [====================================================================================================] 100% - 9s 8ms/step - loss: 2006.12
Validation loss: 1.53188e+07.
Epoch 5
3868.2280 [====================================================================================================] 100% - ETA: 0s - loss: 3864.65
1080/1080 [====================================================================================================] 100% - 9s 8ms/step - loss: 3868.22
Validation loss: 3.19173e+07.
Epoch 6
7577.0280 [====================================================================================================] 100% - ETA: 0s - loss: 7570.01
1080/1080 [====================================================================================================] 100% - 8s 8ms/step - loss: 7577.02
Validation loss: 6.51282e+07.
Epoch 7
11557.880 [====================================================================================================] 100% - ETA: 0s - loss: 11547.1
1080/1080 [====================================================================================================] 100% - 8s 7ms/step - loss: 11557.8
Validation loss: 1.11472e+08.
Epoch 8
14903.180 [====================================================================================================] 100% - ETA: 0s - loss: 14889.3
1080/1080 [====================================================================================================] 100% - 9s 8ms/step - loss: 14903.1
Validation loss: 1.59785e+08.
Epoch 9
17642.780 [====================================================================================================] 100% - ETA: 0s - loss: 17626.4
1080/1080 [====================================================================================================] 100% - 9s 8ms/step - loss: 17642.7
Validation loss: 1.05952e+08.
Epoch 10
212571080 [====================================================================================================] 100% - ETA: 0s - loss: 21237.4
1080/1080 [====================================================================================================] 100% - 8s 7ms/step - loss: 21257
Validation loss: 1.44208e+08.
Epoch 11
24389.180 [====================================================================================================] 100% - ETA: 0s - loss: 24366.5
1080/1080 [====================================================================================================] 100% - 9s 8ms/step - loss: 24389.1
Validation loss: 1.35117e+08.
Epoch 12
27116.780 [====================================================================================================] 100% - ETA: 0s - loss: 27091.6
1080/1080 [====================================================================================================] 100% - 8s 7ms/step - loss: 27116.7
Validation loss: 1.59217e+08.
Accuracy: train = 89.387%,	 valid = 89.2333%
Predicting ...
Saving predicted labels to "results.csv."...
Neural network model is saved to "model.bin"
Finished

real	2m2.085s
user	44m9.558s
sys	0m50.993s

So you can see that something's different, and my assumption at this point is that there's some small difference in the convolution code somewhere. However, it doesn't seem like there is a failing test case in mlpack_test that might make it easy to reproduce. But I also noticed that the original output you gave has slightly different results vs. the master branch too, so maybe that is an easier thing to reproduce.

If you don't get a chance, I'll try and dig in, but it may be a handful of days before I have the chance. 👍

mlpack-bot · 2021-04-14T18:11:12Z

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

partially parallelized ANN

b710894

mlpack-bot bot added s: needs review s: unanswered s: unlabeled labels Mar 26, 2020

adiwajshing changed the title ~~Parallelize artificial network~~ Parallelize convolutional neural network Mar 26, 2020

favre49 added c: methods and removed s: unanswered s: unlabeled labels Mar 26, 2020

kartikdutt18 reviewed Mar 26, 2020

View reviewed changes

adiwajshing added 4 commits March 27, 2020 16:07

following style guide + test fix

81daf67

style guide fixes

d863024

style guide fixes

b78e24d

style guide final fix

b9bcc8a

Bug fixes + Parallelized Convolution::Gradient()

6bbf3a5

-Using omp_size_t instead of size_t when using OpenMP -Convolution::Backward() bug fix -Parallelized Convolution::Gradient()

adiwajshing commented Mar 27, 2020

View reviewed changes

OMP size_t fix

ae440ff

adiwajshing mentioned this pull request Mar 28, 2020

Multithread Naive Convolution? #2319

Closed

lozhnikov reviewed Mar 30, 2020

View reviewed changes

adiwajshing added 2 commits April 2, 2020 16:35

fixes + begin parallelizing NaiveConvolution

0ecba46

-removed parallel loop from bernoulli_distribution & leaky_relu -fixed data race in convolution_impl -conform to style guide -parallelized naive convolution

error fix

9bb3a97

adiwajshing mentioned this pull request Apr 2, 2020

Enhancements to benchmarking mlpack/benchmarks#146

Open

omp fix + style guide fix

1b24524

mlpack-bot bot added the s: stale label May 20, 2020

mlpack-bot bot removed the s: stale label May 25, 2020

mlpack-bot bot added the s: stale label Jun 24, 2020

mlpack-bot bot closed this Jul 1, 2020

FabioMBB mentioned this pull request Feb 27, 2021

exception: std::logic_error mlpack/examples#144

Closed

Merge remote-tracking branch 'origin/master' into HEAD

eda202c

rcurtin reopened this Mar 14, 2021

mlpack-bot bot removed the s: stale label Mar 14, 2021

mlpack-bot bot added the s: stale label Apr 14, 2021

shrit added help wanted s: keep open t: feature request and removed s: stale labels Apr 16, 2021

shrit added this to the mlpack 4.0.0 milestone Sep 30, 2021

shubham1206agra mentioned this pull request Jul 4, 2022

Parallelize ANN with OpenMP #3240

Merged

conradsnicta added s: stale and removed s: keep open labels Jul 18, 2022

mlpack-bot bot closed this Jul 25, 2022

		arma::cube inputTemp = arma::cube(const_cast<arma::Mat<eT>&>(input).memptr(),
		inputWidth, inputHeight, inSize * batchSize, false, false);

		#pragma omp parallel for
		for (omp_size_t i = 0; i < sample.n_elem; i++)

	arma::Mat<eT> &curSlice = outputTemp.slice(outMap);
	arma::Mat<eT>& curSlice = outputTemp.slice(outMap);

	arma::Mat<eT> &curGradTemp = gradientTemp.slice(outMapIdx+inMap);
	arma::Mat<eT>& curGradTemp = gradientTemp.slice(outMapIdx+inMap);

		#pragma omp parallel for
		for (omp_size_t i = 0; i < input.n_elem; i++)

Parallelize convolutional neural network #2341

Parallelize convolutional neural network #2341

Conversation

adiwajshing commented Mar 26, 2020 • edited Loading

mlpack-bot bot commented Mar 26, 2020

kartikdutt18 commented Mar 26, 2020

adiwajshing commented Mar 26, 2020

kartikdutt18 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kartikdutt18 commented Mar 26, 2020

adiwajshing commented Mar 26, 2020

kartikdutt18 commented Mar 27, 2020

adiwajshing commented Mar 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adiwajshing commented Mar 30, 2020

kartikdutt18 commented Mar 30, 2020

lozhnikov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lozhnikov Mar 31, 2020 • edited Loading

Choose a reason for hiding this comment

g++ 9.2.1

clang 9.0.1

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlpack-bot bot commented May 20, 2020

adiwajshing commented May 25, 2020

mlpack-bot bot commented Jun 24, 2020

FabioMBB commented Feb 26, 2021

rcurtin commented Feb 27, 2021

adiwajshing commented Feb 28, 2021

mlpack-bot bot commented Mar 14, 2021

rcurtin commented Mar 14, 2021

adiwajshing commented Mar 15, 2021

rcurtin commented Mar 15, 2021

mlpack-bot bot commented Apr 14, 2021

adiwajshing commented Mar 26, 2020 •

edited

Loading

kartikdutt18 left a comment •

edited

Loading

lozhnikov Mar 31, 2020 •

edited

Loading