Building TensorFlow

The instructions provided below specify the steps to build TensorFlow version 2.2.0 on Linux on IBM Z for the following distributions:

Ubuntu (18.04, 20.04)

General Notes:

When following the steps below please use a standard permission user unless otherwise specified.
A directory /<source_root>/ will be referred to in these instructions, this is a temporary writable directory anywhere you'd like to place it.

Step 1: Build and Install TensorFlow v2.2.0

1.1) Build using script

If you want to build TensorFlow using manual steps, go to STEP 1.2.

Use the following commands to build TensorFlow using the build script. Please make sure you have wget installed.

wget -q https://raw.githubusercontent.com/linux-on-ibm-z/scripts/master/Tensorflow/2.2.0/build_tensorflow.sh

# Build Tensorflow
bash build_tensorflow.sh    [Provide -t option for executing build with tests]

If the build completes successfully, go to STEP 2. In case of error, check logs for more details or go to STEP 1.2 to follow manual build steps.

1.2) Install the dependencies

export SOURCE_ROOT=/<source_root>/

Ubuntu (18.04, 20.04)

 sudo apt-get update
 sudo apt-get install sudo wget libhdf5-dev python3-dev python3-pip pkg-config unzip openjdk-11-jdk zip libssl-dev git python3-numpy libblas-dev  liblapack-dev python3-scipy gfortran swig cython3 -y
 sudo ldconfig
 sudo pip3 install --no-cache-dir numpy==1.16.2 future wheel backports.weakref portpicker futures enum34 keras_preprocessing keras_applications h5py tensorflow_estimator setuptools pybind11

Ensure /usr/bin/python points to Python3 to build TensorFlow in a Python3 environment

  sudo ln -sf /usr/bin/python3 /usr/bin/python

Install grpcio

 export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=True
 sudo -E pip3 install grpcio

Build Bazel v2.2.0 -- Instructions for building Bazel can be found here.

1.3) Build TensorFlow

Download source code

cd $SOURCE_ROOT
git clone https://github.com/linux-on-ibm-z/tensorflow.git
cd tensorflow
git checkout v2.2.0-s390x

Configure

./configure
You have bazel 2.0.0- (@non-git) installed.
Please specify the location of python. [Default is /usr/bin/python]:


Found possible Python library paths:
  /usr/lib/python3/dist-packages
  /usr/local/lib/python3.6/dist-packages
Please input the desired Python library path to use.  Default is [/usr/lib/python3/dist-packages]

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: N
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: N
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: N
Clang will not be downloaded.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See 
.bazelrc for more details.
        --config=mkl            # Build with MKL support.
        --config=monolithic     # Config for mostly static monolithic build.
        --config=ngraph         # Build with Intel nGraph support.
        --config=numa           # Build with NUMA support.
        --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.
        --config=v2             # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
        --config=noaws          # Disable AWS S3 filesystem support.
        --config=nogcp          # Disable GCP support.
        --config=nohdfs         # Disable HDFS support.
        --config=nonccl         # Disable NVIDIA NCCL support.
Configuration finished

Build TensorFlow
```
bazel --host_jvm_args="-Xms1024m" --host_jvm_args="-Xmx2048m" build //tensorflow/tools/pip_package:build_pip_package
```
Note: TensorFlow build is resource intensive operation. If build continues to fail try increasing the swap space and reduce the number of concurrent jobs by specifying --jobs=n in the build command above, where n is the number of concurrent jobs.

1.4) Build and Install TensorFlow wheel

cd $SOURCE_ROOT/tensorflow
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_wheel
sudo pip3 install /tmp/tensorflow_wheel/tensorflow-2.2.0-cp*-linux_s390x.whl

Step 2: Verify TensorFlow (Optional)

Run TensorFlow from command Line

 $ cd $SOURCE_ROOT
 $ /usr/bin/python3
  >>> import tensorflow as tf
  >>> tf.add(1, 2).numpy()
  3
  >>> hello = tf.constant('Hello, TensorFlow!')
  >>> hello.numpy()
  'Hello, TensorFlow!'
  >>>

Step 3: Execute Test Suite (Optional)

Run complete testsuite

JTOOLS=$SOURCE_ROOT/remote_java_tools_linux
cd $SOURCE_ROOT/tensorflow
bazel --host_jvm_args="-Xms1024m" --host_jvm_args="-Xmx2048m" test --define=tensorflow_mkldnn_contraction_kernel=0 --host_javabase="@local_jdk//:jdk" --override_repository=remote_java_tools_linux=$JTOOLS --test_tag_filters=-gpu,-benchmark-test,-v1only -k --test_timeout 300,450,1200,3600 --build_tests_only --test_output=errors -- //tensorflow/... -//tensorflow/compiler/... -//tensorflow/lite/... -//tensorflow/core/platform/cloud/...

Note: If tests fail with '_NamespacePath' object has no attribute 'sort' error, upgrade setuptools dependency with below command and rerun the test:

sudo pip3 install --upgrade setuptools

Note: //tensorflow/lite and //tensorflow/core/platform/cloud skipped due to BoringSSL, refer #14039 for details.

Run individual test
```
JTOOLS=$SOURCE_ROOT/remote_java_tools_linux
bazel --host_jvm_args="-Xms1024m" --host_jvm_args="-Xmx2048m" test --define=tensorflow_mkldnn_contraction_kernel=0 --host_javabase="@local_jdk//:jdk" --override_repository=remote_java_tools_linux=$JTOOLS //tensorflow/<module_name>:<testcase_name>
```
For example,
```
JTOOLS=$SOURCE_ROOT/remote_java_tools_linux
bazel --host_jvm_args="-Xms1024m" --host_jvm_args="-Xmx2048m" test --define=tensorflow_mkldnn_contraction_kernel=0 --host_javabase="@local_jdk//:jdk" --override_repository=remote_java_tools_linux=$JTOOLS //tensorflow/python/kernel_tests:topk_op_test
```
Note:
1. Following tests are failing on s390x and x86:
//tensorflow/python/autograph/pyct:inspect_utils_test_par
//tensorflow/python/debug:dist_session_debug_grpc_test
//tensorflow/python/distribute:checkpointing_test_tpu
//tensorflow/python/distribute:ctl_correctness_test_tpu
//tensorflow/python/distribute:custom_training_loop_gradient_test_tpu
//tensorflow/python/distribute:custom_training_loop_input_test_tpu
//tensorflow/python/distribute:custom_training_loop_metrics_test_tpu
//tensorflow/python/distribute:custom_training_loop_models_test_tpu
//tensorflow/python/distribute:custom_training_loop_optimizer_test_tpu
//tensorflow/python/distribute:input_lib_test_tpu
//tensorflow/python/distribute:input_lib_type_spec_test_tpu
//tensorflow/python/distribute:keras_metrics_test_tpu
//tensorflow/python/distribute:keras_save_load_test_tpu
//tensorflow/python/distribute:metrics_v1_test_tpu
//tensorflow/python/distribute:minimize_loss_test_tpu
//tensorflow/python/distribute:moving_averages_test_tpu
//tensorflow/python/distribute:saved_model_mixed_api_test_tpu
//tensorflow/python/distribute:saved_model_save_load_test_tpu
//tensorflow/python/distribute:step_fn_test_tpu
//tensorflow/python/distribute:tpu_strategy_test
//tensorflow/python/distribute:values_test_tpu
//tensorflow/python/eager:def_function_test_cpu_only
//tensorflow/python/eager:remote_cloud_tpu_pod_test
//tensorflow/python/eager:remote_cloud_tpu_test
//tensorflow/python/keras/distribute:keras_dnn_correctness_test_tpu
//tensorflow/python/keras/distribute:keras_embedding_model_correctness_test_tpu
//tensorflow/python/keras/distribute:keras_image_model_correctness_test_tpu
//tensorflow/python/keras/distribute:keras_rnn_model_correctness_test_tpu
//tensorflow/python/keras/distribute:keras_stateful_lstm_model_correctness_test_tpu
//tensorflow/python/keras/distribute:keras_utils_test_tpu
//tensorflow/python/keras/distribute:multi_worker_fault_tolerance_test
//tensorflow/python/tpu:async_checkpoint_test
//tensorflow/python/tpu:datasets_test
//tensorflow/python/tpu:tpu_test
//tensorflow/python:framework_memory_checker_test
//tensorflow/python:session_clusterspec_prop_test
//tensorflow/python/debug:source_utils_test (For Ubuntu 20.04 only)
//tensorflow/python/keras/engine:network_test (Intermittently observed on Ubuntu 20.04 only)
//tensorflow/tools/api/tests:api_compatibility_test (For Ubuntu 20.04 only)
//tensorflow/tools/docs:tf_doctest (Intermittently observed on Ubuntu 20.04 only)

2. Keras applications_load_weight_test test cases rely on H5PY conversion routines which has a known bug that prevents opaque types to work correctly causing the failures observed on s390x.
//tensorflow/python/keras/applications:applications_load_weight_test_inception_resnet_v2
//tensorflow/python/keras/applications:applications_load_weight_test_inception_v3
//tensorflow/python/keras/applications:applications_load_weight_test_mobilenet
//tensorflow/python/keras/applications:applications_load_weight_test_mobilenet_v2
//tensorflow/python/keras/applications:applications_load_weight_test_nasnet_large
//tensorflow/python/keras/applications:applications_load_weight_test_nasnet_mobile
//tensorflow/python/keras/applications:applications_load_weight_test_resnet
//tensorflow/python/keras/applications:applications_load_weight_test_resnet_v2

3. Below mentioned test cases expect ICU encoding data to be present for big endian format. If needed, this data can be manually generated on s390x for test cases to pass.
//tensorflow/python/kernel_tests:unicode_decode_op_test
//tensorflow/python/kernel_tests:unicode_transcode_op_test

4. Below mentioned test cases fail due to missing clock frequency support for s390x in Abseil library code.
//tensorflow/python:cluster_test
//tensorflow/python:cost_analyzer_test

5. Test case //tensorflow/python/kernel_tests/linalg:linear_operator_circulant_test may fail on Ubuntu 20.04 due to tolerence threshold issue. It will pass by applying the following patch:
```
diff --git a/tensorflow/python/kernel_tests/linalg/linear_operator_circulant_test.py b/tensorflow/python/kernel_tests/linalg/linear_operator_circulant_test.py
index 810c47ba1e..6f5d1090ed 100644
--- a/tensorflow/python/kernel_tests/linalg/linear_operator_circulant_test.py
+++ b/tensorflow/python/kernel_tests/linalg/linear_operator_circulant_test.py
@@ -585,7 +585,7 @@ class LinearOperatorCirculant2DTestNonHermitianSpectrum(
           [matrix_tensor, matrix_t, imag_matrix])
 
       np.testing.assert_allclose(0, imag_matrix, atol=1e-6)
-      self.assertAllClose(matrix, matrix_transpose, atol=0)
+      self.assertAllClose(matrix, matrix_transpose, atol=1e-6)
 
   def test_real_spectrum_gives_self_adjoint_operator(self):
     with self.cached_session():
```

References:

https://www.tensorflow.org/
https://github.com/tensorflow/tensorflow
http://bazel.io/

The information provided in this article is accurate at the time of writing, but on-going development in the open-source projects involved may make the information incorrect or obsolete. Please open issue or contact us on IBM Z Community if you have any questions or feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building TensorFlow

Building TensorFlow

General Notes:

Step 1: Build and Install TensorFlow v2.2.0

1.1) Build using script

1.2) Install the dependencies

1.3) Build TensorFlow

1.4) Build and Install TensorFlow wheel

Step 2: Verify TensorFlow (Optional)

Step 3: Execute Test Suite (Optional)

References:

Clone this wiki locally