Updated March 13, 2023

Introduction to TensorFlow OpenCL

TensorFlow is a machine learning algorithm execution framework based on artificial intelligence concepts. We’re working on adding support for OpenCLTM devices to the TensorFlow framework using SYCLTM to give developers access to a wider range of processors. SYCL is an easy free, cross-platform C++ abstraction layer, while OpenCL(Open Computing Language) is a framework for building applications that execute across heterogeneous platforms. OpenCL is a standard parallel computing standard for event and data-based parallelism.

Overview of TensorFlow OpenCL

Over OpenGL ES acceleration, OpenCL provides a 2x inferencing speedup. TensorFlow Lite falls back to OpenGL ES if OpenCL isn’t available, although most mobile GPU vendors supply OpenCL drivers, even if they aren’t exposed to Android app development directly. Higher-level frameworks and compilers are increasingly using OpenCL as an acceleration target. Because OpenCL allows workloads to be shared by CPU and GPU while running the same programs, programmers can take advantage of both by dividing work across the devices. Because the relative rates of processes fluctuate among the devices, this creates a dilemma in selecting how to partition the work. Machine learning has been proposed as a solution to this issue.

CUDA vs OpenCL

Comparison	CUDA	OpenCL
Developed	by NVIDIA Corporation.	developed by Khronos Group
Definition	Compute Unified Device Architecture (CUDA) is a parallel computing design that supports applications that demand a lot of parallel processing.	OpenCL is an open standard that may be used on a wide range of hardware, including desktop and laptop GPUs.
Multiple OS Support	CUDA can run on Windows, Linux, and macOS, but it requires NVIDIA hardware to do it. e.g., Windows XP and later, macOS	OpenCL, on the other hand, can run on practically any operating system and on a wide range of hardware. e.g., Android, FreeBSD, Windows, Linux, macOS e
GPU Support	2 GPUs	Utilize 1GPU
Language support	C, C++, fortran	C, C++
Templates	CUDA is a C API and also constructs.	C++ bindings and has C99
Function	Compiler- build kernels	Kernels at run time.
Libraries	Has a large number of high-performance libraries	Although it has a large number of libraries that may be used on any OpenCL-compliant hardware, it is not as comprehensive as CUDA.
Performance	There is no apparent advantage, as it is depending on code quality, hardware type, and other factors.	There is no apparent advantage, as it is depending on code quality, hardware type, and other factors.

TensorFlow OpenCL examples

There are no known vulnerabilities in TensorFlow-OpenCL and no known vulnerabilities in its dependent libraries. The Apache-2.0 License applies to TensorFlow-OpenCL. This is a permissive license. Permissive licenses offer the fewest limitations and can be used in almost any project.

Blender’s most recent versions support OpenCL rendering. Using the container that has been provided to the Sylabs library, you can run Blender as a graphical programme that will use a local Radeon GPU for OpenCL compute:

$ singularity exec --rocm --bind /etc/OpenCL library://sylabs/demo/blend blender

Set-Up and Run the TensorFlow OpenCL

To add OpenCL support to TensorFlow, we need to use ComputeCpp to create an OpenCL version of TensorFlow. TensorFlow now includes OpenCL support, which can be implemented using SYCL, thanks to Codeplay. TensorFlow is based on the Eigen linear algebra C++ library.

OpenCL installation

sudo apt update sudo apt install clinfo clinfo

Install Packages

sudo apt update sudo apt install git cmake gcc build-essential libpython3-all-dev ocl-icd-opencl-dev opencl-headers openjdk-8-jdk python3 python3-dev python3-pip zlib1g-dev pip install -U --user numpy==1.14.5 wheel==0.31.1 six==1.11.0 mock==2.0.0 enum34==1.1.6

Configure Set-up

git clone http://github.com/codeplaysoftware/tensorflow cd tensorflow

Environment variables Set-up

export CC_OPT_FLAGS="-march=native" export PYTHON_BIN_PATH="/usr/bin/python" export USE_DEFAULT_PYTHON_LIB_PATH=1 export TF_NEED_JEMALLOC=1 export TF_NEED_MKL=0 export TF_NEED_GCP=0 export TF_NEED_HDFS=0 export TF_ENABLE_XLA=0 export TF_NEED_CUDA=0 export TF_NEED_VERBS=0 export TF_NEED_MPI=0 export TF_NEED_GDR=0 export TF_NEED_AWS=0 export TF_NEED_S3=0 export TF_NEED_KAFKA=0 export TF_DOWNLOAD_CLANG=0 export TF_SET_ANDROID_WORKSPACE=0 export TF_NEED_OPENCL_SYCL=1 export TF_NEED_COMPUTECPP=1

It’s a good idea to run the tests to ensure TensorFlow was constructed successfully. With the following command, you may perform a big set of roughly 1500 tests:

bazel test --test_lang_filters=cc,py --test_timeout 1500 --verbose_failures --jobs=1 --config=sycl --config=opt -- //tensorflow/... -//tensorflow/compiler/... -//tensorflow/contrib/distributions/... -//tensorflow/contrib/lite/... -//tensorflow/contrib/session_bundle/... -//tensorflow/contrib/slim/... -//tensorflow/contrib/verbs/... -//tensorflow/core/distributed_runtime/... -//tensorflow/core/kernels/hexagon/... -//tensorflow/go/... -//tensorflow/java/... -//tensorflow/python/debug/... -//tensorflow/stream_executor/...

Build Tensor Flow

git clone http://github.com/codeplaysoftware/tensorflow cd tensorflow

Set-Up operations

with tf. Session() as se1: with tf.device("/gpu:0"): To execute a code with tf.Session() as se1:

This line-up will build a new context manager, instructing TensorFlow to use the GPU to accomplish those tasks.

TensorFlow program

Program #1

import tensorflow as tf >>> he1 = tf.constant('Hi, TensorFlow world!') >>> se1 = tf.Session() >>> se1.run(hello) 'Hi, TensorFlow world!' >>> x = tf.constant(12) >>> y = tf.constant(22) >>> se1.run(x + y) 34 >>> se1.close()

Program #2

import sys import numpy as np import tensorflow as tf from datetime import datetime d_name = sys.argv[1] shape = (int(sys.argv[2]), int(sys.argv[2])) if d_name == "gpu": d_name = "/gpu:0" else: d_name = "/cpu:0" with tf.device(d_name): ran_matrix = tf.random_uniform(shape=shape, minval=0, maxval=1) d_operation = tf.matmul(ran_matrix, tf.transpose(ran_matrix)) sum_op = tf.reduce_sum(d_operation) start = datetime.now() with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session: res = session.run(sum_op) print(res) print("\n" * 6) print("Shape:", shape, "Device:", d_name) print("Time done:", datetime.now() - startTime) print("\n" * 6)

Explanation

To execute
Python name1.py gpu 1500

Output:

OpenCL Acceleration for TensorFlow

OpenCL allows a wide range of accelerators to be used, involving multi-core CPUs, GPUs, DSPs, FPGAs, and specialized hardware like inferencing engines. An OpenCL system is divided into host and device components, with host software developed in a general programming language like C or C++ and generated for running on a host CPU using a normal compiler. TensorFlow to OpenCL translation would necessitate scribbling the kernels in OpenCL C and distinct codebases, both of which would be difficult to maintain. All of it is single-source C++ when using SYCL, therefore it’s possible to integrate the SYCL back-end to TensorFlow in a non-intrusive way.

Let’s see the sample code for registration

namespace tensorflow { REG5(UnaryOp, CPU, "Sqrt", functor::sqrt, float, Eigen::half, double, complex64, complex128); #if GOOGLE_CUDA REG3(UnaryOp, GPU, "Sqrt", functor::sqrt, float, Eigen::half, double); #endif #ifdef TENSORFLOW_USE_SYCL REG2(UnaryOp, SYCL, "Sqrt", functor::sqrt, float, double); #endif }

Conclusion

In general, OpenCL is successful. As a standard, it contains all of the necessary parts, namely run-time code creation and sufficient support for heterogeneous computing. Therefore, in this article, we have seen how tensor flow is acted on OpenCL.

Quiz Result
Total Questions	Correct Answers	Wrong Answers	Percentage

Introduction to TensorFlow OpenCL

Overview of TensorFlow OpenCL

CUDA vs OpenCL

TensorFlow OpenCL examples

TensorFlow program

OpenCL Acceleration for TensorFlow

Conclusion

Recommended Articles

Follow us!

APPS

Blog

Courses

Email