• No results found

An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems

N/A
N/A
Protected

Academic year: 2022

Share "An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems"

Copied!
2
0
0

Loading.... (view fulltext now)

Full text

(1)

http://www.diva-portal.org

Postprint

This is the accepted version of a paper presented at International Conference on Exascale Applications and Software.

Citation for the original published paper:

Chien, S W., Sishtla, C P., Markidis, S., Zhang, J., Peng, I B. et al. (2018)

An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems

In: Proceedings of the 5th International Conference on Exascale Applications and Software (pp. 34-). The University of Edinburgh

N.B. When citing this work, cite the original published paper.

Permanent link to this version:

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-232985

(2)

EASC 2018

An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems

Steven Wei Der Chien, Chaitanya Prasad Sishtla, Stefano Markidis, Jun Zhang, Ivy Bo Peng and Erwin Laure

KTH Royal Institute of Technology, Sweden

Computational intensive applications such as pattern recognition, and natural language process- ing, are increasingly popular on HPC systems. Many of these applications use deep-learning, a branch of machine learning, to determine the weights of artificial neural network nodes by minimizing a loss function. Such applications depend heavily on dense matrix multiplications, also called tensorial operations. The use of Graphics Processing Unit (GPU) has considerably speeded up deep-learning computations, leading to a Renaissance of the artificial neural net- work. Recently, the NVIDIA Volta GPU [1] and the Google Tensor Processing Unit (TPU) have been specially designed to support deep-learning workloads. New programming models have also emerged for convenient expression of tensorial operations and deep-learning com- putational paradigms. An example of such new programming frameworks is TensorFlow, an open-source deep-learning library released by Google in 2015.

0 100 200 300 400 500 600 700 800

2+1 4+1 8+1 16+1

Gflops/s

No. of workers + No. of data servers 8192x8192

32768x32768 65536x65536

Figure 1. Performance of distributed matrix multipli- cation.

TensorFlow expresses algorithms as a com- putational graph where nodes represent op- erations and edges between nodes repre- sent data flow. Multi-dimensional data such as vectors and matrices which flows be- tween operations are called Tensors. For this reason, computation problems need to be expressed as a computational graph.

In particular, TensorFlow supports dis- tributed computation with flexible assign- ment of operation and data to devices such as GPU and CPU on different com- puting nodes. Computation on devices are based on optimized kernels such as MKL, Eigen and cuBLAS. Inter-node com- munication can be through TCP and RDMA.

This work attempts to evaluate the usability and expressiveness of the TensorFlow programming model for traditional HPC problems. As an illustration, we prototyped a distributed block matrix multiplication for large dense matrices which cannot be co-located on a single device and a Conjugate Gradient (CG) solver. We eval- uate the difficulty of expressing traditional HPC algorithms using computational graphs and study the scalability of distributed TensorFlow on accelerated systems. Our preliminary result with distributed matrix multiplication shows that distributed computation on TensorFlow is extremely scalable. This study provides an initial investigation of new emerging programming models for HPC.

The work is funded by the European Commission through the SAGE project (Grant agreement no. 671500).

References

[1] S. Markidis, S.W.D Chien, E. Laure, I.B. Peng, and J.S. Vetter. NVIDIA Tensor Core Programmability,

Performance & Precision. arXiv preprint arXiv:1803.04014, 2018.

References

Related documents

To motivate the employees is the second aim of incentive systems according to Arvidsson (2004) and that is partly achieved when employees value the reward that

Däremot är denna studie endast begränsat till direkta effekter av reformen, det vill säga vi tittar exempelvis inte närmare på andra indirekta effekter för de individer som

Från den teoretiska modellen vet vi att när det finns två budgivare på marknaden, och marknadsandelen för månadens vara ökar, så leder detta till lägre

Syftet eller förväntan med denna rapport är inte heller att kunna ”mäta” effekter kvantita- tivt, utan att med huvudsakligt fokus på output och resultat i eller från

The cluster orchestration tool Kubernetes enables easy deployment and reproducibility of life science research by utilizing the advantages of the container technology.. The

Our baseline program, built using Python and the TensorFlow frame- work, trains a simple feedforward neural network with a single hid- den layer on the MNIST data set with one device

Most of the participants showed positive attitude towards PP in distributed settings. It is helpful to enhance the design and solution quality of the work. PP

In the client session of the parameter server, for each phase, a thread is executed to perform task decomposition and pushes tasks into the task queue while the main thread