Recently Searched

No results found

Tags

No results found

Document

No results found

Upload

Home Schools Topics

Log in

An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems

Share "An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems"

N/A

N/A

Protected

Academic year: 2022

Info

Protected

Academic year: 2022

Share "An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems"

Copied!

2

0

0

2

0

0

Loading.... (view fulltext now)

Download now ( 2 Page )

Full text

(1)

http://www.diva-portal.org

Postprint

This is the accepted version of a paper presented at International Conference on Exascale Applications and Software.

Citation for the original published paper:

Chien, S W., Sishtla, C P., Markidis, S., Zhang, J., Peng, I B. et al. (2018)

An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems

In: Proceedings of the 5th International Conference on Exascale Applications and Software (pp. 34-). The University of Edinburgh

N.B. When citing this work, cite the original published paper.

Permanent link to this version:

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-232985

(2)

EASC 2018

An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems

Steven Wei Der Chien, Chaitanya Prasad Sishtla, Stefano Markidis, Jun Zhang, Ivy Bo Peng and Erwin Laure

KTH Royal Institute of Technology, Sweden

Computational intensive applications such as pattern recognition, and natural language process- ing, are increasingly popular on HPC systems. Many of these applications use deep-learning, a branch of machine learning, to determine the weights of artificial neural network nodes by minimizing a loss function. Such applications depend heavily on dense matrix multiplications, also called tensorial operations. The use of Graphics Processing Unit (GPU) has considerably speeded up deep-learning computations, leading to a Renaissance of the artificial neural net- work. Recently, the NVIDIA Volta GPU [1] and the Google Tensor Processing Unit (TPU) have been specially designed to support deep-learning workloads. New programming models have also emerged for convenient expression of tensorial operations and deep-learning com- putational paradigms. An example of such new programming frameworks is TensorFlow, an open-source deep-learning library released by Google in 2015.

0 100 200 300 400 500 600 700 800

2+1 4+1 8+1 16+1

Gflops/s

No. of workers + No. of data servers 8192x8192

32768x32768 65536x65536

Figure 1. Performance of distributed matrix multipli- cation.

TensorFlow expresses algorithms as a com- putational graph where nodes represent op- erations and edges between nodes repre- sent data flow. Multi-dimensional data such as vectors and matrices which flows be- tween operations are called Tensors. For this reason, computation problems need to be expressed as a computational graph.

In particular, TensorFlow supports dis- tributed computation with flexible assign- ment of operation and data to devices such as GPU and CPU on different com- puting nodes. Computation on devices are based on optimized kernels such as MKL, Eigen and cuBLAS. Inter-node com- munication can be through TCP and RDMA.

This work attempts to evaluate the usability and expressiveness of the TensorFlow programming model for traditional HPC problems. As an illustration, we prototyped a distributed block matrix multiplication for large dense matrices which cannot be co-located on a single device and a Conjugate Gradient (CG) solver. We eval- uate the difficulty of expressing traditional HPC algorithms using computational graphs and study the scalability of distributed TensorFlow on accelerated systems. Our preliminary result with distributed matrix multiplication shows that distributed computation on TensorFlow is extremely scalable. This study provides an initial investigation of new emerging programming models for HPC.

The work is funded by the European Commission through the SAGE project (Grant agreement no. 671500).

References

[1] S. Markidis, S.W.D Chien, E. Laure, I.B. Peng, and J.S. Vetter. NVIDIA Tensor Core Programmability,

Performance & Precision. arXiv preprint arXiv:1803.04014, 2018.

References

Download now ( PDF - 2 Page - 157.63 KB )

Related documents

- an evaluation of the “Motivational Model for Rewards”

To motivate the employees is the second aim of incentive systems according to Arvidsson (2004) and that is partly achieved when employees value the reward that

Utvärdering av RUT-avdraget

Däremot är denna studie endast begränsat till direkta effekter av reformen, det vill säga vi tittar exempelvis inte närmare på andra indirekta effekter för de individer som

Kostnadsutvecklingen för läkemedel på den omreglerade apoteksmarknaden

Från den teoretiska modellen vet vi att när det finns två budgivare på marknaden, och marknadsandelen för månadens vara ökar, så leder detta till lägre

Utvärdering av regeringens miljöteknikstrategi

Syftet eller förväntan med denna rapport är inte heller att kunna ”mäta” effekter kvantita- tivt, utan att med huvudsakligt fokus på output och resultat i eller från

Kubernetes as an approach for solving bioinformatic problems

The cluster orchestration tool Kubernetes enables easy deployment and reproducibility of life science research by utilizing the advantages of the container technology.. The

Performance of TensorFlow: Examining what factors impact the performance of TensorFlow in distributed systems

Our baseline program, built using Python and the TensorFlow framework, trains a simple feedforward neural network with a single hid- den layer on the MNIST data set with one device

Evaluation of the Effects of Pair Programming on Performance and Social Practices in Distributed Software Development

Most of the participants showed positive attitude towards PP in distributed settings. It is helpful to enhance the design and solution quality of the work. PP

An Evaluation of TensorFlow as a Programming Framework for HPC Applications

In the client session of the parameter server, for each phase, a thread is executed to perform task decomposition and pushes tasks into the task queue while the main thread

Upload your study materials to download all documents.

Your document will be enriched, shared on 5dok SE to assist in studying.

Related documents

Designing for the incorporation of programming in mathematical education : Programming as an instrument for mathematical problem solving

Designing for the incorporation of programming in mathematical education : Programming as an instrument for mathematical problem solving

257

0

0

Vårdnadsbidraget Motion 2020/21:3079 av Jan Ericson (M) - Riksdagen

Vårdnadsbidraget Motion 2020/21:3079 av Jan Ericson (M) - Riksdagen

2

0

0

COMPUTER GAMES AND THE OLDER GENERATION

COMPUTER GAMES AND THE OLDER GENERATION

40

0

0

Reasonable standard of living - A difficult concept

Reasonable standard of living - A difficult concept

43

0

0

Ledare; Det ödesdigra valet

Ledare; Det ödesdigra valet

2

0

0

An Evaluation of a Maintenance Model-

An Evaluation of a Maintenance Model-

76

0

0

Dygden har jag platt försummat: subjektsskapande praktiker inom kriminalpolitiken

Dygden har jag platt försummat: subjektsskapande praktiker inom kriminalpolitiken

28

0

0

The impact of service and hearing dogs on health-related quality of life and activity level: a Swedish longitudinal intervention study

The impact of service and hearing dogs on health-related quality of life and activity level: a Swedish longitudinal intervention study

9

0

0