Big Data Analytics with R and Hadoop
Set up an integrated infrastructure of R and Hadoop to turn your data analytics into Big Data analytics
Vignesh Prajapati
BIRMINGHAM - MUMBAI
Big Data Analytics with R and Hadoop
Copyright © 2013 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2013 Production Reference: 1181113
Published by Packt Publishing Ltd.
Livery Place 35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78216-328-2 www.packtpub.com
Cover Image by Duraid Fatouhi ( duraidfatouhi@yahoo.com )
Credits
Author
Vignesh Prajapati
Reviewers
Krishnanand Khambadkone Muthusamy Manigandan Vidyasagar N V
Siddharth Tiwari Acquisition Editor
James Jones Lead Technical Editor
Mandar Ghate Technical Editors
Shashank Desai Jinesh Kampani Chandni Maishery
Project Coordinator Wendell Palmar
Copy Editors Roshni Banerjee Mradula Hegde Insiya Morbiwala Aditya Nair Kirti Pai Shambhavi Pai Laxmi Subramanian Proofreaders
Maria Gould Lesley Harrison Elinor Perry-Smith
Indexer
Mariammal Chettiyar Graphics
Ronak Dhruv Abhinash Sahu Production Coordinator
Pooja Chiplunkar Cover Work
Pooja Chiplunkar
About the Author
Vignesh Prajapati , from India, is a Big Data enthusiast, a Pingax ( www.pingax.
com ) consultant and a software professional at Enjay. He is an experienced ML Data engineer. He is experienced with Machine learning and Big Data technologies such as R, Hadoop, Mahout, Pig, Hive, and related Hadoop components to analyze datasets to achieve informative insights by data analytics cycles.
He pursued B.E from Gujarat Technological University in 2012 and started his career as Data Engineer at Tatvic. His professional experience includes working on the development of various Data analytics algorithms for Google Analytics data source, for providing economic value to the products. To get the ML in action, he implemented several analytical apps in collaboration with Google Analytics and Google Prediction API services. He also contributes to the R community by developing the RGoogleAnalytics' R library as an open source code Google project and writes articles on Data-driven technologies.
Vignesh is not limited to a single domain; he has also worked for developing various interactive apps via various Google APIs, such as Google Analytics API, Realtime API, Google Prediction API, Google Chart API, and Translate API with the Java and PHP platforms. He is highly interested in the development of open source technologies.
Vignesh has also reviewed the Apache Mahout Cookbook for Packt Publishing. This
book provides a fresh, scope-oriented approach to the Mahout world for beginners
as well as advanced users. Mahout Cookbook is specially designed to make users
aware of the different possible machine learning applications, strategies, and
algorithms to produce an intelligent as well as Big Data application.
Acknowledgment
First and foremost, I would like to thank my loving parents and younger brother Vaibhav for standing beside me throughout my career as well as while writing this book. Without their support it would have been totally impossible to achieve this knowledge sharing. As I started writing this book, I was continuously motivated by my father (Prahlad Prajapati) and regularly followed up by my mother (Dharmistha Prajapati). Also, thanks to my friends for encouraging me to initiate writing for big technologies such as Hadoop and R.
During this writing period I went through some critical phases of my life, which were challenging for me at all times. I am grateful to Ravi Pathak, CEO and founder at Tatvic, who introduced me to this vast field of Machine learning and Big Data and helped me realize my potential. And yes, I can't forget James, Wendell, and Mandar from Packt Publishing for their valuable support, motivation, and guidance to achieve these heights. Special thanks to them for filling up the communication gap on the technical and graphical sections of this book.
Thanks to Big Data and Machine learning. Finally a big thanks to God, you have given me the power to believe in myself and pursue my dreams. I could never have done this without the faith I have in you, the Almighty.
Let us go forward together into the future of Big Data analytics.
About the Reviewers
Krishnanand Khambadkone has over 20 years of overall experience. He is currently working as a senior solutions architect in the Big Data and Hadoop Practice of TCS America and is architecting and implementing Hadoop solutions for Fortune 500 clients, mainly large banking organizations. Prior to this he worked on delivering middleware and SOA solutions using the Oracle middleware stack and built and delivered software using the J2EE product stack.
He is an avid evangelist and enthusiast of Big Data and Hadoop. He has written several articles and white papers on this subject, and has also presented these at conferences.
Muthusamy Manigandan is the Head of Engineering and Architecture with Ozone Media. Mani has more than 15 years of experience in designing large-scale software systems in the areas of virtualization, Distributed Version Control systems, ERP, supply chain management, Machine Learning and Recommendation Engine, behavior-based retargeting, and behavior targeting creative. Prior to joining Ozone Media, Mani handled various responsibilities at VMware, Oracle, AOL, and Manhattan Associates. At Ozone Media he is responsible for products, technology, and research initiatives. Mani can be reached at mmaniga@
yahoo.co.uk and http://in.linkedin.com/in/mmanigandan/ .
Vidyasagar N V had an interest in computer science since an early age. Some of his serious work in computers and computer networks began during his high school days.
Later he went to the prestigious Institute Of Technology, Banaras Hindu University for his B.Tech. He is working as a software developer and data expert, developing and building scalable systems. He has worked with a variety of second, third, and fourth generation languages. He has also worked with flat files, indexed files, hierarchical databases, network databases, and relational databases, such as NOSQL databases, Hadoop, and related technologies. Currently, he is working as a senior developer at Collective Inc., developing Big-Data-based structured data extraction techniques using the web and local information. He enjoys developing high-quality software, web-based solutions, and designing secure and scalable data systems.
I would like to thank my parents, Mr. N Srinivasa Rao and Mrs. Latha Rao, and my family who supported and backed me throughout my life, and friends for being friends. I would also like to thank all those people who willingly donate their time, effort, and expertise by participating in open source software projects. Thanks to Packt Publishing for selecting me as one of the technical reviewers on this wonderful book. It is my honor to be a part of this book. You can contact me at vidyasagar1729@gmail.com .
Siddharth Tiwari has been in the industry since the past three years working on Machine learning, Text Analytics, Big Data Management, and information search and Management. Currently he is employed by EMC Corporation's Big Data management and analytics initiative and product engineering wing for their Hadoop distribution.
He is a part of the TeraSort and MinuteSort world records, achieved while working with a large financial services firm.
He pursued Bachelor of Technology from Uttar Pradesh Technical University with
equivalent CGPA 8.
www.PacktPub.com
Support files, eBooks, discount offers and more
You might want to visit www.PacktPub.com for support files and downloads related to your book.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.
com and as a print book customer, you are entitled to a discount on the eBook copy.
Get in touch with us at service@packtpub.com for more details.
At www.PacktPub.com , you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
TM