Web Crawling and Data Mining with Apache Nutch [Laliwala & Shaikh 2013](2).pdf

(2346 KB) Pobierz
1255877402.023.png
Web Crawling and Data
Mining with Apache Nutch
Perform web crawling and apply data mining in
your application
Dr. Zakir Laliwala
Abdulbasit Shaikh
BIRMINGHAM - MUMBAI
1255877402.024.png 1255877402.025.png 1255877402.026.png 1255877402.001.png 1255877402.002.png 1255877402.003.png 1255877402.004.png 1255877402.005.png 1255877402.006.png 1255877402.007.png 1255877402.008.png 1255877402.009.png 1255877402.010.png 1255877402.011.png 1255877402.012.png 1255877402.013.png 1255877402.014.png 1255877402.015.png 1255877402.016.png 1255877402.017.png
Web Crawling and Data Mining with Apache Nutch
Copyright © 2013 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, without the prior written
permission of the publisher, except in the case of brief quotations embedded in
critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy
of the information presented. However, the information contained in this book is
sold without warranty, either express or implied. Neither the authors, nor Packt
Publishing, and its dealers and distributors will be held liable for any damages
caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the
companies and products mentioned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy of this information.
First published: December 2013
Production Reference: 1171213
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78328-685-0
www.packtpub.com
Cover Image by Jarek Blaminsky ( milak6@wp.pl )
Credits
Authors
Dr. Zakir Laliwala
Project Coordinator
Ankita Goenka
Abdulbasit Shaikh
Proofreaders
Ameesha Green
Bernadette Watkins
Reviewers
Mark Kerzner
Shriram Sridharan
Indexer
Mariammal Chettiyar
Acquisition Editors
Neha Nagwekar
Vinay V. Argekar
Graphics
Disha Haria
Commissioning Editor
Deepika Singh
Production Coordinator
Conidon Miranda
Technical Editors
Vrinda Nitesh Bhosale
Cover Work
Conidon Miranda
Anita Nayak
Harshad Vairat
Copy Editors
Roshni Banerjee
Mradula Hegde
Sayanee Mukherjee
Deepa Nambiar
1255877402.018.png 1255877402.019.png 1255877402.020.png 1255877402.021.png 1255877402.022.png
About the Authors
Dr. Zakir Laliwala is an entrepreneur, an open source specialist, and a hands-on
CTO at Attune Infocom. Attune Infocom provides enterprise open source solutions
and services for SOA, BPM, ESB, Portal, cloud computing, and ECM. At Attune
Infocom, he is responsible for product development and the delivery of solutions
and services. He explores new enterprise open source technologies and deines
architecture, roadmaps, and best practices. He has provided consultations and
training to corporations around the world on various open source technologies such
as Mule ESB, Activiti BPM, JBoss jBPM and Drools, Liferay Portal, Alfresco ECM,
JBoss SOA, and cloud computing.
He received a Ph.D. in Information and Communication Technology from Dhirubhai
Ambani Institute of Information and Communication Technology. He was an
adjunct faculty at Dhirubhai Ambani Institute of Information and Communication
Technology (DA-IICT), and he taught Master's degree students at CEPT.
He has published many research papers on web services, SOA, grid computing, and
the semantic web in IEEE, and has participated in ACM International Conferences.
He serves as a reviewer at various international conferences and journals. He has
also published book chapters and written books on open source technologies.
He was a co-author of the books Mule ESB Cookbook and Activiti5 Business Process
Management Beginner's Guide, Packt Publishing.
Zgłoś jeśli naruszono regulamin