hadoop-with-python.pdf

(1794 KB) Pobierz
Hadoop
with Python
Zachary Radtka
& Donald Miner
Hadoop with Python
Zachary Radtka & Donald Miner
Hadoop with Python
by Zachary Radtka and Donald Miner
Copyright © 2016 O’Reilly Media, Inc.. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA
95472.
O’Reilly books may be purchased for educational, business, or sales promotional use.
Online editions are also available for most titles (http://safaribooksonline.com). For
more information, contact our corporate/institutional sales department:
800-998-9938 or
corporate@oreilly.com.
Editor:
Meghan Blanchette
Production Editor:
Kristen Brown
Copyeditor:
Sonia Saruba
October 2015:
First Edition
Interior Designer:
David Futato
Cover Designer:
Karen Montgomery
Illustrator:
Rebecca Demarest
Revision History for the First Edition
2015-10-19
First Release
See
http://oreilly.com/catalog/errata.csp?isbn=9781491942277
for release details.
While the publisher and the authors have used good faith efforts to ensure that the
information and instructions contained in this work are accurate, the publisher and
the authors disclaim all responsibility for errors or omissions, including without
limitation responsibility for damages resulting from the use of or reliance on this
work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is sub‐
ject to open source licenses or the intellectual property rights of others, it is your
responsibility to ensure that your use thereof complies with such licenses and/or
rights.
978-1-491-94227-7
[LSI]
Table of Contents
Source Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. Hadoop Distributed File System (HDFS). . . . . . . . . . . . . . . . . . . . . . . . . 1
Overview of HDFS
Interacting with HDFS
Snakebite
Chapter Summary
Data Flow
Hadoop Streaming
mrjob
Chapter Summary
2
3
7
13
2. MapReduce with Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
15
18
22
26
28
29
31
35
40
41
43
44
50
3. Pig and Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
WordCount in Pig
Running Pig
Pig Latin
Extending Pig with Python
Chapter Summary
4. Spark with Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
WordCount in PySpark
PySpark
Resilient Distributed Datasets (RDDs)
Text Search with PySpark
v
Zgłoś jeśli naruszono regulamin