Big Data: Principles and Best Practices of Scalable Real-Time Data Systems

Nathan Marz, James Warren

ISBN: 9789351198062

328 pages

INR 649


This book presents the Lambda Architecture, a scalable, easy-to-understand approach that can be built and run by a small team. You'll explore the theory of big data systems and how to implement them in practice. In addition to discovering a general framework for processing big data, you'll learn specific technologies like Hadoop, Storm and NoSQL databases.


Part 1 Batch layer


2 Data model for Big Data

    2.1 The properties of data

    2.2 The fact-based model for representing data

    2.3 Graph schemas

    2.4 A complete data model for

    2.5 Summary

3 Data model for Big Data: Illustration

    3.1 Why a serialization framework?

    3.2 Apache Thrift

    3.3 Limitations of serialization frameworks

    3.4 Summary

4 Data storage on the batch layer

    4.1 Storage requirements for the master dataset

    4.2 Choosing a storage solution for the batch layer

    4.3 How distributed filesystems work

    4.4 Storing a master dataset with a distributed filesystem

    4.5 Vertical partitioning

    4.6 Low-level nature of distributed filesystems

    4.7 Storing the master dataset on a distributed filesystem

    4.8 Summary

5 Data storage on the batch layer: Illustration

    5.1 Using the Hadoop Distributed File System

    5.2 Data storage in the batch layer with Pail

    5.3 Storing the master dataset for

    5.4 Summary

6 Batch layer

    6.1 Motivating examples

    6.2 Computing on the batch layer

    6.3 Recomputation algorithms vs. incremental algorithms

    6.4 Scalability in the batch layer

    6.5 MapReduce: a paradigm for Big Data computing

    6.6 Low-level nature of MapReduce

    6.7 Pipe diagrams: a higher-level way of thinking about batch computation

    6.8 Summary

7 Batch layer: Illustration

    7.1 An illustrative example

    7.2 Common pitfalls of data-processing tools

    7.3 An introduction to JCascalog

    7.4 Composition

    7.5 Summary

8 An example batch layer: Architecture and algorithms

    8.1 Design of the batch layer

    8.2 Workflow overview

    8.3 Ingesting new data

    8.4 URL normalization

    8.5 User-identifier normalization

    8.6 Deduplicate pageviews

    8.7 Computing batch views

    8.8 Summary

9 An example batch layer: Implementation

    9.1 Starting point

    9.2 Preparing the workflow

    9.3 Ingesting new data

    9.4 URL normalization

    9.5 User-identifier normalization

    9.6 Deduplicate pageviews

    9.7 Computing batch views

    9.8 Summary


Part 2 Serving layer


10 Serving layer

    10.1 Performance metrics for the serving layer

    10.2 The serving layer solution to the normalization/denormalization problem

    10.3 Requirements for a serving layer database

    10.4 Designing a serving layer for

    10.5 Contrasting with a fully incremental solution

    10.6 Summary



11 Serving layer: Illustration

    11.1 Basics of ElephantDB

    11.2 Building the serving layer for

    11.3 Summary


Part 3 Speed layer


12 Realtime views

    12.1 Computing realtime views

    12.2 Storing realtime views

    12.3 Challenges of incremental computation

    12.4 Asynchronous versus synchronous updates

    12.5 Expiring realtime views

    12.6 Summary

13 Realtime views: Illustration

    13.1 Cassandra’s data model

    13.2 Using Cassandra

    13.3 Summary

14 Queuing and stream processing

    14.1 Queuing

    14.2 Stream processing

    14.3 Higher-level, one-at-a-time stream processing

    14.4 speed layer

    14.5 Summary

15 Queuing and stream processing: Illustration

    15.1 Defining topologies with Apache Storm

    15.2 Apache Storm clusters and deployment

    15.3 Guaranteeing message processing

    15.4 Implementing the uniques-over-time speed layer

    15.5 Summary

16 Micro-batch stream processing

    16.1 Achieving exactly-once semantics

    16.2 Core concepts of micro-batch stream processing

    16.3 Extending pipe diagrams for micro-batch processing

    16.4 Finishing the speed layer for

    16.5 Pageviews over time 262 n Bounce-rate analysis

    16.6 Another look at the bounce-rate-analysis example

    16.7 Summary

17 Micro-batch stream processing: Illustration

    17.1 Using Trident

    17.2 Finishing the speed layer

    17.3 Fully fault-tolerant, in-memory, micro-batch processing

    17.4 Summary

18 Lambda Architecture in depth

    18.1 Defining data systems

    18.2 Batch and serving layers

    18.3 Speed layer

    18.4 Query layer

    18.5 Summary


  • Name:
  • Designation:
  • Name of Institute:
  • Email:
  • * Request from personal id will not be entertained
  • Moblie:
  • ISBN / Title:
  • ISBN:    * Please specify ISBN / Title Name clearly