Data Analytics
ISBN: 9789354641824
For more information write to us at: acadmktg@wiley.com
Description
The goal of this book is to provide a smooth transition from traditional data analytics to recent algorithms for massive data analysis including real-time analytics. It focuses on concepts, principles, and techniques applicable to any technology environment and industry and establishes a baseline that can be enhanced further by additional real-world experience. This book aims to be a ready reckoner to either a novice or a professional working in the field. A whole section is devoted to classical supervised methods of analysis like regression, times series, Bayesian analysis, etc. Recent topics in clustering and data streams analysis is covered later. Emphasis is on newer tools like MapReduce and NoSQL. A comprehensive discussion of real-time analytics is included.
Preface
About the Authors
Syllabus
Contents
Chapter 1 Introduction to Big Data
1.1 Introduction
1.2 Big Data Characteristics
1.3 Types of Big Data
1.4 Challenges of Traditional Systems
1.5 Web Data
1.6 Evolution of Analytic Scalability
1.7 When to use OLTP, MPP and Hadoop?
1.8 Grid Computing
1.9 Cloud Computing
1.10 MapReduce
1.11 Fault Tolerance
1.12 Analytic Processes and Tools
1.13 Analysis Versus Reporting
1.14 Statistical Concepts
Chapter 2 Data Analysis
2.1 Introduction
2.2 Data Analysis
2.3 Importance of Data Analysis
2.4 Data Analytics Applications
2.5 Regression Modelling Techniques
2.6 Bayesian Modelling, Inference and Bayesian Networks
2.7 Support Vector Machines and Kernel Methods
2.8 Time Series Analysis
2.9 Rule Induction
2.10 Sequential Cover Algorithm
Chapter 3 Neural Networks
3.1 Biological Neuron
3.2 Learning and Generalization
3.3 Competitive Learning
3.4 Principal Component Analysis and Neural Networks
3.5 Fuzzy Logic
Chapter 4 Mining Data Streams
4.1 Introduction
4.2 Data Stream Management Systems
4.3 Data Stream Mining
4.4 Examples of Data Stream Applications
4.5 Stream Queries
4.6 Issues in Data Stream Query Processing
4.7 Sampling in Data Streams
4.8 Filtering Streams
4.9 Counting Distinct Elements in a Stream
4.10 Estimating Moments
4.11 Querying on Windows − Counting Ones in a Window
4.12 Decaying Windows
4.13 Real-Time Analytics Platform (RTAP)
Chapter 5 Frequent Itemsets and Clustering
5.1 Introduction to Frequent Itemsets
5.2 Market-Basket Model
5.3 Algorithm for Finding Frequent Itemsets
5.4 Handling Larger Datasets in Main Memory
5.5 Limited Pass Algorithms
5.6 Counting Frequent Items in a Stream
5.7 Introduction to Clustering
5.8 Overview of Clustering Techniques
5.9 Hierarchical Clustering
5.10 Partitioning Methods
5.11 The CURE Algorithm
5.12 Clustering High-Dimensional Data
5.13 CLIQUE
5.14 Frequent Pattern-Based Clustering Methods
5.15 Clustering Streams
Chapter 6 Frameworks and Visualization
6.1 Introduction
6.2 Introduction to Hadoop
6.3 What is Hadoop?
6.4 Core Components of Hadoop
6.5 Hadoop Ecosystem
6.6 Physical Architecture
6.7 Hadoop Limitations
6.8 Hive
6.9 MapReduce and The New Software Stack
6.10 MapReduce
6.11 Algorithms Using MapReduce
6.12 What is NoSQL?
6.13 NoSQL Business Drivers
6.14 NoSQL Case Studies
6.15 NoSQL Data Architectural Patterns
6.16 Variations of NoSQL Architectural Patterns
6.17 Using NoSQL to Manage Big Data
6.18 Visualizations
Summary
Review Questions