Data Science and Big Data Computing Frameworks and Methodologies 1st Edition by Zaigham Mahmood – Ebook PDF Instant Download/Delivery:
Full dowload Data Science and Big Data Computing Frameworks and Methodologies 1st Edition after payment
Product details:
ISBN 10: 3319318616
ISBN 13: 9783319318615
Author: Zaigham Mahmood
This illuminating text/reference surveys the state of the art in data science, and provides practical guidance on big data analytics. Expert perspectives are provided by authoritative researchers and practitioners from around the world, discussing research developments and emerging trends, presenting case studies on helpful frameworks and innovative methodologies, and suggesting best practices for efficient and effective data analytics. Features: reviews a framework for fast data applications, a technique for complex event processing, and agglomerative approaches for the partitioning of networks; introduces a unified approach to data modeling and management, and a distributed computing perspective on interfacing physical and cyber worlds; presents techniques for machine learning for big data, and identifying duplicate records in data repositories; examines enabling technologies and tools for data mining; proposes frameworks for data extraction, and adaptive decision making and social media analysis.
Data Science and Big Data Computing Frameworks and Methodologies 1st Table of contents:
Part I: Data Science Applications and Scenarios
Chapter 1: An Interoperability Framework and Distributed Platform for Fast Data Applications
1.1 Introduction
1.2 Background
1.3 Introducing Fast Data
1.3.1 Motivating Scenarios
1.3.2 Issues Relating to Interoperability
1.4 An Interoperability Framework for Fast Data
1.4.1 Understanding Interoperability
1.4.2 The Variety Dimension
1.4.3 The Velocity Dimension
1.4.4 Modelling with Resources and Services
1.4.5 Handling the Variety Dimension
1.4.6 A Service Interoperability Language with Support for Variety
1.5 A Distributed Interoperability Platform
1.5.1 Handling the Velocity Dimension
1.5.2 Architecture of the Platform
1.6 Usefulness of the Approach
1.7 Future Research
1.8 Conclusions
References
Chapter 2: Complex Event Processing Framework for Big Data Applications
2.1 Introduction
2.2 Complex Event Processing
2.2.1 CEP Architectural Layers
2.2.2 Event Modeling
2.3 Semantic Intrusion Detection Using CEP
2.3.1 Implementation and Validation
2.4 CEP-Enabled Geriatric Health Monitoring
2.4.1 Implementation and Validation
2.5 Conclusion
References
Chapter 3: Agglomerative Approaches for Partitioning of Networks in Big Data Scenarios
3.1 Introduction
3.2 Big Data Scenarios and Issues
3.3 Parallel Processing
3.3.1 Bulk-Synchronous Parallel (BSP)
3.3.2 Overview of Pregel
3.3.3 Overview of Seraph
3.4 External Memory Operations
3.4.1 External Memory BFS
3.5 Agglomeration in Big Data Scenarios
3.6 Agglomerative Approaches
3.6.1 Generic Model
3.6.2 Fast Unfolding
3.6.3 SCAN
3.6.4 Leader-Follower
3.6.5 HC-PIN
3.6.6 Other Approaches
3.7 Agglomerative Strategic Changes for Big Data Scenarios
3.8 Parameter Tuning for Big Data Scenarios
3.9 Discussion
3.10 Conclusion
References
Chapter 4: Identifying Minimum-Sized Influential Vertices on Large-Scale Weighted Graphs: A Big Data
4.1 Introduction
4.2 Related Works
4.2.1 Influence Maximization Problem in Social Network
4.2.2 GPU Framework
4.2.3 Remarks
4.3 Graph Model and Problem Definition
4.3.1 Graph Model
4.3.2 Problem Definition
4.4 MapReduce Algorithm for Identifying Individual Zones
4.4.1 Algorithm 1: Mapper Part
4.4.2 Algorithm 1: Reducer Part
4.5 MapReduce Algorithm for Solving MIV
4.5.1 Algorithm 2: Mapper Part
4.5.2 Algorithm 2: Reducer Part
4.6 Conclusion and Future Work
References
Part II: Big Data Modelling and Frameworks
Chapter 5: A Unified Approach to Data Modeling and Management in Big Data Era
5.1 Introduction
5.2 Big Data: Heterogeneous Data
5.2.1 Characteristics, Promise, and Benefits
5.2.2 Data Models
5.2.3 Data Gathering
5.2.4 Open Issues and Challenges
5.3 Unified Approach to Big Data Modeling
5.3.1 Unified Data Representation and Aggregation
5.3.2 Data Access and Real-Time Processing
5.4 Uniform Data Management
5.5 CyberWater Case Study
5.6 Conclusion
References
Chapter 6: Interfacing Physical and Cyber Worlds: A Big Data Perspective
6.1 Introduction
6.2 Data Generation by Physical Systems: Big Data Sources
6.2.1 Wireless Sensor Networks
6.2.2 Social Networks
6.2.3 Vehicular Ad Hoc Networks
6.2.4 Wireless Body Area Networks
6.3 Data in Cyber Systems: Big Data Management
6.3.1 Cloud Computing Paradigms
6.3.2 Service-Oriented Decision Support Systems
6.3.2.1 Data as a Service
6.3.2.2 Information as a Service
6.3.2.3 Analytics as a Service
6.4 Interfacing Cyber World with Physical World
6.4.1 Data Acquisition
6.4.2 Data Preprocessing
6.4.2.1 Data Cleaning
6.4.2.2 Data Fusion
6.4.2.3 Data Compression
6.4.3 Data Storage
6.4.4 Query Processing
6.4.5 Data Analysis
6.4.6 Actuation
6.5 Future Challenges and Opportunities
6.6 Conclusion
References
Chapter 7: Distributed Platforms and Cloud Services: Enabling Machine Learning for Big Data
7.1 Introduction
7.2 Machine Learning for Data Science
7.3 Distributed and Cloud-Based Execution Support in Popular Machine Learning Tools
7.4 Distributed Machine Learning Platforms
7.5 Machine Learning as a Service (MLaaS)
7.6 Related Studies
7.7 Conclusion and Guidelines
References
Chapter 8: An Analytics-Driven Approach to Identify Duplicate Bug Records in Large Data Repositories
8.1 Introduction
8.2 Literature Survey
8.3 The Proposed System to Identify Duplicate Records
8.4 Detecting Apple-to-Apple Pairs
8.4.1 Vector Space Model
8.4.1.1 Weightage Techniques
8.4.1.2 Example
8.4.2 Clustering Approaches
8.4.2.1 K-Mean Clustering
8.4.2.2 Algorithm: Cluster Algorithm to Group Similar Bug Data Records
8.4.2.3 Example
8.4.2.4 Nearest Neighbor Classifier
8.5 An Approach to Detect Apple-to-Orange Pairs
8.5.1 Training Phase
8.5.1.1 Step 1
8.5.1.2 Step 2
8.5.2 Online Phase
8.5.3 Okapi BM25
8.5.4 Language Modeling with Smoothing
8.6 Implementation and Case Study
8.6.1 Datasets
8.7 Recurring Bug Prevention Framework
8.7.1 Knowledge-Enriched Bug Repository
8.7.2 Identify Groups of Recurring Duplicate Bugs
8.7.3 Root Cause Analysis and Bug Fix
8.7.4 Preventability Analytics
8.8 Conclusions
References
Part III: Big Data Tools and Analytics
Chapter 9: Large-Scale Data Analytics Tools: Apache Hive, Pig, and HBase
9.1 Introduction
9.2 Apache Hive
9.2.1 Hive Compilation and Execution Stages
9.2.2 Hive Commands
9.2.2.1 Databases
9.2.2.2 Tables
9.2.2.3 Loading Data into Table
9.2.2.4 Retrieving Data from Table
9.2.2.5 Drop Command for Database and Tables
9.2.3 Partitioning and Bucketing
9.2.3.1 Dynamic Partitioning
9.2.3.2 Static Partitioning
9.2.3.3 Hybrid Partitioning
9.2.3.4 Bucketing
9.2.4 External Table
9.2.5 Hive Performance
9.3 Apache Pig
9.3.1 Modes of User Interaction with Pig
9.3.2 Pig Compilation and Execution Stages
9.3.2.1 Parsing
9.3.2.2 Compile and Optimize
9.3.2.3 Plan
9.3.3 Pig Latin Commands
9.3.3.1 LOAD Command
9.3.3.2 DUMP Command
9.3.3.3 STORE Command
9.3.3.4 DESCRIBE Command
9.3.3.5 ILLUSTRATE Command
9.3.3.6 Expressions
9.3.3.7 UNION Command
9.3.3.8 SPLIT Command
9.3.3.9 FILTER Command
9.3.3.10 GROUP Command
9.3.3.11 FOREACH Command
9.3.4 Pig Scripts
9.3.4.1 ct.pig
9.3.4.2 comp.pig
9.3.5 User-Defined Functions (UDFs) in Pig
9.3.5.1 Predefined UDF
9.3.5.2 Customized Java UDFs
9.4 Apache HBase
9.4.1 HBase Architecture
9.4.1.1 Region Server
9.4.1.2 Master Server
9.4.1.3 Zookeeper
9.4.2 HBase Commands
9.4.2.1 Table Creation
9.4.2.2 List
9.4.2.3 Put
9.4.2.4 Get
9.4.2.5 Scan
9.4.2.6 Disable/Enable a Table
9.4.2.7 Drop
9.4.2.8 Exit/Quit
9.4.2.9 Stop
9.5 Conclusion
References
Chapter 10: Big Data Analytics: Enabling Technologies and Tools
10.1 Introduction
10.2 Characterizing Big Data
10.3 The Inherent Challenges
10.4 Big Data Infrastructures, Platforms, and Analytics
10.4.1 Unified Platforms for Big Data Analytics
10.4.2 Newer and Nimbler Applications
10.4.3 Tending Toward a Unified Architecture
10.4.4 Big Data Appliances and Converged Solutions
10.4.5 Big Data Frameworks
10.4.6 The Hadoop Software Family
10.5 Databases for Big Data Management and Analytics
10.5.1 NoSQL Databases
10.5.2 Why NoSQL Databases?
10.5.3 Classification of NoSQL Databases
10.5.4 Cloud-Based Databases for Big Data
10.6 Data Science
10.6.1 Basic Concepts
10.6.2 The Role of Data Scientist
10.7 Conclusion
References
Chapter 11: A Framework for Data Mining and Knowledge Discovery in Cloud Computing
11.1 Introduction
11.2 Related Work
11.3 Data Mining
11.3.1 Classification
11.3.1.1 Naive Bayes Classifier
11.3.1.2 Decision Tree (C4.5)
11.3.1.3 Random Forest
11.3.1.4 AdaBoost
11.3.2 Clustering
11.3.2.1 K-Means
11.3.3 Association Rule Mining
11.4 Cloud Computing
11.4.1 Deployment Models
11.4.1.1 Public Cloud
11.4.1.2 Private Cloud
11.4.1.3 Hybrid Cloud
11.4.2 Service Models
11.4.2.1 Infrastructure as a Service (IaaS)
11.4.2.2 Platform as a Service (PaaS)
11.4.2.3 Software as a Service (SaaS)
11.5 The Proposed DMCC Framework
11.5.1 Overview
11.5.2 DMCC Framework Architecture
11.5.3 DMCC Framework Features
11.6 Experimental Results
11.6.1 Dataset Description
11.6.1.1 EEG Eye State Dataset
11.6.1.2 Skin Segmentation Dataset
11.6.1.3 KDD Cup 1999 Dataset
11.6.1.4 Census Income Dataset
11.6.2 Classification Results
11.6.3 Clustering Results
11.6.4 A Study of Association Rule Mining
11.7 Conclusion and Suggestions for Future Work
References
Chapter 12: Feature Selection for Adaptive Decision Making in Big Data Analytics
12.1 Introduction
12.2 Dimension Reductions
12.2.1 Hybrid Genetic Search Model (HGSM)
12.2.2 Fuzzy-Rough-Set Approach
12.2.2.1 Data Preparation
12.2.2.2 DIM-RED-GA Algorithm
12.3 Case Study
12.3.1 Rice Diseases
12.3.1.1 Leaf Brown Spot
12.3.1.2 Rice Blast
12.3.1.3 Sheath Rot
12.3.1.4 Bacterial Blight
12.3.1.5 Rice Hispa
12.3.2 Methodology
12.3.2.1 Image Filtering
12.3.2.2 Feature Extraction
12.3.2.3 Colour Features
12.3.2.4 Shape-Based Features
12.3.2.5 Texture Features
12.3.3 Position Detection
12.4 Conclusions
References
Chapter 13: Social Impact and Social Media Analysis Relating to Big Data
13.1 Introduction
13.2 Using Big Data and Social Media Innovatively
13.2.1 Social Media as Advertising and Marketing Medium
13.2.2 Adding Value for Social Well-Being
13.2.3 Gauging Business Performance
13.2.4 Adding Value in Financial Markets
13.3 Discriminate Use of Social Media Analysis
13.3.1 Unstructured Data
13.3.2 Gaps in Governance Standards
13.3.3 Serving Self-Interest
13.3.4 Moving Away from Comfort Metrics
13.3.5 Using Power to Leverage Outcomes
13.3.6 Risks Relating to Social Media Platforms
13.3.7 Research Methodology Challenges
13.3.8 Securing Big Data
13.3.9 Limitations of Addressing Social Problems
13.4 Imperatives for Big Data Use from Social Media
13.4.1 Responsibility of Analytic Role Players
13.4.2 Evidence-Based Decision-Making
13.4.3 Protection of Rights
13.4.4 Knowing the Context
13.5 Conclusion
People also search for Data Science and Big Data Computing Frameworks and Methodologies 1st:
what is big data and data science
data science and computer science double major
data science and biology
data science and big data analytics making data-driven decisions
data science and big data analysis
Reviews
There are no reviews yet.