Learn the algorithms and tools you need to build MapReduce applications with Hadoop and Spark for processing gigabyte, terabyte, or petabyte-sized datasets on clusters of commodity hardware. With this practical book, author Mahmoud Parsian, head of
使用Spark进行快速数据处理 Chapter 1: Installing Spark and Setting Up Your Cluster Chapter 2: Using the Spark Shell Chapter 3: Building and Running a Spark Application Chapter 4: Creating a SparkContext Chapter 5: Loading and Saving Data in Spark Chapter 6: Ma
About This Book Explore the integration of Apache Spark with third party applications such as H20, Databricks and Titan Evaluate how Cassandra and Hbase can be used for storage An advanced guide with a combination of instructions and practical examp
Key Features Perform data analysis and build predictive models on huge datasets that leverage Apache Spark Learn to integrate data science algorithms and techniques with the fast and scalable computing features of Spark to address big data challenge
1: Getting Started with Apache Spark 2: Developing Applications with Spark 3: Spark SQL 4: Working with External Data Sources 5: Spark Streaming 6: Getting Started with Machine Learning 7: Supervised Learning with MLlib Regression 8: Supervised Lear