-
In this tutorial, we will provide an overview of Apache Spark, it’s relationship with Scala, Zeppelin notebooks, Interpreters, Datasets and DataFrames. Finally, we will showcase Apache Zeppelin notebook for our development environment to keep things simple and elegant. Zeppelin will allow us to run in a pre-configured environment and execute code written for Spark […]
Start
-
This tutorial will get you started with Apache Spark and will cover: How to use the Spark DataFrame & Dataset API How to use the SparkSQL interface via Shell-in-a-Box Prerequisites Downloaded and Installed latest Hortonworks Data Platform (HDP) Sandbox Learning the Ropes of the HDP Sandbox Basic Scala syntax Getting Started with Apache Zeppelin […]
Start
-
Apache Zeppelin is a web-based notebook that enables interactive data analytics. With Zeppelin, you can make beautiful data-driven, interactive and collaborative documents with a rich set of pre-built language back-ends (or interpreters) such as Scala (with Apache Spark), Python (with Apache Spark), SparkSQL, Hive, Markdown, Angular, and Shell. With a focus on Enterprise, Zeppelin […]
Start
-
In this two-part lab-based tutorial, we will first introduce you to Apache Spark SQL. Spark SQL is a higher-level Spark module that allows you to operate on DataFrames and Datasets, which we will cover in more detail later. At the end of the tutorial we will provide you a Zeppelin Notebook to import into […]
Start
-
In this tutorial, we will explore how you can access and analyze data on Hive from Spark. In particular, you will learn: How to interact with Apache Spark through an interactive Spark shell How to read a text file from HDFS and create a RDD How to interactively analyze a data set through a […]
Start
-
This tutorial will teach you how to set up a full development environment for developing Spark applications. For this tutorial we’ll be using Python, but Spark also supports development with Java, Scala and R. We’ll be using PyCharm Community Edition as our IDE. PyCharm Professional edition can also be used. By the end of […]
Start
-
This tutorial will teach you how to set up a full development environment for developing and debugging Spark applications. For this tutorial we’ll be using Scala, but Spark also supports development with Java, and Python. We will be using be using IntelliJ Version: 2018.2 as our IDE running on Mac OSx High Sierra, and […]
Start
-
This tutorial will teach you how to set up a full development environment for developing and debugging Spark applications. For this tutorial we’ll be using Java, but Spark also supports development with Scala, Python and R. We’ll be using IntelliJ as our IDE, and since we’re using Java we’ll use Maven as our build […]
Start
-
In this tutorial, we will introduce you to Machine Learning with Apache Spark. The hands-on portion for this tutorial is an Apache Zeppelin notebook that has all the steps necessary to ingest and explore data, train, test, visualize, and save a model. We will cover a basic Linear Regression model that will allow us […]
Start
-
R is a popular tool for statistics and data analysis. It has rich visualization capabilities and a large collection of libraries that have been developed and maintained by the R developer community. One drawback to R is that it’s designed to run on in-memory data, which makes it unsuitable for large datasets. Spark is […]
Start
-
In this tutorial, we will introduce core concepts of Apache Spark Streaming and run a Word Count demo that computes an incoming list of words every two seconds. Prerequisites This tutorial is a part of series of hands-on tutorials to get you started with HDP using Hortonworks Sandbox. Please ensure you complete the prerequisites […]
Start