Newsletter

Holen Sie sich die neuesten Updates von Hortonworks per E-Mail

Einmal monatlich erhalten Sie die neuesten Erkenntnisse, Trends und Analysen sowie Fachwissen zu Big Data.

AVAILABLE NEWSLETTERS:

Sign up for the Developers Newsletter

Einmal monatlich erhalten Sie die neuesten Erkenntnisse, Trends und Analysen sowie Fachwissen zu Big Data.

cta

Erste Schritte

Cloud

Sind Sie bereit?

Sandbox herunterladen

Wie können wir Ihnen helfen?

* Ich habe verstanden, dass ich mich jederzeit abmelden kann. Außerdem akzeptiere ich die weiteren Angaben in der Datenschutzrichtlinie von Hortonworks.
SchließenSchaltfläche „Schließen“
CDA > Data Engineers & Scientists > Data Science Applications

Building a Sentiment Analysis Application

Cloud Sind Sie bereit?

SANDBOX HERUNTERLADEN

Einleitung

For this project, you will play the part of a Big Data Application Developer who leverages their skills as a Data Engineer and Data Scientist by using multiple Big Data Technologies provided by Hortonworks Data Flow (HDF) and Hortonworks Data Platform (HDP) to build a Real-Time Sentiment Analysis Application. For the application, you will learn to acquire tweet data from Twitter’s Decahose API and send the tweets to the Kafka Topic “tweets” using NiFi. Next you will learn to build Spark Machine Learning Model that classifies the data as happy or sad and export the model to HDFS. However, before building the model, Spark requires the data that builds and trains the model to be in feature array, so you will have to do some data cleansing with SparkSQL. Once the model is built, you will use Spark Structured Streaming to load the model from HDFS, pull in tweets from Kafka topic “tweets”, add a sentiment score to the tweet, then stream the data to Kafka topic “tweetsSentiment”. Earlier after finishing the NiFi flow, you will build another NiFi flow that ingests data from Kafka topic “tweetsSentiment” and stores the data into HBase. With Hive and HBase integration, you will perform queries to visualize that the data was stored successfully and also show the sentiment score for tweets.

Big Data Technologies used to develop the Application:

Goals and Objectives

  • Learn to create a Twitter Application using Twitter’s Developer Portal to get KEYS and TOKENS for connecting to Twitter’s APIs
  • Learn to create a NiFi Dataflow Application that integrates Twitter’s Decahose API to ingest tweets, perform some preprocessing, store the data into the Kafka Topic “tweets”.
  • Learn to create a NiFi Dataflow Application that ingests the Kafka Topic “tweetsSentiment” to stream sentiment tweet data to HBase
  • Learn to build a SparkSQL Application to clean the data and get it into a suitable format for building the sentiment classification model
  • Learn to build a SparkML Application to train and validate a sentiment classification model using Gradient Boosting
  • Learn to build a Spark Structured Streaming Application to stream the sentiment tweet data from Kafka topic “tweets” on HDP to Kafka topic “tweetsSentiment” on HDF while attaching a sentiment score per tweet based on output of the classification model
  • Learn to visualize the tweet sentiment score by using Zeppelin’s Hive interpreter mapping to the HBase table

Voraussetzungen

Übersicht

The tutorial series consists of the following tutorial modules:

1. Application Development Concepts You will be introduced to sentiment fundamentals: sentiment analysis, ways to perform the data analysis and the various use cases.

2. Setting up the Development Environment You will create a Twitter Application in Twitter’s Developer Portal for access to KEYS and TOKENS. You will then write a shell code and perform Ambari REST API Calls to setup a development environment.

3. Acquiring Twitter Data You will build a NiFi Dataflow to ingest Twitter data, preprocess it and store it into the Kafka Topic “tweets”. The second NiFi Dataflow you will build, ingests the enriched sentiment tweet data from Kafka topic “tweetsSentiment” and streams the content of the flowfile to HBase.

4. Cleaning the Raw Twitter Data You will create a Zeppelin notebook and use Zeppelin’s Spark Interpreter to clean the raw twitter data in preparation to create the sentiment classification model.

5. Building a Sentiment Classification Model You will create a Zeppelin notebook and use Zeppelin’s Spark Interpreter to build a sentiment classification model that classifies tweets as Happy or Sad and exports the model to HDFS.

6. Deploying a Sentiment Classification Model You will create a Scala IntelliJ project in which you develop a Spark Structured Streaming application that streams the data from Kafka topic “tweets” on HDP, processes the tweet JSON data by adding sentiment and streaming the data into Kafka topic “tweetsSentiment” on HDF.

7. Visualizing Sentiment Scores You will use Zeppelin’s JDBC Hive Interpreter to perform SQL queries against the noSQL HBase table “tweets_sentiment” for visual insight into tweet sentiment score.

Rezensionen der Benutzer

Bewertung der Benutzer
0 No Reviews
5 Star 0%
4 Star 0%
3 Star 0%
2 Star 0%
1 Star 0%
Name des Tutorials
Building a Sentiment Analysis Application

Um neue Fragen zu stellen oder Antworten auf Fragen anderer Nutzer zu durchsuchen, besuchen Sie bitte die Hortonworks Community Connection.

No Reviews
Rezension schreiben

Registrieren

Bitte registrieren Sie sich, um eine Rezension zu schreiben

Teilen Sie Ihre Erfahrungen

Beispiel: Bestes Tutorial der Welt

Sie müssen mindestens 50 Zeichen in dieses Feld eingeben.

Erfolgreich eingesendet

Vielen Dank für Ihre Rezension!