Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Erste Schritte

Cloud

Sind Sie bereit?

Sandbox herunterladen

Wie können wir Ihnen helfen?

SchließenSchaltfläche „Schließen“

Learning the Ropes of the Hortonworks Sandbox

Introduction

This tutorial is aimed for users who do not have much experience in using the Sandbox.
We will install and explore the Sandbox on virtual machine and cloud environments. We will also navigate the Ambari user interface.
Let’s begin our Hadoop journey.

Prerequisites

  • Downloaded and Installed Hortonworks Sandbox
  • Allow yourself around one hour to complete this tutorial
  • If on mac or linux, added sandbox.hortonworks.com to your /private/etc/hosts file
  • If on windows 7, added sandbox.hortonworks.com to your /c/Windows/System32/Drivers/etc/hosts file

If on mac or linux, to add sandbox.hortonworks.com to your list of hosts, open the terminal, enter the following command, replace {Host-Name} with the appropriate host for your sandbox:

echo '{Host-Name} sandbox.hortonworks.com' | sudo tee -a /private/etc/hosts

NOTE: In single machine, just replace {Host-Name} with 127.0.0.1

If on windows 7, to add sandbox.hortonworks.com to your list of hosts, open git bash, enter the following command, replace {Host-Name} with the appropriate host for your sandbox:

echo '{Host-Name} sandbox.hortonworks.com' | tee -a /c/Windows/System32/Drivers/etc/hosts

Outline

What is the Sandbox?

The Sandbox is a straightforward, pre-configured, learning environment that contains the latest developments from Apache Hadoop Enterprise, specifically Hortonworks Data Platform (HDP) Distribution. The Sandbox comes packaged in a virtual environment that can run in the cloud or on your personal machine. The Sandbox allows you to learn and explore HDP on your own.

If you want to explore Hortonworks Sandbox in Microsoft Azure, please skip to Section 2.

Section 1: Sandbox in VM

Step 1: Explore the Sandbox in a VM

1.1 Install the Sandbox

Start the Hortonworks Sandbox following the Installation Steps to start the VM.

sandbox_download

Note: The Sandbox system requirements include that you have a 64 bit OS with at least 8 GB of RAM and enabled BIOS for virtualization. Find out about the newest features, known and resolved issues along with other updates on HDP 2.4 from the release notes. The Sandbox on Azure is under construction and will update to HDP2.5 soon.

1.2 Learn the Host Address of Your Environment

Once you have installed the Sandbox VM, it resolves to the host on your environment, the address of which varies depending upon the Virtual Machine you are using(Vmware, VirtualBox etc). As, a general thumb rule, wait for the installation to complete and confirmation screen will tell you the host your sandbox resolves to. For example:

In case of VirtualBox: host would be 127.0.0.1

Host Address of Sandbox Environment

Note: In case of Azure, your host can be found under Public IP Address on the dashboard. For further clarification, check out our guide for Deploying Hortonworks Sandbox on Azure.

If you are using a private cluster or a cloud to run sandbox. Please find the host your sandbox resolves to.

1.3 Connect to the Welcome Screen

Append the port number :8888 to your host address, open your browser, and access Sandbox Welcome page at http://_host_:8888/.

new_splashscreen

Click on Launch Dashboard to go to Ambari with a Hello HDP tutorial and Quick Links to view some services of HDP environment.

Launch Dashboard opens the Ambari user interface and an additional tutorial window. You should login to Ambari using the username and password based on the tutorials you are following in the tutorial window. Most of the tutorials login to Ambari using raj_ops/raj_ops or maria_dev/maria_dev.

1.4 Multiple Ways to Execute Terminal Commands

Note: For all methods below, the login credential instructions will be the same to access the Sandbox through the terminal.

  • Login using username as root and password as hadoop.
  • After first time login, you will be prompted to retype your current password, then change your password.
  • If you are using Putty on Windows then go to terminal of your sandbox in oracle virtualBox –> Press Alt+F5 –> enter username – root –> enter password – hadoop –> it will ask you to set new password –> set new password.

Secure Shell (SSH) Method:

Open your terminal (mac and linux) or putty (windows). Type the following command to access the Sandbox through SSH:

# Usage:
      ssh <username>@<hostname> -p <port>;
# Example:
      ssh root@127.0.0.1 -p 2222;

Mac Terminal SSH

Mac OS Terminal. When you type the password, the entry doesn’t echo on the screen, it hides the user input. Carefully type correct password.

Shell Web Client Method:

Open your web browser. Type the following text into your browser to access the Sandbox through the shell:

# Usage:
    #  _host_:4200
Example:
      127.0.0.1:4200

Shell in the Browser Sandbox

Appearance of Web Shell. When you type the password, the entry doesn’t echo on the screen, it hides the user input. Carefully type correct password.

VM Terminal Method:

Open the Sandbox through Virtualbox or VMware. The Sandbox VM Welcome Screen will appear. For Linux/Windows users, press Alt+F5 and for Mac, press Fn+Alt+F5 to login into the Sandbox VM Terminal.

Shell VM Terminal Sandbox

VirtualBox VM Terminal. When you type the password, the entry doesn’t echo on the screen, it hides the user input. Carefully type correct password.

1.5 Learn Your Sandbox Version

To find information about your sandbox, execute the command:

sandbox-version

1.6 Send Data Between Sandbox & Local Machine

Open your terminal (linux or mac) or git bash (windows). To send data, in our example HDF .tar.gz file, from your local machine to the azure sandbox, you would input the the following command. If you want to try this command, replace the HDF filename with another filename from your Downloads folder. Modify the command and execute:

scp -P 2222 ~/Downloads/HDF-1.2.0.1-1.tar.gz root@localhost:/root

This command sends HDF from your local machine’s Downloads folder to the Sandbox’s root directory. We can send any file, directory we want, we just need to specify the path. We can also choose any sandbox directory or path that we want the data to land into.

Here is the definition of the command that we used above:

scp -P <input-port> </input-directory-path-local-mach> <input-username@hostname-:/sandbox-dir-path>

We can also send data from sandbox to our local machine, refer to the modified command definition below:

scp -P <input-port> <input-username@hostname-:/sandbox-dir-path> </input-directory-path-local-mach>

What is the difference between the two command definitions above?
To send data from local machine to sandbox, the local machine directory path comes before sandbox directory. To transfer data from sandbox to local machine, the command arguments are reversed.

Step 2: Explore Ambari

Navigate to Ambari welcome page using the url given on Sandbox welcome page.

Note: Both the username and password to login are maria_dev.

2.1 Use Terminal to Find the Host IP Sandbox Runs On

If you want to search for the host address your sandbox is running on, ssh into the sandbox terminal upon successful installation and follow subsequent steps:

  1. Login using username as root and password as hadoop.
  2. Type ifconfig and look for inet addr: under eth0.
  3. Use the inet addr, append :8080 and open it into a browser. It shall direct you to Ambari login page.
  4. This inet address is randomly generated for every session and therefore differs from session to session.

Host_Address_Sandbox_Runs_On

Services Provided By the Sandbox

Service URL
Sandbox Welcome Page http://host:8888
Ambari Dashboard http://host:8080
ManageAmbari http://host:8080/views/ADMIN_VIEW/2.4.0.0/INSTANCE/#/
Hive User View http://host:8080/#/main/views/HIVE/1.5.0/AUTO_HIVE_INSTANCE
Pig User View http://host:8080/#/main/views/PIG/1.0.0/Pig_INSTANCE
File User View http://host:8080/#/main/views/FILES/1.0.0/AUTO_FILES_INSTANCE
SSH Web Client http://host:4200
Hadoop Configuration http://host:50070/dfshealth.html http://host:50070/explorer.html

The following Table Contains Login Credentials:

Service User Passwort
Ambari, OS admin refer to step 2.1
Ambari, OS maria_dev maria_dev
Ambari, OS raj_ops raj_ops
Ambari, OS holger_gov holger_gov
Ambari, OS amy_ds amy_ds

Please go to Section 3 to know more about these users.

2.2 Setup Ambari admin Password Manually

  1. Start your sandbox and open a terminal (mac or linux) or putty (windows)
  2. SSH into the sandbox as root using ssh root@127.0.0.1 -p 2222. For Azure and VMware users, your _host_ and _port_ will be different.
  3. Type the following commands:
# Updates password
ambari-admin-password-reset
# If Ambari doesn't restart automatically, restart ambari service
ambari-agent restart

Note: Now you can login to ambari as an admin user to perform operations, such as starting and stopping services.

Terminal Update Ambari admin password

2.3 Explore Ambari Welcome Screen 5 Key Capabilities

Enter the Ambari Welcome URL and then you should see a similar screen:

Lab0_3

  1. Operate Your Cluster” will take you to the Ambari Dashboard which is the primary UI for Hadoop Operators
  2. Manage Users + Groups” allows you to add & remove Ambari users and groups
  3. Clusters” allows you to grant permission to Ambari users and groups
  4. Ambari User Views” list the set of Ambari Users views that are part of the cluster
  5. Deploy Views” provides administration for adding and removing Ambari User Views

Enter the Ambari Dashboard URL and you should see a similar screen:

Lab0_4

Click on

1. Metrics, Heatmap and Configuration

and then the

2. Dashboard, Services, Hosts, Alerts, Admin and User Views icon (represented by 3×3 matrix ) to become familiar with the Ambari resources available to you.

Section 2: Sandbox in Microsoft Azure

NOTE: HDP 2.5 Sandbox on Azure is under construction and will be updated soon. This section addresses to HDP 2.4 Sandbox on Azure

Step 1: Explore the Sandbox in Azure

1.1 Deploy the Sandbox in Azure

Follow the tutorial here to deploy the latest HDP Sandbox on Azure.

1.2 Connect to the Welcome Screen

Append the port number :8888 to your host address, open your browser, and access Sandbox Welcome page at http://_host_:8888/.
Here, host is your public IP address that is generated when you deployed the HDP Sandbox in Azure. Take note of the IP address. In this example, it is 23.99.9.232. Your machine will have a different IP.

Sandbox Welcome Screen Azure

1.3 Multiple Ways to Execute Terminal Commands

Secure Shell (SSH) Method:

Open your terminal (mac and linux) or putty (windows). Here again, host is the public IP address provided by Azure. Give the username and password that you provided while deploying the sandbox on Azure. Use the following command to access the Sandbox through SSH:

# Usage:
      ssh <username>@<host> -p 22;

Mac Terminal SSH Azure 1

Mac OS Terminal. When you type the password, the entry doesn’t echo on the screen, it hides the user input. Carefully type correct password.

Shell Web Client Method:

Open your web browser. Replace the following text by your host into your browser to access the Sandbox through the shell. Provide the same username and password that you gave while deploying the sandbox on Azure.

# Usage:
    #  _host_:4200

Shell in the Browser Sandbox Azure

Appearance of Web Shell. When you type the password, the entry doesn’t echo on the screen, it hides the user input. Carefully type correct password.

1.4 Send Data Between Azure Sandbox & Local Machine

Open your terminal (linux or mac) or git bash (windows). To send data, in our example HDF .tar.gz file, from your local machine to the azure sandbox, you would input the the following command. If you want to try this command, replace the HDF filename with another filename from your Downloads folder. Also replace james94 with your azure username you provided while deploying the sandbox. Lastly, replace the 2nd james94 folder_name with your azure username, which is the folder of your sandbox. Modify the command and execute:

scp -P 2222 ~/Downloads/HDF-1.2.0.1-1.tar.gz james94@localhost:/james94

This command sends HDF from your local machine’s Downloads folder to the Sandbox’s root directory. We can send any file, directory we want, we just need to specify the path. We can also choose any sandbox directory or path that we want the data to land into.

Here is the definition of the command that we used above:

scp -P <input-port> </input-directory-path-local-mach> <input-username@hostname-:/sandbox-dir-path>

We can also send data from sandbox to our local machine, refer to the modified command definition below:

scp -P <input-port> <input-username@hostname-:/sandbox-dir-path> </input-directory-path-local-mach>

What is the difference between the two command definitions above?
To send data from local machine to sandbox, the local machine directory path comes before sandbox directory. To transfer data from sandbox to local machine, the command arguments are reversed.

Step 2: Explore Ambari in Azure

Navigate to Ambari welcome page using the url given on Sandbox welcome page.

Note: Both the username and password to login are maria_dev.

Services Provided By the Sandbox in Azure

Service URL
Sandbox Welcome Page http://host:8888
Ambari Dashboard http://host:8080
Manage Ambari http://host:8080/views/ADMIN_VIEW/2.2.1.0/INSTANCE/#/
Hive User View http://host:8080/#/main/views/HIVE/1.0.0/AUTO_HIVE_INSTANCE
Pig User View http://host:8080/#/main/views/PIG/1.0.0/Pig
File User View http://host:8080/#/main/views/FILES/1.0.0/Files
SSH Web Client http://host:4200
Hadoop Configuration http://host:50070/dfshealth.html http://host:50070/explorer.html

The following Table Contains Login Credentials:

Service User Passwort
Ambari, OS admin refer to step 2.1
Ambari, OS maria_dev maria_dev

2.1 Setup Ambari admin Password Manually

1. Open a terminal (mac or linux) or putty (windows)

2. SSH into the sandbox using your username and password that you provided at the time of creating the sandbox on Azure. Your host is a public IP address given by Azure and sudo password is a Sandbox password.

# Usage:
      ssh <username>@<host> -p 22;

3. Type the following commands:

# Updates password
sudo ambari-admin-password-reset
# If Ambari doesn't restart automatically, restart ambari service
ambari-agent restart

Note: Now you can login to ambari as an admin user to perform operations, such as starting and stopping services.

Terminal Update Ambari admin password Azure

2.2 Explore Ambari Welcome Screen 5 Key Capabilities

Enter the Manage Ambari page using the link above or click on the Ambari id pull down and select Manage Ambari:

ManageAmbari

and then you should see a similar screen:

Lab0_3

NOTE: only the Ambari admin id has access to this page

  1. Operate Your Cluster” will take you to the Ambari Dashboard which is the primary UI for Hadoop Operators
  2. Manage Users + Groups” allows you to add & remove Ambari users and groups
  3. Clusters” allows you to grant permission to Ambari users and groups
  4. Ambari User Views” list the set of Ambari Users views that are part of the cluster
  5. Deploy Views” provides administration for adding and removing Ambari User Views

Enter the Ambari Dashboard URL and you should see a similar screen:

Lab0_4

Click on

  1. Metrics, Heatmap and Configuration

and then the

  1. Dashboard, Services, Hosts, Alerts, Admin and User Views icon (represented by 3×3 matrix ) to become familiar with the Ambari resources available to you.

Section 3: New Users in Sandbox

Ambari 2.4 introduced the notion of Role-Based Access Control(RBAC) for the Ambari web interface. Ambari now includes additional cluster operation roles providing more granular division of control of the Ambari Dashboard and the various Ambari Views. The image below illustrates the various Ambari Roles. Only the admin id has access to view or change these roles. Please refer to the HDP Ambari roles documentation for more information.

AmbariRBAC

There are 4 user personas present in Sandbox:

1. maria_dev – maria_dev is responsible for preparing and getting insight from data. She loves to explore different HDP components like Hive, Pig, HBase, Phoenix, etc.

2. raj_ops – raj_ops is responsible for infrastructure build and R&D activities like design, install, configure and administration. He serves as a technical expert in the area of system administration for complex operating systems.

3. holger_gov – holger_gov is primarily for the management of data elements, both the content and metadata. He has a specialist role that incorporates processes, policies, guidelines and responsibilities for administering organizations’ entire data in compliance with policy and/or regulatory obligations.

4. amy_ds – A data scientist who uses Hive, Spark and Zeppelin to do exploratory data analysis, data cleanup and transformation as preparation for analysis.

Some notable differences between these users in the Sandbox are mentioned below:

Name id(s) Role Dienstleistungen
Sam Admin Ambari Admin Ambari
Raj (raj_ops) Hadoop Warehouse Operator Hive/Tez, Ranger, Falcon, Knox, Sqoop, Oozie, Flume, Zookeeper
Maria (maria_dev) Spark and SQL Developer Hive, Zeppelin, MapReduce/Tez/Spark, Pig, Solr, HBase/Phoenix, Sqoop, NiFi, Storm, Kafka, Flume
Amy (amy_ds) Data Scientist Spark, Hive, R, Python, Scala
Holger (holger_gov) Data Steward Atlas

OS Level Authorization

Name id(s) HDFS Authorization Ambari Authorization Ranger Authorization
Sam Admin Max Ops Ambari Admin Admin access
Raj (raj_ops) Access to Hive, Hbase, Atlas, Falcon, Ranger, Knox, Sqoop, Oozie, Flume, Operations Cluster Administrator Admin Access
Maria (maria_dev) Access to Hive, Hbase, Falcon, Oozie and Spark Service Operator Normal User Access
Amy (amy_ds) Access to Hive, Spark and Zeppelin Service Operator Normal User Access
Holger (holger_gov) Access to Atlas Service Administrator Normal User Access

Other Differences

Name id(s) Sandbox Role View Configurations Start/Stop/Restart Service Modify Configurations Add/delete services Install Components Manage Users/Groups Manage Ambari Views Atlas UI Access Sample Ranger Policy Access
Sam Admin Ambari Admin Ja Ja Ja Ja Ja Ja Ja Ja NA
Raj (raj_ops) Cluster Administrator Ja Ja Ja Ja Ja Nein Nein Nein ALL
Maria (maria_dev) Service Operator Ja Ja Nein Nein Nein Nein Nein Nein SELECT
Amy (amy_ds) Service Operator Ja Ja Nein Nein Nein Nein Nein Nein SELECT
Holger (holger_gov) Service Administrator Ja Ja Ja Nein Nein Nein Nein Ja SELECT, CREATE, DROP

Do not forget to check out the scripts from which these users and their operations are created.

Section 4: Troubleshoot

Step 1: Troubleshoot Problems

Check Hortonworks Community Connection(HCC) for answers to problems you may come across during your hadoop journey.

Hortonworks Community Connection Main Page

1.1 Technique for Finding Answers in HCC

  • Insert quotes around your tutorial related problem
  • Be specific by including keywords (error, tutorial name, etc.)

Further Reading