Holen Sie sich die neuesten Updates von Hortonworks per E-Mail

Einmal monatlich erhalten Sie die neuesten Erkenntnisse, Trends und Analysen sowie Fachwissen zu Big Data.


Sign up for the Developers Newsletter

Einmal monatlich erhalten Sie die neuesten Erkenntnisse, Trends und Analysen sowie Fachwissen zu Big Data.


Erste Schritte


Sind Sie bereit?

Sandbox herunterladen

Wie können wir Ihnen helfen?

* Ich habe verstanden, dass ich mich jederzeit abmelden kann. Außerdem akzeptiere ich die weiteren Angaben in der Datenschutzrichtlinie von Hortonworks.
SchließenSchaltfläche „Schließen“
September 10, 2018
Vorige FolieNächste Folie

Die Open Hybrid Architecture Initiative

The concept of a modern data architecture has evolved dramatically over the past 10-plus years. Turn the clock back and recall the days of legacy data architectures, which had many constraints. Storage was expensive and had associated hardware costs. Compute often involved appliances and more hardware investments. Networks were expensive, deployments were only on-premises and proprietary software and hardware were locking in enterprises everywhere you turned.

This was (and for many organizations still is) a world of transactional silos where the architecture only allowed for post-transactional analytics of highly structured data. The weaknesses in these legacy architectures were exposed with the advent of new data types such as mobile and sensors, and new analytics such as machine learning and data science. Couple that with the advent of cloud computing and you have a perfect storm.

A multitude of interconnected factors disrupted that legacy data architecture era. Storage became cheaper and new software such as Apache Hadoop took center stage. Compute also went the software route and we saw the start of edge computing. Networks became ubiquitous and provided the planet with 3G/4G/LTE connectivity, deployments started to become hybrid and enterprises embraced open source software. This led to a rush of innovation as customer requirements changed, influencing the direction that vendors had to take to modernize the data architecture.

The emergence of cloud created the need to evolve again to take advantage of its unique characteristics such as de-coupled storage and compute. As a result, this led to connected data architectures, with the Hadoop ecosystem evolving for IaaS and PaaS models and innovations such as Hortonworks DataPlane Service (DPS) for connecting deployments in the data center and the public cloud.

Given that data has “mass” and is responsible for the rapid rise of cloud adoption, the data architecture must evolve again to meet the needs of today’s enterprises and take advantages of the unique benefits of cloud. So much more is required in a data architecture today to achieve our dreams of digital transformation, real-time analytics and artificial intelligence – just to name a few. This paves the way for pre-transaction analysis and drives use cases such as 360-degree view of the customer. Organizations need a unified hybrid architecture for on-premises, multi-cloud and edge environments. The time has come to once again reimagine the data architecture, with hybrid as a key requirement.

What does it take to be hybrid? We’ve been innovating to answer this question for some time. Hybrid requires:

  • Cloud-native Hadoop for public cloud – delivered with Hortonworks Data Platform (HDP) and Hortonworks DataFlow (HDF) on IaaS.
  • Data flow and management to and from the edge – delivered with HDF, and specifically with MiNiFi.
  • Consistent security and data governance across all tiers – delivered with DPS.
  • A consistent architecture in the cloud and on-premises. This is the last mile.

The last point on consistent architectures is critical – not just from a technology standpoint, but more because the differences manifest themselves in a fundamental manner in the interaction model for the user vis-a-vis the technology. As an example, when it comes to the Hadoop ecosystem today, users walk up to a shared, multi-tenant cluster and just submit their SQL queries, Spark applications, etc. In the cloud, however, users have to provision their workloads such as query instances, Spark clusters, etc., before they can run analytics.

The Open Hybrid Architecture Initiative

Today, we are excited to announce the Open Hybrid Architecture initiative – the last mile of our endeavor to deliver on the promise of hybrid. This initiative is a broad effort across the open-source communities, the partner ecosystem and Hortonworks platforms to enable a consistent experience by bringing the cloud architecture on-premises for the enterprise.

Another key benefit is helping customers settle on a consistent architecture and interaction model which allows them to seamlessly move data and workloads across on-premises and multiple clouds using platforms such as DPS.

Through the initiative, we deliver an architecture where it absolutely will not matter where your data is – in any cloud, on-prem or the edge – enterprises can leverage open-source analytics in a secure and governed manner. The benefits of ensuring a consistent interaction model cannot be overstated, and provides the key to unlocking a seamless experience.

The Open Hybrid Architecture initiative will make this possible by:

  • De-coupling storage, with both file system interfaces and an object-store interface to data.
  • Containerizing compute resources for elasticity and software isolation.
  • Sharing services for metadata, governance and security across all tiers.
  • Providing DevOps/orchestration tools for managing services/workloads via the “infrastructure is code” paradigm to allow spin-up/down in a programmatic manner.
  • Designating workloads specific to use cases such as EDW, data science, etc., rather than sharing everything in a multi-tenant Hadoop cluster.

So, What Happens Next?

After careful consideration, we’ve determined the best path forward is a phased approach, similar to how Hortonworks delivered enterprise-grade SQL queries-on-Hadoop via the Stinger and Stinger.Next initiatives.

The Open Hybrid Architecture initiative will include the following development phases:

  • Phase 1: Containerization of HDP and HDF workloads with DPS driving the new interaction model for orchestrating workloads by programmatic spin-up/down of workload-specific clusters (different versions of Hive, Spark, NiFi, etc.) for users and workflows.
  • Phase 2: Separation of storage and compute by adopting scalable file-system and object-store interfaces via the Apache Hadoop HDFS Ozone project.
  • Phase 3: Containerization for portability of big data services, leveraging technologies such as Kubernetes for containerized HDP and HDF. Red Hat and IBM partner with us on this journey to accelerate containerized big data workloads for hybrid. As part of this phase, we will certify HDP, HDF and DPS as Red Hat Certified Containers on RedHat OpenShift, an industry-leading enterprise container and Kubernetes application platform. This allows customers to more easily adopt a hybrid architecture for big data applications and analytics, all with the common and trusted security, data governance and operations that enterprises require.

Just as we enabled the modern data architecture with HDP and YARN back in the day, we’re at it again – but this time it’s bringing the innovation we’ve done in the cloud down to our products in the data center.

Hortonworks has been on a multi-year journey toward cloud-first and cloud-native architectures. The Open Hybrid Architecture initiative is the final piece of the puzzle. Not only will this initiative bring cloud-native to the data center, but it will also help our customers embrace and master the unified hybrid architectural model that is required to get the full benefits of on-premises, cloud and edge computing. We, along with our partner ecosystem and the open-source community, are excited to tackle this next redesign of the modern data architecture.


Srikar Ananthula says:

Interesting !

Craig says:

BlueData already delivers the consistent hybrid architecture shown in the diagram above. We can spin up multiple HDP clusters for different use cases and connect to existing HDFS data lake or S3

Debabrata Ghosh says:

Hi Arun,
A very nice article indeed. With respect to Phase 3 above , would you have some sample deployment architectures where HDP would have been installed on top of Kubernetes ? Or any best practices which you may share around this please ?


Antwort verfassen

Ihre E-Mail-Adresse wird nicht veröffentlicht. Pflichtfelder sind mit * gekennzeichnet