In this blog series you will find quotes, backgrounds, suggested further readings and other information related to my latest book SAP HANA 2.0, An Introduction published by SAP Press.
As the goal of the book is to provide an introduction, we could not spend as much time and pages on each and every topic as we wished at times. Big Data is one such topic although a small paragraph is included covering SAP Data Hub, SAP Vora, and SAP HANA Hadoop Integration. In this blog, I will cover big data topics in a bit more detail and include references where to find more information.
Any good? Post a comment, share on social media, and/or give a like. That’s how the community works. Thanks!
When business discovered big data it was welcomed as the new (black) gold but after 5 years of drilling and prospecting not everyone remained enthusiastic.
Looking at web searches with Google Trends, we can see that the interest in big data took off in 2012. Interest has waned a bit, taken over by data science and machine learning by 2018. Who’s to blame? Cloud computing.
According to The Origins of ‘Big Data’: An Etymological Detective Story the term goes back to the 1990s but from a technical perspective, big data took shape between 2004 and 2008 when the contemporary search giants Google and Yahoo developed and later open-sourced MapReduce and the Hadoop Distributed File System (HDFS). Pig, Hive, Zookeeper, and other Apache open source projects followed (49 is the current count).
Big data was initially characterized with 3 V’s: volume, velocity, variety, to which IBM added veracity, The Four V’s of Big Data, then we had the 5 Vs Everyone Must Know, The evolution of big data – the ‘6 Vs’, the Seven V’s of Big Data, the 10 Vs of Big Data and SAP even went up-to-eleven by adding the V of Vora (more on that V below).
Illustration from the The 42 V’s of Big Data and Data Science
Should you want to learn more, the Big data entry on Wikipedia provides as good an introduction as any (including a shady picture of the SAP Big Data bus).
The Path Forward
Real-Time Data Platform
With the Sybase acquisition of 2010, SAP acquired several big data-related technologies like IQ and Event Stream Processor (ESP) for IoT (Internet-of-Things) ingestion.
In 2012, SAP bundled SAP HANA with several of these technologies as the Real-Time Data Platform (RTDP).
SAP Real-Time Data Platform (2012)
A year later, in 2013, SAP acquired KXEN, the Knowledge eXtraction Engine, which just had brought InfiniteInsight to market for self-service predictive analytics, bringing data mining to the business professional, no PhD required. SAP InfiniteInsight would morph into SAP Predictive Analytics with the Automated Predictive Library (APL) providing SAP HANA integration. Although we would now file this under Analytics, at the time data mining was the way to go to unlock big data.
For more information about data mining and advanced analytics, see
Smart Data Services
The same year, with the release of SAP HANA SPS 06, smart data access added virtualisation to the SAP HANA platform, which enabled direct access to Hadoop and other data sources from SAP HANA.
Other “smart” technologies followed the next year with SPS 09 (2014) with Smart Data Streaming, (later Streaming Analytics) based on ESP; Dynamic Tiering (smart data tiering was considered as well), a native big data solution based on IQ; Smart Data Integration (SDI) and Smart Data Quality (SDQ) both BusinessObjects Data Services technologies to address the veracity of big data.
On the Bus
Also in 2013, SAP partnered with HortonWorks (now Cloudera) to resell big data platforms and started the Big Data Tour to get the developer community on the bus.
The next year, 2014, Spark integration was added plus a certified Spark distribution, causing some question marks about the future direction of SAP (HANA).
Big data integration took one step further with the release of SAP HANA Vora, announced at SAP TechEd 2015.
The name was later shortened to SAP Vora to underline that this concerned an independent product which not required the SAP HANA platform (see the FAQ for your questions).
Big Data-as-a-Service (BDaaS)
In 2016, SAP acquired Altiscale’s Big Data-as-a-Service (BDaaS) solution, integrated as SAP Cloud Platform Big Data Services. Vora was added to the service and this brought more good news.
Cloud-native, Multi-cloud, and Hybrid
SAP Vora 2.0
For version 2.0, SAP Vora was re-architected to run inside Docker containers with Kubernetes for cluster management, providing customers “the flexibility to choose among cloud, on-premise and hybrid deployment models, and they can migrate between these options easily and with minimal disruption”.
SAP Data Hub
SAP Vora was now also included with another new containerised application, SAP Data Hub.
SAP Data Intelligence
In 2019, SAP Data Hub was made available as a managed service with the name SAP Data Intelligence and just recently (March 2020), the on-premise product and the cloud-based service have been merged.
SAP HANA Cloud
SAP HANA Cloud, Data Lake
Just released as well (March 2020) is SAP HANA Cloud: a single data gateway to all your data. This service includes SAP HANA data lake, where we find our old friend IQ at work.
SAP HANA Cloud uses the same container and Kubernetes orchestration technologies as Data Intelligence (and Vora).
Smart data access (virtualisation) plays an important role in the design of SAP HANA Cloud and this includes, of course, access to the usual big data source suspects Hadoop and Spark but also to Google Big Query and Amazon Athena.
For more information, see
Learned Something New?
Post a comment, share on social media, and/or give a like. That’s how the community works. Thanks.
If you would like to receive updates, connect with me on
Denys van Kempen
SAP HANA 2.0 – An Introduction
Just getting started with SAP HANA? Or do have a migration to SAP HANA 2.0 coming up? Need a quick update covering business benefits and technology overview. Understand the role of the system administrator, developer, data integrator, security officer, data scientist, data modeler, project manager, and other SAP HANA stakeholders? My latest book about SAP HANA 2.0 covers everything you need to know.
Get it from SAP Press or Amazon: