SAP HANA stands for High Performance Analytical Appliance. This flexible appliance (combination of hardware and software) runs independently of the data source and can be used to analyze large data volumes in real time within the main memory (in-memory technology).
Software Components of SAP HANA
SAP HANA database, SAP HANA Studio, SAP HANA Client, SAP HANA Application Function Libraries (AFL—an optional component).
Software for data replication
SAP LT Replication Add-on and Server, SAP HANA Direct Extractor Connection (DXC), SAP BusinessObjects Data Services.
Software for direct data preparation
SAP HANA Client Package for Microsoft Excel, SAP HANA User Interface for Information Access (INA), SAP HANA Information Composer.
Lifecycle management components
SAP Host Agent, Software Update Manager for SAP HANA, SAP Solution Manager Diagnostics Agent.
Structure of the above mentioned SAP HANA components
SAP HANA Database
As a full relational database, SAP HANA provides functions similar to other relational (“traditional“) databases that are supported by SAP. Like these traditional databases, SAP HANA provides functions for data backup and recovery, supports the SQL standard (SQL 92 Entry-Level and some SQL 99 extensions), and guarantees data consistency by following the ACID principle (atomicity, consistency, isolation, durability) when executing
In contrast to other relational databases, SAP HANA can place all relevant business data in the main memory. It combines row-, column-, and objectbased database technologies and was optimized for the usage of parallel processing functionality provided by modern hardware technologies.You can use multi-core and multi-CPU architectures to their fullest potential.The SAP HANA database provides its own programming language (SQLScript)
SAP HANA Studio
SAP HANA Studio is comprised of the administration and development environment.
Note: Eclipse and its Significance for SAP.
Eclipse is a platform for development tools and environments (e.g., for Java, C/C++, or PHP). It is maintained and further developed by the Eclipse Foundation (see http://eclipse.org).
In addition to SAP HANA Studio and the ABAP Development Tools for SAP NetWeaver, the following SAP development environments are based on Eclipse:
- SAP NetWeaver Developer Studio (Java)
- SAP Eclipse Tools for SAP HANA Cloud Platform
- SAP UI Development Tools for HTML5
- SAP NetWeaver Gateway Plug-in for Eclipse
One of the main advantages of the Eclipse platform is the ability to integrate different tools into one installation so that the user benefits from a homogeneous development environment.
SAP HANA Studio:usage areas
Starting and stopping database services
- Monitoring the system
- Specifying system settings
- Maintaining users and authorizations
- Configuring the audit log
SAP HANA Client
Using the SAP HANA Client, you can connect to the SAP HANA database via a network protocol. The following standards are supported.
- ODBC (Open Database Connectivity) and JDBC (Java Database Connectivity) for SQL-based access
- ODBO (OLE DB for OLAP) for MDX-based access (multi-dimensional expressions).
As the Eclipse platform is Java-based, SAP HANA Studio uses the JDBC client to establish the connection. This variant is also used in Java-based application servers.
The SAP NetWeaver Application Server (AS) ABAP uses the so-called Database Specific Library (DBSL) (which is embedded in the SQLDBC client) to connect to the SAP HANA database.
Special BI clients (business intelligence), such as add-ins for Microsoft Excel, typically use MDX-based access for multi-dimensional queries that are executed via the ODBO client.
SAP HANA Function Libraries
The functional scope of SAP HANA can be extended using special function libraries (Application Function Libraries, AFL) written in C++. With the current release level SAP HANA SPS5, these libraries must be installed manually using the SAP HANA on-site configuration tool after installing the database.
SAP HANA currently provides two application function libraries: the Business Function Library (BFL) with its own standard business functions, and the Predictive Analysis Library (PAL) for data mining and predictions based on existing historical data.
Software for Data Replication
For many application scenarios, you must use data from existing systems in SAP HANA. The process of first replicating data structures and then an existing data set (initial load) is called data replication. If the data is subsequently changed in the original system (for example, after creating a new business partner), the mirrored data is updated as well (delta load). The existing systems can be systems of the SAP Business Suite, SAP NetWeaver BW, or any other data source.
Depending on the data source and usage scenario, different mechanisms and tools can be used for replication.
To benefit from these hardware trends, SAP has been working in close cooperation with hardware manufacturers during the development of SAP HANA. Consequently, the SAP HANA database currently only runs on hardware certified by SAP.
Current hard disks provide 15,000 rpm. Assuming that the disk needs 0.5 rotations on average per access, two milliseconds are already needed for these 0.5 rotations. In addition to this, the times for positioning the read/write head and the transfer time must be added, which results in a total of about six to eight milliseconds.
When using Flash memory, no mechanical parts need be moved. This results in access times of about 200 microseconds. In SAP HANA, performance-critical data is placed in this type of memory and then loaded into the main memory.
Access to the main memory, (or DRAM, dynamic random access memory) is even faster. Typical access times are 60 to 100 nanoseconds. The exact access time depends on the access location within memory. With the NUMA architecture (non-uniform memory access) used in SAP HANA, a processor can access its own local memory faster than memory that
is within the same system but is being managed by other processors. With the currently certified systems, this memory area has a size of up to four TB.
Access times to caches in the CPU are usually indicated as clock ticks. In case of a CPU with a clock speed of 2.4 GHz, a cycle takes about 0.42 nanoseconds. The hardware certified for SAP HANA uses three caches, referred to as L1 to L3 cache. L1 cache can be accessed in three to four clock ticks, L2 cache in about ten clock ticks, and L3 cache in about 40 clock ticks. L1 cache has a size of 64 KB, L2 cache of 256 KB, and L3 cache of
30 MB. Each server comprises only one L3 cache which is used by all CPUs, while each CPU has its own L2 and L1 cache. This is illustrated in below diagram.
Main memory as the new bottleneck
When sizing an SAP HANA system, enough capacity should be assigned to place all data in the main memory so that all reading accesses can usually be executed on this memory. When accessing the data for the first time (e.g., after starting the system), the data is loaded into the main memory. You can also manually or automatically unload the data from the main memory. This can be necessary if, for example, the system tries to use
more than the available memory size.
In the past, access to the hard disk was usually the performance bottleneck; with SAP HANA, however, main memory access is now the bottleneck.
Even though these accesses are up to 100,000 times faster than hard-disk accesses, they are still four to 60 times slower than accesses to CPU caches, which is why the main memory is the new bottleneck for SAP HANA.
The algorithms in SAP HANA are implemented in such a way that they can work directly with the L1 cache in the CPU wherever possible. Data transport from the main memory to the CPU caches must therefore be kept to a minimum—which has major effects on the software innovations described in the next section.
The software innovations in SAP HANA make optimal use of the previously described hardware. This is done through two ways: By keeping the data transport between the main memory and CPU caches to a minimum (e.g., by means of compression), and by fully leveraging the CPUs using parallel threads for data processing.
SAP HANA provides software optimizations in the following areas:
- Data layout in the main memory
Data Layout in the Main Memory
In every relational database, the entries of a database table must be stored in a certain data layout.
Let’s now take a look at the third area of software innovation: partitioning. Partitioning is used whenever very large quantities of data must be maintained and managed.
Advantages of partitioning
This technique greatly facilitates data management for database administrators. A typical task is the deletion of data (such as after an archiving operation was completed successfully). There is no need to search large amounts of information for the data to be deleted; instead, database administrators can simply remove an entire partition. Moreover, partitioning can increase application performance.
There are basically two technical variants of partitioning:
- With vertical partitioning, tables are divided into smaller sections on a column basis. For a table with seven columns, column 1 to 5 could perhaps be stored in one partition, while column 6 and 7 are stored in a different partition.
- With horizontal partitioning, tables are divided into smaller sections on a row basis. Rows 1 to 1,000,000 are then perhaps stored in one partition, while rows 1,000,001 to 2,000,000 are placed in another partition.
SAP HANA supports only horizontal partitioning. The data in a table is distributed across different partitions on a row basis, while the records within the partitions are stored on a column basis.
The below Diagram shows how horizontal partitioning is used for a table with the two columns Name and Gender in case of column-based data storage. On the left side, the table is shown with a dictionary vector (DV) and an attribute vector (AV) for both the column Name and the column Gender. On the right side, the data was partitioned using the round-robin technique, which will be explained in more detail next. The consecutive rows were distributed across two partitions by turns (the first row was stored in the first partition, the second row in the second partition, the third row again in the first partition, and so on).