Dropping multiple HDI containers asynchronously with SAP HANA Client for Python

This blog demonstrates a simple use of SAP Client for Python with SAP HANA HDI SQL APIs (stored procedures) in an on-premise SAP HANA Database.

TL;DR: A simple example of a Python wrapper on top of the SAP HANA Client for Python (https://pypi.org/project/hdbcli/) is provided in the Git repository https://github.com/nklinked/hana-task-runner with one predefined example for containers deletion. Besides the given example, you can extend the functionality to cover your own use cases. Please refer to the great blog Secure connection from Python to SAP HANA by Witalij Rudnicki to address the security concerns.

The scenario led to such a requirement is quite unusual and most likely will not be met every time when developing HANA Native applications with XS Advanced. The main characteristics of the addressed problem are as following:

  • You develop database artifacts in SAP Web IDE for SAP HANA XS Advanced using HDI containers (SAP HANA Database Modules). Actively building application, you noticed that the number of HANA service instances, instantiated by the Web IDE for development, has significantly grown.
  • You learned that you must manually deallocate resources consumed by development versions of applications based on SAP Web IDE for SAP HANA – Known Issues. Based on the same document, you learned that you must keep the number of HDI containers, created in a single space and bound to di-builder, less than 120.
  • Attempting to delete development HANA service instances with XS CLI, you noticed that the most of them have beed deleted normally and only a few of them failed with a timeout error. Further attempts to delete those service instances remained unsuccessful with the same error.
  • Attempting to directly drop respective HDI containers you observed that the respective procedure call has taken several hours in the database. In my case this process has taken from 2 to 20 hours. Depending on the number of such containers, sequential deletion may take significant time.
  • After dropping such an HDI container in the database, you successfully removed the respective HANA service instance and its bindings from the space in XS Advanced.
  • Given the circumstances, you decided to find a way to delete problematic HDI containers in parallel without opening multiple SQL Consoles in SAP HANA Studio.

The HDI SQL APIs require the respective permissions to be granted to the invoking user. The provided example requires a database user with assigned HDI Container Group Administrator privileges for all needed Container Groups, e.g., for the default XSA-managed Container Group _SYS_DI#SYS_XS_HANA_BROKER.

Please review following steps provided as a reference to fulfill the requirements (please note that the list of granted permissions can be further narrowed down). The complete and comprehensive guidelines to manage SAP HANA HDI Containers are provided in SAP HANA Deployment Infrastructure (HDI) Reference for SAP HANA Platform.

1. User SYSTEM creates HDI Administrator and HDI Container Group Administrator users:

CREATE USER HDI_ADMIN PASSWORD YOUR_PASSWORD NO FORCE_FIRST_PASSWORD_CHANGE;
CREATE USER CONTAINER_GROUP_ADMIN PASSWORD YOUR_PASSWORD NO FORCE_FIRST_PASSWORD_CHANGE;

2. User SYSTEM grants HDI administration permissions to HDI Administrator:

CREATE LOCAL TEMPORARY TABLE #PRIVILEGES LIKE _SYS_DI.TT_API_PRIVILEGES;
INSERT INTO #PRIVILEGES (PRINCIPAL_NAME, PRIVILEGE_NAME, OBJECT_NAME) SELECT 'HDI_ADMIN', PRIVILEGE_NAME, OBJECT_NAME FROM _SYS_DI.T_DEFAULT_DI_ADMIN_PRIVILEGES;
CALL _SYS_DI.GRANT_CONTAINER_GROUP_API_PRIVILEGES('_SYS_DI', #PRIVILEGES, _SYS_DI.T_NO_PARAMETERS, ?, ?, ?);
DROP TABLE #PRIVILEGES;

3. User HDI_ADMIN (HDI Administrator) grants administration permissions for container group SYS_XS_HANA_BROKER to user CONTAINER_GROUP_ADMIN (HDI Container Group Administrator):

CREATE LOCAL TEMPORARY COLUMN TABLE #PRIVILEGES LIKE _SYS_DI.TT_API_PRIVILEGES;
INSERT INTO #PRIVILEGES (PRINCIPAL_NAME, PRIVILEGE_NAME, OBJECT_NAME) SELECT 'CONTAINER_GROUP_ADMIN', PRIVILEGE_NAME, OBJECT_NAME FROM _SYS_DI.T_DEFAULT_CONTAINER_GROUP_ADMIN_PRIVILEGES;
CALL _SYS_DI.GRANT_CONTAINER_GROUP_API_PRIVILEGES('SYS_XS_HANA_BROKER', #PRIVILEGES, _SYS_DI.T_NO_PARAMETERS, ?, ?, ?);
DROP TABLE #PRIVILEGES;

4. User CONTAINER_GROUP_ADMIN (HDI Container Group Administrator) can be used to run operations on container group SYS_XS_HANA_BROKER as an administrator.

Background

Implementing the deletion we simply need to reflect the following process using the hdbcli:

CREATE LOCAL TEMPORARY COLUMN TABLE #DROP_CONTAINER_PARAMETERS LIKE _SYS_DI.TT_PARAMETERS;
INSERT INTO #DROP_CONTAINER_PARAMETERS ( KEY, VALUE ) VALUES ( 'ignore_work', 'true' );
INSERT INTO #DROP_CONTAINER_PARAMETERS ( KEY, VALUE ) VALUES ( 'ignore_deployed', 'true' );
CALL _SYS_DI#G.DROP_CONTAINER('C', #DROP_CONTAINER_PARAMETERS, ?, ?, ?);
DROP TABLE #DROP_CONTAINER_PARAMETERS; 

Considering the requirement to run parallel operations on the given list of containers, we may design this high-level process flow:

The%20principal%20process%20schema

Project Structure

The sample project, being described, is structured as following:

hana-task-runner
├── components
│ ├── client.py
│ └── runner.py
├── config.yaml
├── requirements.txt
└── task_runner.py

Where:

  • ./components/ stores the implementation of the HANA Client wrapper and the asynchronous tasks runner
  • ./config.yaml stores client and database configurations and the list of containers to be processed
  • ./requirements.txt stores the reference to dependencies
  • ./task_runner.py is the main file of the application

Main Methods

The configuration file config.yaml describes the connection details, logging and processing settings and the list of the containers to be processed:

connection: indexserver_hostname: hostname # The indexserver hostname indexserver_port: 30015 # The indexserver SQL port container_group_admin: USER # The user with Container Group Administrator password: PLAIN_PASSWORD # The password encrypt: False # Keep False by default for not SSL-enforced connections sslValidateCertificate: False # Keep False by default client_config: output_dir: ./output logging_level: INFO # CRITICAL | ERROR | WARNING | INFO | DEBUG | NOTSET processing: max_concurrency: 3 # The allowed number of parallel operations operations: _SYS_DI#SYS_XS_HANA_BROKER: # The container group name delete: # The operation to be performed, supported options: delete - NAMED_CONTAINER # The container name

The main methods to interact with the database are implemented in class Client (./components/client.py). We will call get_database_connection() every time to instantiate a new connection to the database in every parallel thread:

 def get_database_connection(self): try: configuration = self.config['connection'] connection = dbapi.connect( address = configuration['indexserver_hostname'], port = configuration['indexserver_port'], user = configuration['container_group_admin'], password = configuration['password'], encrypt = configuration['encrypt'], sslValidateCertificate = configuration['sslValidateCertificate'], communicationTimeout = 0) except Exception as e: # pylint: disable=invalid-name address = configuration['indexserver_hostname'] port = configuration['indexserver_port'] user = configuration['container_group_admin'] logging.error( (f'Failed to establish connection to indexserver on {address}:{port} ' f'with user {user}'), exc_info=e) return None else: return connection

We will call drop_container() providing the container group name and the container name to invoke the required HDI SQL procedures:

def drop_container(self, container_group_name, container_name): try: connection = self.get_database_connection() cursor = connection.cursor() except Exception as e: # pylint: disable=invalid-name logging.error( ('Failed to establish connection to database when deleting ' f'container {container_name} from group {container_group_name}'), exc_info=e) else: try: logging.info((f'Started dropping container {container_name} ' f'from group {container_group_name}')) cursor.execute(("create local temporary column table #drop_container_parameters " "like _sys_di.tt_parameters;")) cursor.execute(("insert into #drop_container_parameters ( key, value ) " "values ( 'ignore_errors', 'true' );")) cursor.execute(("insert into #drop_container_parameters ( key, value ) " "values ( 'ignore_work', 'true' );")) cursor.execute(("insert into #drop_container_parameters ( key, value ) " "values ( 'ignore_deployed', 'true' );")) cursor.execute((f"call {container_group_name}.drop_container('{container_name}', " "#drop_container_parameters, ?, ?, ?);")) keys = [ x[0] for x in cursor.description] rows = cursor.fetchall() result_set = pd.DataFrame(rows, columns=keys) pretty_printed_results = tabulate(result_set, headers='keys', tablefmt='psql') logging.info( ( 'Results of executing SQL: ' f'call {container_group_name}.drop_container(\'{container_name}\', ' f'#drop_container_parameters, ?, ?, ?))\n' f'{pretty_printed_results}' ) ) cursor.execute('drop table #drop_container_parameters;') logging.info((f'Dropped container {container_name} ' f'from group {container_group_name}')) except Exception as e: # pylint: disable=invalid-name logging.error(( (f'Failed to drop container {container_name} ' f'from group {container_group_name}') ),exc_info=e) finally: cursor.close() connection.close()

Finally, we define class Runner (./components/runner.py) to implement methods to handle asynchronous tasks wrapped as partials, prepared function calls with given arguments:

import asyncio
import logging
import time class Runner: def __init__(self, sem_value): self.sem_value = sem_value async def execute_task(self, semaphore, partial): loop = asyncio.get_running_loop() async with semaphore: await loop.run_in_executor(None, partial) async def execute_worklist(self, loop, partials): semaphore = asyncio.Semaphore(value=self.sem_value) worklist = [] for partial in partials: worklist.append(self.execute_task(semaphore, partial)) logging.info(f'Limited to {self.sem_value} parallel tasks') await asyncio.wait(worklist) def run(self, partials): start = time.time() loop = asyncio.get_event_loop() logging.info('Started parallel processing') loop.run_until_complete(self.execute_worklist(loop, partials)) loop.close() end = time.time() logging.info(f'Finished parallel processing in {end-start}s')

Running the main program the configuration file (./config.yaml) will be read and required settings will be applied. Execution logs as well as the tabular output of invoked procedures will be stored in the log file in the specified directory.

After running the deletion on the database side, please consider the following required steps:

  1. Clean-up the respective HANA service instances from your development spaces in XS Advanced.
  2. Run XSA check-services on the XS Advanced host to identify any inconsistent service instance or service binding.
  3. Running the deletion be cautious, the deletion actually deletes the database objects.

Generally, the blog provided a brief overview of a Python-based application based on the SAP HANA Client and utilizing HDI SQL APIs. Please refer to the complete example and more comprehensive documentation in the repository https://github.com/nklinked/hana-task-runner.