Python hana_ml: Store and load trained models(ModelStorage)

I am writing this blog to show basic model management using python package hana_ml.  Wtih class ModelStorage, you can save and load models.  Besides, I show State Enabled Real-Time Scoring Functions for faster prediction process.

Environment is as below.

  • Python: 3.7.13(Google Colaboratory)
  • HANA: Cloud Edition 2022.16

Python packages and their versions.

  • hana_ml: 2.13.22072200
  • pandas: 1.3.5
  • scikit-learn: 1.0.2

As for HANA Cloud, I activated scriptserver and created my users.  Though I don’t recognize other special configurations, I may miss something since our HANA Cloud was created long time before.

I didn’t use HDI here to make environment simple.

Pre-requisites

Please see another article “Python hana_ml: PAL Classification Training(UnifiedClassification)” for training process.  From step1 “Install Python packages” to step 8 “Training” are exactly same code.  Step 9, 10 and 11 are unnecessary for this article.

9. Import modules

Import other python package modules as additional.

import pprint from hana_ml.model_storage import ModelStorage

10. Save model

Just save model with class “ModelStorage” and function “save_model”.

ms = ModelStorage(conn) uc_rdt.name = 'Random Forest'
ms.save_model(model=uc_rdt, if_exists='replace')

Model metadata is stored in table “HANAML_MODEL_STORAGE”, so the both below result are same.

display(ms.list_models())
display(conn.table('HANAML_MODEL_STORAGE').collect())

Let’s look into the contents deeply.

pprint.pprint(ms.list_models().to_dict())

Though model metadata is stored in table “HANAML_MODEL_STORAGE”, model contents and other data are saved in tables under “JSON -> artifacts”, which are up to algorithm.  Help doc says as below.

The back-end model. It consists in the model returned by SAP HANA APL or SAP HANA PAL. For SAP HANA APL, it is always saved into the table HANAMl_APL_MODELS_DEFAULT, while for SAP HANA PAL, a model can be saved into different tables depending on the nature of the specified algorithm.

{'CLASS': {0: 'hana_ml.algorithms.pal.unified_classification.UnifiedClassification'}, 'JSON': {0: '{"model_attributes": {"func": "RandomDecisionTree", ' '"multi_class": null, "massive": false, "group_params": null, ' '"kwargs": {"n_estimators": 10, "max_depth": 10}}, "fit_params": ' '{"key": "ID", "features": null, "label": null, "group_key": ' 'null, "group_params": null, "purpose": null, "partition_method": ' '"stratified", "stratified_column": "CLASS", ' '"partition_random_state": null, "training_percent": 0.8, ' '"training_size": null, "ntiles": 2, "categorical_variable": ' 'null, "output_partition_result": null, "background_size": null, ' '"background_random_state": null, "build_report": true, "impute": ' 'false, "strategy": null, "strategy_by_col": null, "als_factors": ' 'null, "als_lambda": null, "als_maxit": null, "als_randomstate": ' 'null, "als_exit_threshold": null, "als_exit_interval": null, ' '"als_linsolver": null, "als_cg_maxit": null, "als_centering": ' 'null, "als_scaling": null, "kwargs": {}}, "artifacts": ' '{"schema": "I348221", "model_tables": ' '["HANAML_RANDOM_FOREST_2_MODELS_0", ' '"HANAML_RANDOM_FOREST_2_MODELS_1", ' '"HANAML_RANDOM_FOREST_2_MODELS_2", ' '"HANAML_RANDOM_FOREST_2_MODELS_3", ' '"HANAML_RANDOM_FOREST_2_MODELS_4", ' '"HANAML_RANDOM_FOREST_2_MODELS_5"], "library": "PAL"}, ' '"pal_meta": {"_fit_param": [["FUNCTION", "RDT", "string"], ' '["KEY", 1, "integer"], ["N_ESTIMATORS", 10, "integer"], ' '["MAX_DEPTH", 10, "integer"], ["PARTITION_METHOD", 2, ' '"integer"], ["PARTITION_STRATIFIED_VARIABLE", "CLASS", ' '"string"], ["PARTITION_TRAINING_PERCENT", 0.8, "float"], ' '["NTILES", 2, "integer"], ["HANDLE_MISSING_VALUE", 0, ' '"integer"], ["CATEGORICAL_VARIABLE", "CLASS", "string"]], ' '"fit_data_struct": {"ID": "INT", "X1": "DOUBLE", "X2": "DOUBLE", ' '"X3": "DOUBLE", "CLASS": "INT"}, "label": "CLASS"}}'}, 'LIBRARY': {0: 'PAL'}, 'MODEL_STORAGE_VER': {0: 1}, 'NAME': {0: 'Random Forest'}, 'SCHEDULE': {0: '{"schedule": {"status": "inactive", "schedule_time": "every ' '1 hours", "pid": null, "client": null, "connection": ' '{"userkey": "your_userkey", "encrypt": "false", ' '"sslValidateCertificate": "true"}, "hana_ml_obj": ' '"hana_ml.algorithms.pal.xxx", "init_params": {}, ' '"fit_params": {}, "training_dataset_select_statement": ' '"SELECT * FROM YOUR_TABLE"}}'}, 'STORAGE_TYPE': {0: 'default'}, 'TIMESTAMP': {0: Timestamp('2022-09-07 06:54:10')}, 'VERSION': {0: 2}}

11. Load model

Now, just load model with function “load_model”.  create_model_state is for State Enabled Real-Time Scoring Functions.

saved_model = ms.load_model(name='Random Forest')
saved_model.create_model_state()

12. Predict with loaded model

Just call “predict” function for prediction.

df_pred = saved_model.predict(test, key='ID')
print(df_pred.collect())
 ID SCORE CONFIDENCE \
0 9 0 1.0 1 13 1 1.0 2 14 0 1.0 3 16 1 0.8 4 20 0 1.0 ... ... ... ... 1995 9988 1 1.0 1996 9990 0 1.0 1997 9996 1 1.0 1998 9998 0 0.8 1999 9999 1 1.0 REASON_CODE 0 [{"attr":"X2","pct":81.0,"val":-0.350732473499... 1 [{"attr":"X2","pct":89.0,"val":-0.546387864002... 2 [{"attr":"X2","pct":82.0,"val":-0.367046185280... 3 [{"attr":"X2","pct":76.0,"val":-0.221394522848... 4 [{"attr":"X2","pct":88.0,"val":-0.470017154574... ... ... 1995 [{"attr":"X2","pct":90.0,"val":-0.490175736690... 1996 [{"attr":"X2","pct":71.0,"val":-0.333635163456... 1997 [{"attr":"X2","pct":94.0,"val":-0.510854084253... 1998 [{"attr":"X2","pct":48.0,"val":-0.140319048941... 1999 [{"attr":"X2","pct":97.0,"val":-0.498180631259... [2000 rows x 4 columns]

13. Delete model state and close connection

Delete model state and close HANA connection.  If you are testing and don’t need all models anymore, then clean_up function delete all models.

saved_model.delete_model_state()
#ms.clean_up()
conn.close()