SAP HANA, graph option

Today we will learn about the fundamentals of Graph data structure and how with the latest SAP HANA Advanced, graph option we can solve network related problems.

What is a Graph?

A Graph is a non-linear data structure. A Graph has nodes containing information and properties of object we are trying to abstract and edges which are essentially connection between two nodes.

For example, a family hierarchy can be visualised as a Graph with persons and the family as nodes and relationships with other family members as Edges between different nodes.

Graphs are also used in social networks like LinkedIn, Facebook.

A graph contains vertices/ nodes and connection between nodes called Edges.

Below we will see a sample of how we create Edges, vertices and Graph workspace in SAP HANA.

Visualising SAP HANA, graph

Once we have defined Edges, vertices and Graph, we can visualise the data as a Graph in SAP HANA.

Navigate to your database container.

Expand your database container.

Select graph workspace.

On selection you can see the above created graph workspace in the object list below.

Right click on graph workspace and select “View Graph”.

Once you open the Graph you can see all the vertices and edges between them.

The SAP HANA, graph options supports graph script – a programming model for developing custom graph algorithms.

Graph script provides some built-in functions and algorithms.

  • Neighbors
  • Strongly Connected Components
  • Breadth First Search
  • Shortest Path One-to-One
  • Shortest Path One-to-All

Below is a sample implementation for Neighbors function.

*************************************/ -- GraphScript functions and algorithms - Neighbors function -- running in version SAP HANA 2.0 SPS04 /*************************************/ /*************************************/ -- tables and workspace DROP SCHEMA "GRAPHSCRIPT" CASCADE; -- Temporary schema CREATE SCHEMA "GRAPHSCRIPT"; --Vertex/Node table definition containing list of all nodes in the Graph CREATE COLUMN TABLE "GRAPHSCRIPT"."NODES" ( "ID"BIGINTPRIMARYKEY ); --Edges definition containing connections between vertex/ nodes. CREATE COLUMN TABLE "GRAPHSCRIPT"."EDGES" ( "ID"BIGINTGENERATEDALWAYSASIDENTITYPRIMARYKEY, "SOURCE"BIGINTREFERENCES"GRAPHSCRIPT"."NODES"("ID") ON DELETE CASCADENOTNULL, "TARGET"BIGINTREFERENCES"GRAPHSCRIPT"."NODES"("ID") ON DELETE CASCADENOTNULL ); --Sample data for Nodes and Edges Table INSERT INTO "GRAPHSCRIPT"."NODES" VALUES (1); INSERT INTO "GRAPHSCRIPT"."NODES" VALUES (2); INSERT INTO "GRAPHSCRIPT"."NODES" VALUES (3); INSERT INTO "GRAPHSCRIPT"."NODES" VALUES (4); INSERT INTO "GRAPHSCRIPT"."NODES" VALUES (5); INSERT INTO "GRAPHSCRIPT"."EDGES"("SOURCE", "TARGET") VALUES (1, 2); INSERT INTO "GRAPHSCRIPT"."EDGES"("SOURCE", "TARGET") VALUES (1, 3); INSERT INTO "GRAPHSCRIPT"."EDGES"("SOURCE", "TARGET") VALUES (2, 3); INSERT INTO "GRAPHSCRIPT"."EDGES"("SOURCE", "TARGET") VALUES (2, 4); INSERT INTO "GRAPHSCRIPT"."EDGES"("SOURCE", "TARGET") VALUES (3, 4); INSERT INTO "GRAPHSCRIPT"."EDGES"("SOURCE", "TARGET") VALUES (5, 4); --Graph data structure definition specifying the vertex table and the edges table with Source and Target for every edge CREATE GRAPH WORKSPACE "GRAPHSCRIPT"."GRAPHWS" EDGE TABLE "GRAPHSCRIPT"."EDGES" SOURCE COLUMN "SOURCE" TARGET COLUMN "TARGET" KEY COLUMN "ID" VERTEX TABLE"GRAPHSCRIPT"."NODES" KEY COLUMN "ID"; /*************************************/ -- NEIGHBORS procedure --Type definition to be used in our Neighbors Procedure CREATE TYPE "GRAPHSCRIPT"."TT_NODES_NEI" AS TABLE ("ID" BIGINT); CREATE TYPE "GRAPHSCRIPT"."TT_EDGES_NEI" AS TABLE ("ID" BIGINT, "SOURCE" BIGINT, "TARGET" BIGINT); CREATE OR REPLACE PROCEDURE "GRAPHSCRIPT"."GS_NEIGHBORS"( IN i_startNode BIGINT, -- the ID of the start node IN i_min BIGINT, -- the minimum hop distance IN i_max BIGINT, -- the maximum hop distance OUT o_nodes "GRAPHSCRIPT"."TT_NODES_NEI", OUT o_nodesCount BIGINT, OUT o_edges "GRAPHSCRIPT"."TT_EDGES_NEI" ) LANGUAGE GRAPH READS SQL DATA AS BEGIN -- create an instance of the graph, refering to the graph workspace object GRAPH g = Graph("GRAPHSCRIPT", "GRAPHWS"); -- create an instance of the start node VERTEX v_start = Vertex(:g, :i_startNode); -- create a multiset of all neighbor nodes of the start node MULTISET<Vertex> m_neighbors = Neighbors(:g, :v_start, :i_min, :i_max); -- project the result from the multiset o_nodes = SELECT :v."ID" FOREACH v IN :m_neighbors; o_nodesCount = COUNT(:m_neighbors); -- create a vertex induced subgraph to get all edges between the nodes in the neighbors multiset GRAPH g_sub = SubGraph(:g, :m_neighbors); o_edges = SELECT :e."ID", :e."SOURCE", :e."TARGET" FOREACH e IN Edges(:g_sub); END; CALL "GRAPHSCRIPT"."GS_NEIGHBORS"(i_startNode => 1, i_min => 0, i_max => 1000, o_nodes => ?, o_nodesCount => ?, o_edges => ?);​

Using the script above we have learned how to define the nodes and Edges table, define Graph using SAP HANA, graph options and create procedures to perform graph specific operations.

Do note, that while defining a graph edges and vertex table, the vertex Id has to be only primary key of the Vertex table and only the vertex Id can be referenced in edges table for source and target columns and also while defining graphs.

Hope this blog helped you learn how SAP HANA enables us to use one of the most complex and useful data structure and come up with cutting Edge analysis and breakdown of problems exhibiting network like behaviour.

Thanks for reading! For more information on SAP HANA, graph options, I will be publishing further blogs containing details about more complex operations that we can perform in SAP HANA.