EDUCAÇÃO E TECNOLOGIA

S4HANA Downtime Optimized Conversion Scenario to Cloud- No System Move

Scenario:

With the growing high bandwidth solutions with various clouds – GCP (Interconnect), Azure(Express Route) and Amazon (Direct Connect) – S/4 Downtime optimized Conversion/Migration option which is not available with “system move” can be executed between On-Prem and Cloud(which is a system move) with little efforts by testing your network to check the feasibility.

Caveat: SAP has to bless this during your planning sessions or do at your own risk.

How To :

  •  SUM to be started from Source system. During your options selection you do not Select “System Move”  but target Database will be the one you install on Cloud and is reachable via On-Prem.
  •  You install SAP  on Target Cloud to connect to Target DB, but SAP will be down until SUM post  step completion on source. You will start Target SAP and finish remaining steps like any post transports/Embedded Fiori ..etc from Target SAP  server.
  • As an Alternative, you can build source application server on Cloud which connects to Source DB and start SUM from there, but you will be creating memory pipe  between On-Prem and Cloud which is a long distance and might be little risky. However,  you get some downtime improvements with this approach compared with starting SUM  from source. Risk and downtime are trade off’s for both the options.

           

In above picture – ping time or Network latency 11ms(as an example) is a simple ping test with 1450 bytes between On-Prem and Cloud. This value changes with block size and BW usage.

Pros:

  • R3loads memory pipeline will be on the same host. Which ensures data is consistently communicated between their respective memory blocks. 
  • Parallel SQL requests execute from On-Prem to Cloud(Data Inserts/updates..etc), which is not something new with fast changing architectures, where DB’s are on Cloud and Client’s execute from various locations to perform various operations on Cloud DB. This Conversion/Migration will be same case, where we will be sending SQL queries/ABAP request from SHD instance in source to HANA DB on Cloud.
  • Conversion commands run through SHD Instance Target Kernel in On-Prem, but actual conversion happens in Cloud HDB. 
  • If you are using the method where you execute conversion in On-Prem and use HSR to replicate(migrate) to cloud after conversion. This downtime optimized method can be used for building Performance testing or QP system directly to cloud which results in saving additional costs for non-prod On-Prem servers. only HSR is additional step for production.

Cons:

  •  It might best suited for system with 5TB-7TB. Based on throughput and downtime. But try with your own analysis.

Pre-Test:

  •  Bandwidth vs Ping time or RTT (On-Prem and Cloud) – With multiple hops in picture, you might  see spiked RTT but less used bandwidth. With Parallel SQL process that needs to      travel  via On-Perm to Cloud network – increasing this process to maximum utilize the bandwidth will benefit and lessen ping time role in S/4 Conversion. 

SAP Standard values

Good value               : Roundtrip time <= 0.3 ms

Moderate value         : 0.3 ms < roundtrip time <= 0.7 ms

Below average value  : Roundtrip time > 0.7 ms

*** In one of NIPING testing we got around 11 ms ping time, so RTT might be little more with the Processing times at DB host. This values compared with SAP standard are way higher, But as it is pointed, we are trying to trade of b/w cost vs downtime and see if we can benefit with using high bandwidth cloud network tools.

  •  Calculate Bandwidth Delay Product  = Bandwidth * Latency (we can use ping time here)        Increase tcp send and receive buffer parameters on both source and target net.ipv4.tcp_wmem, net.ipv4.tcp_rmem accordingly. Refer SAP Note 2382421 for more information. Because our source is client and trying to send good amount of data this should be done in Client side as well(source).

You can use url https://www.switch.ch/network/tools/tcp_throughput/  to calculate as well. This decrease time of waiting for acknowledgement during send/receive buffer full situations and ensure maximum bandwidth usage.

  •   Perform Network stability test between OnPrem and Cloud – as per SAP note -500235 – Network Diagnosis with NIPING- for 24 hours. Check if any data loss.
  •   Perform Network throughput testing as per above note with various block sizes- Attached python program that will create NIPING parallel threads and talk to server that is started at target. I am not an python expert, but it gave me errors with different python version. so correct data types as needed. But this program served my purpose as I only needed to collect stats from Cloud Network metrics.

 Tested with GCP Dedicated Network – for 100 parallel NIPING throughput test – got  around 140 – 150 MBps – (Use Cloud network metrics to get this data) which will approximate to 500 – 540 GB/hour. But, This threads can be increased or decreased further according to Target HANA DB resource and Bandwidth availability. This test should be done after changing Bandwidth Delay Product parameters.  You can do similar testing with IPERF3 but results were little different for me.

                                    sample testing data capture.

import subprocess
import os
import time
import csv
TargetServer = input("Enter server hostname: ")
TotalParallel = int(input("Enter total parallel to be tested: "))
PacketSize = int(input("Enter Buffer size to be tested: "))
DelaySet = input("Do you want to run with a Delay Y or N :")
DelayValue = 0
LoopSet = input("Do you want to run this as loop Y or N :")
LoopValue = 3
if LoopSet == 'Y' : LoopValue = int(input(" Enter number of times to Loop :"))
if DelaySet == 'Y' : DelayValue = int(input(" Enter Delay Value :")) ParallelCommand = "niping -c -H {} -B {} -L {} -D {}".format(TargetServer,PacketSize,LoopValue,DelayValue) FileCommandTR = "cat FileOutPut.txt | grep tr2"
FileCommandAVG = "cat FileOutPut.txt | grep av2"
print(ParallelCommand)
Pprocesses = []
OutputProcess = []
f = open("FileOutPut.txt", "w") for IteLoop in range(TotalParallel): Pprocesses.append(subprocess.Popen([ParallelCommand], stdout=f,shell=True)) for p in Pprocesses: if p.poll() is None: p.wait() f.close() g = open("tr2file.txt", "w")
subprocess.Popen([FileCommandTR], stdout=g, shell=True)
g.close() h = open("avgfile.txt", "w")
subprocess.Popen([FileCommandAVG], stdout=h, shell=True)
h.close() commandAVGP = "cat avgfile.txt | awk '{sum+=$2} END {print(sum)}'"
commandAVG1=subprocess.Popen([commandAVGP],stdout=subprocess.PIPE,universal_newlines=True, shell=True)
time.sleep(5)
AVG1 = commandAVG1.communicate()[0]
AVG = float(AVG1)
TotalAVG = (float(AVG)/TotalParallel)
print("Total avg time for: {} Parallel process: {}".format(TotalParallel,TotalAVG)) commandTR2P = "cat tr2file.txt | awk '{sum+=$2} END {print sum}'"
commandTR21=subprocess.Popen([commandTR2P],stdout=subprocess.PIPE,universal_newlines=True, shell=True)
TR2 = float(commandTR21.communicate()[0])
TotalTR2 = TR2/TotalParallel
print("Total avg Throughput for: {} Parallel process: {}".format(TotalParallel,TotalTR2))

 

Python Program Input:

   

Python Program Output: 

Increase in average times is what tell us the delay or RTT, however if your throughput do not impact  with the increase in parallel process then there is still scope of increasing more parallel process from a network perspective. However you need to check or if necessary increase DB resources  accordingly for short duration during go-live to maximize benefit from parallelism and to reduce downtime.

Perform this method with SAP blessings or at your own risk. And, above mentioned steps are just for reference, but do explore and validate for your case respectively.  Appreciate your comments and feedback.