This Blog Post will give the overview on Dependency Data Profiling technique within SAP Information Steward data quality tool. I will explain dependency data profiling technique step by step in SAP Information Steward.
This article will guide you through step by step procedure and will give you the complete idea on usage of Dependency profiling technique.
Now Let’s begin with explaining that in detail, I will start with Dependency profiling.
Consider the below data set as an example-
Here are some key points to remember when you are performing dependency profiling in SAP Information Steward :
- Dependency profiling helps the user in identifying relationship between the columns of same table/view by defining the primary and dependent columns
- Results of this profiling feature is in the form of values for both primary and dependent columns. Values for primary columns are shown as header which can be drilled down to get values for dependent columns
How to perform the Uniqueness profiling?
To perform the dependency profiling on Table/View:
- Just select the view/table and hit on dependency profiling from the profiling options in workspace section of SAP Information Steward. Window shown in screenshot will get pop up. (See the screenshot)
- Define primary and dependents column, add the column to primary section which you want to treat as a master column and add other columns to dependent section for which you want to check the dependency/relationship with primary column
- Hit Save and Run Now button to execute the dependency profiling
Note : Dependency profiling can be performed on single table/view. Also, only one column is allowed to be added in primary column section. You can add multiple columns to dependent section. This means with dependency profiling you can check 1:1 or 1:n relationships
Important values to keep in mind :
Input Sampling Rate– How you want the records chosen. For example, if you chose a Max input size of 1,000 records and you enter a rate of 1, then the first 1000 records will be profiled. If you enter a rate of 2, then every second record of the total records in the table, up to 1000 records, will be profiled, and so on.
Max distinct values-: This field allows you to set how many values you want to display as sample in output. maximum and default is 50.
You can check the task status in Task section of SAP Information Steward, once task is complete results can be viewed in workspace section.
Reading the results generated from Dependency profiling :
To check the results of dependency profiling, you need to follow below steps-
- Go to workspace and select the view/table for which you performed dependency profiling
- On the right hand top corner, click on advanced tab
- Under advanced tab section you would see different columns with different profiling technique names
- Under dependency profiling section, you would see an green tick beside your table/view, click on that and you should be able to see the results as shown below.
Interpreting the results :
- You would be able to see the results in the form of header-item kind of structure, where header represents the primary column value and item represents the dependent column values(multiple values get separated by commas)
- Number associated with header value gives the count of dependent records on that primary value
- eg- In screenshot, Mumbai(2) represents that Mumbai city has 2 dependent employee records.
Here I complete the detailed explanation of Dependency data profiling technique in SAP Information Steward. I will be covering other types in my next posts, so be connected.
Please do provide your valuable feedback on this post in comments section, this will help me in improving my content and share more knowledge with this community.
Thanks and Happy learning!