In this blog post I will cover how the Document Information Extraction service can be used to extract information from an identity document. Also, I will discuss how this can be used to automate a manual identity check processes in your company. As an example, I will be using sample identity documents issued by the Spanish government 🇪🇸.
What is the Document Information Extraction service and how does it work?
The Document Information Extraction is part of the SAP AI Business Services portfolio. You can use it to extract information from business documents such as invoices, purchase orders, advanced payment. This service can be used to automate data extraction from digital documents instead of having a person manually extracting this.
The service uses machine learning to automate your document information extraction processes. After uploading a document, the service returns the extraction results from the fields detected in the document.
The documentation focuses on the processing of business documents and there is no mention of personal documents, e.g. identity cards. The reality is that it looks like it is possible to process basically any type of document that has some form of “structure” in its contents. Therefore, it is possible to extract the “structured information” that is included in the identity cards.
What type of process can be automated within your company?
Some HR processes can involve a proof of identity step as part of their processes. These processes are normally manual and can delay/increase the time it takes to complete the job. Instead of having a person dedicated to just checking identity documents, the process can be automated and only involve a person if absolutely necessary, e.g. there is no confidence on the results (extracted information) returned by the AI service.
Below a couple of scenarios that can be automated within your company:
- Employee onboarding: It is common to request, as part of the employee onboarding process, a scan/photo of the future employee’s identity document.
- Identity check: Government agencies request some form of proof of identity to validate the requester during an application process.
Show me the goods, how can I use the Document Information Extraction to process identity documents?
Glad you asked. In the example below, I will be using sample identity documents issued by the Spanish government. The Spanish government issues identity documents to citizens, called DNI, and also to foreign residents, called NIE.
To enable the Document Information Extraction service follow the “Set up account for Document Information Extraction and Go to Application” tutorial available at Developer Tutorials
Before extracting information from a personal identity document we first need to define a template in the service. The video below shows you how to create this template.
Below the schema used by the template documents. To create it, you will need to go to Settings > Schema Configuration in the Document Information Extraction UI and create a new custom schema.
Once the template is created, lets see how you can extract the information from a document.
Once the job is processed, then we will be able to see the extraction results.
Ok, but everything is via a UI and manual. How can I automate checking identity documents?
Creating the templates is manual but once that is done, retrieving the information from a document can be automated by using the Document Information Extraction service API https://aiservices-dox.cfapps.eu10.hana.ondemand.com/document-information-extraction/v1/swagger.json. An application, that you would need to develop, can submit a document (job in API terms) by attaching an image and specifying the fields that it wants to retrieve from the document. After some time the job will finish and the extracted information will be available via the API.
To find out how to get an OAuth Access Token for the Document Information Extraction service, follow this tutorial – https://developers.sap.com/tutorials/cp-aibus-dox-web-oauth-token.html.
Example API request to retrieve job:
curl --location --request GET 'https://aiservices-dox.cfapps.eu10.hana.ondemand.com/document-information-extraction/v1/document/jobs/ac7b1234-1234-1234-1234-12346f28b08f' \ --header 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImprdSI6Imh0dHBzOi8vc'
To summarise, we’ve seen how it is possible to process identity documents using the Document Information Extraction service part of the SAP AI Business Services. I hope this blog post helps you create your own custom templates and I look forward to hear from the cool scenarios you are automating using SAP AI Business Services.