Using Azure Functions to Enable OCR Processing of Images

A couple of weeks ago I was given the opportunity of working with a partner to build a solution that would hopefully help them automate their expense (receipts) processing.

The scenario was simple:

  1. Upload Image (.png, .jpg).
  2. Extract Data from Image.
  3. Process Expense using extracted Data.

Whilst the scenario sounded simple there was a need for a reliable infrastructure to enable the processing, i.e. queue the image, hand-off the image for processing, track status, error-handling, and return success or failed state.

[Note, for the purposes of the blog post I am not going touch on how the data was extracted. This is referenced in another blog post by one of my colleagues on the team that can be found here. In my example I am going to mock the data extraction using a single callout to the Microsoft Vison API OCR (Optical Character Recognition) method.]

After several iterations the following architecture was agreed:

Receipt Processing Diagram

Walking through the steps:

  1. User takes a photo of the receipt and using a Xamarin app, chooses to upload the image for processing (Azure Blob Storage). [Note, for the purpose of this blog, rather than a Mobile Test App (as per diagram) I have included a simple .NET Console application which is not production ready and for demo / PoC purposes only]
  2. Once the image is uploaded the Xamarin app adds a message to the Expenses Queue (Azure Queue Storage) to trigger the next step in the process.
  3. The Expense Processing Function is an Azure Function that is triggered when a message is placed onto a specific Azure Queue Storage (QueueTrigger). When triggered the Azure Function creates a record in a table (Azure Table Storage) to track status and to store the success or failed state of the process.
  4. The Expense Processing Function hands off the actual processing of the image to another function. Like step 2, this is managed by placing a message on a queue which then triggers the Receipt Processing Function (Azure Function). At this point you may be wondering why there is a dotted line around the 3 boxes and it is named Smart Services. This is to suggest that these services are isolated, that they have no dependencies on any other service. This was a key ask by the partner because over time other apps, not just the Expense Processor, may need to call the receipt processing service.
  5. Once the processing had completed (success or fail) the Receipt Processing Function hands-off the result back to the calling application. To ensure isolation (as per step 5) the Receipt Processing Function simply hands-off using a callback URL as defined in the original message item. This callback Url endpoint is another Azure Function, denoted in the diagram as Processing Callback Function whose trigger this time is a HttpTrigger.
  6. The purpose of Processing Callback Function is to update the state the Receipt Processing Function. It does so by updating the table record that was created as per step 3.
  7. The Processing Callback Function also adds another message to the Expenses Queue, which in turn will again trigger the Expense Processing Function. This step is optional, but allows the Expense Processing Function to do any post-processing such as notifying a user that there expense has been processed.

[Note, for the purposes of this blog I have not included any code relating to the Web App that was built to view and manage the outputs of the OCR processing.]

So why Azure Functions? Well I’m not going to paraphrase the contents of the Azure Function documentation page but in our scenario Azure Functions fitted perfectly:

  • small discrete code classes;
  • simple bindings and triggers;
  • no complicated server infrastructure;
  • cost effective – pay as you go;
  • simple to use, simple to integrate;
  • continuous deployment;
  • scale;
  • analytics and monitoring;

Okay that’s the summary complete. Next I am going to walkthrough some of the key pieces of the solution and then finally provide instructions on where to learn how you would go about setting up continuous deployment to Azure using Visual Studio Team Services.

The Solution

All the code I describe in this blog post can be found on GitHub.

There are 2 folders: a simple image uploader console application and, the azure functions to process the image. Note, there are 2 azure function solutions: ExpenseOCRCapture which contains the Expense Processing Functions (as per diagram) that handle the processing workflow; SmartOCRService which contains the Receipt Processing Function (as per diagram) to manage the callout to the Microsoft Vision API and parse the result.

Please feel free to download the solutions and try out the code yourself. For instructions on how to deploy and run, please refer to the pre-requisite and setup instructions outlined in the readme documents in each folder.

[Note, at the time of writing to build and deploy the Azure Functions you must use Visual Studio 2017 Preview (2). For details on where to download please refer here.]

The Azure Functions

Let’s have a look at the Azure Function solutions that are in GitHub:

ExpenseOCRCapture

The contents of the expense-capture folder contains a single Visual Studio 2017 Preview (2) solution that contains two Azure Functions called ExpenseProcessor and OCRCallback.

Looking at the contents of ExpenseProcessor.cs:

image

The ExpenseProcessor function’s primary purpose is to handle the image processing workflow. The function itself is triggered by a message being added to the Azure Storage Queue, receiptQueueItem.

As well as receiptQueueItem there are several other important parameters of this function, namely:

  • receiptsTable – this is an Azure Storage Table which provide tracking status and ultimately the output of the OCR request.
  • ocrQueue – this is an Azure Storage Queue and provides the binding to allow this function to callout to the SmartOCRService that we will discuss later. Note, its connection property is set as SmartServicesStorage – this is an Application Setting key/value pair and should be the Azure Storage Connection string associated with the SmartOCRService storage account.
  • incontainer/receipts – this is an Azure Storage Blob that is used to store the image files for processing. Note, rather than sharing this blob with the SmartOCRService, this function generates a Shared Access Signature (SAS) which the OCR service then uses. This removes a dependency on the SmartOCRService thus allowing multiple blob stores and therefore multiple requestors.

Step 0 in the case statement is responsible for the primary activity of this function and is the the one that provides the SmartOCRService with the necessary information so that it can process the image:

image 

The method StartOCR(…) is responsible for creating a new message queue item of type OCRQueueMessage. The message has 4 properties:

  • ItemId – unique identifier for the image being processed
  • ItemType – the type of the image, in this case ‘receipt’ but in other solutions this may be ‘invoice’ or ‘order’
  • ImageUrl – the SAS which provides the SmartOCRService the location and permissions required to access the image in blob storage
  • Callback – so that the SmartOCRService knows where to respond to once the OCR processing is complete, a callback URL is provided. This is the HTTP address of the OCRCallback function which we will describe next plus the function key which provides the caller (SmartOCRService) the necessary authentication to call the function. This property requires 2 application settings key/value pairs to be added:
    • OCRCallbackKey – when creating an HttpTrigger the creator needs to provide the AuthLevel required to call the function. This can be one of 3 levels: function (default), anonymous, and admin. For the purpose of the OCRCallback function the auth level has been set to function (function is useful when the function is only called from another service and there is usually no user-interaction). Setting the auth level to function means that on each request the requestor must provide a key. This key can be found in the OCRCallback function manage tab as shown below. Copy the value and create a new Application Setting key/value pair with the name of the key being OCRCallbackKey.

image

    • BaseCallbackAddress – this is Url of where the OCRCallback function is hosted. This is Url of where you have published your Azure Function which will usually be something like https://myfunctions.azurewebsites.net/api You should create a new Application Setting key/value pair with the name of the key being BaseCallbackAddress

To trigger the SmartOCRService the message is simply added to the OCRQueue. An added advantage of using a queue rather than a direct request to the SmartOCRService is that the role of the ExpenseProcessor is now temporarily complete until the OCRCallback function triggers the state change or continuation of the workflow.

Step 1 of the process provides a placeholder to communicate when the processing of the image is complete (success or failure). In this simplified case, the code simply updates the receiptsTable to highlight the final status of the process. To identify which image has been processed, the ItemId is a property of the message payload.

Step 99 of the process allows the function to handle any necessary ‘retries’. As you will see as part of the OCRSmartService, there are several non-catastrophic scenarios which we may want to handle by retrying the process. This step simply restarts the process by following the steps executed as part of Step 0.

Now looking at the contents of the function OCRCallback (found in OCRCallback.cs):

image

You’ll see that its primary role is to act as conduit between the OCRSmartServices and the ExpenseProcessor. It simply takes the result of the OCRSmartService and translates it as new workflow state. This new workflow state will either be Complete, Error or Retry. In the case of Complete and Error there will be additional state captured which will either be the returned Text from the OCRSmartService in the case of Complete or the reason why the SmartOCRService failed which in this case will be the Exception message.

Its important to note that it was a desired condition by the partner to have this separation of concerns between the ExpenseProcessor and the SmartOCRService – it is imagined that overtime more processors will be put in place (e.g. InvoiceProcessor, OrderProcessor, etc.) and therefore the OCRSmartService should have no dependency on the requestor.

SmartOCRService

The contents of the smart-services folder contains a Visual Studio 2017 Preview (2) solution called SmartOCRService with one Azure Function called SmartOCRService.

As per the previous ExpenseProcessor function, the function is triggered when a message is added to the Azure Storage Queue (QueueTrigger) described by the parameter ocrQueue

The function’s key responsibility is to call out to the OCR Service (in this case the Microsoft Cognitive Vision API) and then return the result to the requestor via the callback Url provided as property of the message payload.

image

The method MakeOCRRequest  is responsible for calling out the OCR service and then determining how to handle the response:

image

Key thing to note within this function: As the function is dependent on the OCR Service being available it needs to handle the exception when the service is unavailable. If the service is unavailable the requestor may want to inform the user that they need to try later, or in our case automate that process by having the requestor retry the whole process automagically.

By default, if the function fails there will be a maximum of 5 retry attempts. If the last retry attempt fails then the original queue message is added to a poison-message queue. By adding the message to a poison-message queue means the message will not be acted upon again but provides a user some notification that the message has failed.

In our case we wanted to override this behaviour by preventing this message being added to the poison-message queue. We did this by monitoring the number of retries so that on the last retry we threw a MaxRetryException (custom Exception) which we then in-turn handled by return a result with a new status of ‘Retry’. If we go back to the previous OCRCallback function above we handled the Retry status by adding the original message back onto the SmartOCRService queue.

Note, to monitor the number of retries the function has tried then add the parameter dequeueCount, which is type int to the signature of the Run(…) method.

Continuous Integration / Deployment

At the time of writing there were certain issues setting up Continuous Integration / Deployment from within Visual Studio 2017 Preview (2). It was a requirement of the Partner that they needed this continuous infrastructure in place.

The original plan was to investigate how this could be done by trying out ourselves the setup within Visual Studio Team Services, but after some researching on the internet we found a great blog which set out the steps perfectly.

If you are interested in using Continuous Integration / Deployment I would suggest following the steps found here.

Application Monitoring

The final requirement the partner had was being able to monitor their Azure Functions. After several iterations it was decided that to get the best insights into how the application was performing was to integrate Microsoft Application Insights.

If you are interested in using Application Insights inside your Azure Functions then I would suggest you read the following blog post found here.

If you find Application Insights is overkill for your projects then I would suggest having a look at the following documentation.

Leave a Reply

Your email address will not be published. Required fields are marked *