This solution accelerator empowers companies to automate the processing of PDF forms to modernize their operations, save time, and reduce cost.
The solution leverages Azure AI Document Intelligence to extract data from PDF forms and store the data in a Azure Cosmos DB.
The below architecture diagram illustrates the main components and information flow of this solution accelerator:
-
PDF forms are uploaded to a container in Azure Data Lake Storage Gen2 (ADLS Gen2).
-
When PDF forms are uploaded to the container, an Azure Logic App is triggered to start the processing of the PDF form(s).
-
The Logic App sends the PDF file location to an Azure Functions app for processing.
-
The Azure Functions app receives the location of file and performs the following:
- Splits the file into single pages if the file has multiple pages, with each page containing one independent form and saves them to an Azure Data Lake Storage Gen2.
- Sends the location of the single page PDF file to Azure Document Intelligence for processing via a REST API (HTTPS POST)and receives response.
- Prepares the response into the desired data structure.
- Saves the structured data as a JSON file to another Azure Data Lake Storage Gen2 container.
-
The Logic App receives the processed response data from the Azure Functions app and sends the processed data to Azure Cosmos DB.
-
Azure Cosmos DB saves the data into specified database and collections.
-
Power BI is connected to the Azure Cosmos DB to extract data and provide insights.
- An Azure subscription.
- Install latest version of Azure CLI
- Install latest version of Bicep
- Install latest version Azure Functions Core Tools
- If you wish to connect Power BI see the a report over data processed using Azure AI Document Intelligence, install Power BI Desktop.
- Clone this repo
Follow the steps below to set up your Azure resources, create the Document Intelligence model, test the solution, and visualize the results in Power BI.
Follow instructions in the folder 1_infra: Deployment Scripts Guide.
Follow instructions in the folder 2_machine_learning_model : Machine Learning Model Guide.
Follow instructions in the 4_solution_testing folder: Solution Testing Guide.
Follow instructions in the 5_power_bi folder: PowerBI Model Guide.
This solution deploys the infrastructure needed for processing form data and loading the data into a database. After deploying the solution in your Azure subscription and testing the solution as described, you can extend the solution to work with you own files and data.
This solution includes a labeled dataset generated from PDF files. To create your own labeled datasets, follow this guide on generating labeled datasets.
When testing the solution, you will manually upload PDF files to blob storage. For you own environment, you may wish to automate uploading the files to blob storage. There's a host of options to automate the uploading of files, including Azure Data Factory, Azure Logic Apps, Azure Functions, and Power Automate.
This Azure Logic app in this solution loads data into Azure Cosmos DB. You can change the Logic App connection to load the into your preferred Azure Data Platform, including Azure SQL DB, Azure SQL DB or Azure Data Lake Gen2.

