How To Grab Data from PDFs with Power Automate AI Builder

I’ve been curious about AI Builder for a long time but didn’t have a good use case until our Land department needed to get information out of a PDF email attachment. I was able to easily build my first AI Builder model and incorporate it into a Power Automate flow. This post will walk thru all the steps for working with Power Automate AI Builder, and you can apply it to your own use case. Read on to learn more.

First, I’m going to describe my use case so you have a general idea of what I am trying to accomplish. Then, I’ll explain the high-level steps for working with AI Builder. Also, I’ll include a broad explanation of AI Builder so you have an idea of where you might use it. Next, I’ll show how I incorporated the AI Builder model into a Power Automate cloud flow. Finally, I’ll wrap up with a problem I ran into and how I fixed it.

My Use Case

My use case came from our Land department. They receive an email with one or more PDF attachments called the final completion report (FCR) for newly completed wells. There is one PDF for each well. When the email arrives, they must confirm that each well has been added to the SharePoint new well tracker with specific information.

We need the Well Name and EKey to look up the well in the new well tracker. I will use an AI Builder model for this job because the Well Name and Ekey exist only in the PDF. AI Builder has a model precisely for this task called Extract information from documents. Fortunately, the FCR PDF looks exactly the same every time. The data will always be in the same place in the PDF, which means a model can easily be trained to find this information.

Sample FCR

So, now that you know what I’m trying to do, here are the high-level steps needed to make it happen.

High-Level Steps to Use AI Builder

  1. First, select and configure a pre-built model or build a custom model from the AI Builder menu in Power Automate.
  2. Then, train and publish the model.
  3. Next, build cloud flow.
  4. Finally, include an AI Builder action that deploys the model within the Power Automate cloud flow.

Next, let’s talk about what AI Builder is.

What is AI Builder?

Microsoft AI Builder is a Microsoft Power Platform capability that provides AI models for use within the Power Platform. This means AI Builder can be used in Power Apps or Power Automate flows. This blog post focuses on Power Automate.

In AI Builder, the user can choose from several model types suited to different business scenarios like text recognition, object recognition, classification, prediction, and more. My use case employs a pre-built text recognition model. Other examples of pre-built models include a business card reader, an ID reader, models for invoice and receipt processing, and extracting information out of documents. They even have a sentiment analysis model that’s fun to play with. For example, I built a cloud flow that monitored my Outlook inbox for emails from my manager. Then, it read my manager’s email. If the model detected a negative sentiment, Power Automate sent me a phone notification. Not terribly practical, but it was fun to play with.

Click on one of the links above to see a full list of the AI models available.

Where is AI Builder?

To begin, go to the AI Builder menu on the Power Automate main page (steps 1 and 2). Farther below, I’ll show you how to build a model.

Once the model has been built, trained, and published, use AI Builder actions in the cloud flow (described in step 4 above), like my model called Document Processing.

How to Build an AI Builder Model

Select a model

To create a new model, go to the Explore submenu under AI Builder. Once built, models live under the Models menu.

From the Explore menu, search for a prebuilt model or customize a model. My use case works with the Document processing model named Extract custom information from documents.

Build the model

Next, tell AI Builder whether you are working with structured or unstructured documents. Click on the icons for a description of structured and unstructured. With a fixed layout, my PDF qualifies as a structured document. Click Next to proceed.

Now, you need to tell AI Builder what information to extract. Click the Add button for each piece of information. I asked AI Builder to look for my 2 key pieces of information, Well Name and Ekey, as well as one additional piece of information, which I will explain later.

After clicking the Add button, AI Builder will ask if the data you are looking for lives inside a field, checkbox, or table. All of my information lives in Fields. Click Next to continue.

Add Sample Documents

Now, add one or more collections of documents to train the model. Because my data will always be in the same place, I only needed one test collection. But, if your document comes in more than one format, you should add a collection for each form or format. Whenever the model runs, it will match the test collection with the document being processed to best extract the data. Click New Collection to add a collection. Then, you’ll need to add at least 5 documents. Click Next to move on.

In the next step, you’ll “tag” each piece of information in each of the collection documents. You may need to use zoom in to get precise enough. Watch the animation below to see how I move and hover over different pieces of information in the document. My mouse detects the fields in the PDF.

Hover over the data you want the model to extract. Left-click on it. Then, select the data point name from the pop-up menu. This tags or connects the field in the PDF with the data you want to extract. If you have already tagged that piece of data, you will see a green check mark. Repeat this process for all documents in the collection. Click Next to move on.

Train the model

Congratulations! You now have a model. But, it needs to be trained. In the next screen, click the Train button and wait for the training to complete.

After clicking Train, this pop up appears until training finishes. The more documents in the collection, the longer training will take.

Publish the model

After training, AI Builder takes you to the model summary screen. Here you can view model accuracy. Now, it’s easy to miss the last step in the process. You must click the Publish button for the model to be discoverable in Power Automate.

Now, the model can be hooked into a Power Automate flow.

My Cloud Flow

Before we wrap up, I want to show some of my cloud flow. I won’t go thru the entire flow, but I will explain all the steps up to AI Builder.

Trigger

First, I have an Office 365 Outlook trigger monitoring my inbox. It’s looking for an email with the subject “FCR” with an email attachment. I create a variable to hold the received date. I put the Received Time dynamic content from the trigger into the variable. Then, I convert the time to my desired format, and I grab the day of the week.

Process Attachments

Next, I use a Get Attachment action to grab the attachment. In theory, the attachment should already be part of my trigger. However, I’ve had errors when I try to get attachments out of the trigger and zero problems when using this action. So, if I am going to work with an attachment in a cloud flow, I always use the Get Attachment action.

If pdf is in the file name, then we run the model.

When adding the AI Builder action, look for an action with the model name you selected previously.

Model we chose from AI Builder menu
Action name in Power Automate

Grab Model Outputs

After the model runs, I want to grab the outputs – Well Name and Ekey – and put them into a new row in an Excel table. Just to be safe, I also use Compose actions with a trim function to get rid of any whitespace in what was pulled out of the PDF.

My cloud flow has other steps, but for the purpose of explaining how to use AI Builder, we are done. Lastly, I just want to point out that you’ll get a ton of dynamic content from the AI Builder action, including dynamic content for the pieces of data I wanted to extract.

Before I wrap up, I want to show you how I solved a problem I ran into on my first model build. I expect this will be a common problem.

Troubleshooting

The first time I ran the model, instead of getting the Ekey that I expected, which looks like “14203191-000”, what came back was “Number 14203191-000”. The word “number” was always in front of the Ekey, which is weird because the only place I see “number” is in “AFE Number” on the left-hand side of the PDF.

To troubleshoot, I opened up the PDF and did a search/find for the word “number”. I was shocked to find the word number sitting invisibly behind my Ekey. In the screenshot, you can see the highlighting from the find.

Fortunately, the fix is super simple. To deal with this, I defined another field for the model to look for called “Number”. I don’t do anything with it, but by adding it as an additional piece of information, the model separates the word number from the Ekey. I get a clean EKey, and even if “number” isn’t in all of my PDFs, the model will still perform. You could also use a compose action in Power Automate, but I wanted to see if I could get the desired outcome from the model, and I was successful.

Thanks For Reading

Wow, that was a long post. But, now you know what you can do with Power Automate AI Builder and have a solid use case to play with. If you found this useful, please do me a favor and share it on LinkedIn. I’d love to help other people learn how to use Power Automate AI Builder. Thanks for reading and have a great week!

Other Sweet Power Automate Content

6 thoughts on “How To Grab Data from PDFs with Power Automate AI Builder”

  1. Pingback: How to Fix A Common Shared Inbox Trigger Error » The Analytics Corner

  2. Pingback: Use Scope in Power Automate To Avoid Rework » The Analytics Corner

  3. Pingback: Where Do You Write Expressions in Power Automate? » The Analytics Corner

  4. Great article. I’ve built a very similar model and flow but I’m having an issue where my PDF is more than one page. I’ve tried getting adding he page range to the extract information action in the flow but no joy. Any idea how to do this?

    1. In my use case, I have a specific keyword that I’m looking for and I use that. The cloud flow loops thru pages until it finds that and then does work with the page.

  5. If anyone wants to extract data from a PDF or image without training a model for select documents, try this new GPT data extraction method: https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-Data-From-PDFs-and-Images-With-GPT/td-p/2201345

    It doesn’t require specifying certain document areas, wordings, styles, etc. It just OCRs the file, converts it to a replica text (txt), and passes it to a GPT prompt where you can ask GPT to do whatever you want with the document data.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.