This weThis week, I am working on a PowerAutomate Desktop flow that extracts data from a PDF. I need to extract numbers from a string. Rather than use regex, the Recognize Entities in Text action made this task a breeze. But, because it wasn’t without its quirks and challenges, I’m doing a write up on how to use it. Read on to learn more.
My Use Case
I’ve scraped data out of a PDF and put it into a list called OcrList. I want all of the numbers in line 1 and none of the text. The format will always be as shown, with the text “Land Number”, followed by some whitespace and then 9 numbers. I want all 9 numbers, but I won’t have to specify the number of numbers, just that I want numbers.
The screenshot below shows the 4 PAD steps required to extract the numbers. I’ll go thru each one in detail, but here are the high level steps.
- Set variable looks at my list and grabs index position one and puts it into a variable.
- Recognize entities in text pulls out the numbers and puts it into a data table.
- Set variable pulls one value from the data table and puts it into a variable.
- Convert number to text converts the number to a text so I can use it in a string later on in the flow.
The first thing I have to do is get from a list with many rows or values down to just the one row that I want to parse. I use Set Variable and the syntax shown (square brackets inside the variable reference) to grab index position 1. Note, list indexes start at 0. I put that into a variable called ItemLandNumber.
Recognize Entities in Text
The Recognize Entities in Text action lives in the Text menu as shown below.
I select Number from the Entity type drop down, and it will grab only the number. This action creates a new variable called RecognizedEntities.
Explore the Entity type drop down, and you’ll see there are a number of things PAD can recognize like percentages, a number range, currency, phone numbers, emails, or even temperature.
But, this is where it gets a little tricky because the output variable, RecognizedEntities, is a data table. It outputs to a table because PAD provides the original text and an extracted value. I can tell it’s a data table because it appears like this under flow variables.
And if I drill into the view, I see this. That means, I need to extract what I want from the table, which I do in the next step.
Set Variable Again
I use another Set Variable action and similar syntax as last time to reference the first cell in the table. Because all indexes start at 0, I use  to reference the first row and first column.
Convert Number to Text
Finally, I need to convert my number to text in one last step because it will become part of another string.
And that is how I use the Recognize entities in text action to extract numbers from a string. Maybe regex is more efficient, but I’m not great at regex. This was a better solution for me.