This bot is a sample bot that accepts Form Image inputs from the user and extract needed information into a card and reply user back. You can use structured and semi-structured Forms to Extract details.
As a test dataset I extracted some e-mails into a form structure then used these structured e-mails to extract key-value pairs.
This bot has been created using Microsoft Bot Framework.
Search data notation is like below format:
{
"id": 2,
"text": "PRIORITY", // Your Reference Text Value
"marginX": -30, // Margin to left of your value field
"marginY": 30, // Margin to left top your value field
"width": 100, // Width of your text area
"height": 100 // Height of your text area
}
The output for above details will be like below
Let's use one of the files from JFK Files like below we're targeting to extract
CLASSIFIED MESSAGE
, DEFERRED
, PRIORITY
, DTG
, INCOMING NUMBER
and DATE
values.
JSON fields for regions will be like below.
[
{
"id": 0,
"text": "CLASSIFIED MESSAGE",
"marginX": 5,
"marginY": 30,
"width": 200,
"height": 120
},
{
"id": 1,
"text": "DEFERRED",
"marginX": -30,
"marginY": 30,
"width": 100,
"height": 100
},
{
"id": 2,
"text": "PRIORITY",
"marginX": -30,
"marginY": 30,
"width": 100,
"height": 100
},
{
"id": 3,
"text": "DTG",
"marginX": 5,
"marginY": 30,
"width": 200,
"height": 200
},
{
"id": 4,
"text": "INCOMING NUMBER",
"marginX": 0,
"marginY": 30,
"width": 200,
"height": 100
},
{
"id": 5,
"text": "DATE",
"marginX": 50,
"marginY": -20,
"width": 300,
"height": 50
}
]
After above definitions search regions will be set like below
And after that, we'll be succesfully extract like below.
Let's use one of the files from JFK Files like below we're targeting to extract
FROM
, TITLE
, AGENCY ORIGINATOR
, RECORD NUMBER
, RECORD SERIES
and AGENCY FILE NUMBER
values.
JSON fields for regions will be like below.
[
{
"id": 0,
"text": "RECORD NUMBER",
"marginX": 220,
"marginY": -5,
"width": 300,
"height": 25
},
{
"id": 1,
"text": "RECORD SERIES",
"marginX": 220,
"marginY": -5,
"width": 300,
"height": 20
},
{
"id": 2,
"text": "AGENCY FILE NUMBER",
"marginX": 300,
"marginY": -5,
"width": 300,
"height": 25
},
{
"id": 3,
"text": "AGENCY ORIGINATOR",
"marginX": 200,
"marginY": -5,
"width": 150,
"height": 25
},
{
"id": 4,
"text": "FROM",
"marginX": 50,
"marginY": -5,
"width": 150,
"height": 25
},
{
"id": 5,
"text": "TITLE",
"marginX": 50,
"marginY": -5,
"width": 600,
"height": 25
}
]
After above definitions search regions will be set like below
And after that, we'll be succesfully extract like below.
In same structure let's assume our sample is like semi-structured like below e-mail.
We'll be looking for in this image, basically to detect From
, To
, Sent
, Subject
fields location first then find values next to these fields. Generally in these type for forms value width is dynamic, for this reason we're using dynamic width/heigh/margins per Key-Value pair.
[
{
"id": 0,
"text": "From",
"marginX": 5,
"marginY": -5,
"width": 800,
"height": 25
},
{
"id": 1,
"text": "Sent",
"marginX": 5,
"marginY": -5,
"width": 400,
"height": 25
},
{
"id": 2,
"text": "To",
"marginX": 5,
"marginY": -5,
"width": 300,
"height": 25
},
{
"id": 3,
"text": "Subject",
"marginX": 5,
"marginY": -5,
"width": 900,
"height": 25
}
]
After above definitions search regions will be set like below
When we use above settings our output is like below
Let's assume we have a table structured key-value pairs like below. And let's do small changes to export these fields.
I changed my SampleData.json
file under Resources like below.
[
{
"id": 0,
"text": "Date",
"marginX": 50,
"marginY": -5,
"width": 400,
"height": 25
},
{
"id": 1,
"text": "Company",
"marginX": 50,
"marginY": -5,
"width": 400,
"height": 25
},
{
"id": 2,
"text": "Total",
"marginX": 50,
"marginY": -5,
"width": 400,
"height": 25
},
{
"id": 3,
"text": "Card",
"marginX": 50,
"marginY": -5,
"width": 400,
"height": 25
},
{
"id": 4,
"text": "Method",
"marginX": 50,
"marginY": -5,
"width": 400,
"height": 25
}
]
After above changes output will be like below
Hope this will be helpful.
- Visual Studio 2017 15.7 or newer installed.
- .Net Core 2.1 or higher installed.
- Bot Framework Emulator 4.1 or newer installed
- Open BotOCRExtract.csproj in Visual Studio.
- Run the project (press
F5
key).
Microsoft Bot Framework Emulator is a desktop application that allows bot developers to test and debug their bots on localhost or running remotely through a tunnel.
- Install the Bot Framework emulator.
- Launch the Bot Framework Emulator.
- File -> Open bot and open BotOCRExtract.bot.
See Deploy your C# bot to Azure for instructions.
The deployment process assumes you have an account on Microsoft Azure and are able to log into the Microsoft Azure Portal.
If you are new to Microsoft Azure, please refer to Getting started with Azure for guidance on how to get started on Azure.