Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[form recognizer] Remove US receipt #11764

Merged
merged 12 commits into from
Jun 3, 2020
3 changes: 3 additions & 0 deletions sdk/formrecognizer/azure-ai-formrecognizer/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@
`CustomFormModel` and `CustomFormModelInfo` models.
- `models` property of `CustomFormModel` is renamed to `submodels`
- `CustomFormSubModel` is renamed to `CustomFormSubmodel`
- Removed `USReceipt`. To see how to deal with the return value of `begin_recognize_receipts`, see the recognize receipt samples in the [samples directory](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/formrecognizer/azure-ai-formrecognizer/samples) for details.
- Removed `USReceiptItem`. To see how to access the individual items on a receipt, see the recognize receipt samples in the [samples directory](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/formrecognizer/azure-ai-formrecognizer/samples) for details.
- Removed `ReceiptType` and the `receipt_type` property from `RecognizedReceipt`. See the recognize receipt samples in the [samples directory](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/formrecognizer/azure-ai-formrecognizer/samples) for details.

**New features**

Expand Down
28 changes: 12 additions & 16 deletions sdk/formrecognizer/azure-ai-formrecognizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ for cell in table.cells:
```

### Recognize Receipts
Recognize data from USA sales receipts using a prebuilt model.
Recognize data from USA sales receipts using a prebuilt model. [Here][service_recognize_receipt] are the fields the service returns for a recognized receipt.

```python
from azure.ai.formrecognizer import FormRecognizerClient
Expand All @@ -227,21 +227,16 @@ with open("<path to your receipt>", "rb") as fd:
poller = form_recognizer_client.begin_recognize_receipts(receipt)
result = poller.result()

r = result[0]
print("Receipt contained the following values with confidences: ")
print("Receipt Type: {} has confidence: {}".format(r.receipt_type.type, r.receipt_type.confidence))
print("Merchant Name: {} has confidence: {}".format(r.merchant_name.value, r.merchant_name.confidence))
print("Transaction Date: {} has confidence: {}".format(r.transaction_date.value, r.transaction_date.confidence))
print("Receipt items:")
for item in r.receipt_items:
print("...Item Name: {} has confidence: {}".format(item.name.value, item.name.confidence))
print("...Item Quantity: {} has confidence: {}".format(item.quantity.value, item.quantity.confidence))
print("...Individual Item Price: {} has confidence: {}".format(item.price.value, item.price.confidence))
print("...Total Item Price: {} has confidence: {}".format(item.total_price.value, item.total_price.confidence))
print("Subtotal: {} has confidence: {}".format(r.subtotal.value, r.subtotal.confidence))
print("Tax: {} has confidence: {}".format(r.tax.value, r.tax.confidence))
print("Tip: {} has confidence: {}".format(r.tip.value, r.tip.confidence))
print("Total: {} has confidence: {}".format(r.total.value, r.total.confidence))
for receipt in result:
for name, field in receipt.fields.items():
if name == "Items":
print("Receipt Items:")
for idx, items in enumerate(field.value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's at all complicated to retrieve these values, it's probably worth illustrating for customers how to get them out of the receipt fields property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still accessing the field properties (including items) one by one in the samples. For the readme, based on @kristapratico , I wanted to show a simpler way of traversing everything

print("...Item #{}".format(idx))
for item_name, item in items.value.items():
print("......{}: {} has confidence {}".format(item_name, item.value, item.confidence))
else:
print("{}: {} has confidence {}".format(name, field.value, field.confidence))
```

### Train a model
Expand Down Expand Up @@ -439,6 +434,7 @@ This project has adopted the [Microsoft Open Source Code of Conduct][code_of_con
[cognitive_authentication_aad]: https://docs.microsoft.com/azure/cognitive-services/authentication#authenticate-with-azure-active-directory
[azure_identity_credentials]: ../../identity/azure-identity#credentials
[default_azure_credential]: ../../identity/azure-identity#defaultazurecredential
[service_recognize_receipt]: https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/GetAnalyzeReceiptResult

[cla]: https://cla.microsoft.com
[code_of_conduct]: https://opensource.microsoft.com/codeofconduct/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,6 @@
TrainingStatus,
CustomFormModelStatus,
FormContentType,
USReceipt,
ReceiptType,
USReceiptItem,
FormTable,
FormTableCell,
TrainingDocumentInfo,
Expand Down Expand Up @@ -45,9 +42,6 @@
'CustomFormModelStatus',
'FormContentType',
'FormContent',
'USReceipt',
'ReceiptType',
'USReceiptItem',
'FormTable',
'FormTableCell',
'TrainingDocumentInfo',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from azure.core.polling.base_polling import LROBasePolling
from ._generated._form_recognizer_client import FormRecognizerClient as FormRecognizer
from ._response_handlers import (
prepare_us_receipt,
prepare_receipt,
prepare_content_result,
prepare_form_result
)
Expand Down Expand Up @@ -74,7 +74,7 @@ def __init__(self, endpoint, credential, **kwargs):

def _receipt_callback(self, raw_response, _, headers): # pylint: disable=unused-argument
analyze_result = self._client._deserialize(AnalyzeOperationResult, raw_response)
return prepare_us_receipt(analyze_result)
return prepare_receipt(analyze_result)

@distributed_trace
def begin_recognize_receipts(self, receipt, **kwargs):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -195,82 +195,13 @@ class RecognizedReceipt(RecognizedForm):
:ivar list[~azure.ai.formrecognizer.FormPage] pages:
A list of pages recognized from the input document. Contains lines,
words, tables and page metadata.
:ivar ~azure.ai.formrecognizer.ReceiptType receipt_type:
The reciept type and confidence.
:ivar str receipt_locale: Defaults to "en-US".
"""
def __init__(self, **kwargs):
super(RecognizedReceipt, self).__init__(**kwargs)
self.receipt_type = kwargs.get("receipt_type", None)
self.receipt_locale = kwargs.get("receipt_locale", "en-US")

def __repr__(self):
return "RecognizedReceipt(form_type={}, fields={}, page_range={}, pages={}, " \
"receipt_type={}, receipt_locale={})".format(
self.form_type, repr(self.fields), repr(self.page_range), repr(self.pages),
repr(self.receipt_type), self.receipt_locale
return "RecognizedReceipt(form_type={}, fields={}, page_range={}, pages={})".format(
self.form_type, repr(self.fields), repr(self.page_range), repr(self.pages)
)[:1024]

class USReceipt(RecognizedReceipt): # pylint: disable=too-many-instance-attributes
"""Extracted fields found on the US sales receipt. Provides
attributes for accessing common fields present in US sales receipts.

:ivar ~azure.ai.formrecognizer.FormField merchant_address:
The address of the merchant.
:ivar ~azure.ai.formrecognizer.FormField merchant_name:
The name of the merchant.
:ivar ~azure.ai.formrecognizer.FormField merchant_phone_number:
The phone number associated with the merchant.
:ivar list[~azure.ai.formrecognizer.USReceiptItem] receipt_items:
The purchased items found on the receipt.
:ivar ~azure.ai.formrecognizer.FormField subtotal:
The subtotal found on the receipt
:ivar ~azure.ai.formrecognizer.FormField tax:
The tax value found on the receipt.
:ivar ~azure.ai.formrecognizer.FormField tip:
The tip value found on the receipt.
:ivar ~azure.ai.formrecognizer.FormField total:
The total amount found on the receipt.
:ivar ~azure.ai.formrecognizer.FormField transaction_date:
The transaction date of the sale.
:ivar ~azure.ai.formrecognizer.FormField transaction_time:
The transaction time of the sale.
:ivar fields:
A dictionary of the fields found on the receipt.
:vartype fields: dict[str, ~azure.ai.formrecognizer.FormField]
:ivar ~azure.ai.formrecognizer.FormPageRange page_range:
The first and last page number of the input receipt.
:ivar list[~azure.ai.formrecognizer.FormPage] pages:
Contains page metadata such as page width, length, text angle, unit.
If `include_text_content=True` is passed, contains a list
of extracted text lines for each page in the input document.
:ivar str form_type: The type of form.
"""

def __init__(self, **kwargs):
super(USReceipt, self).__init__(**kwargs)
self.merchant_address = kwargs.get("merchant_address", None)
self.merchant_name = kwargs.get("merchant_name", None)
self.merchant_phone_number = kwargs.get("merchant_phone_number", None)
self.receipt_items = kwargs.get("receipt_items", None)
self.subtotal = kwargs.get("subtotal", None)
self.tax = kwargs.get("tax", None)
self.tip = kwargs.get("tip", None)
self.total = kwargs.get("total", None)
self.transaction_date = kwargs.get("transaction_date", None)
self.transaction_time = kwargs.get("transaction_time", None)

def __repr__(self):
return "USReceipt(merchant_address={}, merchant_name={}, merchant_phone_number={}, " \
"receipt_type={}, receipt_items={}, subtotal={}, tax={}, tip={}, total={}, "\
"transaction_date={}, transaction_time={}, fields={}, page_range={}, pages={}, " \
"form_type={}, receipt_locale={})".format(
repr(self.merchant_address), repr(self.merchant_name), repr(self.merchant_phone_number),
repr(self.receipt_type), repr(self.receipt_items), repr(self.subtotal), repr(self.tax),
repr(self.tip), repr(self.total), repr(self.transaction_date), repr(self.transaction_time),
repr(self.fields), repr(self.page_range), repr(self.pages), self.form_type, self.receipt_locale
)[:1024]


class FormField(object):
"""Represents a field recognized in an input form.
Expand Down Expand Up @@ -513,69 +444,6 @@ def __repr__(self):
)[:1024]


class ReceiptType(object):
"""The type of the analyzed US receipt and the confidence
value of that type.

:ivar str type: The type of the receipt. For example, "Itemized",
"CreditCard", "Gas", "Parking", "Gas", "Other".
:ivar float confidence:
Measures the degree of certainty of the recognition result. Value is between [0.0, 1.0].
"""

def __init__(self, **kwargs):
self.type = kwargs.get("type", None)
self.confidence = kwargs.get("confidence", None)

@classmethod
def _from_generated(cls, item):
return cls(
type=item.value_string,
confidence=adjust_confidence(item.confidence)) if item else None

def __repr__(self):
return "ReceiptType(type={}, confidence={})".format(self.type, self.confidence)[:1024]


class USReceiptItem(object):
"""A receipt item on a US sales receipt.
Contains the item name, quantity, price, and total price.

:ivar ~azure.ai.formrecognizer.FormField name:
The name of the item.
:ivar ~azure.ai.formrecognizer.FormField quantity:
The quantity associated with this item.
:ivar ~azure.ai.formrecognizer.FormField price:
The price of a single unit of this item.
:ivar ~azure.ai.formrecognizer.FormField total_price:
The total price of this item, taking the quantity into account.
"""

def __init__(self, **kwargs):
self.name = kwargs.get("name", None)
self.quantity = kwargs.get("quantity", None)
self.price = kwargs.get("price", None)
self.total_price = kwargs.get("total_price", None)

@classmethod
def _from_generated(cls, items, read_result):
try:
receipt_item = items.value_array
return [cls(
name=FormField._from_generated("Name", item.value_object.get("Name"), read_result),
quantity=FormField._from_generated("Quantity", item.value_object.get("Quantity"), read_result),
price=FormField._from_generated("Price", item.value_object.get("Price"), read_result),
total_price=FormField._from_generated("TotalPrice", item.value_object.get("TotalPrice"), read_result),
) for item in receipt_item]
except AttributeError:
return []

def __repr__(self):
return "USReceiptItem(name={}, quantity={}, price={}, total_price={})".format(
repr(self.name), repr(self.quantity), repr(self.price), repr(self.total_price)
)[:1024]


class FormTable(object):
"""Information about the extracted table contained on a page.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,64 +7,33 @@
# pylint: disable=protected-access

from ._models import (
USReceipt,
ReceiptType,
FormField,
USReceiptItem,
FormPage,
FormLine,
FormTable,
FormTableCell,
FormPageRange,
RecognizedForm
RecognizedForm,
RecognizedReceipt
)


def prepare_us_receipt(response):
def prepare_receipt(response):
receipts = []
read_result = response.analyze_result.read_results
document_result = response.analyze_result.document_results
form_page = FormPage._from_generated(read_result)

for page in document_result:
if page.fields is None:
receipt = USReceipt(
receipt = RecognizedReceipt(
page_range=FormPageRange(first_page_number=page.page_range[0], last_page_number=page.page_range[1]),
pages=form_page[page.page_range[0]-1:page.page_range[1]],
form_type=page.doc_type,
)
receipts.append(receipt)
continue
receipt = USReceipt(
merchant_address=FormField._from_generated(
"MerchantAddress", page.fields.get("MerchantAddress"), read_result
),
merchant_name=FormField._from_generated(
"MerchantName", page.fields.get("MerchantName"), read_result
),
merchant_phone_number=FormField._from_generated(
"MerchantPhoneNumber",
page.fields.get("MerchantPhoneNumber"),
read_result,
),
receipt_type=ReceiptType._from_generated(page.fields.get("ReceiptType")),
receipt_items=USReceiptItem._from_generated(
page.fields.get("Items"), read_result
),
subtotal=FormField._from_generated(
"Subtotal", page.fields.get("Subtotal"), read_result
),
tax=FormField._from_generated("Tax", page.fields.get("Tax"), read_result),
tip=FormField._from_generated("Tip", page.fields.get("Tip"), read_result),
total=FormField._from_generated(
"Total", page.fields.get("Total"), read_result
),
transaction_date=FormField._from_generated(
"TransactionDate", page.fields.get("TransactionDate"), read_result
),
transaction_time=FormField._from_generated(
"TransactionTime", page.fields.get("TransactionTime"), read_result
),
receipt = RecognizedReceipt(
page_range=FormPageRange(
first_page_number=page.page_range[0], last_page_number=page.page_range[1]
),
Expand All @@ -73,7 +42,7 @@ def prepare_us_receipt(response):
fields={
key: FormField._from_generated(key, value, read_result)
for key, value in page.fields.items()
},
}
)

receipts.append(receipt)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from azure.core.polling.async_base_polling import AsyncLROBasePolling
from .._generated.aio._form_recognizer_client_async import FormRecognizerClient as FormRecognizer
from .._response_handlers import (
prepare_us_receipt,
prepare_receipt,
prepare_content_result,
prepare_form_result
)
Expand Down Expand Up @@ -84,7 +84,7 @@ def __init__(

def _receipt_callback(self, raw_response, _, headers): # pylint: disable=unused-argument
analyze_result = self._client._deserialize(AnalyzeOperationResult, raw_response)
return prepare_us_receipt(analyze_result)
return prepare_receipt(analyze_result)

@distributed_trace_async
async def recognize_receipts(
Expand Down
Loading