DocAI Expense Parser Demo

Objective

Learn how to use Google Cloud Platform to construct a pipeline to process expenses (ie. receipts). This repo serves as a sample code to build your own demo but is not tested for production.

Visualizing the workflow

GCP Services used in the Demo

Steps to re-create this demo in your own GCP environment

Create a Google Cloud Platform Project
Enable the Cloud Document AI API, Cloud functions API, and Cloud Build API in the project you created in step #1
If you do not have access to the parser, request access via this link. Here is a link to the official Expense Parser documentation.
Create a service account that will later be used by Cloud Functions
1. Navigate to IAM & Admin -> Service Accounts
2. Click on Create a service account
3. In the Service account name section, type in process-receipt-example or a name of your choice
4. Click Create and continue
5. Grant this service account the following roles:
  - Storage Admin
  - BigQuery Admin
  - Document AI API User
6. Click Done and you should see this service account in the IAM main page
Create your Doc AI processor
- At this point, you should have your request in Step 3 approved and have access to expense parser
- Navigate to console -> Document AI -> processors
- Click Create processor and choose expense parser
- Name your processor and click Create
- Take note of your processor's region (eg. us) and processor ID
Activate your Command Shell and clone this GitHub Repo in your Command shell using the command:

gh repo clone jiya-zhang/docai-expense-parser-demo
git checkout -b v1api

Execute Bash shell scripts in your Cloud Shell terminal to create cloud resources (i.e Google Cloud Storage Buckets, Pub/Sub topics, Cloud Functions, BigQuery dataset and table)
1. Change directory to the scripts folder
```
cd docai-expense-parser-demo
```
2. Update the following values in .env.local:
  - PROJECT_ID should match your current project's ID
  - BUCKET_LOCATION is where you want the raw receipts to be stored
  - CLOUD_FUNCTION_LOCATION is where your code executes
  - CLOUD_FUNCTION_SERVICE_ACCOUNT should be the same name you created in Step 4
```
vim .env.local
```
3. Make your .sh files executable
```
chmod +x set-up-pipeline.sh
```
4. Change directory to the cloud functions folder
```
cd cloud-functions
```
5. Update the following values in .env.yaml (from your note in Step 5):
  - PARSER_LOCATION
  - PROCESSOR_ID
```
vim .env.yaml
```
6. Go back to the original folder and execute your .sh files to create cloud resources
```
cd ..
./set-up-pipeline.sh
```
Testing/Validating the demo
1. Upload a sample receipt in the input bucket (<project_id>-input-receipts)
2. At the end of the processing, you should expect your BigQuery tables to be populated with extracted entities (eg. total_amount, supplier_name, etc.)
3. With the structured data in BigQuery, we can now design downstream analytical tools to gain actionable insights as well as detect errors/frauds.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
cloud-functions		cloud-functions
table-schema		table-schema
.env.local		.env.local
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
set-up-pipeline.sh		set-up-pipeline.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocAI Expense Parser Demo

Objective

Visualizing the workflow

GCP Services used in the Demo

Steps to re-create this demo in your own GCP environment

About

Releases

Packages

Languages

jiya-zhang/docai-expense-parser-demo

Folders and files

Latest commit

History

Repository files navigation

DocAI Expense Parser Demo

Objective

Visualizing the workflow

GCP Services used in the Demo

Steps to re-create this demo in your own GCP environment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages