Skip to content

aws-samples/amazon-textract-idp-cdk-samples-java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Welcome to your CDK Java project!

This is a blank project for CDK development with Java.

The cdk.json file tells the CDK Toolkit how to execute your app.

It is a Maven based project, so you can open this project with any Maven compatible Java IDE to build and run tests.

Useful commands

  • mvn package compile and run tests
  • cdk ls list all stacks in the app
  • cdk synth emits the synthesized CloudFormation template
  • cdk deploy deploy this stack to your default AWS account/region
  • cdk diff compare deployed stack with current state
  • cdk docs open CDK documentation

Enjoy!

Workflow

Deploy using

cdk deploy DocumentSplitterWorkflow

This samples includes a new component called DocumentSpliter, which takes and input document of type TIFF or PDF and outputs each individual page to an S3 location and adds the list of filenames to an array.

That array is then used in a Step Functions Map state and processed in parallel. Each iteration classifies the page and then in case of a W2 or paystub routes to an extraction process or not. At the end all the W2s and Paystubs are extracted and the map returns and array with the page numbers and their classification result.

DocumentSplitter

Test with a sample document using:

 aws s3 cp s3://amazon-textract-public-content/idp-cdk-samples/moby-dick-hidden-paystub-and-w2.pdf $(aws cloudformation list-exports --query 'Exports[?Name==`DocumentSplitterWorkflow-DocumentUploadLocation`].Value' --output text) 

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published