Skip to content

Latest commit

 

History

History
25 lines (17 loc) · 805 Bytes

layoutlm.md

File metadata and controls

25 lines (17 loc) · 805 Bytes
tags
transformers
ml

LayoutLM

LayoutLM is a Transformer-based model combining image and text information to understand structured documents such as scanned receipts or forms. There are three versions of the model:

  • v1 -- combines text and layout information
  • v2 -- combines text, layout and image information
  • v3 -- simplifies processing of v2 into a single transformer

The first version of the model, described by while the second version, introduced by , also uses image features.

The v1 and v2 differ quite dramatically so this note describes v1 only briefly as an introduction to processing text and image. Rest of the note is dedicated to v2 only.

TODO: there is also a v3 ...