-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Boudams now has a mode system, that will allow much more flexibility …
…in the future - Legacy mode is "simple-space" - A new mode, "advanced-space" allows for training a model on string that have spaces in them commit 1d06489 Author: Thibault Clérice <[email protected]> Date: Tue Apr 12 11:14:54 2022 +0200 Why not commit b2541e9 Author: Thibault Clérice <[email protected]> Date: Tue Apr 12 11:14:20 2022 +0200 Removed out-commented data commit 824325a Author: Thibault Clérice <[email protected]> Date: Tue Apr 12 11:10:15 2022 +0200 Probably working model tagging commit 72e8cd9 Author: Thibault Clérice <[email protected]> Date: Tue Apr 12 09:41:12 2022 +0200 Mode AdvancedSpace is working, need to see at training time now commit ec9904c Author: Thibault Clérice <[email protected]> Date: Mon Apr 11 16:54:40 2022 +0200 Working SimpleSpaceMode commit c110f52 Author: Thibault Clérice <[email protected]> Date: Mon Apr 11 16:21:10 2022 +0200 Add gitignore commit 2580abe Merge: 42662eb 68a4ee6 Author: Thibault Clérice <[email protected]> Date: Mon Apr 11 16:19:15 2022 +0200 Merge branch '1.0.0/new-data-formats' of github.com:PonteIneptique/boudams into 1.0.0/new-data-formats commit 42662eb Author: Thibault Clérice <[email protected]> Date: Mon Apr 11 16:17:31 2022 +0200 [WIP] Cli should be working commit a1ba7b3 Author: Thibault Clérice <[email protected]> Date: Mon Apr 11 15:29:13 2022 +0200 [WIP] Working on splitter for data generation commit 240158c Author: Thibault Clérice <[email protected]> Date: Tue Apr 5 17:01:25 2022 +0200 [WIP] Moving the mask mechanism to a new Mode class commit 68a4ee6 Author: Thibault Clérice <[email protected]> Date: Tue Apr 5 17:01:25 2022 +0200 [WIP] Moving the mask mechanism to a new Mode class
- Loading branch information
1 parent
149c9a1
commit d8ed1d5
Showing
26 changed files
with
123,383 additions
and
511 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Custom Architecture Building String | ||
|
||
*This is following the example of [VGSL specs](https://tesseract-ocr.github.io/tessdoc/tess4/VGSLSpecs.html).* | ||
|
||
The new spec system is built around custom architecture strings. | ||
|
||
Available modules: | ||
|
||
- `C[A]<x>,<d>` uses a convolutional layer where `x` is the n-gram window and `d` the output. | ||
- `CP[A]<x>,<d>` uses a convolutional layer with positional embeddings where `x` is the n-gram window and `d` the output. | ||
- `L[A]<h>,<l>` uses a Bi-LSTM layer where `h` is the hidden size and `l` the number of layers. | ||
- `G[A]<h>,<l>` uses a Bi-GRU layer where `h` is the hidden size and `l` the number of layers. | ||
- `D<r>` uses a Dropout layer with a rate of `r` | ||
- `L<d>` uses a Linear layer of dimension `d` | ||
|
||
`[A]` can be replaced with an activation layer, such as: | ||
|
||
- `s` = sigmoid | ||
- `t` = tanh | ||
- `r` = relu | ||
- `l` = linear (i.e., No non-linearity) | ||
- `m` = softmax | ||
- `n` = n/a | ||
|
||
The VGSL module must starts with an embedding size: `E<dim>`. | ||
|
||
Example: `[E200 L120 L200 Cr3,10 D3]` will use a Convolutional Layer of (3 ngram for 10 of dim) and a relu activation | ||
over which 30% of dropout is applied before classification |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.