From d6bd77ef99023ffa16c7dedc9f58e535ad752183 Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Sun, 1 Oct 2023 18:42:43 -0400 Subject: [PATCH 01/23] Add research.md --- .../research.md | 26 +++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 trocr/evaluation-dataset/handwritten-typed-text-classification/research.md diff --git a/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md b/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md new file mode 100644 index 0000000..5e45a7c --- /dev/null +++ b/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md @@ -0,0 +1,26 @@ +# [Research] TrOCR Encoder + FFN Decoder + +#### Overview +To create a robust classification model for our task, multiple Convolutional Neural Network (CNN) models were explored and assessed. Details of each attempted model, along with their respective implementations, can be accessed in the [Introduction section](https://github.com/BU-Spark/ml-herbarium/blob/dev/trocr/evaluation-dataset/handwritten-typed-text-classification/notebooks/Classifier_NN.ipynb) of the project's Jupyter Notebook. + +#### Issues Encountered with CNNs +During experimentation, I identified fundamental limitations with how CNNs process images containing text, affecting our ability to accurately classify text in images into either handwritten or machine-printed categories. + +In specific, it was observed that the text in images, particularly handwritten text, constitutes a minimal portion of the image in terms of pixel count, thereby reducing our Region of Interest (ROI). This small ROI posed challenges in information retention and propagation when image filters were applied, leading to the loss of textual details. To mitigate this, I employed the morphological operation of **erosion** on binarized images to emphasize the text, effectively enlarging the ROI. This process proved useful in counteracting some of the undesirable effects of CNN filters and preserving the integrity of the text in the images. + +#### Methodology +Given the encountered limitations with CNNs, I approached the classification task in two primary steps to circumvent the challenges: + +1. **Feature Extraction with TrOCR Encoder:** + Leveraged the encoder part of the TrOCR model to obtain reliable feature representations from the images, focusing on capturing inherent characteristics of text. TrOCR encoder was used because, unlike CNNs the TrOCR feature representations contain textual details which would then be used to decode to characters. In essence, the encoder preserves textual information that CNNs might not. + +2. **Training a Custom FFN Decoder:** + Employed a custom Feed-Forward Neural Network (FFN) as the decoder to make predictions based on the feature representations extracted from the encoder. The model was trained specifically to discern the subtle differences in features between the two categories. + +This methodology enabled to maintain a high level of accuracy and reliability in our classification task while overcoming the inherent shortcomings identified in CNN models for processing images with text. + +#### Readings + +The [Sequence to Sequence Learning with Neural Networks](https://arxiv.org/abs/1409.3215) paper inspired me to use this encoder-decoder architecture. In this paper, the authors use multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Additionally, BERT-like architectures also act as an inspiration to the encoder-decoder paradigm. + +This approach of utilizing an FFN as a decoder, post feature extraction, is important in handling various classification tasks, especially when dealing with specialized forms of data like text in images because it allows us to define a custom network specific to our task. From 3c041a9d5707fa514d620b001a0b2b6ba27f505e Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Sun, 1 Oct 2023 20:44:34 -0400 Subject: [PATCH 02/23] Add research.md --- trocr/label-extraction/research.md | 32 ++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100644 trocr/label-extraction/research.md diff --git a/trocr/label-extraction/research.md b/trocr/label-extraction/research.md new file mode 100644 index 0000000..2625d51 --- /dev/null +++ b/trocr/label-extraction/research.md @@ -0,0 +1,32 @@ +# [Research] DETR (DEtection TRansformer) + +#### Overview +To choose an optimal model for detecting labels in plant sample images, a review of various models was undertaken. The task was to discern labels from plant specimen images, with potential models including LayoutLMv3 and DETR. A detailed comparison and a critical review of the models led to an optimal model selection, aligning with the project's specific goals and constraints. + +#### Analysis of Models +During the model selection process, **BioBERT** and **LayoutLMv3** were meticulously analyzed, with the former already in use in our post-OCR step. + +- **BioBERT:** + The BioBERT paper emphasizes the significance of pre-training with biomedical text. The model, trained over 23 days utilizing 8 V100 GPUs, exhibited superior performance over pre-trained BERT in scientific Named Entity Recognition (NER) tasks. + +- **LayoutLMv3:** + LayoutLMv3 initialized its text modality with RoBERTa weights and underwent subsequent pre-training on the IIT-CDIP dataset. The multi-model nature of the model could prove effective as well. + +An in-depth reading of these papers raised concerns over the loss of nuanced information learned from pre-training on medical text, which could potentially be a setback for the project. The risk was highlighted by our objective to focus information extraction solely from the labels on our specimen images, and implementing LayoutLMv3 could potentially deviate us from this goal. + +#### Rationale for Model Selection +Given the potential limitations and changes required to the existing pipeline, to have BioBERT as an isolated post-processing step was preferred. This would offer flexibility in integrating later models like **SciBERT** and leveraging off-the-shelf models pre-trained on biomedical text. + +With the constrained timeline, aiming to label adequate data, pre-training the text modality of the LayoutLMv3 model, and documenting the results appeared ambitious. + +Therefore, given the considerations and project alignment, DETR was opted for as the preferred model to detect labels in our specimen images. DETR’s proficiency in detecting objects, in our case labels (which are in essence, rectangular shapes) made it a fitting choice, as it synchronized well with our usecase.Additionally, integrating LayoutLMv3 would have necessitated considerable modifications to the existing pipeline, risking the loss of advantages gained from the pre-trained BioBERT. + +The model's availability on Hugging Face is also a major factor in terms of codebase maintainability and has made it an optimal choice for our task. Please feel free to checkout other models for object detection on "Papers with code". + +#### About DETR + +DETR leverages the transformer architecture, predominantly used for NLP tasks, to process image data effectively, making it stand out from traditional CNN-based detection models. It fundamentally alters the conventional object detection paradigms, removing the need for anchor boxes and employing a bipartite matching loss to handle objects of different scales and aspect ratios, thereby mitigating issues prevalent in region proposal-based methods. The model enables processing both convolutional features and positional encodings concurrently, optimizing spatial understanding within images. + +On benchmark datasets like COCO, DETR exhibits better performance, demonstrating its ability to optimize the Intersection over Union (IoU) metric, while maintaining high recall rates. It uses a set-based global loss, which helps in overcoming issues related to occlusion and object density, establishing a higher benchmark for complex tasks. + +Its application has extended to medical image analysis, where precise detection is pivotal. It has been especially impactful in instances where identifying and localizing multiple objects within images is crucial, such as in surveillance and autonomous vehicle navigation. From d8eb1385277461c32c9c0db5fa0040f8a75fc2b8 Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Mon, 2 Oct 2023 00:02:46 -0400 Subject: [PATCH 03/23] Update deployment instructions --- trocr/README.md | 32 ++++++++++++++++++++------------ 1 file changed, 20 insertions(+), 12 deletions(-) diff --git a/trocr/README.md b/trocr/README.md index 371ad64..1f4de7f 100644 --- a/trocr/README.md +++ b/trocr/README.md @@ -15,7 +15,7 @@ This notebook is identical to the above notebook, except it does not include the # Model Deployment Files ## trocr_with_detr_transcription.py -This is the main script for running the pipeline. This script performs several operations, such as object detection, text extraction, and Named Entity Recognition (NER) on a set of images. +This is the **main (inference)** script for running the pipeline. This script performs several operations, such as object detection, text extraction, and Named Entity Recognition (NER) on a set of images. 1. First, it initializes and runs a model called DETR to identify labels in each image and save their bounding boxes to a pickle file. 2. Second, it runs a text detection model called CRAFT on the images to identify areas containing text, saving these bounding areas to another pickle file. @@ -43,12 +43,12 @@ python trocr_with_detr_transcription.py --input-dir /path/to/input --save-dir /p ## trocr.py Contains all the functions which relate to running the trocr portion of the pipeline ## utilities.py -Contains a number of functions which are primarily related to the invluded CVIT_Training.py file. +Contains a number of functions which are primarily related to the included CVIT_Training.py file. (Not in use currently) ## requirements.txt All required python installs for running the pipeline -## trocr_env.txt +## trocr_env.yml Conda environment configuration to run the pipeline. # Deployment instructions @@ -65,32 +65,40 @@ cd ml-herbarium git checkout dev cd trocr ``` -Create a new conda environment and activate it +Create a new conda environment with the environment YAML file and activate it ``` -conda create -n my-conda-env python=3.9 +conda env create -n my-conda-env --file=trocr_env.yml conda activate my-conda-env ``` -Install all required packages and Jupter +Install Jupter and required packages ``` conda install jupyter -pip install -r requirements.txt -pip install taxonerd +pip install transformers==4.27.0 --no-deps ``` -Currently, the setup uses `en_core_eco_biobert` model for entity recognition as part of the TaxoNERD pipeline. To download and add the model, run the folllowing command. +We install the `transformers` package separately because of the dependency requirement that is imposed by `spacy-transformers`. The dependency does not cause any issues, albeit throwing an installation error. + +Currently, the setup uses `en_core_eco_md` (and `en_core_eco_biobert`) models for entity recognition as part of the TaxoNERD pipeline. To download and add the models, run the following commands. ``` +pip install https://github.com/nleguillarme/taxonerd/releases/download/v1.5.0/en_core_eco_md-1.0.2.tar.gz pip install https://github.com/nleguillarme/taxonerd/releases/download/v1.5.0/en_core_eco_biobert-1.0.2.tar.gz ``` -> **NOTE [SCC ONLY]:** If the `spacy` module throws an import error, you might have to uninstall the cublass package that is already installed, using the command `pip uninstall nvidia-cublas-cu11`. This is to avoid conflicts between the cuda module loaded in SCC and the installed packages from the requirements file. - Other available models can be viewed [here](https://github.com/nleguillarme/taxonerd#models). Respective model installation instructions can be found [here](https://github.com/nleguillarme/taxonerd#models:~:text=To%20download%20the%20models%3A). +To use the spaCy models for extracting location and date information, please run the following commands. +``` +python -m spacy download en_core_web_sm +python -m spacy download en_core_web_md +python -m spacy download en_core_web_trf +``` + + To start Jupyter Notebooks in the current folder, use the command ``` jupyter notebook ``` -To run the pipeline, please execute the `cleaned_trocr_test.ipynb` notebook in the current (`trocr`) folder. +To run the pipeline, please execute the [`trocr_with_detr_label_extraction.ipynb`](https://github.com/BU-Spark/ml-herbarium/blob/research-doc-patch/trocr/trocr_with_detr_label_extraction.ipynb) notebook in the current (`trocr`) folder. > **NOTE:** It is HIGHLY recommended to run the pipeline on a GPU (V100(16 GB) on SCC is recommended so that multiple models in the pipeline can be hosted on the GPU; smaller GPUs have not been tested). Running on the CPU is significantly slower. From 02105a6efab8080fa6cbe136b571fd15db3eba56 Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Mon, 2 Oct 2023 00:08:46 -0400 Subject: [PATCH 04/23] Delete trocr/trocr_env.txt --- trocr/trocr_env.txt | 234 -------------------------------------------- 1 file changed, 234 deletions(-) delete mode 100644 trocr/trocr_env.txt diff --git a/trocr/trocr_env.txt b/trocr/trocr_env.txt deleted file mode 100644 index b93af1b..0000000 --- a/trocr/trocr_env.txt +++ /dev/null @@ -1,234 +0,0 @@ -# This file may be used to create an environment using: -# $ conda create --name --file -# platform: linux-64 -@EXPLICIT -https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2023.7.22-hbcca054_0.conda -https://conda.anaconda.org/conda-forge/noarch/font-ttf-dejavu-sans-mono-2.37-hab24e00_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/font-ttf-inconsolata-3.000-h77eed37_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/font-ttf-source-code-pro-2.038-h77eed37_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/font-ttf-ubuntu-0.83-hab24e00_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/ld_impl_linux-64-2.40-h41732ed_0.conda -https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-13.1.0-hfd8a6a1_0.conda -https://conda.anaconda.org/conda-forge/linux-64/python_abi-3.9-3_cp39.conda -https://conda.anaconda.org/conda-forge/noarch/tzdata-2023c-h71feb2d_0.conda -https://conda.anaconda.org/conda-forge/noarch/fonts-conda-forge-1-0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/libgomp-13.1.0-he5830b7_0.conda -https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-2_gnu.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/fonts-conda-ecosystem-1-0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-13.1.0-he5830b7_0.conda -https://conda.anaconda.org/conda-forge/linux-64/alsa-lib-1.2.8-h166bdaf_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/attr-2.5.1-h166bdaf_1.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/bzip2-1.0.8-h7f98852_4.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/gettext-0.21.1-h27087fc_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/gmp-6.2.1-h58526e2_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/graphite2-1.3.13-h58526e2_1001.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/icu-72.1-hcb278e6_0.conda -https://conda.anaconda.org/conda-forge/linux-64/keyutils-1.6.1-h166bdaf_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/lame-3.100-h166bdaf_1003.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/libexpat-2.5.0-hcb278e6_1.conda -https://conda.anaconda.org/conda-forge/linux-64/libffi-3.4.2-h7f98852_5.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/libiconv-1.17-h166bdaf_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/libjpeg-turbo-2.1.5.1-h0b41bf4_0.conda -https://conda.anaconda.org/conda-forge/linux-64/libnsl-2.0.0-h7f98852_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/libogg-1.3.4-h7f98852_1.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/libopus-1.3.1-h7f98852_1.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/libsodium-1.0.18-h36c2ea0_1.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/libuuid-2.38.1-h0b41bf4_0.conda -https://conda.anaconda.org/conda-forge/linux-64/libzlib-1.2.13-hd590300_5.conda -https://conda.anaconda.org/conda-forge/linux-64/lz4-c-1.9.4-hcb278e6_0.conda -https://conda.anaconda.org/conda-forge/linux-64/mpg123-1.31.3-hcb278e6_0.conda -https://conda.anaconda.org/conda-forge/linux-64/ncurses-6.4-hcb278e6_0.conda -https://conda.anaconda.org/conda-forge/linux-64/nspr-4.35-h27087fc_0.conda -https://conda.anaconda.org/conda-forge/linux-64/openssl-3.1.2-hd590300_0.conda -https://conda.anaconda.org/conda-forge/linux-64/pixman-0.40.0-h36c2ea0_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/pthread-stubs-0.4-h36c2ea0_1001.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/xorg-kbproto-1.0.7-h7f98852_1002.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/xorg-libice-1.1.1-hd590300_0.conda -https://conda.anaconda.org/conda-forge/linux-64/xorg-libxau-1.0.11-hd590300_0.conda -https://conda.anaconda.org/conda-forge/linux-64/xorg-libxdmcp-1.1.3-h7f98852_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/xorg-renderproto-0.11.1-h7f98852_1002.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/xorg-xextproto-7.3.0-h0b41bf4_1003.conda -https://conda.anaconda.org/conda-forge/linux-64/xorg-xf86vidmodeproto-2.3.1-h7f98852_1002.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/xorg-xproto-7.0.31-h7f98852_1007.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/xz-5.2.6-h166bdaf_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/yaml-0.2.5-h7f98852_2.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/expat-2.5.0-hcb278e6_1.conda -https://conda.anaconda.org/conda-forge/linux-64/libcap-2.69-h0f662aa_0.conda -https://conda.anaconda.org/conda-forge/linux-64/libedit-3.1.20191231-he28a2e2_2.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/libevent-2.1.12-hf998b51_1.conda -https://conda.anaconda.org/conda-forge/linux-64/libflac-1.4.3-h59595ed_0.conda -https://conda.anaconda.org/conda-forge/linux-64/libgpg-error-1.47-h71f35ed_0.conda -https://conda.anaconda.org/conda-forge/linux-64/libpng-1.6.39-h753d276_0.conda -https://conda.anaconda.org/conda-forge/linux-64/libsqlite-3.42.0-h2797004_0.conda -https://conda.anaconda.org/conda-forge/linux-64/libvorbis-1.3.7-h9c3ff4c_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/libxcb-1.15-h0b41bf4_0.conda -https://conda.anaconda.org/conda-forge/linux-64/libxml2-2.11.4-h0d562d8_0.conda -https://conda.anaconda.org/conda-forge/linux-64/mysql-common-8.0.33-hf1915f5_2.conda -https://conda.anaconda.org/conda-forge/linux-64/pcre2-10.40-hc3806b6_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/readline-8.2-h8228510_1.conda -https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.12-h27826a3_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/xorg-libsm-1.2.4-h7391055_0.conda -https://conda.anaconda.org/conda-forge/linux-64/zeromq-4.3.4-h9c3ff4c_1.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/zlib-1.2.13-hd590300_5.conda -https://conda.anaconda.org/conda-forge/linux-64/zstd-1.5.2-hfc55251_7.conda -https://conda.anaconda.org/conda-forge/linux-64/freetype-2.12.1-hca18f0e_1.conda -https://conda.anaconda.org/conda-forge/linux-64/krb5-1.20.1-h81ceb04_0.conda -https://conda.anaconda.org/conda-forge/linux-64/libgcrypt-1.10.1-h166bdaf_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/libglib-2.76.4-hebfc3b9_0.conda -https://conda.anaconda.org/conda-forge/linux-64/libllvm16-16.0.6-h5cf9203_1.conda -https://conda.anaconda.org/conda-forge/linux-64/libsndfile-1.2.0-hb75c966_0.conda -https://conda.anaconda.org/conda-forge/linux-64/mysql-libs-8.0.33-hca2cd23_2.conda -https://conda.anaconda.org/conda-forge/linux-64/nss-3.89-he45b914_0.conda -https://conda.anaconda.org/conda-forge/linux-64/pandoc-3.1.3-h32600fe_0.conda -https://conda.anaconda.org/conda-forge/linux-64/python-3.9.16-h2782a2a_0_cpython.conda -https://conda.anaconda.org/conda-forge/linux-64/xcb-util-0.4.0-hd590300_1.conda -https://conda.anaconda.org/conda-forge/linux-64/xcb-util-keysyms-0.4.0-h8ee46fc_1.conda -https://conda.anaconda.org/conda-forge/linux-64/xcb-util-renderutil-0.3.9-hd590300_1.conda -https://conda.anaconda.org/conda-forge/linux-64/xcb-util-wm-0.4.1-h8ee46fc_1.conda -https://conda.anaconda.org/conda-forge/linux-64/xorg-libx11-1.8.6-h8ee46fc_0.conda -https://conda.anaconda.org/conda-forge/noarch/asttokens-2.2.1-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/attrs-23.1.0-pyh71513ae_1.conda -https://conda.anaconda.org/conda-forge/noarch/backcall-0.2.0-pyh9f0ad1d_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/backports-1.0-pyhd8ed1ab_3.conda -https://conda.anaconda.org/conda-forge/linux-64/brotli-python-1.0.9-py39h5a03fae_9.conda -https://conda.anaconda.org/conda-forge/noarch/cached_property-1.5.2-pyha770c72_1.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/certifi-2023.7.22-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/charset-normalizer-3.2.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/linux-64/dbus-1.13.6-h5008d03_3.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/debugpy-1.6.8-py39h3d6467e_0.conda -https://conda.anaconda.org/conda-forge/noarch/decorator-5.1.1-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/defusedxml-0.7.1-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/entrypoints-0.4-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/exceptiongroup-1.1.2-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/executing-1.2.0-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/flit-core-3.9.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/linux-64/fontconfig-2.14.2-h14ed4e7_0.conda -https://conda.anaconda.org/conda-forge/linux-64/glib-tools-2.76.4-hfc55251_0.conda -https://conda.anaconda.org/conda-forge/noarch/idna-3.4-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/ipython_genutils-0.2.0-py_1.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/json5-0.9.14-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/jsonpointer-2.0-py_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/jupyterlab_widgets-3.0.8-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/linux-64/libclang13-16.0.6-default_h4d60ac6_1.conda -https://conda.anaconda.org/conda-forge/linux-64/libcups-2.3.3-h36d4200_3.conda -https://conda.anaconda.org/conda-forge/linux-64/libpq-15.3-hbcd7760_1.conda -https://conda.anaconda.org/conda-forge/linux-64/libsystemd0-254-h3516f8a_0.conda -https://conda.anaconda.org/conda-forge/linux-64/markupsafe-2.1.3-py39hd1e30aa_0.conda -https://conda.anaconda.org/conda-forge/noarch/mistune-3.0.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/nest-asyncio-1.5.6-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/packaging-23.1-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/pandocfilters-1.5.0-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/parso-0.8.3-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/pickleshare-0.7.5-py_1003.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/pkgutil-resolve-name-1.3.10-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/ply-3.11-py_1.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/prometheus_client-0.17.1-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/linux-64/psutil-5.9.5-py39h72bdee0_0.conda -https://conda.anaconda.org/conda-forge/noarch/ptyprocess-0.7.0-pyhd3deb0d_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/pure_eval-0.2.2-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/pycparser-2.21-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/pygments-2.16.1-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/pysocks-1.7.1-pyha2e5f31_6.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/python-dateutil-2.8.2-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/python-fastjsonschema-2.18.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/python-json-logger-2.0.7-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/pytz-2023.3-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/linux-64/pyyaml-6.0-py39hb9d737c_5.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/pyzmq-25.1.0-py39hb257651_0.conda -https://conda.anaconda.org/conda-forge/noarch/rfc3339-validator-0.1.4-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/rfc3986-validator-0.1.1-pyh9f0ad1d_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/rpds-py-0.9.2-py39h9fdd4d6_0.conda -https://conda.anaconda.org/conda-forge/noarch/send2trash-1.8.2-pyh41d4057_0.conda -https://conda.anaconda.org/conda-forge/noarch/setuptools-68.0.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/sniffio-1.3.0-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/soupsieve-2.3.2.post1-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/toml-0.10.2-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/tomli-2.0.1-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/tornado-6.3.2-py39hd1e30aa_0.conda -https://conda.anaconda.org/conda-forge/noarch/traitlets-5.9.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/typing_extensions-4.7.1-pyha770c72_0.conda -https://conda.anaconda.org/conda-forge/noarch/typing_utils-0.1.0-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/uri-template-1.3.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/webcolors-1.13-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/webencodings-0.5.1-py_1.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/websocket-client-1.6.1-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/wheel-0.41.1-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/widgetsnbextension-4.0.8-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/linux-64/xcb-util-image-0.4.0-h8ee46fc_1.conda -https://conda.anaconda.org/conda-forge/linux-64/xkeyboard-config-2.39-hd590300_0.conda -https://conda.anaconda.org/conda-forge/linux-64/xorg-libxext-1.3.4-h0b41bf4_2.conda -https://conda.anaconda.org/conda-forge/linux-64/xorg-libxrender-0.9.11-hd590300_0.conda -https://conda.anaconda.org/conda-forge/noarch/zipp-3.16.2-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/anyio-3.7.1-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/arrow-1.2.3-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/async-lru-2.0.4-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/babel-2.12.1-pyhd8ed1ab_1.conda -https://conda.anaconda.org/conda-forge/noarch/backports.functools_lru_cache-1.6.5-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/bleach-6.0.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/cached-property-1.5.2-hd8ed1ab_1.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/cairo-1.16.0-hbbf8b49_1016.conda -https://conda.anaconda.org/conda-forge/linux-64/cffi-1.15.1-py39he91dace_3.conda -https://conda.anaconda.org/conda-forge/noarch/comm-0.1.4-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/linux-64/glib-2.76.4-hfc55251_0.conda -https://conda.anaconda.org/conda-forge/noarch/importlib-metadata-6.8.0-pyha770c72_0.conda -https://conda.anaconda.org/conda-forge/noarch/importlib_resources-6.0.1-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/jedi-0.19.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/jinja2-3.1.2-pyhd8ed1ab_1.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/jupyterlab_pygments-0.2.2-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/libclang-16.0.6-default_h1cdf331_1.conda -https://conda.anaconda.org/conda-forge/linux-64/libxkbcommon-1.5.0-h5d7e998_3.conda -https://conda.anaconda.org/conda-forge/noarch/matplotlib-inline-0.1.6-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/overrides-7.4.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/pexpect-4.8.0-pyh1a96a4e_2.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/pip-23.2.1-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/linux-64/pulseaudio-client-16.1-hb77b528_4.conda -https://conda.anaconda.org/conda-forge/noarch/qtpy-2.3.1-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/referencing-0.30.2-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/linux-64/sip-6.7.11-py39h3d6467e_0.conda -https://conda.anaconda.org/conda-forge/noarch/stack_data-0.6.2-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/terminado-0.17.1-pyh41d4057_0.conda -https://conda.anaconda.org/conda-forge/noarch/tinycss2-1.2.1-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/typing-extensions-4.7.1-hd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/urllib3-2.0.4-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/linux-64/argon2-cffi-bindings-21.2.0-py39hb9d737c_3.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/fqdn-1.5.1-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/gstreamer-1.22.3-h977cf35_1.conda -https://conda.anaconda.org/conda-forge/linux-64/harfbuzz-7.3.0-hdb3a94d_0.conda -https://conda.anaconda.org/conda-forge/noarch/importlib_metadata-6.8.0-hd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/isoduration-20.11.0-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/noarch/jsonschema-specifications-2023.7.1-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/jupyter_server_terminals-0.4.4-pyhd8ed1ab_1.conda -https://conda.anaconda.org/conda-forge/noarch/platformdirs-3.10.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/linux-64/pyqt5-sip-12.12.2-py39h3d6467e_4.conda -https://conda.anaconda.org/conda-forge/noarch/requests-2.31.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/wcwidth-0.2.6-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/argon2-cffi-21.3.0-pyhd8ed1ab_0.tar.bz2 -https://conda.anaconda.org/conda-forge/linux-64/gst-plugins-base-1.22.3-h938bd60_1.conda -https://conda.anaconda.org/conda-forge/noarch/jsonschema-4.19.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/linux-64/jupyter_core-5.3.1-py39hf3d152e_0.conda -https://conda.anaconda.org/conda-forge/noarch/prompt-toolkit-3.0.39-pyha770c72_0.conda -https://conda.anaconda.org/conda-forge/noarch/jsonschema-with-format-nongpl-4.19.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/jupyter_client-8.3.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/nbformat-5.9.2-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/prompt_toolkit-3.0.39-hd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/linux-64/qt-main-5.15.8-h01ceb2d_12.conda -https://conda.anaconda.org/conda-forge/noarch/ipython-8.14.0-pyh41d4057_0.conda -https://conda.anaconda.org/conda-forge/noarch/jupyter_events-0.7.0-pyhd8ed1ab_1.conda -https://conda.anaconda.org/conda-forge/noarch/nbclient-0.8.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/linux-64/pyqt-5.15.9-py39h52134e7_4.conda -https://conda.anaconda.org/conda-forge/noarch/ipykernel-6.25.1-pyh71e2992_0.conda -https://conda.anaconda.org/conda-forge/noarch/ipywidgets-8.1.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/nbconvert-core-7.7.3-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/jupyter_console-6.6.3-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/jupyter_server-2.7.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/nbconvert-pandoc-7.7.3-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/qtconsole-base-5.4.3-pyha770c72_0.conda -https://conda.anaconda.org/conda-forge/noarch/jupyter-lsp-2.2.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/jupyterlab_server-2.24.0-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/nbconvert-7.7.3-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/notebook-shim-0.2.3-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/qtconsole-5.4.3-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/jupyterlab-4.0.4-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/noarch/notebook-7.0.2-pyhd8ed1ab_0.conda -https://conda.anaconda.org/conda-forge/linux-64/jupyter-1.0.0-py39hf3d152e_8.conda From ea88f31a47c4336fa46811b4b15b667e3ebd821b Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Mon, 2 Oct 2023 00:09:59 -0400 Subject: [PATCH 05/23] Add conda env YAML file --- trocr/trocr_env.yml | 371 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 371 insertions(+) create mode 100644 trocr/trocr_env.yml diff --git a/trocr/trocr_env.yml b/trocr/trocr_env.yml new file mode 100644 index 0000000..f9e21e9 --- /dev/null +++ b/trocr/trocr_env.yml @@ -0,0 +1,371 @@ +name: trocr_env +channels: + - conda-forge + - defaults +dependencies: + - _libgcc_mutex=0.1=conda_forge + - _openmp_mutex=4.5=2_gnu + - alsa-lib=1.2.8=h166bdaf_0 + - anyio=3.7.1=pyhd8ed1ab_0 + - argon2-cffi=21.3.0=pyhd8ed1ab_0 + - argon2-cffi-bindings=21.2.0=py39hb9d737c_3 + - arrow=1.2.3=pyhd8ed1ab_0 + - asttokens=2.2.1=pyhd8ed1ab_0 + - async-lru=2.0.4=pyhd8ed1ab_0 + - attr=2.5.1=h166bdaf_1 + - attrs=23.1.0=pyh71513ae_1 + - babel=2.12.1=pyhd8ed1ab_1 + - backcall=0.2.0=pyh9f0ad1d_0 + - backports=1.0=pyhd8ed1ab_3 + - backports.functools_lru_cache=1.6.5=pyhd8ed1ab_0 + - bleach=6.0.0=pyhd8ed1ab_0 + - brotli-python=1.0.9=py39h5a03fae_9 + - bzip2=1.0.8=h7f98852_4 + - ca-certificates=2023.7.22=hbcca054_0 + - cached-property=1.5.2=hd8ed1ab_1 + - cached_property=1.5.2=pyha770c72_1 + - cairo=1.16.0=hbbf8b49_1016 + - cffi=1.15.1=py39he91dace_3 + - comm=0.1.4=pyhd8ed1ab_0 + - dbus=1.13.6=h5008d03_3 + - debugpy=1.6.8=py39h3d6467e_0 + - decorator=5.1.1=pyhd8ed1ab_0 + - defusedxml=0.7.1=pyhd8ed1ab_0 + - entrypoints=0.4=pyhd8ed1ab_0 + - exceptiongroup=1.1.2=pyhd8ed1ab_0 + - executing=1.2.0=pyhd8ed1ab_0 + - expat=2.5.0=hcb278e6_1 + - flit-core=3.9.0=pyhd8ed1ab_0 + - font-ttf-dejavu-sans-mono=2.37=hab24e00_0 + - font-ttf-inconsolata=3.000=h77eed37_0 + - font-ttf-source-code-pro=2.038=h77eed37_0 + - font-ttf-ubuntu=0.83=hab24e00_0 + - fontconfig=2.14.2=h14ed4e7_0 + - fonts-conda-ecosystem=1=0 + - fonts-conda-forge=1=0 + - fqdn=1.5.1=pyhd8ed1ab_0 + - freetype=2.12.1=hca18f0e_1 + - gettext=0.21.1=h27087fc_0 + - glib=2.76.4=hfc55251_0 + - glib-tools=2.76.4=hfc55251_0 + - gmp=6.2.1=h58526e2_0 + - graphite2=1.3.13=h58526e2_1001 + - gst-plugins-base=1.22.3=h938bd60_1 + - gstreamer=1.22.3=h977cf35_1 + - harfbuzz=7.3.0=hdb3a94d_0 + - icu=72.1=hcb278e6_0 + - idna=3.4=pyhd8ed1ab_0 + - importlib-metadata=6.8.0=pyha770c72_0 + - importlib_metadata=6.8.0=hd8ed1ab_0 + - importlib_resources=6.0.1=pyhd8ed1ab_0 + - ipykernel=6.25.1=pyh71e2992_0 + - ipython=8.14.0=pyh41d4057_0 + - ipython_genutils=0.2.0=py_1 + - ipywidgets=8.1.0=pyhd8ed1ab_0 + - isoduration=20.11.0=pyhd8ed1ab_0 + - jedi=0.19.0=pyhd8ed1ab_0 + - jinja2=3.1.2=pyhd8ed1ab_1 + - json5=0.9.14=pyhd8ed1ab_0 + - jsonpointer=2.0=py_0 + - jsonschema=4.19.0=pyhd8ed1ab_0 + - jsonschema-specifications=2023.7.1=pyhd8ed1ab_0 + - jsonschema-with-format-nongpl=4.19.0=pyhd8ed1ab_0 + - jupyter=1.0.0=py39hf3d152e_8 + - jupyter-lsp=2.2.0=pyhd8ed1ab_0 + - jupyter_client=8.3.0=pyhd8ed1ab_0 + - jupyter_console=6.6.3=pyhd8ed1ab_0 + - jupyter_core=5.3.1=py39hf3d152e_0 + - jupyter_events=0.7.0=pyhd8ed1ab_1 + - jupyter_server=2.7.0=pyhd8ed1ab_0 + - jupyter_server_terminals=0.4.4=pyhd8ed1ab_1 + - jupyterlab=4.0.4=pyhd8ed1ab_0 + - jupyterlab_pygments=0.2.2=pyhd8ed1ab_0 + - jupyterlab_server=2.24.0=pyhd8ed1ab_0 + - jupyterlab_widgets=3.0.8=pyhd8ed1ab_0 + - keyutils=1.6.1=h166bdaf_0 + - krb5=1.20.1=h81ceb04_0 + - lame=3.100=h166bdaf_1003 + - ld_impl_linux-64=2.40=h41732ed_0 + - libcap=2.69=h0f662aa_0 + - libclang=16.0.6=default_h1cdf331_1 + - libclang13=16.0.6=default_h4d60ac6_1 + - libcups=2.3.3=h36d4200_3 + - libedit=3.1.20191231=he28a2e2_2 + - libevent=2.1.12=hf998b51_1 + - libexpat=2.5.0=hcb278e6_1 + - libffi=3.4.2=h7f98852_5 + - libflac=1.4.3=h59595ed_0 + - libgcc-ng=13.1.0=he5830b7_0 + - libgcrypt=1.10.1=h166bdaf_0 + - libglib=2.76.4=hebfc3b9_0 + - libgomp=13.1.0=he5830b7_0 + - libgpg-error=1.47=h71f35ed_0 + - libiconv=1.17=h166bdaf_0 + - libjpeg-turbo=2.1.5.1=h0b41bf4_0 + - libllvm16=16.0.6=h5cf9203_1 + - libnsl=2.0.0=h7f98852_0 + - libogg=1.3.4=h7f98852_1 + - libopus=1.3.1=h7f98852_1 + - libpng=1.6.39=h753d276_0 + - libpq=15.3=hbcd7760_1 + - libsndfile=1.2.0=hb75c966_0 + - libsodium=1.0.18=h36c2ea0_1 + - libsqlite=3.42.0=h2797004_0 + - libstdcxx-ng=13.1.0=hfd8a6a1_0 + - libsystemd0=254=h3516f8a_0 + - libuuid=2.38.1=h0b41bf4_0 + - libvorbis=1.3.7=h9c3ff4c_0 + - libxcb=1.15=h0b41bf4_0 + - libxkbcommon=1.5.0=h5d7e998_3 + - libxml2=2.11.4=h0d562d8_0 + - libzlib=1.2.13=hd590300_5 + - lz4-c=1.9.4=hcb278e6_0 + - markupsafe=2.1.3=py39hd1e30aa_0 + - matplotlib-inline=0.1.6=pyhd8ed1ab_0 + - mistune=3.0.0=pyhd8ed1ab_0 + - mpg123=1.31.3=hcb278e6_0 + - mysql-common=8.0.33=hf1915f5_2 + - mysql-libs=8.0.33=hca2cd23_2 + - nbclient=0.8.0=pyhd8ed1ab_0 + - nbconvert=7.7.3=pyhd8ed1ab_0 + - nbconvert-core=7.7.3=pyhd8ed1ab_0 + - nbconvert-pandoc=7.7.3=pyhd8ed1ab_0 + - nbformat=5.9.2=pyhd8ed1ab_0 + - ncurses=6.4=hcb278e6_0 + - nest-asyncio=1.5.6=pyhd8ed1ab_0 + - notebook=7.0.2=pyhd8ed1ab_0 + - notebook-shim=0.2.3=pyhd8ed1ab_0 + - nspr=4.35=h27087fc_0 + - nss=3.89=he45b914_0 + - openssl=3.1.2=hd590300_0 + - overrides=7.4.0=pyhd8ed1ab_0 + - pandoc=3.1.3=h32600fe_0 + - pandocfilters=1.5.0=pyhd8ed1ab_0 + - parso=0.8.3=pyhd8ed1ab_0 + - pcre2=10.40=hc3806b6_0 + - pexpect=4.8.0=pyh1a96a4e_2 + - pickleshare=0.7.5=py_1003 + - pip=23.2.1=pyhd8ed1ab_0 + - pixman=0.40.0=h36c2ea0_0 + - pkgutil-resolve-name=1.3.10=pyhd8ed1ab_0 + - platformdirs=3.10.0=pyhd8ed1ab_0 + - ply=3.11=py_1 + - prometheus_client=0.17.1=pyhd8ed1ab_0 + - prompt-toolkit=3.0.39=pyha770c72_0 + - prompt_toolkit=3.0.39=hd8ed1ab_0 + - psutil=5.9.5=py39h72bdee0_0 + - pthread-stubs=0.4=h36c2ea0_1001 + - ptyprocess=0.7.0=pyhd3deb0d_0 + - pulseaudio-client=16.1=hb77b528_4 + - pure_eval=0.2.2=pyhd8ed1ab_0 + - pycparser=2.21=pyhd8ed1ab_0 + - pygments=2.16.1=pyhd8ed1ab_0 + - pyqt=5.15.9=py39h52134e7_4 + - pyqt5-sip=12.12.2=py39h3d6467e_4 + - pysocks=1.7.1=pyha2e5f31_6 + - python=3.9.16=h2782a2a_0_cpython + - python-dateutil=2.8.2=pyhd8ed1ab_0 + - python-fastjsonschema=2.18.0=pyhd8ed1ab_0 + - python-json-logger=2.0.7=pyhd8ed1ab_0 + - python_abi=3.9=3_cp39 + - pyyaml=6.0=py39hb9d737c_5 + - pyzmq=25.1.0=py39hb257651_0 + - qt-main=5.15.8=h01ceb2d_12 + - qtconsole=5.4.3=pyhd8ed1ab_0 + - qtconsole-base=5.4.3=pyha770c72_0 + - qtpy=2.3.1=pyhd8ed1ab_0 + - readline=8.2=h8228510_1 + - referencing=0.30.2=pyhd8ed1ab_0 + - rfc3339-validator=0.1.4=pyhd8ed1ab_0 + - rfc3986-validator=0.1.1=pyh9f0ad1d_0 + - rpds-py=0.9.2=py39h9fdd4d6_0 + - send2trash=1.8.2=pyh41d4057_0 + - setuptools=68.0.0=pyhd8ed1ab_0 + - sip=6.7.11=py39h3d6467e_0 + - sniffio=1.3.0=pyhd8ed1ab_0 + - soupsieve=2.3.2.post1=pyhd8ed1ab_0 + - stack_data=0.6.2=pyhd8ed1ab_0 + - terminado=0.17.1=pyh41d4057_0 + - tinycss2=1.2.1=pyhd8ed1ab_0 + - tk=8.6.12=h27826a3_0 + - toml=0.10.2=pyhd8ed1ab_0 + - tomli=2.0.1=pyhd8ed1ab_0 + - tornado=6.3.2=py39hd1e30aa_0 + - traitlets=5.9.0=pyhd8ed1ab_0 + - typing_extensions=4.7.1=pyha770c72_0 + - typing_utils=0.1.0=pyhd8ed1ab_0 + - tzdata=2023c=h71feb2d_0 + - uri-template=1.3.0=pyhd8ed1ab_0 + - wcwidth=0.2.6=pyhd8ed1ab_0 + - webcolors=1.13=pyhd8ed1ab_0 + - webencodings=0.5.1=py_1 + - websocket-client=1.6.1=pyhd8ed1ab_0 + - wheel=0.41.1=pyhd8ed1ab_0 + - widgetsnbextension=4.0.8=pyhd8ed1ab_0 + - xcb-util=0.4.0=hd590300_1 + - xcb-util-image=0.4.0=h8ee46fc_1 + - xcb-util-keysyms=0.4.0=h8ee46fc_1 + - xcb-util-renderutil=0.3.9=hd590300_1 + - xcb-util-wm=0.4.1=h8ee46fc_1 + - xkeyboard-config=2.39=hd590300_0 + - xorg-kbproto=1.0.7=h7f98852_1002 + - xorg-libice=1.1.1=hd590300_0 + - xorg-libsm=1.2.4=h7391055_0 + - xorg-libx11=1.8.6=h8ee46fc_0 + - xorg-libxau=1.0.11=hd590300_0 + - xorg-libxdmcp=1.1.3=h7f98852_0 + - xorg-libxext=1.3.4=h0b41bf4_2 + - xorg-libxrender=0.9.11=hd590300_0 + - xorg-renderproto=0.11.1=h7f98852_1002 + - xorg-xextproto=7.3.0=h0b41bf4_1003 + - xorg-xf86vidmodeproto=2.3.1=h7f98852_1002 + - xorg-xproto=7.0.31=h7f98852_1007 + - xz=5.2.6=h166bdaf_0 + - yaml=0.2.5=h7f98852_2 + - zeromq=4.3.4=h9c3ff4c_1 + - zipp=3.16.2=pyhd8ed1ab_0 + - zlib=1.2.13=hd590300_5 + - zstd=1.5.2=hfc55251_7 + - pip: + - absl-py==1.4.0 + - aiohttp==3.8.5 + - aiosignal==1.3.1 + - argcomplete==1.10.3 + - async-timeout==4.0.2 + - beautifulsoup4==4.8.2 + - blis==0.7.10 + - cachetools==5.3.1 + - catalogue==2.0.9 + - certifi==2022.9.24 + - chardet==3.0.4 + - charset-normalizer==2.1.1 + - click==7.1.2 + - cmake==3.27.1 + - coco-eval==0.0.4 + - compressed-rtf==1.0.6 + - confection==0.1.1 + - conllu==4.5.3 + - contourpy==1.0.6 + - craft-text-detector==0.4.3 + - cycler==0.11.0 + - cymem==2.0.7 + - cython==0.29.32 + - datasets==2.14.5 + - dill==0.3.7 + - docx2txt==0.8 + - ebcdic==1.1.1 + - extract-msg==0.28.7 + - fastrlock==0.8.1 + - filelock==3.8.0 + - fonttools==4.38.0 + - frozenlist==1.4.0 + - fsspec==2023.6.0 + - gdown==4.5.4 + - gmpy2==2.1.5 + - google-auth==2.22.0 + - google-auth-oauthlib==1.0.0 + - grpcio==1.56.2 + - huggingface-hub==0.16.4 + - imageio==2.31.2 + - imapclient==2.1.0 + - joblib==1.2.0 + - kiwisolver==1.4.4 + - langcodes==3.3.0 + - lazy-loader==0.3 + - lightning-utilities==0.9.0 + - lit==16.0.6 + - lxml==4.9.3 + - markdown==3.4.4 + - matplotlib==3.6.2 + - mpmath==1.3.0 + - multidict==6.0.4 + - multiprocess==0.70.15 + - murmurhash==1.0.9 + - networkx==3.1 + - nmslib==2.1.1 + - numpy==1.23.5 + - nvidia-cublas-cu11==11.10.3.66 + - nvidia-cuda-cupti-cu11==11.7.101 + - nvidia-cuda-nvrtc-cu11==11.7.99 + - nvidia-cuda-runtime-cu11==11.7.99 + - nvidia-cudnn-cu11==8.5.0.96 + - nvidia-cufft-cu11==10.9.0.58 + - nvidia-curand-cu11==10.2.10.91 + - nvidia-cusolver-cu11==11.4.0.1 + - nvidia-cusparse-cu11==11.7.4.91 + - nvidia-nccl-cu11==2.14.3 + - nvidia-nvtx-cu11==11.7.91 + - oauthlib==3.2.2 + - olefile==0.46 + - opencv-python==4.5.4.60 + - packaging==21.3 + - pandas==1.5.2 + - pathy==0.10.2 + - pdfminer-six==20191110 + - pillow==10.0.0 + - preshed==3.0.8 + - protobuf==4.24.0 + - pyarrow==13.0.0 + - pyasn1==0.5.0 + - pyasn1-modules==0.3.0 + - pybind11==2.6.1 + - pycocotools==2.0.6 + - pycountry==22.3.5 + - pycryptodome==3.18.0 + - pydantic==1.10.12 + - pyparsing==3.0.9 + - pysbd==0.3.4 + - python-pptx==0.6.21 + - pytorch-lightning==1.9.5 + - pytz==2022.6 + - pywavelets==1.4.1 + - regex==2022.10.31 + - requests==2.28.1 + - requests-oauthlib==1.3.1 + - rsa==4.9 + - safetensors==0.3.1 + - scikit-image==0.21.0 + - scikit-learn==1.1.3 + - scipy==1.9.3 + - scispacy==0.5.1 + - six==1.12.0 + - smart-open==6.3.0 + - sortedcontainers==2.4.0 + - spacy==3.4.4 + - spacy-alignments==0.9.0 + - spacy-legacy==3.0.12 + - spacy-loggers==1.0.4 + - spacy-transformers==1.1.9 + - sparse-dot-topn==0.3.5 + - sparse-dot-topn-for-blocks==0.3.1.post3 + - speechrecognition==3.8.1 + - srsly==2.4.7 + - string-grouper==0.6.1 + - sympy==1.12 + - taxonerd==1.5.1 + - tensorboard==2.14.0 + - tensorboard-data-server==0.7.1 + - textract==1.6.5 + - thinc==8.1.12 + - threadpoolctl==3.1.0 + - tifffile==2023.8.30 + - timm==0.9.2 + - tokenizers==0.13.2 + - topn==0.0.7 + - torch==1.13.0 + - torchaudio==0.13.0 + - torchmetrics==1.0.2 + - torchvision==0.14.0 + - tqdm==4.64.1 + - triton==2.0.0 + - typer==0.7.0 + - typing-extensions==4.4.0 + - tzlocal==5.0.1 + - urllib3==1.26.13 + - wasabi==0.10.1 + - werkzeug==2.3.6 + - xlrd==1.2.0 + - xlsxwriter==3.1.2 + - xxhash==3.3.0 + - yarl==1.9.2 +prefix: /projectnb/sparkgrp/kabilanm/.conda/envs/trocr_env From 13c8e539a8b437c0783bac8a09518abb80629d89 Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Mon, 2 Oct 2023 00:11:13 -0400 Subject: [PATCH 06/23] Update README.md --- trocr/README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/trocr/README.md b/trocr/README.md index 1f4de7f..4729f1d 100644 --- a/trocr/README.md +++ b/trocr/README.md @@ -42,10 +42,11 @@ python trocr_with_detr_transcription.py --input-dir /path/to/input --save-dir /p ## trocr.py Contains all the functions which relate to running the trocr portion of the pipeline -## utilities.py -Contains a number of functions which are primarily related to the included CVIT_Training.py file. (Not in use currently) -## requirements.txt +## [Not in use] utilities.py +Contains a number of functions which are primarily related to the included CVIT_Training.py file. + +## [Not in use] requirements.txt All required python installs for running the pipeline ## trocr_env.yml From 65902cf3f4da1452537ffa6be2878270bcf9e42a Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Mon, 2 Oct 2023 00:31:57 -0400 Subject: [PATCH 07/23] Create Dockerfile for TrOCR env --- trocr/Dockerfile | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100644 trocr/Dockerfile diff --git a/trocr/Dockerfile b/trocr/Dockerfile new file mode 100644 index 0000000..a1b4084 --- /dev/null +++ b/trocr/Dockerfile @@ -0,0 +1,32 @@ +# Use an official Miniconda3 as a parent image +FROM continuumio/miniconda3:latest + +# Set the working directory in docker +WORKDIR /usr/src/app + +# Declare argument for conda environment name +ARG CONDA_ENV_NAME=trocr_env + +# Clone the repository +RUN git clone https://github.com/BU-Spark/ml-herbarium.git . && \ + git checkout dev && \ + cd trocr + +# Create a new conda environment from the YAML file and activate it +RUN conda env create -n $CONDA_ENV_NAME --file=trocr_env.yml && \ + echo "conda activate $CONDA_ENV_NAME" >> ~/.bashrc + +# Install Jupyter and other required packages +RUN conda install -n $CONDA_ENV_NAME jupyter -y && \ + /opt/conda/envs/$CONDA_ENV_NAME/bin/pip install transformers==4.27.0 --no-deps && \ + /opt/conda/envs/$CONDA_ENV_NAME/bin/pip install https://github.com/nleguillarme/taxonerd/releases/download/v1.5.0/en_core_eco_md-1.0.2.tar.gz && \ + /opt/conda/envs/$CONDA_ENV_NAME/bin/pip install https://github.com/nleguillarme/taxonerd/releases/download/v1.5.0/en_core_eco_biobert-1.0.2.tar.gz && \ + /opt/conda/envs/$CONDA_ENV_NAME/bin/python -m spacy download en_core_web_sm && \ + /opt/conda/envs/$CONDA_ENV_NAME/bin/python -m spacy download en_core_web_md && \ + /opt/conda/envs/$CONDA_ENV_NAME/bin/python -m spacy download en_core_web_trf + +# Make port 8888 available to the world outside this container +EXPOSE 8888 + +# Run Jupyter Notebook when the container launches +CMD [ "/opt/conda/envs/$CONDA_ENV_NAME/bin/jupyter", "notebook", "--ip='*'", "--port=8888", "--no-browser", "--allow-root" ] From 8fe324198df0c38b0c51433b495e0aec6112c646 Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Mon, 2 Oct 2023 00:32:57 -0400 Subject: [PATCH 08/23] Create Dockerfile for TrOCR env --- trocr/docker/Dockerfile | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100644 trocr/docker/Dockerfile diff --git a/trocr/docker/Dockerfile b/trocr/docker/Dockerfile new file mode 100644 index 0000000..a1b4084 --- /dev/null +++ b/trocr/docker/Dockerfile @@ -0,0 +1,32 @@ +# Use an official Miniconda3 as a parent image +FROM continuumio/miniconda3:latest + +# Set the working directory in docker +WORKDIR /usr/src/app + +# Declare argument for conda environment name +ARG CONDA_ENV_NAME=trocr_env + +# Clone the repository +RUN git clone https://github.com/BU-Spark/ml-herbarium.git . && \ + git checkout dev && \ + cd trocr + +# Create a new conda environment from the YAML file and activate it +RUN conda env create -n $CONDA_ENV_NAME --file=trocr_env.yml && \ + echo "conda activate $CONDA_ENV_NAME" >> ~/.bashrc + +# Install Jupyter and other required packages +RUN conda install -n $CONDA_ENV_NAME jupyter -y && \ + /opt/conda/envs/$CONDA_ENV_NAME/bin/pip install transformers==4.27.0 --no-deps && \ + /opt/conda/envs/$CONDA_ENV_NAME/bin/pip install https://github.com/nleguillarme/taxonerd/releases/download/v1.5.0/en_core_eco_md-1.0.2.tar.gz && \ + /opt/conda/envs/$CONDA_ENV_NAME/bin/pip install https://github.com/nleguillarme/taxonerd/releases/download/v1.5.0/en_core_eco_biobert-1.0.2.tar.gz && \ + /opt/conda/envs/$CONDA_ENV_NAME/bin/python -m spacy download en_core_web_sm && \ + /opt/conda/envs/$CONDA_ENV_NAME/bin/python -m spacy download en_core_web_md && \ + /opt/conda/envs/$CONDA_ENV_NAME/bin/python -m spacy download en_core_web_trf + +# Make port 8888 available to the world outside this container +EXPOSE 8888 + +# Run Jupyter Notebook when the container launches +CMD [ "/opt/conda/envs/$CONDA_ENV_NAME/bin/jupyter", "notebook", "--ip='*'", "--port=8888", "--no-browser", "--allow-root" ] From 6baccef45d932f932781a696776ddee9e96af92e Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Mon, 2 Oct 2023 00:33:33 -0400 Subject: [PATCH 09/23] Delete trocr/Dockerfile --- trocr/Dockerfile | 32 -------------------------------- 1 file changed, 32 deletions(-) delete mode 100644 trocr/Dockerfile diff --git a/trocr/Dockerfile b/trocr/Dockerfile deleted file mode 100644 index a1b4084..0000000 --- a/trocr/Dockerfile +++ /dev/null @@ -1,32 +0,0 @@ -# Use an official Miniconda3 as a parent image -FROM continuumio/miniconda3:latest - -# Set the working directory in docker -WORKDIR /usr/src/app - -# Declare argument for conda environment name -ARG CONDA_ENV_NAME=trocr_env - -# Clone the repository -RUN git clone https://github.com/BU-Spark/ml-herbarium.git . && \ - git checkout dev && \ - cd trocr - -# Create a new conda environment from the YAML file and activate it -RUN conda env create -n $CONDA_ENV_NAME --file=trocr_env.yml && \ - echo "conda activate $CONDA_ENV_NAME" >> ~/.bashrc - -# Install Jupyter and other required packages -RUN conda install -n $CONDA_ENV_NAME jupyter -y && \ - /opt/conda/envs/$CONDA_ENV_NAME/bin/pip install transformers==4.27.0 --no-deps && \ - /opt/conda/envs/$CONDA_ENV_NAME/bin/pip install https://github.com/nleguillarme/taxonerd/releases/download/v1.5.0/en_core_eco_md-1.0.2.tar.gz && \ - /opt/conda/envs/$CONDA_ENV_NAME/bin/pip install https://github.com/nleguillarme/taxonerd/releases/download/v1.5.0/en_core_eco_biobert-1.0.2.tar.gz && \ - /opt/conda/envs/$CONDA_ENV_NAME/bin/python -m spacy download en_core_web_sm && \ - /opt/conda/envs/$CONDA_ENV_NAME/bin/python -m spacy download en_core_web_md && \ - /opt/conda/envs/$CONDA_ENV_NAME/bin/python -m spacy download en_core_web_trf - -# Make port 8888 available to the world outside this container -EXPOSE 8888 - -# Run Jupyter Notebook when the container launches -CMD [ "/opt/conda/envs/$CONDA_ENV_NAME/bin/jupyter", "notebook", "--ip='*'", "--port=8888", "--no-browser", "--allow-root" ] From bdab6945c2fd608c890059d7eec97244287c4b7f Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Mon, 2 Oct 2023 00:35:44 -0400 Subject: [PATCH 10/23] Add ReadMe.md with docker instructions --- trocr/docker/ReadMe.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 trocr/docker/ReadMe.md diff --git a/trocr/docker/ReadMe.md b/trocr/docker/ReadMe.md new file mode 100644 index 0000000..9b26f81 --- /dev/null +++ b/trocr/docker/ReadMe.md @@ -0,0 +1,16 @@ +### Build and Run Instructions +1. **Build the Docker Image:** + Navigate to the directory containing the Dockerfile and run: + ```sh + docker build --build-arg CONDA_ENV_NAME= -t my-herbarium-app . + ``` + Replace `` with the desired conda environment name. + +2. **Run the Docker Container:** + ```sh + docker run -p 8888:8888 my-herbarium-app + ``` + +> ### Notes +> - If you don't provide the `--build-arg` while building, the default value `trocr_env` will be used as the conda environment name. +> - Remember to replace `` with the actual name you want to give to your conda environment when building the Docker image. From c8a0daee34558aa706506a4086e10e79c03169fb Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Mon, 2 Oct 2023 00:38:13 -0400 Subject: [PATCH 11/23] Update README.md with docker instructions --- trocr/README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/trocr/README.md b/trocr/README.md index 4729f1d..54bd9d2 100644 --- a/trocr/README.md +++ b/trocr/README.md @@ -93,7 +93,6 @@ python -m spacy download en_core_web_md python -m spacy download en_core_web_trf ``` - To start Jupyter Notebooks in the current folder, use the command ``` jupyter notebook @@ -101,6 +100,8 @@ jupyter notebook To run the pipeline, please execute the [`trocr_with_detr_label_extraction.ipynb`](https://github.com/BU-Spark/ml-herbarium/blob/research-doc-patch/trocr/trocr_with_detr_label_extraction.ipynb) notebook in the current (`trocr`) folder. +> **For docker deployment instructions, refer to the `docker` folder in the current (`trocr`) folder.** + > **NOTE:** It is HIGHLY recommended to run the pipeline on a GPU (V100(16 GB) on SCC is recommended so that multiple models in the pipeline can be hosted on the GPU; smaller GPUs have not been tested). Running on the CPU is significantly slower. ## Final Dataframe Column Descriptions @@ -109,7 +110,7 @@ This column describes the position that a given image was processed ### Transcription This column contains every transcription that was found in the image. They are ordered based on the relative position of the top left coordinate for each bounding box in an image. ## Transcription_Confidence -This contains the TrOCR model confidences in each transcription. This list of values is ordered based on the `Transcription` column (i.e. you can reference each individual transcription and its confidence using the same index number). +This contains the TrOCR model confidences in each transcription. This list of values is ordered based on the `Transcription` column (i.e., you can reference each individual transcription and its confidence using the same index number). ## Image_Path This is the absolute path of the location for a given image ## Bounding_Boxes From 482e189895baa58466e6d8d33547ac7d70506e46 Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Mon, 2 Oct 2023 00:47:49 -0400 Subject: [PATCH 12/23] Update docker instructions --- trocr/docker/ReadMe.md | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/trocr/docker/ReadMe.md b/trocr/docker/ReadMe.md index 9b26f81..eff3e14 100644 --- a/trocr/docker/ReadMe.md +++ b/trocr/docker/ReadMe.md @@ -1,16 +1,35 @@ -### Build and Run Instructions -1. **Build the Docker Image:** +# Build and Run Instructions +## **Build the Docker Image:** Navigate to the directory containing the Dockerfile and run: ```sh docker build --build-arg CONDA_ENV_NAME= -t my-herbarium-app . ``` Replace `` with the desired conda environment name. -2. **Run the Docker Container:** - ```sh - docker run -p 8888:8888 my-herbarium-app - ``` - > ### Notes > - If you don't provide the `--build-arg` while building, the default value `trocr_env` will be used as the conda environment name. > - Remember to replace `` with the actual name you want to give to your conda environment when building the Docker image. + +## **Run the Docker Container:** +### Using Docker Bind Mounts +When you run your Docker container, you can use the `-v` or `--mount` flag to bind-mount a directory or a file from your host into your container. + +#### Example +If you have the input images in a directory named `images` on your host, you can mount this directory to a directory inside your container like this: +```sh +docker run -v $(pwd)/images:/usr/src/app/images -p 8888:8888 my-herbarium-app +``` +or +```sh +docker run --mount type=bind,source=$(pwd)/images,target=/usr/src/app/images -p 8888:8888 my-herbarium-app +``` + +Here: +- `$(pwd)/images` is the absolute path to the `images` directory on your host machine. +- `/usr/src/app/images` is the path where the `images` directory will be accessible from within your container. + +> ### Note +> When using bind mounts, any changes made to the files in the mounted directory will be reflected in both the host and the container, since they are actually the same files on the host’s filesystem. + +> ### Modification in Script +> We would need to modify the script to read images from the mounted directory (`/usr/src/app/images` in this example) instead of the original host directory. From 1d75533b798c489d820b87df01cba55310a66595 Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Mon, 2 Oct 2023 01:04:42 -0400 Subject: [PATCH 13/23] Update trocr_with_detr_label_extraction.ipynb --- trocr/trocr_with_detr_label_extraction.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/trocr/trocr_with_detr_label_extraction.ipynb b/trocr/trocr_with_detr_label_extraction.ipynb index 0c4c490..566739a 100644 --- a/trocr/trocr_with_detr_label_extraction.ipynb +++ b/trocr/trocr_with_detr_label_extraction.ipynb @@ -640,7 +640,7 @@ ], "source": [ "# Use the DETR for inference (adopted from Freddie (https://github.com/freddiev4/comp-vision-scripts/blob/main/object-detection/detr.py))\n", - "detr_model = 'KabilanM/detr-label-extraction'\n", + "detr_model = 'spark-ds549/detr-label-detection'\n", "# The DETR model returns the bounding boxes of the lables indentified from the images\n", "# We will utilize the bounding boxes to rank lables in the downstream task\n", "label_bboxes = detr.run(inputdir, workdir, detr_model)\n", From a8cb1ba3f82b45e7113d14d07159ab6f8d2f41d7 Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Mon, 2 Oct 2023 18:07:06 -0400 Subject: [PATCH 14/23] clarify methodology --- .../handwritten-typed-text-classification/research.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md b/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md index 5e45a7c..d8b567c 100644 --- a/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md +++ b/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md @@ -12,7 +12,7 @@ In specific, it was observed that the text in images, particularly handwritten t Given the encountered limitations with CNNs, I approached the classification task in two primary steps to circumvent the challenges: 1. **Feature Extraction with TrOCR Encoder:** - Leveraged the encoder part of the TrOCR model to obtain reliable feature representations from the images, focusing on capturing inherent characteristics of text. TrOCR encoder was used because, unlike CNNs the TrOCR feature representations contain textual details which would then be used to decode to characters. In essence, the encoder preserves textual information that CNNs might not. + Leveraged the encoder part of the TrOCR model to obtain reliable feature representations from the images, focusing on capturing inherent characteristics of text. The encoder from TrOCR was employed due to its capability to retain textual details in its feature representations, which are pivotal for decoding to characters. This stands in contrast to Convolutional Neural Networks (CNNs), which might not preserve such detailed textual information. In essence, the encoder within TrOCR ensures the conservation of textual nuances that are potentially overlooked or lost when using CNNs. 2. **Training a Custom FFN Decoder:** Employed a custom Feed-Forward Neural Network (FFN) as the decoder to make predictions based on the feature representations extracted from the encoder. The model was trained specifically to discern the subtle differences in features between the two categories. From 3bbe1e6577843785df3c52c954ad551d7080eb22 Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Mon, 2 Oct 2023 18:09:06 -0400 Subject: [PATCH 15/23] fix typo --- trocr/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/trocr/README.md b/trocr/README.md index 54bd9d2..fe183bd 100644 --- a/trocr/README.md +++ b/trocr/README.md @@ -72,7 +72,7 @@ conda env create -n my-conda-env --file=trocr_env.yml conda activate my-conda-env ``` -Install Jupter and required packages +Install Jupyter and required packages ``` conda install jupyter pip install transformers==4.27.0 --no-deps From f4f7c5adbeb174e38d5cf85bf3b7713d1ac8c9ab Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Wed, 1 Nov 2023 08:52:40 -0400 Subject: [PATCH 16/23] Add results summary --- .../research.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md b/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md index d8b567c..1a9d404 100644 --- a/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md +++ b/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md @@ -24,3 +24,22 @@ This methodology enabled to maintain a high level of accuracy and reliability in The [Sequence to Sequence Learning with Neural Networks](https://arxiv.org/abs/1409.3215) paper inspired me to use this encoder-decoder architecture. In this paper, the authors use multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Additionally, BERT-like architectures also act as an inspiration to the encoder-decoder paradigm. This approach of utilizing an FFN as a decoder, post feature extraction, is important in handling various classification tasks, especially when dealing with specialized forms of data like text in images because it allows us to define a custom network specific to our task. + +#### Results Summary + +In our handwritten vs. typed-text classification task, the model performed impressively with an overall accuracy of \(96\%\). The test samples were handpicked to be challenging for the model to classify (since some of these were misclassified by a human). + +- **Handwritten Text Class:** + - **Precision:** \(97.96\%\) + - **Recall:** \(96.00\%\) + - **F1-Score:** \(96.97\%\) + - **Support:** 50 samples + +- **Typed Text Class:** + - **Precision:** \(96.23\%\) + - **Recall:** \(98.08\%\) + - **F1-Score:** \(97.14\%\) + - **Support:** 52 samples + +The balanced performance across both classes, as shown in the nearly identical macro average and weighted average metrics, demonstrates the model's robustness in distinguishing between handwritten and typed texts. + From 10a675b0554fd0a2d66d99e3e4c60eb7fe33d399 Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Wed, 1 Nov 2023 08:53:46 -0400 Subject: [PATCH 17/23] Add results summary --- .../research.md | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md b/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md index 1a9d404..ea57bda 100644 --- a/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md +++ b/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md @@ -23,23 +23,23 @@ This methodology enabled to maintain a high level of accuracy and reliability in The [Sequence to Sequence Learning with Neural Networks](https://arxiv.org/abs/1409.3215) paper inspired me to use this encoder-decoder architecture. In this paper, the authors use multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Additionally, BERT-like architectures also act as an inspiration to the encoder-decoder paradigm. -This approach of utilizing an FFN as a decoder, post feature extraction, is important in handling various classification tasks, especially when dealing with specialized forms of data like text in images because it allows us to define a custom network specific to our task. +This approach of utilizing an FFN as a decoder, post feature extraction, is important in handling various classification tasks, especially when dealing with specialized forms of data like text in images, because it allows us to define a custom network specific to our task. #### Results Summary In our handwritten vs. typed-text classification task, the model performed impressively with an overall accuracy of \(96\%\). The test samples were handpicked to be challenging for the model to classify (since some of these were misclassified by a human). -- **Handwritten Text Class:** - - **Precision:** \(97.96\%\) - - **Recall:** \(96.00\%\) - - **F1-Score:** \(96.97\%\) - - **Support:** 50 samples - -- **Typed Text Class:** - - **Precision:** \(96.23\%\) - - **Recall:** \(98.08\%\) - - **F1-Score:** \(97.14\%\) - - **Support:** 52 samples +- *Handwritten Text Class:* + - *Precision:* \(97.96\%\) + - *Recall:* \(96.00\%\) + - *F1-Score:* \(96.97\%\) + - *Support:* 50 samples + +- *Typed Text Class:* + - *Precision:* \(96.23\%\) + - *Recall:* \(98.08\%\) + - *F1-Score:* \(97.14\%\) + - *Support:* 52 samples The balanced performance across both classes, as shown in the nearly identical macro average and weighted average metrics, demonstrates the model's robustness in distinguishing between handwritten and typed texts. From 1b110c383910adc9615a80786329a9447b967794 Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Wed, 1 Nov 2023 09:04:47 -0400 Subject: [PATCH 18/23] Link research doc --- .../handwritten-typed-text-classification/ReadMe.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/trocr/evaluation-dataset/handwritten-typed-text-classification/ReadMe.md b/trocr/evaluation-dataset/handwritten-typed-text-classification/ReadMe.md index bd0cc00..006ba8e 100644 --- a/trocr/evaluation-dataset/handwritten-typed-text-classification/ReadMe.md +++ b/trocr/evaluation-dataset/handwritten-typed-text-classification/ReadMe.md @@ -5,7 +5,9 @@ > 1. You can further experiment with the `trocr-base-handwritten` model instead of the TrOCR large model. ## Overview -Here, we aim to build a pipeline to classify handwritten text and typed/machine-printed text extracted from images. The ultimate goal of this pipeline is to classify the plant specimen images into typed/handwritten categories to create an evaluation set. The evaluation set will be used to test the main TrOCR pipeline. We utilize various machine learning models and techniques for this purpose. +Here, we aim to build a pipeline to classify handwritten text and typed/machine-printed text extracted from images. The ultimate goal of this pipeline is to classify the plant specimen images into typed/handwritten categories to create an evaluation set. The evaluation set will be used to test the main TrOCR pipeline. We utilize various machine learning models and techniques for this purpose. + +For detailed insights on the models explored and the results of the implementation, refer to the [research.md](https://github.com/BU-Spark/ml-herbarium/blob/research-doc-patch/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md) file. ## Getting Started From b265ee7ddb4643fb334e1f5c686d4e28548ad67f Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Wed, 1 Nov 2023 09:08:02 -0400 Subject: [PATCH 19/23] fix: ReadMe vs research doc --- .../handwritten-typed-text-classification/ReadMe.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/trocr/evaluation-dataset/handwritten-typed-text-classification/ReadMe.md b/trocr/evaluation-dataset/handwritten-typed-text-classification/ReadMe.md index 006ba8e..5dab0d4 100644 --- a/trocr/evaluation-dataset/handwritten-typed-text-classification/ReadMe.md +++ b/trocr/evaluation-dataset/handwritten-typed-text-classification/ReadMe.md @@ -7,7 +7,7 @@ ## Overview Here, we aim to build a pipeline to classify handwritten text and typed/machine-printed text extracted from images. The ultimate goal of this pipeline is to classify the plant specimen images into typed/handwritten categories to create an evaluation set. The evaluation set will be used to test the main TrOCR pipeline. We utilize various machine learning models and techniques for this purpose. -For detailed insights on the models explored and the results of the implementation, refer to the [research.md](https://github.com/BU-Spark/ml-herbarium/blob/research-doc-patch/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md) file. +The following sections of the ReadMe shed light on the files and folders in this directory and how to run the model scripts. For detailed insights on the models explored and the results of the implementation, refer to the [research.md](https://github.com/BU-Spark/ml-herbarium/blob/research-doc-patch/trocr/evaluation-dataset/handwritten-typed-text-classification/research.md) file. ## Getting Started From fa4bd5849487a7f5c25437f98742d7c05656cc47 Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Wed, 1 Nov 2023 09:29:35 -0400 Subject: [PATCH 20/23] Add paper and dataset citations --- trocr/label-extraction/research.md | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/trocr/label-extraction/research.md b/trocr/label-extraction/research.md index 2625d51..b9d836d 100644 --- a/trocr/label-extraction/research.md +++ b/trocr/label-extraction/research.md @@ -1,25 +1,25 @@ # [Research] DETR (DEtection TRansformer) #### Overview -To choose an optimal model for detecting labels in plant sample images, a review of various models was undertaken. The task was to discern labels from plant specimen images, with potential models including LayoutLMv3 and DETR. A detailed comparison and a critical review of the models led to an optimal model selection, aligning with the project's specific goals and constraints. +To choose an optimal model for detecting labels in plant sample images, a review of various models was undertaken. The task was to discern labels from plant specimen images, with potential models including LayoutLMv3[^1^] and DETR[^2^]. A detailed comparison and a critical review of the models led to an optimal model selection, aligning with the project's specific goals and constraints. #### Analysis of Models -During the model selection process, **BioBERT** and **LayoutLMv3** were meticulously analyzed, with the former already in use in our post-OCR step. +During the model selection process, **BioBERT**[^3^](as part of the TaxoNERD[^5^] module) and **LayoutLMv3** were meticulously analyzed, with the former already in use in our post-OCR step. - **BioBERT:** The BioBERT paper emphasizes the significance of pre-training with biomedical text. The model, trained over 23 days utilizing 8 V100 GPUs, exhibited superior performance over pre-trained BERT in scientific Named Entity Recognition (NER) tasks. - **LayoutLMv3:** - LayoutLMv3 initialized its text modality with RoBERTa weights and underwent subsequent pre-training on the IIT-CDIP dataset. The multi-model nature of the model could prove effective as well. + LayoutLMv3 initialized its text modality with RoBERTa weights and underwent subsequent pre-training on the IIT-CDIP dataset[^6^]. The multi-model nature of the model could prove effective as well. An in-depth reading of these papers raised concerns over the loss of nuanced information learned from pre-training on medical text, which could potentially be a setback for the project. The risk was highlighted by our objective to focus information extraction solely from the labels on our specimen images, and implementing LayoutLMv3 could potentially deviate us from this goal. #### Rationale for Model Selection -Given the potential limitations and changes required to the existing pipeline, to have BioBERT as an isolated post-processing step was preferred. This would offer flexibility in integrating later models like **SciBERT** and leveraging off-the-shelf models pre-trained on biomedical text. +Given the potential limitations and changes required to the existing pipeline, to have BioBERT as an isolated post-processing step was preferred. This would offer flexibility in integrating later models like **SciBERT**[^4^] and leveraging off-the-shelf models pre-trained on biomedical text. With the constrained timeline, aiming to label adequate data, pre-training the text modality of the LayoutLMv3 model, and documenting the results appeared ambitious. -Therefore, given the considerations and project alignment, DETR was opted for as the preferred model to detect labels in our specimen images. DETR’s proficiency in detecting objects, in our case labels (which are in essence, rectangular shapes) made it a fitting choice, as it synchronized well with our usecase.Additionally, integrating LayoutLMv3 would have necessitated considerable modifications to the existing pipeline, risking the loss of advantages gained from the pre-trained BioBERT. +Therefore, given the considerations and project alignment, DETR was opted for as the preferred model to detect labels in our specimen images. DETR’s proficiency in detecting objects, in our case labels (which are in essence, rectangular shapes) made it a fitting choice, as it synchronized well with our usecase. Additionally, integrating LayoutLMv3 would have necessitated considerable modifications to the existing pipeline, risking the loss of advantages gained from the pre-trained BioBERT. The model's availability on Hugging Face is also a major factor in terms of codebase maintainability and has made it an optimal choice for our task. Please feel free to checkout other models for object detection on "Papers with code". @@ -27,6 +27,18 @@ The model's availability on Hugging Face is also a major factor in terms of code DETR leverages the transformer architecture, predominantly used for NLP tasks, to process image data effectively, making it stand out from traditional CNN-based detection models. It fundamentally alters the conventional object detection paradigms, removing the need for anchor boxes and employing a bipartite matching loss to handle objects of different scales and aspect ratios, thereby mitigating issues prevalent in region proposal-based methods. The model enables processing both convolutional features and positional encodings concurrently, optimizing spatial understanding within images. -On benchmark datasets like COCO, DETR exhibits better performance, demonstrating its ability to optimize the Intersection over Union (IoU) metric, while maintaining high recall rates. It uses a set-based global loss, which helps in overcoming issues related to occlusion and object density, establishing a higher benchmark for complex tasks. +On benchmark datasets like COCO[^7^], DETR exhibits better performance, demonstrating its ability to optimize the Intersection over Union (IoU) metric, while maintaining high recall rates. It uses a set-based global loss, which helps in overcoming issues related to occlusion and object density, establishing a higher benchmark for complex tasks. Its application has extended to medical image analysis, where precise detection is pivotal. It has been especially impactful in instances where identifying and localizing multiple objects within images is crucial, such as in surveillance and autonomous vehicle navigation. + +--- + +### References: + +[^1^]: [LayoutLMv3](https://arxiv.org/abs/2204.08387). +[^2^]: [DETR (DEtection TRansformer)](https://arxiv.org/abs/2005.12872) +[^3^]: [BioBERT](https://arxiv.org/abs/1901.08746) +[^4^]: [SciBERT](https://arxiv.org/abs/1903.10676) +[^5^]: [TaxoNERD](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.13778). +[^6^]: [IIT-CDIP Dataset](https://data.nist.gov/od/id/mds2-2531). +[^7^]: [COCO Dataset](http://cocodataset.org/). From e6341caf3822a03fcc10682bc88e58ea6407b9c1 Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Wed, 1 Nov 2023 09:55:26 -0400 Subject: [PATCH 21/23] Add more citations --- trocr/label-extraction/research.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/trocr/label-extraction/research.md b/trocr/label-extraction/research.md index b9d836d..b1ddea9 100644 --- a/trocr/label-extraction/research.md +++ b/trocr/label-extraction/research.md @@ -4,7 +4,7 @@ To choose an optimal model for detecting labels in plant sample images, a review of various models was undertaken. The task was to discern labels from plant specimen images, with potential models including LayoutLMv3[^1^] and DETR[^2^]. A detailed comparison and a critical review of the models led to an optimal model selection, aligning with the project's specific goals and constraints. #### Analysis of Models -During the model selection process, **BioBERT**[^3^](as part of the TaxoNERD[^5^] module) and **LayoutLMv3** were meticulously analyzed, with the former already in use in our post-OCR step. +During the model selection process, **BioBERT**[^3^](as part of the TaxoNERD module - paper[^4^] and GitHub[^5^]) and **LayoutLMv3** were meticulously analyzed, with the former already in use in our post-OCR step. - **BioBERT:** The BioBERT paper emphasizes the significance of pre-training with biomedical text. The model, trained over 23 days utilizing 8 V100 GPUs, exhibited superior performance over pre-trained BERT in scientific Named Entity Recognition (NER) tasks. @@ -15,7 +15,7 @@ During the model selection process, **BioBERT**[^3^](as part of the TaxoNERD[^5^ An in-depth reading of these papers raised concerns over the loss of nuanced information learned from pre-training on medical text, which could potentially be a setback for the project. The risk was highlighted by our objective to focus information extraction solely from the labels on our specimen images, and implementing LayoutLMv3 could potentially deviate us from this goal. #### Rationale for Model Selection -Given the potential limitations and changes required to the existing pipeline, to have BioBERT as an isolated post-processing step was preferred. This would offer flexibility in integrating later models like **SciBERT**[^4^] and leveraging off-the-shelf models pre-trained on biomedical text. +Given the potential limitations and changes required to the existing pipeline, to have BioBERT as an isolated post-processing step was preferred. This would offer flexibility in integrating later models like **SciBERT**[^7^] and leveraging off-the-shelf models pre-trained on biomedical text. With the constrained timeline, aiming to label adequate data, pre-training the text modality of the LayoutLMv3 model, and documenting the results appeared ambitious. @@ -27,7 +27,7 @@ The model's availability on Hugging Face is also a major factor in terms of code DETR leverages the transformer architecture, predominantly used for NLP tasks, to process image data effectively, making it stand out from traditional CNN-based detection models. It fundamentally alters the conventional object detection paradigms, removing the need for anchor boxes and employing a bipartite matching loss to handle objects of different scales and aspect ratios, thereby mitigating issues prevalent in region proposal-based methods. The model enables processing both convolutional features and positional encodings concurrently, optimizing spatial understanding within images. -On benchmark datasets like COCO[^7^], DETR exhibits better performance, demonstrating its ability to optimize the Intersection over Union (IoU) metric, while maintaining high recall rates. It uses a set-based global loss, which helps in overcoming issues related to occlusion and object density, establishing a higher benchmark for complex tasks. +On benchmark datasets like COCO[^8^], DETR exhibits better performance, demonstrating its ability to optimize the Intersection over Union (IoU) metric, while maintaining high recall rates. It uses a set-based global loss, which helps in overcoming issues related to occlusion and object density, establishing a higher benchmark for complex tasks. Its application has extended to medical image analysis, where precise detection is pivotal. It has been especially impactful in instances where identifying and localizing multiple objects within images is crucial, such as in surveillance and autonomous vehicle navigation. @@ -38,7 +38,8 @@ Its application has extended to medical image analysis, where precise detection [^1^]: [LayoutLMv3](https://arxiv.org/abs/2204.08387). [^2^]: [DETR (DEtection TRansformer)](https://arxiv.org/abs/2005.12872) [^3^]: [BioBERT](https://arxiv.org/abs/1901.08746) -[^4^]: [SciBERT](https://arxiv.org/abs/1903.10676) -[^5^]: [TaxoNERD](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.13778). +[^7^]: [SciBERT](https://arxiv.org/abs/1903.10676) +[^4^]: [TaxoNERD](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.13778). [^6^]: [IIT-CDIP Dataset](https://data.nist.gov/od/id/mds2-2531). -[^7^]: [COCO Dataset](http://cocodataset.org/). +[^8^]: [COCO Dataset](http://cocodataset.org/). +[^5^]: [TaxoNERD GitHub Repository](https://github.com/nleguillarme/taxonerd). From 7d3f3485ce9ae77fb175a35880d298dc9e00967b Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Wed, 1 Nov 2023 10:56:59 -0400 Subject: [PATCH 22/23] Add results summary --- trocr/label-extraction/research.md | 40 +++++++++++++++++++++++++++--- 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/trocr/label-extraction/research.md b/trocr/label-extraction/research.md index b1ddea9..9c86dc2 100644 --- a/trocr/label-extraction/research.md +++ b/trocr/label-extraction/research.md @@ -1,9 +1,9 @@ # [Research] DETR (DEtection TRansformer) -#### Overview +### Overview To choose an optimal model for detecting labels in plant sample images, a review of various models was undertaken. The task was to discern labels from plant specimen images, with potential models including LayoutLMv3[^1^] and DETR[^2^]. A detailed comparison and a critical review of the models led to an optimal model selection, aligning with the project's specific goals and constraints. -#### Analysis of Models +### Analysis of Models During the model selection process, **BioBERT**[^3^](as part of the TaxoNERD module - paper[^4^] and GitHub[^5^]) and **LayoutLMv3** were meticulously analyzed, with the former already in use in our post-OCR step. - **BioBERT:** @@ -14,7 +14,7 @@ During the model selection process, **BioBERT**[^3^](as part of the TaxoNERD mod An in-depth reading of these papers raised concerns over the loss of nuanced information learned from pre-training on medical text, which could potentially be a setback for the project. The risk was highlighted by our objective to focus information extraction solely from the labels on our specimen images, and implementing LayoutLMv3 could potentially deviate us from this goal. -#### Rationale for Model Selection +### Rationale for Model Selection Given the potential limitations and changes required to the existing pipeline, to have BioBERT as an isolated post-processing step was preferred. This would offer flexibility in integrating later models like **SciBERT**[^7^] and leveraging off-the-shelf models pre-trained on biomedical text. With the constrained timeline, aiming to label adequate data, pre-training the text modality of the LayoutLMv3 model, and documenting the results appeared ambitious. @@ -23,7 +23,7 @@ Therefore, given the considerations and project alignment, DETR was opted for as The model's availability on Hugging Face is also a major factor in terms of codebase maintainability and has made it an optimal choice for our task. Please feel free to checkout other models for object detection on "Papers with code". -#### About DETR +### About DETR DETR leverages the transformer architecture, predominantly used for NLP tasks, to process image data effectively, making it stand out from traditional CNN-based detection models. It fundamentally alters the conventional object detection paradigms, removing the need for anchor boxes and employing a bipartite matching loss to handle objects of different scales and aspect ratios, thereby mitigating issues prevalent in region proposal-based methods. The model enables processing both convolutional features and positional encodings concurrently, optimizing spatial understanding within images. @@ -33,6 +33,38 @@ Its application has extended to medical image analysis, where precise detection --- +### Evaluation Summary + +The model's performance was evaluated using the COCO evaluation metrics, a standard benchmark for object detection algorithms. The following results provide insights into its accuracy and precision: + +- **Intersection over Union (IoU)**: + - This metric quantifies the overlap between the predicted bounding box and the actual ground truth. Higher values indicate better alignment between predictions and ground truth. + +- **Average Precision (AP)**: + - `AP (IoU=0.50:0.95, all sizes)`: 0.229 + - A comprehensive metric measuring the model's precision over multiple IoU thresholds (0.50 to 0.95) and object sizes. + - `AP (IoU=0.50, all sizes)`: 0.401 + - At a lenient overlap requirement of 0.50, the model exhibits a precision of 0.401. + - `AP (IoU=0.75, all sizes)`: 0.262 + - At a stricter overlap of 0.75, precision drops slightly. + - For specific object sizes: + - `AP (IoU=0.50:0.95, large objects)`: 0.229 + - The model's precision for small and medium objects was not evaluated or not available in the dataset. + +- **Average Recall (AR)**: + - Reflecting the model's ability to identify all potential objects: + - `AR (maxDets=1)`: 0.161 + - `AR (maxDets=10)`: 0.316 + - `AR (maxDets=100)`: 0.316 + - For specific object sizes: + - `AR (IoU=0.50:0.95, large objects)`: 0.316 + - Recall for small and medium objects was not evaluated or not available in the dataset. + +**Interpretation**: +The model demonstrates respectable precision, especially at a lenient IoU threshold of 0.50. While precision tends to drop with stricter IoU thresholds, the average recall indicates the model's consistent ability to identify objects, especially when considering a larger number of detections per image. The model's proficiency in detecting larger objects is evident, while its performance on smaller or medium objects requires further assessment and improvement. + +--- + ### References: [^1^]: [LayoutLMv3](https://arxiv.org/abs/2204.08387). From ae640582fb7272e30fce2be007888a05b993113d Mon Sep 17 00:00:00 2001 From: Kabilan Mohanraj Date: Wed, 1 Nov 2023 10:59:30 -0400 Subject: [PATCH 23/23] link research doc --- trocr/label-extraction/ReadMe.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/trocr/label-extraction/ReadMe.md b/trocr/label-extraction/ReadMe.md index 8353ae8..6761cd3 100644 --- a/trocr/label-extraction/ReadMe.md +++ b/trocr/label-extraction/ReadMe.md @@ -15,6 +15,8 @@ ## Overview Here, we aim to use DETR (DEtection TRansformer) to segment labels from our plant sample images, through object detection. +The following sections of the ReadMe shed light on the files and folders in this directory and how to run the model scripts. For detailed insights on the models explored and the results of the implementation, refer to the [research.md](https://github.com/BU-Spark/ml-herbarium/blob/research-doc-patch/trocr/label-extraction/research.md) file. + ## Getting Started ### Prerequisites and Installation