Skip to content

πŸ–‹οΈ 3 open source signature detection and extraction modules were analyzed. 3 more modules were created by modifying and combining the open source modules, creating a module which outperformed all existing modules. A custom CNN was built using Keras to detect signatures and pre existing VGG16 CNN was modified to create a signature extraction mode…

Notifications You must be signed in to change notification settings

Purefekt/signature-detection-and-extraction

Repository files navigation

Signature Detection and Extraction

Summary

This repository explore signature detection and extraction using image processing methods and convolutional neural networks.

  • For image processing methods we have 6 modules (3 open source modules and 3 hybrids of those open source modules)
  • For CNNs we have a custom CNN to detect if a given document has a signature or not and a CNN with bounding box regression to get the bounding box coordinates for a given document.

Image Processing modules

  • Module 1 -> Adapted from https://github.com/ahmetozlu/signature_extractor. This project aims at removing the signatures from a given document. I have modified it to extract the signature instead and then used morphological techniques to find the bounding box for the signature.
  • Module 2 -> Adapted from https://github.com/EnzoSeason/signature_detection. This project extracts the signature in a given document, but it didnt output the bounding box coordinates. I modified this to instead output the bounding box coordinates of the signature.
  • Module 3 -> This is a hybrid of modules 1 and 2.
  • Module 4 -> Adapted from https://github.com/saifkhichi96/signature-extraction. This project has a gui, it lets the user pick an image and then extracts the signature using a decision tree classifier which was trained a dataset of 8000 signatures. I modified it by removing the gui since it was really slow while testing on 700+ images and also made it output just the bounding box coordinates for the signature.
  • Module 5 -> Hybrid of modules 1 and 4.
  • Module 6 -> Hybrid of modules 1,2 and 4.

Convolutional Neural Network

  • Custom CNN classifier -> A custom CNN was built using Keras. This CNN was trained on the entire Tobacco800 dataset (1290 images). Images with a signature (776 images) were labelled as 1 and images without a signature were labelled as 0 (514).
  • CNN based on VGG16 with bounding box regression layer -> A pre existing CNN architecture VGG16 was used. The final fully connected layer was removed and replaced with a bounding box regression layer. The dataset was the 776 images (subset of tobacco800) which had signatures. This model was used to predict the bounding box coordinates of signatures on documents.

Evaluating Different Resources

This Jupyter notebook contains the evaluation - evaluating_different_signature_detection_methods.ipynb

Install prerequisites

ImageMagick (for macOS use brew)

brew install imagemagick

Install python packages with conda

Note --> Install all packages one by one, requirements.txt has some issues because of different channels and opencv-contrib needs to be built from source.

Create conda environment with the name signature-detection-and-extraction and activate virtual environment

conda create --name signature-detection-and-extraction

conda activate signature-detection-and-extraction

Install packages one by one

conda install -c anaconda joblib

conda install -c anaconda numpy

conda install -c anaconda scikit-learn

conda install -c anaconda scipy

conda install -c conda-forge opencv

conda install -c conda-forge threadpoolctl

conda install -c conda-forge tqdm

opencv-contrib package will have to be built from source with NONFREE flag set. If this package is installed directly then we wont be able to use a crucial function in the code. This might take some time.

CMAKE_ARGS="-DOPENCV_ENABLE_NONFREE=ON" pip install --no-binary=opencv-contrib-python opencv-contrib-python

conda install -c anaconda scikit-image

conda install -c conda-forge notebook

conda install -c anaconda pandas

conda install -c conda-forge notebook

conda install -c anaconda keras

pip install -U plaidml-keras

To use conda environment in Jupyter Notebook, issue the following command

python -m ipykernel install --user --name=signature-detection-and-extraction

Directory tree

Project root/
β”œβ”€β”€ 6_modules_examples.ipynb
β”œβ”€β”€ all_modules.py
β”œβ”€β”€ assets/
β”‚   β”œβ”€β”€ ...
β”œβ”€β”€ cnn_bounding_box_regression/
β”‚   β”œβ”€β”€ bbox_regression_cnn.h5 (download seperately)
β”‚   β”œβ”€β”€ cnn_bbox_regression.py
β”‚   β”œβ”€β”€ create_ground_truth_csv.py
β”‚   β”œβ”€β”€ eval_images_ground_truth.py
β”‚   β”œβ”€β”€ eval_images_ground_truth_bbox.json
β”‚   β”œβ”€β”€ eval_images_predicted_bbox.json
β”‚   β”œβ”€β”€ evaluating_bbox_regression_cnn.ipynb
β”‚   β”œβ”€β”€ iou_bbox_regression.json
β”‚   β”œβ”€β”€ plot.png
β”‚   β”œβ”€β”€ predictions_cnn_bbox_regression.py
β”‚   └── test_images.txt
β”œβ”€β”€ cnn_class_label_predictor/
β”‚   β”œβ”€β”€ basic_cnn_model_gpu.h5 (download seperately)
β”‚   β”œβ”€β”€ basic_cnn_plaidML_gpu.py
β”‚   β”œβ”€β”€ cleaning_data.py
β”‚   β”œβ”€β”€ cpu_vs_gpu.png
β”‚   β”œβ”€β”€ evaluating_basic_cnn.ipynb
β”‚   └── test/
β”‚       β”œβ”€β”€ ...
β”œβ”€β”€ dataset_docs_with_signs/
β”‚   └── cleaning_data.py
β”œβ”€β”€ evaluating_different_signature_extraction_modules.ipynb
β”œβ”€β”€ modules/
β”‚   β”œβ”€β”€ model_4_5_6/
β”‚   β”‚   └── decision-tree.pkl
β”‚   β”œβ”€β”€ module_1/
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ module_1.pdf
β”‚   β”œβ”€β”€ module_2/
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   └── module_4/
β”‚       β”œβ”€β”€ ...
β”œβ”€β”€ README.md
β”œβ”€β”€ serialize_ground_truth_bbox.py
β”œβ”€β”€ serialize_iou_data.py
└── serialize_predicted_bbox.py

Summary of all files

  • 6_modules_examples.ipynb -> jupyter notebook with examples on how to use all 6 modules to get bbox coordinates of signature in an image
  • all_modules.py -> class containing methods for implementing all 6 signature extraction modules
  • assets/ -> directory containing resources like test image, json file with bbox coord data, etc
  • cnn_bounding_box_regression/bbox_regression_cnn.h5 -> trained model CNN for bbox regression
  • cnn_bounding_box_regression/cnn_bbox_regression.py -> python script to train the CNN model for bbox regression
  • cnn_bounding_box_regression/create_ground_truth_csv.py -> python script to convert ground truth bbox values into a csv file for input for the CNN
  • cnn_bounding_box_regression/eval_images_ground_truth.py -> python scrip for getting the ground truth bbox values for evaluation set
  • cnn_bounding_box_regression/eval_images_ground_truth_bbox.json -> json file containing the ground truth bbox values for images in evaluation set
  • cnn_bounding_box_regression/eval_images_predicted_bbox.json -> json file containing the predicted bbox values for the images in the evaluation dataset
  • cnn_bounding_box_regression/evaluating_bbox_regression_cnn.ipynb -> jupyter notebook showing results for the bbox regression CNN model
  • cnn_bounding_box_regression/iou_bbox_regression.json -> json file containing the iou values for the images in the evaluation dataset
  • cnn_bounding_box_regression/plot.png -> loss over 25 epochs while training the model
  • cnn_bounding_box_regression/predictions_cnn_bbox_regression.py -> python script which predicts the bboxes on the entire eval dataset using the trained model
  • cnn_bounding_box_regression/test_images.txt -> list of filenames of images in the test set
  • cnn_class_label_predictor/basic_cnn_model_gpu.h5 -> trained custom CNN for classification
  • cnn_class_label_predictor/basic_cnn_plaidML_gpu.py -> python script to train the custom CNN with plaidML backend to use AMG GPU
  • cnn_class_label_predictor/cleaning_data.py -> python script to create a custom dataset
  • cnn_class_label_predictor/cpu_vs_gpu.png -> image showing the difference in time while using cpu and gpu for training a model
  • cnn_class_label_predictor/evaluating_basic_cnn.ipynb -> jupyter notebook showing results for the basic custom CNN model
  • cnn_class_label_predictor/test/ -> directory containing the test files. These are used by the cnn_class_label_predictor/evaluating_basic_cnn.ipynb jupyter notebook
  • dataset_docs_with_signs/cleaning_data.py -> python script to create custom dataset
  • evaluating_different_signature_extraction_modules.ipynb -> results for the 6 modules used for signature extraction
  • modules/model_4_5_6/decision-tree.pkl -> trained decision tree model used by modules 4,5,6
  • modules/module_1/ -> all files used by module 1
  • modules/module_1.pdf -> pdf explaining how module 1 works
  • modules/module_2/ -> all files used by module 2
  • modules/module_4/ -> all files used by module 3
  • serialize_ground_truth_bbox.py -> python script to convert xml ground truth bbox data files into one json file
  • serialize_iou_data.py -> python script which uses the ground truth bbox json file and predicted bbox json file to calculate iou and serializes that information into a new json file
  • serialize_predicted_bbox.py -> python script which uses all modules and predicts bboxes for all images and serializes that data into seperate json files for each module

About

πŸ–‹οΈ 3 open source signature detection and extraction modules were analyzed. 3 more modules were created by modifying and combining the open source modules, creating a module which outperformed all existing modules. A custom CNN was built using Keras to detect signatures and pre existing VGG16 CNN was modified to create a signature extraction mode…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published