- Python 3 (In our experiments, we used python 3.8.8)
- Blender 2.82a
albumentations==1.3.0
einops==0.6.0
huggingface-hub==0.12.1
imageio==2.22.4
kornia==0.6.9
Levenshtein==0.20.9
numpy==1.22.3
opencv-python==4.5.5.64
pandas==1.4.2
scikit-image==0.19.3
scikit-learn==1.0.2
scipy==1.8.0
timm==0.4.12
torch==1.11.0+cu113
torchaudio==0.11.0+cu113
torchvision==0.12.0+cu113
tqdm==4.64.0
transformers==4.26.1
typing_extensions==4.2.0
wandb==0.13.5
To ensure no omissions, ./pip_freeze.txt
shows the output when pip freeze
is executed in our environment.
- DDI-100: Download documents consisting of texts and figures. Then, Combine texts under the
orig_texts
folder and figures under theorig_backgrounds
folder. Put obtained document images under./materials/docs/
. - Doc3D: Download paper meshes (.obj). Put these under
./materials/paper_meshes/
. - SVBRDFs dataset (Deschaintre+, 2018): Download the dataset (85GB zipped) and crop a subset of sand and fabric normal map images. Then, put these under
./materials/selected_normal_maps/
. - ShapeNet: Download ShapeNetCore.v2 and unzip it. Then, put it under
./materials/objects/
. - Laval Indoor HDR dataset: Download panoramas and convert these
.png
files, then put these under./materials/panoramas/indoor/
- SUN360 dataset: Contact the author, download panoramas, and convert these
.png
files. Then, put these under./materials/panoramas/indoor/
and./materials/panoramas/outdoor/
, respectively. - Execute
python3 utils/making_material_csv.py
insrc
directory.
We initially put a few materials for placing checks and reproducibility checks.
However, we will not provide these all materials, only downloaded by original links.
We used blender 2.82a, and you should install some packages. An example installation script is below.
wget https://download.blender.org/release/Blender2.82/blender-2.82-linux64.tar.xz
tar xvf blender-2.82-linux64.tar.xz
export PATH=/home/ubuntu/local/blender-2.82-linux64/:$PATH
wget https://bootstrap.pypa.io/get-pip.py
/home/ubuntu/local/blender-2.82-linux64/2.82/python/bin/python3.7m get-pip.py
/home/ubuntu/local/blender-2.82-linux64/2.82/python/bin/python3.7m -m pip install easydict
/home/ubuntu/local/blender-2.82-linux64/2.82/python/bin/python3.7m -m pip install pillow
/home/ubuntu/local/blender-2.82-linux64/2.82/python/bin/python3.7m -m pip install opencv-python
/home/ubuntu/local/blender-2.82-linux64/2.82/python/bin/python3.7m -m pip install pandas
/home/ubuntu/local/blender-2.82-linux64/2.82/python/bin/python3.7m -m pip install tqdm
!!!Every command should be executed in src
directory!!!
# rendering synthetic document images
blender -b --python utils/SynDocRenderer.py -- --rendering --save_path output
python3 utils/postprocess_SynDocDS.py
Then, the dataset, SynDocDS, is under the ./datasets/SynDocDS
and the corresponding csv file is under the ./datasets/csv/SynDocDS
.
Brief descriptions are below.
./src/utils/SynDocRenderer.py
: Rendering synthetic images. This mainly corresponds toSection 3.1. Image Rendering
in our paper../src/utils/postprocess_SynDocDS.py
: Providing post-processed shadow mattes, background color, and dataset csv. Augmenting shadow images corresponding toSection 3.2. Enriching Shadow Images
in our paper.
We will release this code and our SynDocDS dataset, which was used for training when our paper is public.
- OSR dataset: We used
Control_*
data, which have ground truth. - Kligler's dataset
- Jung's dataset
Each image is placed in the ./datasets/[DATASETNAME]/shadowfree_image
and ./datasets/[DATASETNAME]/shadow_image
folders with the same name. In addition, the OSR dataset also contains shadow masks, which are placed in the ./datasets/[DATASETNAME]/shadow_mask
folder.
To make shadow masks and background colors for real datasets, use postprocess_real_dataset.ipynb
. Put target dataset name NAME = DIR = [DATASETNAME]
.
Training/validation/test splits are specified in ./datasets/csv/[DATASETNAME]/
python3 train.py configs/DSFN_SynDocDS/config.yaml
If you want to use W&B to display output images, losses, and evaluation scores, you can use --use_wandb
option.
Please download here.
Please check an example config file for finetuning, DSFN_SynDocDS_FT_[DATASETNAME]/config.yaml
.
*example
python3 train.py configs/DSFN_SynDocDS_FT_Kligler/config.yaml
python3 evaluate.py configs/[CONFIG_NAME]/config.yaml --test_data [DATASET_NAME]
*example
python3 evaluate.py configs/DSFN_SynDocDS/config.yaml --test_data JungAll
python3 evaluate.py configs/DSFN_SynDocDS_FT_Jung/config.yaml --test_data Jung
For the test dataset name, please refer to ./src/libs/dataset_csv.py
The result images are under ./configs/[CONFIG_NAME]/
and evaluation score is saved as ./configs/[CONFIG_NAME]/evaluation_score.txt
We used Tesseract for OCR. It should be installed following this instruction.
We used parts of OSR dataset that have detectable texts.
The images (the file name starts from these numbers) have detectable texts: ["003", "005", "006", "007", "008", "009", "010", "011", "012", "013", "014", "015", "016", "024", "025", "026"]
The images (the file name starts from these numbers) are rotated 90 degrees to the right: ["001", "002", "017", "019", "020", "021", "023"]
(NOTE: These are actually not used.)
ocr.ipynb
is a short code for calculating edit distances.
We used the rendering code for SDSRD from this repository.
We carefully selected the WORKLOAD files by referring to commit logs, etc., to use a dataset as close as possible to those used in previous studies (Lin+, CVPR2020).