Official PyTorch repository for Ship in Sight: Diffusion Models for Ship-Image Super Resolution, WCCI 2024.
In recent years, remarkable advancements have been achieved in the field of image generation, primarily driven by the escalating demand for high-quality outcomes across various image generation subtasks, such as inpainting, denoising, and super resolution. A major effort is devoted to exploring the application of super-resolution techniques to enhance the quality of low-resolution images. In this context, our method explores in depth the problem of ship image super resolution, which is crucial for coastal and port surveillance.
We investigate the opportunity given by the growing interest in text-to-image diffusion models, taking advantage of the prior knowledge that such foundation models have already learned. In particular, we present a diffusion-model-based architecture that leverages text conditioning during training while being class-aware, to best preserve the crucial details of the ships during the generation of the super-resoluted image. Since the specificity of this task and the scarcity availability of off-the-shelf data, we also introduce a large labeled ship dataset scraped from online ship images, mostly from ShipSpotting website.
Our method achieves more robust results than other deep learning models previously employed for super resolution, as proven by the multiple experiments performed. Moreover, we investigate how this model can benefit downstream tasks, such as classification and object detection, thus emphasizing practical implementation in a real-world scenario. Experimental results show flexibility, reliability, and impressive performance of the proposed framework over state-of-the-art methods for different tasks.
Paper | IEEE Paper |
Luigi Sigillo, Riccardo Fosco Gramaccioni, Alessandro Nicolosi, Danilo Comminiello
ISPAMM Lab, Sapienza University of Rome
- 08.04.2024: Dataset is released.
- 08.04.2024: Checkpoints are released.
- 18.03.2024: Repo is released.
For more evaluation, please refer to our paper for details.
- Pytorch == 1.12.1
- CUDA == 11.7
- pytorch-lightning==1.4.2
- xformers == 0.0.16 (Optional)
- Other required packages in
environment.yaml
# git clone this repository
git clone https://github.com/luigisigillo/ShipinSight.git
cd ShipinSight
# Create a conda environment and activate it
conda env create --file environment.yaml
conda activate ShipinSight
# Install xformers
conda install xformers -c xformers/label/dev
# Install taming & clip
pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
pip install -e git+https://github.com/openai/CLIP.git@main#egg=clip
pip install -e .
Download the pretrained Stable Diffusion models from [HuggingFace]
python main.py --train --base configs/shipinsight/v2-finetune_text_T_512.yaml --gpus GPU_ID, --name NAME --scale_lr False
python main.py --train --base configs/shipinsight/v2-finetune_text_T_512.yaml --gpus GPU_ID, --resume RESUME_PATH --scale_lr False
Request access to the Diffusion and autoencoder pretrained models from Google Drive.
- Test on 128 512: You need at least 10G GPU memory to run this script (batchsize 2 by default)
python scripts/sr_val_ddpm_text_T_vqganfin_old.py --config configs/shipinsight/v2-finetune_text_T_512.yaml --ckpt CKPT_PATH --vqgan_ckpt VQGANCKPT_PATH --init-img INPUT_PATH --outdir OUT_DIR --ddpm_steps 200 --dec_w 0.5 --colorfix_type adain
To construct such a dataset, a straightforward approach was scraping images from the web. The main source for our dataset is ShipSpotting, which serves as a repository for user uploaded images, hosting a vast collection of ship images, amounting to approximately 3 million. Furthermore, for each image, valuable supplementary information is available, such as the type of the ship, and present and past names. Next, we made sure that as many images as possible were collected in our dataset, since in deep learning, the quantity of training data directly influences the quality of results.
A larger volume of data enables models to generalize more effectively. Thus we scrape all the images and as a result, the dataset comprises a total of 1.517.702 samples. We exclude many classes of ships from our final analysis and concentrate on the more common and valuable for a real scenario use case. The total number of different classes is 20 and the ship categories included are Bulkers, Containerships, Cruise ships, Dredgers, Fire Fighting Vessels, Floating Sheerlegs, General Cargo, Inland, Livestock Carriers, Passenger Vessels, Patrol Forces, Reefers, Ro-ro, Supply ships, Tankers, Training ships, Tugs, Vehicle Carriers, Wood Chip Carriers. The total amount of samples after this class selection is 507.918.
Request access to the Dataset from Google Drive.
Please cite our work if you found it useful:
@INPROCEEDINGS{10650042,
author={Sigillo, Luigi and Gramaccioni, Riccardo Fosco and Nicolosi, Alessandro and Comminiello, Danilo},
booktitle={2024 International Joint Conference on Neural Networks (IJCNN)},
title={Ship in Sight: Diffusion Models for Ship-Image Super Resolution},
year={2024},
volume={},
number={},
pages={1-8},
keywords={Training;Analytical models;Image synthesis;Surveillance;Superresolution;Text to image;Diffusion models;Generative Deep Learning;Image Super resolution;Diffusion Models;Ship Classification},
doi={10.1109/IJCNN60899.2024.10650042}}
This project is based on stablediffusion, latent-diffusion, SPADE, mixture-of-diffusers and StableSR. Thanks for their awesome work.