implemented as parameterized Jupyter Notebooks executed via Papermill library:
A multi-style version of NST capable of transferring style from multiple source images (with equal weights by default). The styled image is found via training the Deep Neural Network match the styled of the found (styled) image to the style of the style source image as closely as possible (without deviating from the content image in terms of content). The implementation is based on PyTorch Hyperlight micro ML framework and uses the algorithm from "A Neural Algorithm of Artistic Style" [1] paper with a few modifications:
- style can be transferred from an arbitrary number of style source images (giving equal weights to each style), not just one
- different layers of VGG19 are used for more realistic style transfer
- an early stopping is used for choosing the number of epochs automatically
Please refer to this Multi-style NST Jupyter notebook for more details.
-
Pretrained CycleGan-based NST for the following styles:
- Summer to winter
- Winter to summer
- Monet
- Cezanne
- Ukiyoe
- Vangogh
The corresponding image processing pipeline is implemented in CycleGan Style transfer notebook and is based on pytorch-CycleGAN-and-pix2pix and [2], [3].
The bot uses Finite State Machine with in-memory storage to keep the running costs low. Some cloud providers, like Heroku, can put the container running the but to sleep after some period of inactivity (30 mintues for Heroku). This causes the bot to lose its in-memory state after sleep. This is not critical however, as the bot can always wake up in the initial state without making the user experience much worse. When the bot looses its state it notifies a user about it asking for a content image regardless of what happened before the period of inactivity.
The high-resolution images are scaled down before they are processed by NST models. As the result the styled (output) usually has lower resolution comparing to the input image. This is a necessary evil which allows for faster NST and lower GPU memory requirements.
The high level architecture is show below.
The bot implementation is based on AIOGram fully asynchronous framework for Telegram Bot API. The following features of AIOGram 2.x and Telegram Bot API 2.0 are used:
- AIOGram Finite State Machine (FSM) with in-memory storage is used to improve the source code structure while keeping the running costs minimal. The in-memory storage can be easily replaced with either Redis, RethinkDB, MongoDB (see https://docs.aiogram.dev/en/latest/dispatcher/fsm.html) if needed.
- Inline Keyboards
- Callbacks
ML models run in a separate ML serving backend based on Ray Serve scalable model serving library built on Ray. ML server receives the image processing requests via REST API from the bot and forwards them to Jupyter notebooks via Papermill library. Each notebook is responsible for a model-specific ML image processing pipeline. Each of the notebooks contains an ML model-specific image processing pipeline and can be found here. The notebooks have a direct access to S3 cloud, they download images from S3 and upload the processed images back to S3. ML server tracks the execution of the notebooks via Papermill and uploads the executed notebooks to S3 bucket upon completion. The failed notebooks are kept in S3 for easier debugging (intil S3's built-in retention removes them).
Amazon S3 is used for transferring the photos from the bot to ML server and back. Such approach allows easier debugging comparing to sending images as part of REST API requests directly to ML server.
The bot expects the following groups of environment variables to be defined
API_TOKEN
- the bot token generated by Telegram's @BotFatherWEBHOOK_HOST_ADDR
- the bot endpoint url exposed to TelegramPORT
- the port for bot endpoint url exposed to Telegram
S3_BUCKET_WITH_RESULTS_NAME
- S3 bucket nameS3_RESULTS_PREFIX
- S3 bucket folder nameAWS_ACCESS_KEY_ID
- AWS access key idAWS_SECRET_ACCESS_KEY
- AWS access keyREGION_NAME
- AWS region name
ML_SERVER_HOST_ADDR
- URL of ML server endpoint exposed to the botDEFAULT_FAST_DEV_RUN
- can take "True"/"False" values; enables a faster regime for ML models (with a limited number of training epochs and less number of batches in an epoch) for debugging purposes
The ML server doesn't have any configuraiton parameters and always uses port 8000 for its endpoint. The AWS credentials are expected to be defined in either ~/.aws/config
and ~/.aws/credentials
files or in AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, AWS_DEFAULT_REGION
environment variables (see Environment variables to configure the AWS CLI for more details). By default the ML server uses a single backend replica but can be easily scaled out (see [4])
The bot can be deployed in both cloud and physical environments. The following deployment scenarios were tested:
- Cloud deployment on Heroku cloud application platform as well as. telegram_bot folder of this repo contains all necessary files for Heroku deployment. When pushing the source code to Heroku repo for deployment make sure to push only telegram_bot subfolder via running
git push heroku `git subtree split --prefix telegram_bot main`:master
(otherwise you would push the whole repo and Heroku won't find the files necessary for deployment). Please refer to Heroku: getting started with Python for details.
- On-prem deployment via exposing the bot's webhook URL to Telegram via NGINX.
Tested deployment scenarios:
-
On-prem deployment on a physical Linux (Manjaro Linux) server with GeForce GTX 970 4Gb GPU. ml_server folder of this repo contains bash scripts for running the server via GNU Screen utility from under specified conda environment. Conda environment expects to contain all necessary dependencies which can either be installed via
pip install -r ./requirements.txt
or via following the instructions from PyTorch HyperLight ML Development Environment project. The latter assumes you run Arch Linux or its derivative (like Manjaro Linux). -
Cloud deployment via Amazon Sagemaker Python SDK with GPU-powered ml.p2.xlarge instance and PyTorch 1.5. In this scenario only a limited functionality of ML server (CycleGANs only) is available as the multi-style NST implementation used by ML server requires at least PyTorch 1.7. At the time of writing this the most recent version of PyTorch supported by the pre-build Amazon Sagemaker containers is 1.6. Amazon has recently released a deep learning container for PyTorch 1.7.1 but at the time of writing this the container is not yet available in Sagemaker Python SDK. The PyTorch 1.5 container was used for testing instead of PyTorch 1.6 container because the latter behaved less stably.
[1] "A Neural Algorithm of Artistic Style", Gatys, Leon A.; Ecker, Alexander S.; Bethge, Matthias, 2015, arXiv:1508.06576
[2] "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networkss", Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A Computer Vision (ICCV), 2017 IEEE International Conference on, 2017
[3] "Image-to-Image Translation with Conditional Adversarial Networks", Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A, Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, 2017
[4] Ray Serve: Advanced Topics and Configurations
[5] AIOGram: the modern and fully asynchronous framework for Telegram Bot API written in Python
[6] PyTorch Hyperlight: The ML micro-framework built as a thin wrapper around PyTorch-Lightning and Ray Tune frameworks to push the boundaries of simplicity even further