👋 Hello! At Bayer's Machine Learning Research we are very happy with your interest in our work.
In this challenge you will help us predict a couple of important antibody properties: 🩹 binding to SARS-CoV2 and 💊 developability as therapeutics.
We will provide you with some data (antibody sequences 🧬, ground truth and more) plus code to load them. We also provide features that should be used to build machine learning models.
Note: there is no need to review any of the provided links to successfully complete the task.
Table of contents
We recommend using conda to manage code dependencies, and we provide a ready to use environment.
-
If needed, install conda.
-
Clone the repository.
git clone https://github.com/bayer-science-for-a-better-life/mlr-challenges-frankenbody.git
- Create the environment (and add your favorite libraries).
cd mlr-challenges-frankenbody
conda env create -f environment.yml
# add your favorite software
# e.g., conda install -c conda-forge jupyter seaborn pytorch biotite -n frankenbody
# or edit environment.yml
- Run the sanity checks
conda activate frankenbody
frankenbody smoke
# if all this worked, you should see printed "SMOKE TESTS HAVE PASSED"
# otherwise, please drop us an email
As an alternative to cloning, you can download the challenge code and data as a zip file.
These data are small enough for the challenge to be solved in a commodity laptop.
At the agreed time, you will receive an email with extra information.
- To allow for last minute changes, please pull or redownload the repository.
cd mlr-challenges-frankenbody
git pull
- We will provide you with a key to access the secret parts of the challenge (essentially the data).
You will need to add it to frankenbody/private_key.py like:
# Hardcode here the key sent to you, should look like:
FRANKENBODY_PRIVATE_KEY = b'FLrMTzp5j-tGSC6T01X-bMW6B1DEitatc6JmUP3Xs6M='
- To verify that everything has worked correctly, please run:
conda activate frankenbody
frankenbody smoke-challenge
# if all this worked, you should see printed "CHALLENGE SMOKE TESTS HAVE PASSED"
# also a new file "frankenbody/challenge.py" should appear
# if not, please drop us an email
Feel free to use anything, from python files to notebooks, to shape the solution.
We just want to get a sense of your thought process and skills.
Please, do not spend more than 4 hours solving the challenge.
We respect and appreciate your time. The challenge is scoped in a way that allows for many paths to completion. We are happy to receive solutions within shorter time frames.
If there are features you wish you had time to implement, feel free to use pseudocode and/or prose to describe them.
Please email us when the solution is ready. To share the solution you can:
-
Use a private GitHub/GitLab repository.
This is our preference. You could set it up before the challenge date.
Please give us access to the repository and share the link with us in the email. -
Add an attachment to the email.
Alternatively, you can also send the results in a compressed file.
Please beware of large attachement sizes. For example, if you send us notebooks, it would be a good idea to clean first variables holding heavy state (e.g., data and models).
Your submission will be reviewed by at least two of our colleagues and discuss it together at later steps during the interview process.
Do not hesitate to email us, we are happy to help!
We highly appreciate feedback. Please, let us know about any thought you can share with us.
These data have been derived from: