Official PyTorch implementation and pre-trained models for the paper AstroCLIP: A Cross-Modal Foundation Model for Galaxies.
AstroCLIP is a novel, cross-modal, self-supervised foundation model that creates a shared embedding space for multi-band imaging and optical spectra of galaxies. These embeddings encode meaningful physical information shared between both modalities, and can be used as the basis for competitive zero- and few-shot learning on a variety of downstream tasks, including similarity search, redshift estimation, galaxy property prediction, and morphology classification.
Check out our interactive similarity search app, enabling both in-modal and cross-modal search for galaxies: https://astroclip.streamlit.app/
The training and evaluation code requires PyTorch 2.0. Additionally, an up-to-date eventlet is required for wandb. Note that the code has only been tested with the specified versions and also expects a Linux environment. To install the AstroCLIP package and its corresponding dependencies, please follow the code below.
pip install --upgrade pip
pip install --upgrade eventlet torch lightning[extra]
pip install -e .
NOTE The package provides the three shortcuts: astroclip_trainer
and spectrum_trainer
, which link to astroclip/trainer.py
, and image_trainer
, which links to astroclip/astrodino/trainer.py
, as long as it is installed. The shortcuts are defined in the project.scripts
section of the pyproject.toml
file.
The package expects to load models and data by default from
{ASTROCLIP_ROOT}
You can configure ASTROCLIP_ROOT
as well as the weights and biases group in which runs are saved by creating a .env
file in the root of astroclip
with the following content:
ASTROCLIP_ROOT="/mnt/ceph/users/polymathic/astroclip"
WANDB_ENTITY_NAME="flatiron-scipt"
If no environment is specified, the default path at Flatiron will be assumed.
We provide the pretrained AstroCLIP model on the Huggingface model hub for easy access. Additionally, we provide the pretrained single-modal models for galaxy images and spectra as well. Model details, checkpoints, configs and logs are below.
Model Name | Pretraining | # Params. | Download | ||
---|---|---|---|---|---|
AstroCLIP | CLIP | 370M | ckpt | config | logs |
Image Encoder | DINOv2 | 302M | ckpt | config | logs |
Spectrum Encoder | Masked Modeling | 43M | ckpt | config | logs |
The pretrained AstroCLIP model can be loaded using the following:
from astroclip.models import AstroClipModel
model = AstroClipModel.load_from_checkpoint(
checkpoint_path = "path_to_model.ckpt",
)
Below, we include a high-level performance overview of our models on a variety of downstream tasks. This is non-exhaustive, and we refer the reader to the paper for the full details.
Source | Model | Type | Redshift | Properties | Morphology |
---|---|---|---|---|---|
Image | AstroCLIP* | Zero-Shot | 0.79 | 0.47 | 0.76 |
Image Encoder* | Zero-Shot | 0.63 | 0.37 | 0.78 | |
Stein, et al. | Zero-Shot | 0.36 | 0.26 | 0.76 | |
ResNet18 | Supervised | 0.77 | 0.43 | - | |
ZooBot1 | Supervised | - | - | 0.88 | |
Spectrum | AstroCLIP* | Zero-Shot | 0.99 | 0.63 | - |
Spectrum Encoder* | Zero-Shot | 0.99 | 0.64 | - | |
Conv+Att2 | Supervised | 0.99 | 0.60 | - | |
Photometry | MLP | Supervised | 0.68 | 0.42 | - |
We report R-squared metrics on redshift and galaxy property estimation (averaged across all properties) and accuracy on galaxy morphology classification (averaged across all labels). Our models are marked with an asterisk (*). [1] We use the results reported from Walmsley, et al. (2021). [2] We use the encoder from Melchior, et al. (2022).
The AstroCLIP model is trained on the cross-matched sample containing optical spectra from the Dark Energy Spectroscopic Instrument (DESI) Early Data Release (EDR) and multi-band images (g,r,z) from the DESI Legacy Survey prepared by Stein, et al. (2022). We provide the dataset as a HuggingFace dataset, which can be accessed directly using
from datasets import load_dataset
# This downloads about 60 GB of data
dset = load_dataset('astroclip/data/dataset.py')
For reproducibility, we include the scripts and a brief description of how to generate the cross-matched dataset in astroclip/data/crossmatch
.
While the AstroCLIP and Spectrum Encoder models are trained on the image-spectrum dataset, we pretrain the galaxy image model separately on full Stein, et al. (2022) image dataset, which consists of 76M galaxy images. This dataset can be accessed using this globus endpoint:
https://app.globus.org/file-manager?origin_id=9fb0fc0e-e760-11ec-9bd2-2d2219dcc1fa&origin_path=%2F
The directory is organized into south and north surveys, where each survey is split into chunks of 1,000,000 galaxies (sorted by decreasing z-band flux) and saved in hdf5 format. For more details, see here.
AstroCLIP is trained using a two-step process:
- We pre-train a single-modal galaxy image encoder and a single-modal galaxy spectrum encoder separately.
- We CLIP-align these two encoders on a paired image-spectrum dataset.
AstroCLIP uses a Vision Transformer (ViT) to encode galaxy images. Pretraining is performed using the DINOv2 package, which combines self-distillation, masked-modeling, and contrastive objectives. Overall, we use largely the same training regime, however we modify some of the contrastive augmentations to suit an astrophysics context. Model training can be launched with the following command:
image_trainer -c astroclip/astrodino/config.yaml
We train the model using 20 A100 GPUs (on 5 nodes) for 250k steps which takes roughly 46 hours.
AstroCLIP uses a 1D Transformer to encode galaxy spectra. Pretraining is performed using a masked-modeling objective, whereby the 1D spectrum is split into contiguous, overlapping patches. Model training can be launched with the following command:
spectrum_trainer fit -c config/specformer.yaml
We train the model using 4 A100 GPUs (on 1 node) for 30k steps which takes roughly 12 hours.
Once pretrained, we align the image and spectrum encoder using cross-attention projection heads to maximize the similarity between cross-modal embeddings that correspond to the same galaxy while simultaneously minimizing the similarity between cross-modal embeddings that correspond to different galaxies. Model training can be launched with the following command:
spectrum_trainer fit -c config/astroclip.yaml
We train the model using 4 A100 GPUs (on 1 node) for 25k steps or until the validation loss does not increase for a fixed number of steps. This takes roughly 12 hours.
We demonstrate that the AstroCLIP can be used to easily perform a variety of downstream tasks. In particular, we demonstrate their ability to do:
- In-modal and cross-modal similarity search
- Photometric redshift prediction
- Physical property estimation from images
- Physical property estimation from spectra
- Morphology classification from images
The details of these downstream tasks and the results in our paper can be found in astroclip/downstream_tasks
.
This reposity uses datasets and contrastive augmentations from Stein, et al. (2022). The image pretraining is built on top of the DINOv2 framework; we also thank Piotr Bojanowski for valuable conversations around image pretraining.
AstroCLIP code and model weights are released under the MIT license. See LICENSE for additional details.
TODO
Thanks goes to these wonderful people (emoji key):
Liam Parker 💻 |
Francois Lanusse 💻 🔣 |
Siavash Golkar 💻 |
Leopoldo 💻 🔧 |
Shirley Ho 🤔 🔍 |
Miles Cranmer 🤔 🎨 |
This project follows the all-contributors specification. Contributions of any kind welcome!