diff --git a/.github/workflows/black-formatter.yml b/.github/workflows/black-formatter.yml
index d9cdd1cd..47fadb2e 100644
--- a/.github/workflows/black-formatter.yml
+++ b/.github/workflows/black-formatter.yml
@@ -16,4 +16,4 @@ jobs:
- uses: psf/black@stable
with:
options: "--check --verbose --diff"
- version: "~= 22.0"
+ version: "~= 22.0"
\ No newline at end of file
diff --git a/README.md b/README.md
index a4917577..01e50707 100644
--- a/README.md
+++ b/README.md
@@ -8,1000 +8,13 @@
# Pippin
-Pippin - a pipeline designed to streamline and remove as much hassle as we can
-when running end-to-end supernova cosmology analyses.
+Pippin - a pipeline designed to streamline and remove as much hassle as we can when running end-to-end supernova cosmology analyses.
-## Table of Contents
-
-
-
-- [Using Pippin](#using-pippin)
-- [Installing Pippin](#installing-it-fresh)
-- [Contributing to Pippin](#issues-and-contributing-to-pippin)
-- [Examples](#examples)
-- [FAQ](#faq)
-- [Tasks](#tasks)
- - [DataPrep](#data-preparation)
- - [Simulation](#simulation)
- - [Light Curve Fit](#light-curve-fit)
- - [Classification](#classification)
- - [Aggregation](#aggregation)
- - [Merging](#merging)
- - [Bias Corrections](#bias-corrections)
- - [Create Covariance](#create-covariance)
- - [CosmoFit](#cosmofit)
- - [Analyse](#analyse)
-- [Adding a new Task](#adding-a-new-task)
-- [Adding a new classifier](#adding-a-new-classifier)
-
-
-## Installing it fresh
-
-If you're using a pre-installed version of Pippin - like the one on Midway, ignore this.
-
-If you're not, installing Pippin is simple.
-
-1. Checkout Pippin
-2. Ensure you have the dependencies install (`pip install -r requirements.txt`) and that your python version is 3.7+.
-3. Celebrate
-
-There is no need to attempt to install Pippin like a package (no `python setup.py install`), just run from the clone.
-
-Now, Pippin also interfaces with other tasks: SNANA and machine learning classifiers mostly. I'd highly recommend
-running on a high performance computer with SNANA already installed, but if you want to take a crack at installing it,
-[you can find the docoumentation here](https://github.com/RickKessler/SNANA).
-
-I won't cover installing SNANA here, hopefully you already have it. But to install the classifiers, we'll take
-[SuperNNova](https://github.com/supernnova/SuperNNova) as an example. To install that, find a good place for it and:
-
-1. Checkout `https://github.com/SuperNNova/SuperNNova`
-2. Create a GPU conda env for it: `conda create --name snn_gpu --file env/conda_env_gpu_linux64.txt`
-3. Activate environment and install natsort: `conda activate snn_gpu` and `conda install --yes natsort`
-
-Then, in the Pippin global configuration file `cfg.yml` in the top level directory, ensure that the SNN path in Pippin is
-pointing to where you just cloned SNN into. You will need to install the other external software packages
-if you want to use them, and you do not need to install any package you do not explicitly request in a config file.
-
-## Using Pippin
-
-Using Pippin is very simple. In the top level directory, there is a `pippin.sh`. If you're on midway and use SNANA, this
-script will be on your path already. To use Pippin, all you need is a config file ready to go. I've got a bunch of mine and
-some general ones in the `configs` directory, but you can put yours wherever you want. I recommend adding your initials to the
-front of the file to make it obvious in the shared output directory which folders as yours.
-
-If you have `example.yml` as your config file and want pippin to run it, easy:
-`pippin.sh example.yml`
-
-The file name that you pass in should contain a run configuration. Note that this is different to the global software
-configuration file `cfg.yml`, and remember to ensure that your `cfg.yml` file is set up properly and that you know
-where you want your output to be
-installed. By default, I assume that the `$PIPPIN_OUTPUT` environment variable is set as the output location,
-so please either set said variable or change the associated line in the `cfg.yml`. [For the morbidly curious, here
-is a very small demo video of using Pippin in the Midway environment](https://www.youtube.com/watch?v=pCaPvzFCZ-Y).
-
-![ConsoleOutput](docs/_static/images/console.gif)
-
-
-### Creating your own configuration file
-
-Each configuration file is represented by a yaml dictionary linking each stage (see stage declaration section below) to
-a dictionary of tasks, the key being the unique name for the task and the value being its specific task configuration.
-
-For example, to define a configuration with two simulations and one light curve fitting task (resulting in 2 output simulations and
-2 output light curve tasks - one for each simulation), a user would define:
-
-```yaml
-SIM:
- SIM_NAME_1:
- SIM_CONFIG: HERE
- SIM_NAME_2:
- SIM_CONFIG: HERE
-
-LCFIT:
- LCFIT_NAME_1:
- LCFIT_CONFIG: HERE
-```
-
-How to configure each task is also detail below on a task-by-task basis, or you can see examples in the `examples`
- directory for each task.
-
-
-### What If I change my config file?
-
-Happens all the time, don't even worry about it. Just start Pippin again and run the file again. Pippin will detect
-any changes in your configuration by hashing all the input files to a specific task. So this means, even if you're
-config file itself doesn't change, changes to an input file it references (for example, the default DES simulation
-input file) would result in Pippin rerunning that task. If it cannot detect anything has changed, and if the task
-finished successfully the last time it was run, the task is not re-executed. You can force re-execution of tasks using the `-r` flag.
-
-
-### Command Line Arguments
-
-On top of this, Pippin has a few command line arguments, which you can detail with `pippin.sh -h`, but I'll also detail here:
-
-```bash
- -h Show the help menu
- -v, --verbose Verbose. Shows debug output. I normally have this option enabled.
- -r, --refresh Refresh/redo - Rerun tasks that completed in a previous run even if the inputs haven't changed.
- -c, --check Check that the input config is valid but don't actually run any tasks.
- -s, --start Start at this task and refresh everything after it. Number of string accepted
- -f, --finish Finish at this stage. For example -f 3 or -f CLASSIFY to run up to and including classification.
- -p, --permission Fix permissions and groups on all output, don't rerun
- -i, --ignore Do NOT regenerate/run tasks up to and including this stage.
- -S, --syntax If no task is given, prints out the possible tasks. If a task name or number is given, prints the docs on that task. For instance 'pippin.sh -S 0' and 'pippin.sh -S DATAPREP' will print the documentation for the DATAPREP task.
-```
-
-For an example, to have a verbose output configuration run and only do data preparation and simulation,
-you would run
-
-`pippin.sh -vf 1 configfile.yml`
-
-
-### Stages in Pippin
-
-You may have noticed above that each stage has a numeric idea for convenience and lexigraphical sorting.
-
-The current stages are:
-
-* `0, DATAPREP` Data preparation
-* `1, SIM`: Simulation
-* `2, LCFIT`: Light curve fitting
-* `3, CLASSIFY`: Classification (training and testing)
-* `4, AGG`: Aggregation (comparing classifiers)
-* `5, MERGE`: Merging (combining classifier and FITRES output)
-* `6, BIASCOR`: Bias corrections using BBC
-* `7, CREATE_COV`: Create input files needed for CosmoMC
-* `8, COSMOFIT`: Run CosmoMC and fit cosmology
-* `9, ANALYSE`: Create final output and plots. Includes output from CosmoMC, BBC and Light curve fitting.
-
-### Pippin on Midway
-
-On midway, sourcing the SNANA setup will add environment variables and Pippin to your path.
-
-Pippin itself can be found at `$PIPPIN`, output at `$PIPPIN_OUTPUT` (which goes to a scratch directory), and `pippin.sh` will automatically work from
-any location.
-
-Note that you only have 100 GB on scratch. If you fill that up and need to nuke some files, look both in `$SCRATCH_SIMDIR` to remove SNANA
-photometry and `$PIPPIN_OUTPUT` to remove Pippin's output. I'd recommend adding this to your `~/.bashrc` file to scan through directories you own and
-calculate directory size so you know what's taking the most space. After adding this and sourcing it, just put `dirusage` into the terminal
-in both of those locations and see what's eating your quota.
-
-```bash
-function dirusage {
- for file in $(ls -l | grep $USER | awk '{print $NF}')
- do
- du -sh "$file"
- done
-}
-```
-
-### Pippin on Perlmutter
-
-On perlmutter, add `source /global/cfs/cdirs/lsst/groups/TD/setup_td.sh` to your `~/.bashrc` to load all the relevant paths and environment variables.
-
-This will add the `$PIPPIN_DIR` path for Pippin source code, and `$PIPPIN_OUTPUT` for the output of Pippin jobs. Additionally `pippin.sh` can be run from any directory.
-
-To load the perlmutter specific `cfg.yml` you must add the following to the start of your Pippin job:
-```yaml
-GLOBAL:
- CFG_PATH: $SNANA_LSST_ROOT/starterKits/pippin/cfg_lsst_perlmutter.yml
-```
-
-## Issues and Contributing to Pippin
-
-Contributing to Pippin or raising issues is easy. Here are some ways you can do it, in order of preference:
-
-1. Submit an [issue on Github](https://github.com/samreay/Pippin), and then submit a pull request to fix that issue.
-2. Submit an [issue on Github](https://github.com/samreay/Pippin), and then wait until I have time to look at it. Hopefully thats quickly, but no guarantees.
-3. Email me with a feature request
-
-If you do want to contribute code, fantastic. [Please note that all code in Pippin is subject to the Black formatter](https://black.readthedocs.io/en/stable/).
-I would recommend installing this yourself because it's a great tool.
-
-
-## Examples
-
-If you want detailed examples of what you can do with Pippin tasks, have a look in the [examples directory](https://github.com/dessn/Pippin/tree/master/examples),
-pick the task you want to know more about, and have a look over all the options.
-
-Here is a very simple configuration file which runs a simulation, does light curve fitting, and then classifies it using the
-debug FITPROB classifier.
-
-```yaml
-SIM:
- DESSIM:
- IA_G10_DES3YR:
- BASE: surveys/des/sim_ia/sn_ia_salt2_g10_des3yr.input
-
-LCFIT:
- BASEDES:
- BASE: surveys/des/lcfit_nml/des_5yr.nml
-
-CLASSIFICATION:
- FITPROBTEST:
- CLASSIFIER: FitProbClassifier
- MODE: predict
-```
-
-You can see that unless you specify a `MASK` on each subsequent task, Pippin will generally try and run everything on everything. So if you have two
-simulations defined, you don't need two light curve fitting tasks, Pippin will make one light curve fit task for each simulation, and then two classification tasks,
-one for each light curve fit task.
-
-### Anchoring in YAML files
-
-If you are finding that your config files contain lots of duplicated sections (for example, many simulations configured
-almost the same way but with one differnece), consider using YAML anchors. [See this blog post](https://blog.daemonl.com/2016/02/yaml.html)
-for more detail. You can define your anchors in the main config section, or add a new section (like SIM, LCFIT, CLASSIFICATION). So long as it doesn't
-match a Pippin keyword for each stage, you'll be fine. I recommend `ANCHORS:` at the top of the file, all of those will work.
-
-
-## FAQ
-
-**Pippin is crashing on some task and the error message isn't useful**
-
-Feel free to send me the log and stack, and I'll see what I can do turn the exception into something
-more human-readable.
-
-**I want Pippin to run after I log out**
-
-Rather than redirecting Pippin output to a file or running it in the background, I *highly recommend* you run
-Pippin in a `screen` session.
-
-For example, if you are doing machine-learning testing, you may create a new screen session called `ml`
-by running `screen -S ml`. It will then launch a new instance of bash for you to play around in. conda **will not work out of the box**. To make
-it work again, run `conda deactivate` and then `conda activate`, and you can check this works by running `which python` and
-verifying its pointing to the miniconda install. You can then run Pippin as per normal: `pippin.sh -v your_job.yml` and get the coloured output.
-To leave the screen session, but *still keep Pippin running even after you log out*, press `Ctrl-A, Ctrl-D`. As in one, and then the other, not `Ctrl-A-D`.
-This will detach from your screen session but keep it running. Just going `Ctrl_D` will disconnect and shut it down. To get back into your screen session,
-simply run `screen -r ml` to reattach. You can see your screen
-sessions using `screen -ls`.
-
-You may notice if you log in and out of midway that your screen sessions might not show up. This is because midway has multiple head nodes, and
-your screen session exists only on one of them. This is why when I ssh to midway I specify a specific login node instead
-of being assigned one. To make it simpler, I'd recommend setting up
-an alias like so to either `login1` or `login2`:
-
-```yaml
-alias sshmidway="ssh username@midway2-login1.rcc.uchicago.edu"
-```
-
-**I want to modify a ton of files but don't want huge yml files, please help**
-
-You can modify input files and put them in a directory you own, and then tell Pippin to look there
-(in addition to the default location) when its constructing your tasks. To do this, see [this example here](https://github.com/dessn/Pippin/blob/master/examples/global.yml),
-or use this code snippet at the top of your YAML file (not that it matters if it's at the top):
-
-```yaml
-GLOBAL:
- DATA_DIRS:
- - /some/new/directory/with/your/files/in/it
-```
-
-**I want to use a different cfg.yml file!**
-
-```yaml
-GLOBAL:
- CFG_PATH: /your/path/here
-```
-**Stop rerunning my sims!**
-
-For big biascor sims it can be frustrating if you're trying to tweak biascor or later stages and sims kick off
-because of some trivial change. So use the `--ignore` ro `-i` command to ignore any undone tasks or tasks with
-hash disagreements in previous stages. To clarify, even tasks that do not have a hash, and have never been submitted, will
-not be run if that stage is set to be ignored.
-
-**I don't want to run this massive jobs again! Let me use external results!**
-
-Good news, everyone! Not only is there a dedicated config file for globally useful tasks, but its easier than ever to slow them
-into your existing jobs. For useful precomputed work, such as biascor sims and trained machine learning classifiers, check out `$PIPPIN_OUTPUT/GLOBAL`.
-
-For an example on how to use these results, check out the reference 5YR analysis `ref_des_5yr.yml`. There are in essense two ways of
-including external tasks. Both operate the same way, one is just a bit more explicit than the other. The explicit way is when adding
-a task that is an *exact* replica of an external task, you can just add the `EXTERNAL` keyword. For example, in the reference 5YR analysis,
-all the biascor sims are precomputed, so we can define them as external tasks like this:
-
-```yaml
-SIM:
- DESSIMBIAS5YRIA_C11: # A SIM task we don't want to rerun
- EXTERNAL: $PIPPIN_OUTPUT/GLOBAL/1_SIM/DESSIMBIAS5YRIA_C11 # The path to a matching external SIM task, which is already finished
- DESSIMBIAS5YRIA_G10:
- EXTERNAL: $PIPPIN_OUTPUT/GLOBAL/1_SIM/DESSIMBIAS5YRIA_G10
- DESSIMBIAS5YRCC:
- EXTERNAL: $PIPPIN_OUTPUT/GLOBAL/1_SIM/DESSIMBIAS5YRCC
-```
-
-In this case, we use the `EXTERNAL` keyword because each of the three defined tasks can only be associated with one, and only one, `EXTERNAL` task. Because `EXTERNAL` tasks are one-to-one with a defined task, the name of the defined task, and the `EXTERNAL` task do not need to match.
-
-Suppose we don't want to recompute the light curve fits. After all, most of the time we're not changing that step anyway! However, unlike `SIM`, `LCFIT` runs multiple sub-tasks - one for each `SIM` task you are performing lightcurve fitting on.
-
-```yaml
-LCFIT:
- D: # An LCFIT task we don't want to rerun
- BASE: surveys/des/lcfit_nml/des_5yr.nml
- MASK: DESSIM # Selects a subset of SIM tasks to run lightcurve fitting on
- # In this case, the SIM tasks are DESSIMBIAS5YRIA_C11, DESSIMBIAS5YRIA_G10, and DESSIMBIAS5YRCC
- EXTERNAL_DIRS:
- - $PIPPIN_OUTPUT/GLOBAL/2_LCFIT/D_DESSIMBIAS5YRIA_C11 # Path to a previously run LCFIT sub-task
- - $PIPPIN_OUTPUT/GLOBAL/2_LCFIT/D_DESSIMBIAS5YRIA_G10
- - $PIPPIN_OUTPUT/GLOBAL/2_LCFIT/D_DESSIMBIAS5YRCC
-```
-
-That is, we have one `LCFIT` task, but because we have three sims going into it and matching the mask, we can't point to a single `EXTERNAL` task. Instead, we provide an external path for each sub-task, as defined in `EXTERNAL_DIRS`. The name of each external sub-task must exactly match the `LCFIT` task name, and the `SIM` sub-task name. For example, the path to the `DESSIMBIAS5YRIA_C11` lightcurve fits, must be `D_DESSIMBIAS5YRIA_C11`.
-
-Note that you still need to point to the right base file, because Pippin still wants those details. It won't be submitted anywhere though, just loaded in.
-
-To use `EXTERNAL_DIRS` on pre-computed tasks that don't follow your current naming scheme (i.e the `LCFIT` task name, or the `SIM` sub-task names differ), you can make use of `EXTERNAL_MAP` to provide a mapping between the `EXTERNAL_DIR` paths, and each `LCFIT` sub-task.
-
-```yaml
-LCFIT:
- D: # An LCFIT task we don't want to rerun
- BASE: surveys/des/lcfit_nml/des_5yer.nml
- MASK: DESSIM # Selects a subset of SIM tasks to run lightcurve fitting on
- EXTERNAL_DIRS: # Paths to external LCFIT tasks, which do not have an exact match with this task
- - $PIPPIN_OUTPUT/EXAMPLE_C11/2_LCFIT/DESFIT_SIM
- - $PIPPIN_OUTPUT/EXAMPLE_G10/2_LCFIT/DESFIT_SIM
- - $PIPPIN_OUTPUT/EXAMPLE/2_LCFIT/DESFIT_CCSIM
- EXTERNAL_MAP:
- # LCFIT_SIM: EXTERNAL_MASK
- D_DESSIMBIAS5YRIA_C11: EXAMPLE_C11 # In this case we are matching to the pippin job name, as the LCFIT task name is shared between two EXTERNAL_DIRS
- D_DESSIMBIAS5YRIA_G10: EXAMPLE_G10 # Same as C11
- D_DESSIMBIAS5YRCC: DESFIT_CCSIM # In this case we match to the LCFIT task name, as the pippin job name (EXAMPLE) would match with the other EXTERNAL_DIRS
-```
-
-The flexibility of `EXTERNAL_DIRS` means you can mix both precomputed and non-precomputed tasks together. Take this classificaiton task:
-
-```yaml
-CLASSIFICATION:
- SNNTEST:
- CLASSIFIER: SuperNNovaClassifier
- MODE: predict
- OPTS:
- MODEL: $PIPPIN_OUTPUT/GLOBAL/3_CLAS/SNNTRAIN_DESTRAIN/model.pt
- EXTERNAL_DIRS:
- - $PIPPIN_OUTPUT/GLOBAL/3_CLAS/SNNTEST_DESSIMBIAS5YRIA_C11_SNNTRAIN_DESTRAIN
- - $PIPPIN_OUTPUT/GLOBAL/3_CLAS/SNNTEST_DESSIMBIAS5YRIA_G10_SNNTRAIN_DESTRAIN
- - $PIPPIN_OUTPUT/GLOBAL/3_CLAS/SNNTEST_DESSIMBIAS5YRCC_SNNTRAIN_DESTRAIN
-```
-
-It will load in the precomputed classification results for the biascor sims, and then also run and generate classification results on any other
-simulation tasks (such as running on the data) using the pretrained model `model.pt`.
-
-Finally, the way this works under the hood is simple - it copies the directory over explicitly. And it will only copy once, so if you want the
-"latest version" just ask the task to refresh (or delete the folder). Once it copies it, there is no normal hash checking,
-it reads in the `config.yml` file created by the task in its initial run and powers onwards.
-
-If you have any issues using this new feature, check out the `ref_des_5yr.yml` file or flick me a message.
-
-## Tasks
-
-Pippin is essentially a wrapper around many different tasks. In this section,
-I'll try and explain how tasks are related to each other, and what each task is.
-
-As a general note, most tasks have an `OPTS` where most details go. This is partially historical, but essentially properties
-that Pippin uses to determine how to construct tasks (like `MASK`, classification mode, etc) are top level, and the Task itself gets passed everything
-inside `OPTS` to use however it wants.
-
-[//]: # (Start of Task specification)
-
-### Data Preparation
-
-The DataPrep task is simple - it is mostly a pointer for Pippin towards an external directory that contains
-some photometry, to say we're going to make use of it. Normally this means data files,
-though you can also use it to point to simulations that have already been run to save yourself
-the hassle of rerunning them. The other thing the DataPrep task will do is run the new
-method of determining a viable initial guess for the peak time, which will be used by the light curve fitting task down the road.
-The full options available for the DataPrep task are:
-
-```yaml
-DATAPREP:
- SOMENAME:
- OPTS:
-
- # Location of the photometry files
- RAW_DIR: $DES_ROOT/lcmerge/DESALL_forcePhoto_real_snana_fits
-
- # Specify which types are confirmed Ia's, confirmed CC or unconfirmed. Used by ML down the line
- TYPES:
- IA: [101, 1]
- NONIA: [20, 30, 120, 130]
-
- # Blind the data. Defaults to True if SIM:True not set
- BLIND: False
-
- # Defaults to False. Important to set this flag if analysing a sim in the same way as data, as there
- # are some subtle differences
- SIM: False
-
- # The method of estimating peak mjd values. Don't ask me what numbers mean what, ask Rick.
- OPT_SETPKMJD: 16
+![A Really Funny Meme](docs/_static/images/meme.jpg)
-```
-
-### Simulation
-
-The simulation task does exactly what you'd think it does. It invokes [SNANA](https://github.com/RickKessler/SNANA) to run some similation as per your configuration.
-If something goes wrong, Pippin tries to dig through the log files to give you a useful error message, but sometimes this
-is difficult (i.e. the logs have been zipped up). With the current version of SNANA, each simulation can have at most one Ia component,
-and an arbitrary number of CC components. The specification for the simulation task config is as follows:
-
-```yaml
-SIM:
- SOMENAMEHERE:
-
- # We specify the Ia component, so it must have IA in its name
- IA_G10:
- BASE: surveys/des/sims_ia/sn_ia_salt2_g10_des5yr.input # And then we specify the base input file which generates it.
-
- # Now we can specify as many CC sims to mix in as we want
- II_JONES:
- BASE: surveys/des/sims_cc/sn_collection_jones.input
-
- IAX:
- BASE: surveys/des/sims_cc/sn_iax.input
- DNDZ_ALLSCALE: 3.0 # Note you can add/overwrite keys like so for specific files
-
- # This section will apply to all components of the sim
- GLOBAL:
- NGEN_UNIT: 1
- RANSEED_REPEAT: 10 12345
-```
-
-### Light Curve Fit
-
-This task runs the SALT2 light curve fitting process on light curves from the simulation or DataPrep task. As above,
-if something goes wrong I try and give a good reason why, if you don't get a good reason, let me know. The task is
-specified like so:
-
-```yaml
-LCFIT:
- SOMENAMEHERE:
- # MASK means only apply this light curve fitting on sims/Dataprep which have DES in the name
- # You can also specify a list for this, and they will be applied as a logical or
- MASK: DES
-
- # The base nml file used
- BASE: surveys/des/lcfit_nml/des.nml
-
- # FITOPTS can be left out for nothing, pointed to a file, specified manually or a combination of the two
- # Normally this would be a single entry like global.yml shown below, but you can also pass a list
- # If you specify a FITOPT manually, make sure it has the / around the label
- # And finally, if you specify a file, make sure its a yml dictionary that links a survey name to the correct
- # fitopts. See the file below for an example
- FITOPTS:
- - surveys/global/lcfit_fitopts/global.yml
- - "/custom_extra_fitopt/ REDSHIFT_FINAL_SHIFT 0.0001"
-
- # We can optionally customise keys in the FITINP section
- FITINP:
- FILTLIST_FIT: 'gri'
-
- # And do the same for the optional SNLCINP section
- SNLCINP:
- CUTWIN_SNRMAX: 3.0, 1.0E8
- CUTWIN_NFILT_SNRMAX: 3.0, 99.
-
- # Finally, options that go outside either of these sections just go in the generic OPTS
- OPTS:
- BATCH_INFO: sbatch $SBATCH_TEMPLATES/SBATCH_Midway2_1hr.TEMPLATE 10
-```
-
-### Classification
-
-Within Pippin, there are many different classifiers implemented. Most classifiers need to be trained, and
-can then run in predict mode. All classifiers that require training can either be trained in the same yml
-file, or you can point to an external serialised instance of the trained class and use that. The general syntax
-for a classifier is:
-
-```yaml
-CLASSIFICATION:
- SOMELABEL:
- CLASSIFIER: NameOfTheClass
- MODE: train # or predict
- MASK: mask # Masks both sim and lcfit together, logical and, optional
- MASK_SIM: sim_only_mask
- MASK_FIT: lcfit_only_mask
- COMBINE_MASK: [SIM_IA, SIM_CC] # optional mask to combine multiple sim runs into one classification job (e.g. separate CC and Ia sims). NOTE: currently not compatible with SuperNNova/SNIRF
- OPTS:
- MODEL: file_or_label # only needed in predict mode, how to find the trained classifier
- OPTIONAL_MASK: opt_mask # mask for optional dependencies. Not all classifiers make use of this
- OPTIONAL_MASK_SIM: opt_sim_only_mask # mask for optional sim dependencies. Not all classifiers make use of this
- OPTIONAL_MASK_FIT: opt_lcfit_only_mask # mask for optional lcfit dependencies. Not all classifiers make use of this
- WHATREVER_THE: CLASSIFIER_NEEDS
-```
-
-#### SCONE Classifier
-
-The [SCONE classifier](https://github.com/helenqu/scone) is a convolutional neural network-based classifier for supernova photometry. The model first creates "heatmaps" of flux values in wavelength-time space, then runs the neural network model on GPU (if available) to train or predict on these heatmaps. A successful run will produce `predictions.csv`, which shows the Ia probability of each SN. For debugging purposes, the model config (`model_config.yml`), Slurm job (`job.slurm`), log (`output.log`), and all the heatmaps (`heatmaps/`) can be found in the output directory. An example of how to define a SCONE classifier:
-
-```yaml
-CLASSIFICATION:
- SCONE_TRAIN: # Helen's CNN classifier
- CLASSIFIER: SconeClassifier
- MODE: train
- OPTS:
- GPU: True # OPTIONAL, default: False
- # HEATMAP CREATION OPTS
- CATEGORICAL: True # OPTIONAL, binary or categorical classification, default: False
- NUM_WAVELENGTH_BINS: 32 # OPTIONAL, heatmap height, default: 32
- NUM_MJD_BINS: 180 # OPTIONAL, heatmap width, default: 180
- REMAKE_HEATMAPS: False # OPTIONAL, SCONE does not remake heatmaps unless the 3_CLAS/heatmaps subdir doesn't exist or if this param is true, default: False
- # MODEL OPTS
- NUM_EPOCHS: 400 # REQUIRED, number of training epochs
- IA_FRACTION: 0.5 # OPTIONAL, desired Ia fraction in train/validation/test sets for binary classification, default: 0.5
-
- SCONE_PREDICT: # Helen's CNN classifier
- CLASSIFIER: SconeClassifier
- MODE: predict
- OPTS:
- GPU: True # OPTIONAL, default: False
- # HEATMAP CREATION OPTS
- CATEGORICAL: True # OPTIONAL, binary or categorical classification, default: False
- NUM_WAVELENGTH_BINS: 32 # OPTIONAL, heatmap height, default: 32
- NUM_MJD_BINS: 180 # OPTIONAL, heatmap width, default: 180
- REMAKE_HEATMAPS: False # OPTIONAL, SCONE does not remake heatmaps unless the 3_CLAS/heatmaps subdir doesn't exist or if this param is true, default: False
- # MODEL OPTS
- MODEL: "/path/to/trained/model" # REQUIRED, path to trained model that should be used for prediction
- IA_FRACTION: 0.5 # OPTIONAL, desired Ia fraction in train/validation/test sets for binary classification, default: 0.5
-```
-
-#### SuperNNova Classifier
-
-The [SuperNNova classifier](https://github.com/supernnova/SuperNNova) is a recurrent neural network that
-operates on simulation photometry. It has three in vuilt variants - its normal (vanilla) mode, a Bayesian mode
-and a Variational mode. After training, a `model.pt` can be found in the output directory,
-which you can point to from a different yaml file. You can define a classifier like so:
-
-```yaml
-CLASSIFICATION:
- SNN_TEST:
- CLASSIFIER: SuperNNovaClassifier
- MODE: predict
- GPU: True # Or False - determines which queue it gets sent into
- CLEAN: True # Or false - determine if Pippin removes the processed folder to sae space
- OPTS:
- MODEL: SNN_TRAIN # Havent shown this defined. Or /somepath/to/model.pt
- VARIANT: vanilla # or "variational" or "bayesian". Defaults to "vanilla"
- REDSHIFT: True # What redshift info to use when classifying. Defaults to 'zspe'. Options are [True, False, 'zpho', 'zspe', or 'none']. True and False are legacy options which map to 'zspe', and 'none' respectively.
- NORM: cosmo_quantile # How to normalise LCs. Other options are "perfilter", "cosmo", "global" or "cosmo_quantile".
- CYCLIC: True # Defaults to True for vanilla and variational model
- SEED: 0 # Sets random seed. Defaults to 0.
- LIST_FILTERS: ['G', 'R', 'I', 'Z'] # What filters are present in the data, defaults to ['g', 'r', 'i', 'z']
- SNTYPES: "/path/to/sntypes.txt" # Path to a file which lists the sn type mapping to be used. Example syntax for this can be found at https://github.com/LSSTDESC/plasticc_alerts/blob/main/Examples/plasticc_schema/elasticc_origmap.txt. Alternatively, yaml dictionaries can be used to specify each sn type individually.
-```
-
-Pippin also allows for supernnova input yaml files to be passed, instead of having to define all of the options in the Pippin input yaml. This is done via:
-
-```yaml
-OPTS:
- DATA_YML: path/to/data_input.yml
- CLASSIFICATION_YML: path/to/classification_input.yml
-```
-
-Example input yaml files can be found [here](https://github.com/supernnova/SuperNNova/tree/master/configs_yml), with the important variation that you must have:
-
-```yaml
-raw_dir: RAW_DIR
-dump_dir: DUMP_DIR
-done_file: DONE_FILE
-```
-
-So that Pippin can automatically replace these with the appropriate directories.
-
-#### SNIRF Classifier
-
-The [SNIRF classifier](https://github.com/evevkovacs/ML-SN-Classifier) is a random forest running off SALT2 summary
-statistics. You can specify which features it gets to train on, which has a large impact on performance. After training,
-there should be a `model.pkl` in the output directory. You can specify one like so:
-
-```yaml
-CLASSIFICATION:
- SNIRF_TEST:
- CLASSIFIER: SnirfClassifier
- MODE: predict
- OPTS:
- MODEL: SNIRF_TRAIN
- FITOPT: some_label # Optional FITOPT to use. Match the label. Defaults to no FITOPT
- FEATURES: x1 c zHD x1ERR cERR PKMJDERR # Columns to use. Defaults are shown. Check FITRES for options.
- N_ESTIMATORS: 100 # Number of trees in forest
- MIN_SAMPLES_SPLIT: 5 # Min number of samples to split a node on
- MIN_SAMPLES_LEAF: 1 # Minimum number samples in leaf node
- MAX_DEPTH: 0 # Max depth of tree. 0 means auto, which means as deep as it wants.
-```
-
-#### Nearest Neighbour Classifier
-
-Similar to SNIRF, NN trains on SALT2 summary statistics using a basic Nearest Neighbour algorithm from sklearn.
-It will produce a `model.pkl` file in its output directory when trained. You can configure it as per SNIRF:
-
-
-```yaml
-CLASSIFICATION:
- NN_TEST:
- CLASSIFIER: NearestNeighborPyClassifier
- MODE: predict
- OPTS:
- MODEL: NN_TRAIN
- FITOPT: some_label # Optional FITOPT to use. Match the label. Defaults to no FITOPT
- FEATURES: zHD x1 c cERR x1ERR COV_x1_c COV_x1_x0 COV_c_x0 PKMJDERR # Columns to use. Defaults are shown.
-```
-
-#### Perfect Classifier
-
-Sometimes you want to cheat, and if you have simulations, this is easy. The perfect classifier looks into the sims to
-get the actual type, and will then assign probabilities as per your configuration. This classifier has no training mode,
-only predict.
-
-```yaml
-CLASSIFICATION:
- PERFECT:
- CLASSIFIER: PerfectClassifier
- MODE: predict
- OPTS:
- PROB_IA: 1.0 # Probs to use for Ia events, default 1.0
- PROB_CC: 0.0 # Probs to use for CC events, default 0.0
-```
-
-#### Unity Classifier
-
-To emulate a spectroscopically confirmed sample, or just to save time, we can assign every event a probability of 1.0
-that it is a type Ia. As it just returns 1.0 for everything, it only has a predict mode
-
-```yaml
-CLASSIFICATION:
- UNITY:
- CLASSIFIER: UnityClassifier
- MODE: predict
-```
-
-#### FitProb Classifier
-
-Another useful debug test is to just take the SALT2 fit probability calculated from the chi2 fitting and use that
-as our probability. You'd hope that classifiers all improve on this. Again, this classifier only has a predict mode.
-
-```yaml
-CLASSIFICATION:
- FITPROBTEST:
- CLASSIFIER: FitProbClassifier
- MODE: predict
-```
-
-### Aggregation
-
-The aggregation task takes results from one or more classification tasks (that have been run in predict mode
-on the same dataset) and generates comparisons between the classifiers (their correlations, PR curves, ROC curves
-and their calibration plots). Additionally, it merges the results of the classifiers into a single
-csv file, mapping SNID to one column per classifier.
-
-```yaml
-AGGREGATION:
- SOMELABEL:
- MASK: mask # Match sim AND classifier
- MASK_SIM: mask # Match only sim
- MASK_CLAS: mask # Match only classifier
- RECALIBRATION: SIMNAME # Optional, use this simulation to recalibrate probabilities. Default no recal.
- # Optional, changes the probability column name of each classification task listed into the given probability column name.
- # Note that this will crash if the same classification task is given multiple probability column names.
- # Mostly used when you have multiple photometrically classified samples
- MERGE_CLASSIFIERS:
- PROB_COLUMN_NAME: [CLASS_TASK_1, CLASS_TASK_2, ...]
- OPTS:
- PLOT: True # Default True, make plots
- PLOT_ALL: False # Default False. Ie if RANSEED_CHANGE gives you 100 sims, make 100 set of plots.
-```
-
-### Merging
-
-The merging task will take the outputs of the aggregation task, and put the probabilities from each classifier
-into the light curve fit results (FITRES files) using SNID.
-
-```yaml
-MERGE:
- label:
- MASK: mask # partial match on all sim, fit and agg
- MASK_SIM: mask # partial match on sim
- MASK_FIT: mask # partial match on lcfit
- MASK_AGG: mask # partial match on aggregation task
-```
-
-### Bias Corrections
-
-With all the probability goodness now in the FITRES files, we can move onto calculating bias corrections.
-For spec-confirmed surveys, you only need a Ia sample for bias corrections. For surveys with contamination,
-you will also need a CC only simulation/lcfit result. For each survey being used (as we would often combine lowz and highz
-surveys), you can specify inputs like below.
-
-Note that I expect this task to have the most teething issues, especially when we jump into the MUOPTS.
-
-```yaml
-BIASCOR:
- LABEL:
- # The base input file to utilise
- BASE: surveys/des/bbc/bbc.input
-
- # The names of the lcfits_data/simulations going in. List format please. Note LcfitLabel_SimLabel format
- DATA: [DESFIT_DESSIM, LOWZFIT_LOWZSIM]
-
- # Input Ia bias correction simulations to be concatenated
- SIMFILE_BIASCOR: [DESFIT_DESBIASCOR, LOWZFIT_LOWZBIASCOR]
-
- # Optional, specify FITOPT to use. Defaults to 0 for each SIMFILE_BIASCOR. If using this option, you must specify a FITOPT for each SIMFILE_BIASCOR
- SIMFILE_BIASCOR_FITOPTS: [0, 1] # FITOPT000 and FITOPT001
-
- # For surveys that have contamination, add in the cc only simulation under CCPRIOR
- SIMFILE_CCPRIOR: DESFIT_DESSIMBIAS5YRCC
-
- # Optional, specify FITOPT to use. Defaults to 0 for each SIMFILE_CCPRIOR. If using this option, you must specify a FITOPT for each SIMFILE_CCPRIOR
- SIMFILE_CCPRIOR_FITOPTS: [0, 1] # FITOPT000 and FITOPT001
-
-
- # Which classifier to use. Column name in FITRES will be determined from this property.
- # In the case of multiple classifiers this can either be
- # 1. A list of classifiers which map to the same probability column name (as defined by MERGE_CLASSIFIERS in the AGGREGATION stage)
- # 2. A probability column name (as defined by MERGE_CLASSIFIERS in the AGGREGATION stage)
- # Note that this will crash if the specified classifiers do not map to the same probability column.
- CLASSIFIER: UNITY
-
- # Default False. If multiple sims (RANSEED_CHANGE), make one or all Hubble plots.
- MAKE_ALL_HUBBLE: False
-
- # Defaults to False. Will load in the recalibrated probabilities, and crash and burn if they dont exist.
- USE_RECALIBRATED: True
-
- # Defaults to True. If set to True, will rerun biascor twice, removing any SNID that got dropped in any FITOPT/MUOPT
- CONSISTENT_SAMPLE: False
-
-
- # We can also specify muopts to add in systematics. They share the structure of the main biascor definition
- # You can have multiple, use a dict structure, with the muopt name being the key
- MUOPTS:
- C11:
- SIMFILE_BIASCOR: [D_DESBIASSYS_C11, L_LOWZBIASSYS_C11]
- SCALE: 0.5 # Defaults to 1.0 scale, used by CREATE_COV to determine covariance matrix contribution
-
- # Generic OPTS that can modify the base file and overwrite properties
- OTPS:
- BATCH_INFO: sbatch $SBATCH_TEMPLATES/SBATCH_Midway2_1hr.TEMPLATE 10
-```
-
-For those that generate large simulations and want to cut them up into little pieces, you want the `NSPLITRAN` syntax.
-The configuration below will take the inputs and divide them into 10 samples, which will then propagate to 10 CosmoMC runs
-if you have a CosmoMC task defined.
-
-```yaml
-BIASCOR:
- LABEL:
- BASE: surveys/des/bbc/bbc_3yr.input
- DATA: [D_DES_G10]
- SIMFILE_BIASCOR: [D_DESSIMBIAS3YRIA_G10]
- PROB_COLUMN_NAME: some_column_name # optional instead of CLASSIFIER
- OPTS:
- NSPLITRAN: 10
-```
-
-### Create Covariance
-
-Assuming the biascor task hasn't died, its time to prep for CosmoMC. To do this, we invoke a script from Dan originally
-(I think) that essentially creates all the input files and structure needed by CosmoMC. It provides a way of scaling
-systematics, and determining which covariance options to run with.
-
-```yaml
-CREATE_COV:
- SOMELABEL:
- MASK: some_biascor_task
- OPTS:
- INI_DIR: /path/to/your/own/dir/of/cosmomc/templates # Defaults to cosmomc_templates, which you can exploit using DATA_DIRS
- SYS_SCALE: surveys/global/lcfit_fitopts/global.yml # Location of systematic scaling file, same as the FITOPTS file.
- SINGULAR_BLIND: False # Defaults to False, whether different contours will have different shifts applied
- BINNED: True # Whether to bin the SN or not for the covariance matrx. Defaults to True
- REBINNED_X1: 2 # Rebin x1 into 2 bins
- REBINNED_C: 4 # Rebin c into 4 bins
- SUBTRACT_VPEC: False # Subtract VPEC contribution to MUERR if True. Used when BINNED: False
- FITOPT_SCALES: # Optional
- FITOPT_LABEL: some_scale # Note this is a partial match, ie SALT2: 1.0 would apply to all SALT2 cal fitopts
- MUOPT_SCALES:
- MUOPT_LABEL: some_scale # This is NOT a partial match, must be exact
- COVOPTS: # Optional, and you'll always get an 'ALL' covopt. List format please
- - "[NOSYS] [=DEFAULT,=DEFAULT]" # This syntax is explained below
-```
-
-If you don't specify `SYS_SCALE`, Pippin will search the LCFIT tasks from the BIASCOR dependency and if all LCFIT tasks
-have the same fitopt file, it will use that.
-
-The `COVOPTS` section is a bit odd. In the square brackets first, we have the label that will be assigned and used
-in the plotting output later. The next set of square backets is a two-tuple, and it applies to `[fitopts,muopts]` in
-that order. For example, to get four contours out of CosmoMC corresponding to all uncertainty, statistics only,
-statistics + calibration uncertainty, and fitopts + C11 uncertainty, we could set:
-
-```yaml
-COVOPTS:
- - "[NOSYS] [=DEFAULT,=DEFAULT]"
- - "[CALIBRATION] [+cal,=DEFAULT]"
- - "[SCATTER] [=DEFAULT,=C11]"
-```
-
-### CosmoFit
-
-CosmoFit is a generic cosmological fitting task, which allows you to choose between different fitters.
-The syntax is very simple:
-```yaml
-COSMOFIT:
- COSMOMC:
- SOMELABEL:
- # CosmoMC options
- WFIT:
- SOMEOTHERLABEL:
- # WFit options
-```
-
-#### CosmoMC
-
-Launching CosmoMC is hopefully fairly simple. There are a list of provided configurations under the `cosmomc_templates`
-directory (inside `data_files`), and the main job of the user is to pick which one they want.
-
-```yaml
-COSMOFIT:
- COSMOMC:
- SOMELABEL:
- MASK_CREATE_COV: mask # partial match
- OPTS:
- INI: sn_cmb_omw # should match the filename of an ini file
- NUM_WALKERS: 8 # Optional, defaults to eight.
-
- # Optional, covopts from CREATE_COV step to run against. If blank, you get them all. Exact matching.
- COVOPTS: [ALL, NOSYS]
-```
-
-#### WFit
-
-Launching WFit simply requires providing the command line options you want to use for each fit.
-```yaml
-COSMOFIT:
- WFIT:
- SOMELABEL:
- MASK: mask # partial match
- OPTS:
- BATCH_INFO: sbatch path/to/SBATCH.TEMPLATE 10 # Last number is the number of cores
- WFITOPT_GLOBAL: "-hsteps 61 -wsteps 101 -omsteps 81" # Optional, will apply these options to all fits"
- WFITOPTS:
- - /om_pri/ -ompri 0.31 -dompri 0.01 # At least one option is required. The name in the /'s is a human readable label
- - /cmb_pri/ -cmb_sim -sigma_Rcmb 0.007 # Optionally include as many other fitopts as you want.
-
-```
-
-### Analyse
-
-The final step in the Pippin pipeline is the Analyse task. It creates a final output directory, moves relevant files into it,
-and generates extra plots. It will save out compressed CosmoMC chains and the plotting scripts (so you can download
-the entire directory and customise it without worrying about pointing to external files), it will copy in Hubble diagrams,
-and - depending on if you've told it to, will make histogram comparison plots between data and sim. Oh and also
-redshift evolution plots. The scripts which copy/compress/rename external files into the analyse directory are generally
-named `parse_*.py`. So `parse_cosmomc.py` is the script which finds, reads and compresses the MCMC chains from CosmoMC into
-the output directory. Then `plot_cosmomc.py` reads those compressed files to make the plots.
-
-Cosmology contours will be blinded when made by looking at the BLIND flag set on the data. For data, this defaults to
-True.
-
-Note that all the plotting scripts work the same way - `Analyse` generates a small yaml file pointing to all the
-resources called `input.yml`, and each script uses the same file to make different plots. It is thus super easy to add your own
-plotting code scripts, and you can specify arbitrary code to execute using the `ADDITIONAL_SCRIPTS` keyword in opts.
-Just make sure your code takes `input.yml` as an argument. As an example, to rerun the CosmoMC plots, you'd simply have to
-run `python plot_cosmomc.py input.yml`.
-
-```yaml
-ANALYSE:
- SOMELABEL:
- MASK_COSMOFIT: mask # partial match
- MASK_BIASCOR: mask # partial match
- MASK_LCFIT: [D_DESSIM, D_DATADES] # Creates histograms and efficiency based off the input LCFIT_SIMNAME matches. Optional
- OPTS:
- COVOPTS: [ALL, NOSYS] # Optional. Covopts to match when making contours. Single or list. Exact match.
- SHIFT: False # Defualt False. Shift all the contours on top of each other
- PRIOR: 0.01 # Default to None. Optional normal prior around Om=0.3 to apply for sims if wanted.
- ADDITIONAL_SCRIPTS: /somepath/to/your/script.py # Should take the input.yml as an argument
-```
-
-[//]: # (End of Task specification)
-
-![Developer Documentation Below](docs/_static/images/developer.jpg)
-
-
-## Coding style
-
-Please, for the love of god, don't code this up in vim/emacs on a terminal connection. Use a proper IDE (I recommend
-PyCharm or VSCode), and **install the Black extensiion**! I have Black set up in PyCharm as a file watcher, and all
-python files, on save, are automatically formatted. Use 160 characters a linewidth. Here is the Black file watcher config:
-
-![Black config](docs/_static/images/black.jpg)
-
-If everyone does this, then all files should remain consistent across different users.
-
-## Testing valid config in Pippin
-
-
- Click for the gory details
-
-To ensure we don't break things when pushing out new code, the tests directory contains a set of
-tests progressively increasing in pipeline complexity, designed to ensure that existing config files
-act consistently regardless of code changes. Any failure in the tests means a break in backwards compatibility
-and should be discussed before being incorporated into a release.
-
-To run the tests, in the top level directory, simply run:
-
-`pytest -v .`
-
-
-
-## Adding a new task
-
-
- Click for the gory details
-
-
-Alright there, you want to add a new task to Pippin? Great. Here's what you've got to do:
-
-1. Create an implementation of the `Task` class, can keep it empty for now.
-2. Figure out where it goes - in `manager.py` at the top you can see the current stages in Pippin. You'll probably need to figure out where it should go.
-Once you have figured it out, import the task and slot it in.
-3. Back in your new class that extends Task, you'll notice you have a few methods to implement:
- 1. `_run()`: Kick the task off, report True or False for successful kicking off.
- To help with determining the hash and whether the task shoudl run, there are a few handy functions:
- `_check_regenerate`, `get_hash_from_string`, `save_hash`, `get_hash_from_files`, `get_old_hash`. See, for example, the Analyse
- task for an example on how I use these.
- 2. `_check_completion(squeue)`: Check to see if the task (whether its being rerun or not) is done.
- Normally I do this by checking for a done file, which contains either SUCCESS or FAILURE. For example, if submitting a script to a queuing system, I might have this after the primary command:
- ```batch
- if [ $? -eq 0 ]; then
- echo SUCCESS > {done_file}
- else
- echo FAILURE > {done_file}
- fi
- ```
- This allows me to easily see if a job failed or passed. On failure, I then generally recommend looking through the task logs and trying to figure out what went wrong, so you can present a useful message
- to your user.
- To then show that error, or **ANY MESSAGE TO THE USER**, use the provided logger:
- `self.logger.error("The task failed because of this reason")`.
-
- This method should return either a) Task.FINISHED_FAILURE, Task.FINISHED_SUCCESS, or alternatively the number of jobs still in the queue, which you could figure out because I pass in all jobs the user has
- active in the variable squeue (which can sometimes be None).
- 3. `get_tasks(task_config, prior_tasks, output_dir, stage_num, prefix, global_config)`: From the given inputs, determine what tasks should be created, and create them, and then return them in a list. For context,
- here is the code I use to determine what simulation tasks to create:
- ```python
- @staticmethod
- def get_tasks(config, prior_tasks, base_output_dir, stage_number, prefix, global_config):
- tasks = []
- for sim_name in config.get("SIM", []):
- sim_output_dir = f"{base_output_dir}/{stage_number}_SIM/{sim_name}"
- s = SNANASimulation(sim_name, sim_output_dir, f"{prefix}_{sim_name}", config["SIM"][sim_name], global_config)
- Task.logger.debug(f"Creating simulation task {sim_name} with {s.num_jobs} jobs, output to {sim_output_dir}")
- tasks.append(s)
- return tasks
- ```
-
-
-
-## Adding a new classifier
-
-
- Click for the gory details
-
-Alright, so what if we're not after a brand new task, but just adding another classifier. Well, its easier to do, and I recommend looking at
-`nearest_neighbor_python.py` for something to copy from. You'll see we have the parent Classifier class, I write out the slurm script that
-would be used, and then define the `train` and `predict` method (which both invoke a general `classify` function in different ways, you can do this
-however you want.)
-
-You'll also notice a very simply `_check_completion` method, and a `get_requirmenets` method. The latter returns a two-tuple of booleans, indicating
-whether the classifier needs photometry and light curve fitting results respectively. For the NearestNeighbour code, it classifies based
-only on SALT2 features, so I return `(False, True)`.
-You can also define a `get_optional_requirements` method which, like `get_requirements`, returns a two-tuple of booleans, indicating whether the classifer needs photometry and light curve fitting results *for this particular run*. By default, this method returns:
-- `True, True` if `OPTIONAL_MASK` set in `OPTS`
-- `True, False` if `OPTIONAL_MASK_SIM` set in `OPTS`
-- `False, True` if `OPTIONAL_MASK_FIT` set in `OPTS`
-- `False, False` otherwise.
-
-If you define your own method based on classifier specific requirements, then these `OPTIONAL_MASK*` keys can still be set to choose which tasks are optionally included. If there are not set, then the normal `MASK`, `MASK_SIM`, and `MASK_FIT` are used instead. Note that if *no* masks are set then *every* sim or lcfit task will be included.
-
-Finally, you'll need to add your classifier into the ClassifierFactory in `classifiers/factory.py`, so that I can link a class name
-in the YAML configuration to your actual class. Yeah yeah, I could use reflection or dynamic module scanning or similar, but I've had issues getting
-the behaviour consistent across systems and conda environments, so we're doing it the hard way.
+## Table of Contents
-
+* [Installation](https://pippin.readthedocs.io/en/latest/src/install.html)
+* [Using Pippin](https://pippin.readthedocs.io/en/latest/src/usage.html)
+* [Tasks](https://pippin.readthedocs.io/en/latest/src/tasks.html)
+* [Pippin Development](https://pippin.readthedocs.io/en/latest/src/dev.html)
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 00000000..d40f39c1
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,31 @@
+[![Documentation](https://readthedocs.org/projects/pippin/badge/?version=latest)](https://pippin.readthedocs.io/en/latest/?badge=latest)
+[![JOSS](https://joss.theoj.org/papers/10.21105/joss.02122/status.svg)](https://doi.org/10.21105/joss.02122)
+[![Zenodo](https://img.shields.io/badge/DOI-10.5281%2Fzenodo.366608-blue)](https://zenodo.org/badge/latestdoi/162215291)
+[![GitHub license](https://img.shields.io/badge/License-MIT-green)](https://github.com/dessn/Pippin/blob/master/LICENSE)
+[![Github Issues](https://img.shields.io/github/issues/dessn/Pippin)](https://github.com/dessn/Pippin/issues)
+![Python Version](https://img.shields.io/badge/Python-3.7%2B-red)
+![Pippin Test](https://github.com/dessn/Pippin/actions/workflows/test-pippin.yml/badge.svg)
+
+# Pippin
+
+Pippin - a pipeline designed to streamline and remove as much hassle as we can when running end-to-end supernova cosmology analyses.
+
+![A Really Funny Meme](_static/images/meme.jpg)
+
+## Table of Contents
+
+:::{toctree}
+:maxdepth: 2
+:hidden:
+
+self
+:::
+
+:::{toctree}
+:maxdepth: 2
+
+src/install.md
+src/usage.md
+src/tasks.md
+src/dev.md
+:::
diff --git a/docs/conf.py b/docs/conf.py
index e1d6a6d5..96dadbbf 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -31,9 +31,20 @@
# ones.
extensions = [
'sphinx_rtd_theme',
- 'myst_parser'
+ 'sphinx_rtd_dark_mode',
+ 'myst_parser',
+ 'sphinxcontrib.youtube',
]
+myst_enable_extensions = [
+ "substitution",
+ "colon_fence",
+]
+
+myst_substitutions = {
+ "patrick": "[Patrick Armstrong](https://github.com/OmegaLambda1998)"
+}
+
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
diff --git a/docs/index.md b/docs/index.md
deleted file mode 100644
index a4917577..00000000
--- a/docs/index.md
+++ /dev/null
@@ -1,1007 +0,0 @@
-[![Documentation](https://readthedocs.org/projects/pippin/badge/?version=latest)](https://pippin.readthedocs.io/en/latest/?badge=latest)
-[![JOSS](https://joss.theoj.org/papers/10.21105/joss.02122/status.svg)](https://doi.org/10.21105/joss.02122)
-[![Zenodo](https://img.shields.io/badge/DOI-10.5281%2Fzenodo.366608-blue)](https://zenodo.org/badge/latestdoi/162215291)
-[![GitHub license](https://img.shields.io/badge/License-MIT-green)](https://github.com/dessn/Pippin/blob/master/LICENSE)
-[![Github Issues](https://img.shields.io/github/issues/dessn/Pippin)](https://github.com/dessn/Pippin/issues)
-![Python Version](https://img.shields.io/badge/Python-3.7%2B-red)
-![Pippin Test](https://github.com/dessn/Pippin/actions/workflows/test-pippin.yml/badge.svg)
-
-# Pippin
-
-Pippin - a pipeline designed to streamline and remove as much hassle as we can
-when running end-to-end supernova cosmology analyses.
-
-## Table of Contents
-
-
-
-- [Using Pippin](#using-pippin)
-- [Installing Pippin](#installing-it-fresh)
-- [Contributing to Pippin](#issues-and-contributing-to-pippin)
-- [Examples](#examples)
-- [FAQ](#faq)
-- [Tasks](#tasks)
- - [DataPrep](#data-preparation)
- - [Simulation](#simulation)
- - [Light Curve Fit](#light-curve-fit)
- - [Classification](#classification)
- - [Aggregation](#aggregation)
- - [Merging](#merging)
- - [Bias Corrections](#bias-corrections)
- - [Create Covariance](#create-covariance)
- - [CosmoFit](#cosmofit)
- - [Analyse](#analyse)
-- [Adding a new Task](#adding-a-new-task)
-- [Adding a new classifier](#adding-a-new-classifier)
-
-
-## Installing it fresh
-
-If you're using a pre-installed version of Pippin - like the one on Midway, ignore this.
-
-If you're not, installing Pippin is simple.
-
-1. Checkout Pippin
-2. Ensure you have the dependencies install (`pip install -r requirements.txt`) and that your python version is 3.7+.
-3. Celebrate
-
-There is no need to attempt to install Pippin like a package (no `python setup.py install`), just run from the clone.
-
-Now, Pippin also interfaces with other tasks: SNANA and machine learning classifiers mostly. I'd highly recommend
-running on a high performance computer with SNANA already installed, but if you want to take a crack at installing it,
-[you can find the docoumentation here](https://github.com/RickKessler/SNANA).
-
-I won't cover installing SNANA here, hopefully you already have it. But to install the classifiers, we'll take
-[SuperNNova](https://github.com/supernnova/SuperNNova) as an example. To install that, find a good place for it and:
-
-1. Checkout `https://github.com/SuperNNova/SuperNNova`
-2. Create a GPU conda env for it: `conda create --name snn_gpu --file env/conda_env_gpu_linux64.txt`
-3. Activate environment and install natsort: `conda activate snn_gpu` and `conda install --yes natsort`
-
-Then, in the Pippin global configuration file `cfg.yml` in the top level directory, ensure that the SNN path in Pippin is
-pointing to where you just cloned SNN into. You will need to install the other external software packages
-if you want to use them, and you do not need to install any package you do not explicitly request in a config file.
-
-## Using Pippin
-
-Using Pippin is very simple. In the top level directory, there is a `pippin.sh`. If you're on midway and use SNANA, this
-script will be on your path already. To use Pippin, all you need is a config file ready to go. I've got a bunch of mine and
-some general ones in the `configs` directory, but you can put yours wherever you want. I recommend adding your initials to the
-front of the file to make it obvious in the shared output directory which folders as yours.
-
-If you have `example.yml` as your config file and want pippin to run it, easy:
-`pippin.sh example.yml`
-
-The file name that you pass in should contain a run configuration. Note that this is different to the global software
-configuration file `cfg.yml`, and remember to ensure that your `cfg.yml` file is set up properly and that you know
-where you want your output to be
-installed. By default, I assume that the `$PIPPIN_OUTPUT` environment variable is set as the output location,
-so please either set said variable or change the associated line in the `cfg.yml`. [For the morbidly curious, here
-is a very small demo video of using Pippin in the Midway environment](https://www.youtube.com/watch?v=pCaPvzFCZ-Y).
-
-![ConsoleOutput](docs/_static/images/console.gif)
-
-
-### Creating your own configuration file
-
-Each configuration file is represented by a yaml dictionary linking each stage (see stage declaration section below) to
-a dictionary of tasks, the key being the unique name for the task and the value being its specific task configuration.
-
-For example, to define a configuration with two simulations and one light curve fitting task (resulting in 2 output simulations and
-2 output light curve tasks - one for each simulation), a user would define:
-
-```yaml
-SIM:
- SIM_NAME_1:
- SIM_CONFIG: HERE
- SIM_NAME_2:
- SIM_CONFIG: HERE
-
-LCFIT:
- LCFIT_NAME_1:
- LCFIT_CONFIG: HERE
-```
-
-How to configure each task is also detail below on a task-by-task basis, or you can see examples in the `examples`
- directory for each task.
-
-
-### What If I change my config file?
-
-Happens all the time, don't even worry about it. Just start Pippin again and run the file again. Pippin will detect
-any changes in your configuration by hashing all the input files to a specific task. So this means, even if you're
-config file itself doesn't change, changes to an input file it references (for example, the default DES simulation
-input file) would result in Pippin rerunning that task. If it cannot detect anything has changed, and if the task
-finished successfully the last time it was run, the task is not re-executed. You can force re-execution of tasks using the `-r` flag.
-
-
-### Command Line Arguments
-
-On top of this, Pippin has a few command line arguments, which you can detail with `pippin.sh -h`, but I'll also detail here:
-
-```bash
- -h Show the help menu
- -v, --verbose Verbose. Shows debug output. I normally have this option enabled.
- -r, --refresh Refresh/redo - Rerun tasks that completed in a previous run even if the inputs haven't changed.
- -c, --check Check that the input config is valid but don't actually run any tasks.
- -s, --start Start at this task and refresh everything after it. Number of string accepted
- -f, --finish Finish at this stage. For example -f 3 or -f CLASSIFY to run up to and including classification.
- -p, --permission Fix permissions and groups on all output, don't rerun
- -i, --ignore Do NOT regenerate/run tasks up to and including this stage.
- -S, --syntax If no task is given, prints out the possible tasks. If a task name or number is given, prints the docs on that task. For instance 'pippin.sh -S 0' and 'pippin.sh -S DATAPREP' will print the documentation for the DATAPREP task.
-```
-
-For an example, to have a verbose output configuration run and only do data preparation and simulation,
-you would run
-
-`pippin.sh -vf 1 configfile.yml`
-
-
-### Stages in Pippin
-
-You may have noticed above that each stage has a numeric idea for convenience and lexigraphical sorting.
-
-The current stages are:
-
-* `0, DATAPREP` Data preparation
-* `1, SIM`: Simulation
-* `2, LCFIT`: Light curve fitting
-* `3, CLASSIFY`: Classification (training and testing)
-* `4, AGG`: Aggregation (comparing classifiers)
-* `5, MERGE`: Merging (combining classifier and FITRES output)
-* `6, BIASCOR`: Bias corrections using BBC
-* `7, CREATE_COV`: Create input files needed for CosmoMC
-* `8, COSMOFIT`: Run CosmoMC and fit cosmology
-* `9, ANALYSE`: Create final output and plots. Includes output from CosmoMC, BBC and Light curve fitting.
-
-### Pippin on Midway
-
-On midway, sourcing the SNANA setup will add environment variables and Pippin to your path.
-
-Pippin itself can be found at `$PIPPIN`, output at `$PIPPIN_OUTPUT` (which goes to a scratch directory), and `pippin.sh` will automatically work from
-any location.
-
-Note that you only have 100 GB on scratch. If you fill that up and need to nuke some files, look both in `$SCRATCH_SIMDIR` to remove SNANA
-photometry and `$PIPPIN_OUTPUT` to remove Pippin's output. I'd recommend adding this to your `~/.bashrc` file to scan through directories you own and
-calculate directory size so you know what's taking the most space. After adding this and sourcing it, just put `dirusage` into the terminal
-in both of those locations and see what's eating your quota.
-
-```bash
-function dirusage {
- for file in $(ls -l | grep $USER | awk '{print $NF}')
- do
- du -sh "$file"
- done
-}
-```
-
-### Pippin on Perlmutter
-
-On perlmutter, add `source /global/cfs/cdirs/lsst/groups/TD/setup_td.sh` to your `~/.bashrc` to load all the relevant paths and environment variables.
-
-This will add the `$PIPPIN_DIR` path for Pippin source code, and `$PIPPIN_OUTPUT` for the output of Pippin jobs. Additionally `pippin.sh` can be run from any directory.
-
-To load the perlmutter specific `cfg.yml` you must add the following to the start of your Pippin job:
-```yaml
-GLOBAL:
- CFG_PATH: $SNANA_LSST_ROOT/starterKits/pippin/cfg_lsst_perlmutter.yml
-```
-
-## Issues and Contributing to Pippin
-
-Contributing to Pippin or raising issues is easy. Here are some ways you can do it, in order of preference:
-
-1. Submit an [issue on Github](https://github.com/samreay/Pippin), and then submit a pull request to fix that issue.
-2. Submit an [issue on Github](https://github.com/samreay/Pippin), and then wait until I have time to look at it. Hopefully thats quickly, but no guarantees.
-3. Email me with a feature request
-
-If you do want to contribute code, fantastic. [Please note that all code in Pippin is subject to the Black formatter](https://black.readthedocs.io/en/stable/).
-I would recommend installing this yourself because it's a great tool.
-
-
-## Examples
-
-If you want detailed examples of what you can do with Pippin tasks, have a look in the [examples directory](https://github.com/dessn/Pippin/tree/master/examples),
-pick the task you want to know more about, and have a look over all the options.
-
-Here is a very simple configuration file which runs a simulation, does light curve fitting, and then classifies it using the
-debug FITPROB classifier.
-
-```yaml
-SIM:
- DESSIM:
- IA_G10_DES3YR:
- BASE: surveys/des/sim_ia/sn_ia_salt2_g10_des3yr.input
-
-LCFIT:
- BASEDES:
- BASE: surveys/des/lcfit_nml/des_5yr.nml
-
-CLASSIFICATION:
- FITPROBTEST:
- CLASSIFIER: FitProbClassifier
- MODE: predict
-```
-
-You can see that unless you specify a `MASK` on each subsequent task, Pippin will generally try and run everything on everything. So if you have two
-simulations defined, you don't need two light curve fitting tasks, Pippin will make one light curve fit task for each simulation, and then two classification tasks,
-one for each light curve fit task.
-
-### Anchoring in YAML files
-
-If you are finding that your config files contain lots of duplicated sections (for example, many simulations configured
-almost the same way but with one differnece), consider using YAML anchors. [See this blog post](https://blog.daemonl.com/2016/02/yaml.html)
-for more detail. You can define your anchors in the main config section, or add a new section (like SIM, LCFIT, CLASSIFICATION). So long as it doesn't
-match a Pippin keyword for each stage, you'll be fine. I recommend `ANCHORS:` at the top of the file, all of those will work.
-
-
-## FAQ
-
-**Pippin is crashing on some task and the error message isn't useful**
-
-Feel free to send me the log and stack, and I'll see what I can do turn the exception into something
-more human-readable.
-
-**I want Pippin to run after I log out**
-
-Rather than redirecting Pippin output to a file or running it in the background, I *highly recommend* you run
-Pippin in a `screen` session.
-
-For example, if you are doing machine-learning testing, you may create a new screen session called `ml`
-by running `screen -S ml`. It will then launch a new instance of bash for you to play around in. conda **will not work out of the box**. To make
-it work again, run `conda deactivate` and then `conda activate`, and you can check this works by running `which python` and
-verifying its pointing to the miniconda install. You can then run Pippin as per normal: `pippin.sh -v your_job.yml` and get the coloured output.
-To leave the screen session, but *still keep Pippin running even after you log out*, press `Ctrl-A, Ctrl-D`. As in one, and then the other, not `Ctrl-A-D`.
-This will detach from your screen session but keep it running. Just going `Ctrl_D` will disconnect and shut it down. To get back into your screen session,
-simply run `screen -r ml` to reattach. You can see your screen
-sessions using `screen -ls`.
-
-You may notice if you log in and out of midway that your screen sessions might not show up. This is because midway has multiple head nodes, and
-your screen session exists only on one of them. This is why when I ssh to midway I specify a specific login node instead
-of being assigned one. To make it simpler, I'd recommend setting up
-an alias like so to either `login1` or `login2`:
-
-```yaml
-alias sshmidway="ssh username@midway2-login1.rcc.uchicago.edu"
-```
-
-**I want to modify a ton of files but don't want huge yml files, please help**
-
-You can modify input files and put them in a directory you own, and then tell Pippin to look there
-(in addition to the default location) when its constructing your tasks. To do this, see [this example here](https://github.com/dessn/Pippin/blob/master/examples/global.yml),
-or use this code snippet at the top of your YAML file (not that it matters if it's at the top):
-
-```yaml
-GLOBAL:
- DATA_DIRS:
- - /some/new/directory/with/your/files/in/it
-```
-
-**I want to use a different cfg.yml file!**
-
-```yaml
-GLOBAL:
- CFG_PATH: /your/path/here
-```
-**Stop rerunning my sims!**
-
-For big biascor sims it can be frustrating if you're trying to tweak biascor or later stages and sims kick off
-because of some trivial change. So use the `--ignore` ro `-i` command to ignore any undone tasks or tasks with
-hash disagreements in previous stages. To clarify, even tasks that do not have a hash, and have never been submitted, will
-not be run if that stage is set to be ignored.
-
-**I don't want to run this massive jobs again! Let me use external results!**
-
-Good news, everyone! Not only is there a dedicated config file for globally useful tasks, but its easier than ever to slow them
-into your existing jobs. For useful precomputed work, such as biascor sims and trained machine learning classifiers, check out `$PIPPIN_OUTPUT/GLOBAL`.
-
-For an example on how to use these results, check out the reference 5YR analysis `ref_des_5yr.yml`. There are in essense two ways of
-including external tasks. Both operate the same way, one is just a bit more explicit than the other. The explicit way is when adding
-a task that is an *exact* replica of an external task, you can just add the `EXTERNAL` keyword. For example, in the reference 5YR analysis,
-all the biascor sims are precomputed, so we can define them as external tasks like this:
-
-```yaml
-SIM:
- DESSIMBIAS5YRIA_C11: # A SIM task we don't want to rerun
- EXTERNAL: $PIPPIN_OUTPUT/GLOBAL/1_SIM/DESSIMBIAS5YRIA_C11 # The path to a matching external SIM task, which is already finished
- DESSIMBIAS5YRIA_G10:
- EXTERNAL: $PIPPIN_OUTPUT/GLOBAL/1_SIM/DESSIMBIAS5YRIA_G10
- DESSIMBIAS5YRCC:
- EXTERNAL: $PIPPIN_OUTPUT/GLOBAL/1_SIM/DESSIMBIAS5YRCC
-```
-
-In this case, we use the `EXTERNAL` keyword because each of the three defined tasks can only be associated with one, and only one, `EXTERNAL` task. Because `EXTERNAL` tasks are one-to-one with a defined task, the name of the defined task, and the `EXTERNAL` task do not need to match.
-
-Suppose we don't want to recompute the light curve fits. After all, most of the time we're not changing that step anyway! However, unlike `SIM`, `LCFIT` runs multiple sub-tasks - one for each `SIM` task you are performing lightcurve fitting on.
-
-```yaml
-LCFIT:
- D: # An LCFIT task we don't want to rerun
- BASE: surveys/des/lcfit_nml/des_5yr.nml
- MASK: DESSIM # Selects a subset of SIM tasks to run lightcurve fitting on
- # In this case, the SIM tasks are DESSIMBIAS5YRIA_C11, DESSIMBIAS5YRIA_G10, and DESSIMBIAS5YRCC
- EXTERNAL_DIRS:
- - $PIPPIN_OUTPUT/GLOBAL/2_LCFIT/D_DESSIMBIAS5YRIA_C11 # Path to a previously run LCFIT sub-task
- - $PIPPIN_OUTPUT/GLOBAL/2_LCFIT/D_DESSIMBIAS5YRIA_G10
- - $PIPPIN_OUTPUT/GLOBAL/2_LCFIT/D_DESSIMBIAS5YRCC
-```
-
-That is, we have one `LCFIT` task, but because we have three sims going into it and matching the mask, we can't point to a single `EXTERNAL` task. Instead, we provide an external path for each sub-task, as defined in `EXTERNAL_DIRS`. The name of each external sub-task must exactly match the `LCFIT` task name, and the `SIM` sub-task name. For example, the path to the `DESSIMBIAS5YRIA_C11` lightcurve fits, must be `D_DESSIMBIAS5YRIA_C11`.
-
-Note that you still need to point to the right base file, because Pippin still wants those details. It won't be submitted anywhere though, just loaded in.
-
-To use `EXTERNAL_DIRS` on pre-computed tasks that don't follow your current naming scheme (i.e the `LCFIT` task name, or the `SIM` sub-task names differ), you can make use of `EXTERNAL_MAP` to provide a mapping between the `EXTERNAL_DIR` paths, and each `LCFIT` sub-task.
-
-```yaml
-LCFIT:
- D: # An LCFIT task we don't want to rerun
- BASE: surveys/des/lcfit_nml/des_5yer.nml
- MASK: DESSIM # Selects a subset of SIM tasks to run lightcurve fitting on
- EXTERNAL_DIRS: # Paths to external LCFIT tasks, which do not have an exact match with this task
- - $PIPPIN_OUTPUT/EXAMPLE_C11/2_LCFIT/DESFIT_SIM
- - $PIPPIN_OUTPUT/EXAMPLE_G10/2_LCFIT/DESFIT_SIM
- - $PIPPIN_OUTPUT/EXAMPLE/2_LCFIT/DESFIT_CCSIM
- EXTERNAL_MAP:
- # LCFIT_SIM: EXTERNAL_MASK
- D_DESSIMBIAS5YRIA_C11: EXAMPLE_C11 # In this case we are matching to the pippin job name, as the LCFIT task name is shared between two EXTERNAL_DIRS
- D_DESSIMBIAS5YRIA_G10: EXAMPLE_G10 # Same as C11
- D_DESSIMBIAS5YRCC: DESFIT_CCSIM # In this case we match to the LCFIT task name, as the pippin job name (EXAMPLE) would match with the other EXTERNAL_DIRS
-```
-
-The flexibility of `EXTERNAL_DIRS` means you can mix both precomputed and non-precomputed tasks together. Take this classificaiton task:
-
-```yaml
-CLASSIFICATION:
- SNNTEST:
- CLASSIFIER: SuperNNovaClassifier
- MODE: predict
- OPTS:
- MODEL: $PIPPIN_OUTPUT/GLOBAL/3_CLAS/SNNTRAIN_DESTRAIN/model.pt
- EXTERNAL_DIRS:
- - $PIPPIN_OUTPUT/GLOBAL/3_CLAS/SNNTEST_DESSIMBIAS5YRIA_C11_SNNTRAIN_DESTRAIN
- - $PIPPIN_OUTPUT/GLOBAL/3_CLAS/SNNTEST_DESSIMBIAS5YRIA_G10_SNNTRAIN_DESTRAIN
- - $PIPPIN_OUTPUT/GLOBAL/3_CLAS/SNNTEST_DESSIMBIAS5YRCC_SNNTRAIN_DESTRAIN
-```
-
-It will load in the precomputed classification results for the biascor sims, and then also run and generate classification results on any other
-simulation tasks (such as running on the data) using the pretrained model `model.pt`.
-
-Finally, the way this works under the hood is simple - it copies the directory over explicitly. And it will only copy once, so if you want the
-"latest version" just ask the task to refresh (or delete the folder). Once it copies it, there is no normal hash checking,
-it reads in the `config.yml` file created by the task in its initial run and powers onwards.
-
-If you have any issues using this new feature, check out the `ref_des_5yr.yml` file or flick me a message.
-
-## Tasks
-
-Pippin is essentially a wrapper around many different tasks. In this section,
-I'll try and explain how tasks are related to each other, and what each task is.
-
-As a general note, most tasks have an `OPTS` where most details go. This is partially historical, but essentially properties
-that Pippin uses to determine how to construct tasks (like `MASK`, classification mode, etc) are top level, and the Task itself gets passed everything
-inside `OPTS` to use however it wants.
-
-[//]: # (Start of Task specification)
-
-### Data Preparation
-
-The DataPrep task is simple - it is mostly a pointer for Pippin towards an external directory that contains
-some photometry, to say we're going to make use of it. Normally this means data files,
-though you can also use it to point to simulations that have already been run to save yourself
-the hassle of rerunning them. The other thing the DataPrep task will do is run the new
-method of determining a viable initial guess for the peak time, which will be used by the light curve fitting task down the road.
-The full options available for the DataPrep task are:
-
-```yaml
-DATAPREP:
- SOMENAME:
- OPTS:
-
- # Location of the photometry files
- RAW_DIR: $DES_ROOT/lcmerge/DESALL_forcePhoto_real_snana_fits
-
- # Specify which types are confirmed Ia's, confirmed CC or unconfirmed. Used by ML down the line
- TYPES:
- IA: [101, 1]
- NONIA: [20, 30, 120, 130]
-
- # Blind the data. Defaults to True if SIM:True not set
- BLIND: False
-
- # Defaults to False. Important to set this flag if analysing a sim in the same way as data, as there
- # are some subtle differences
- SIM: False
-
- # The method of estimating peak mjd values. Don't ask me what numbers mean what, ask Rick.
- OPT_SETPKMJD: 16
-
-```
-
-### Simulation
-
-The simulation task does exactly what you'd think it does. It invokes [SNANA](https://github.com/RickKessler/SNANA) to run some similation as per your configuration.
-If something goes wrong, Pippin tries to dig through the log files to give you a useful error message, but sometimes this
-is difficult (i.e. the logs have been zipped up). With the current version of SNANA, each simulation can have at most one Ia component,
-and an arbitrary number of CC components. The specification for the simulation task config is as follows:
-
-```yaml
-SIM:
- SOMENAMEHERE:
-
- # We specify the Ia component, so it must have IA in its name
- IA_G10:
- BASE: surveys/des/sims_ia/sn_ia_salt2_g10_des5yr.input # And then we specify the base input file which generates it.
-
- # Now we can specify as many CC sims to mix in as we want
- II_JONES:
- BASE: surveys/des/sims_cc/sn_collection_jones.input
-
- IAX:
- BASE: surveys/des/sims_cc/sn_iax.input
- DNDZ_ALLSCALE: 3.0 # Note you can add/overwrite keys like so for specific files
-
- # This section will apply to all components of the sim
- GLOBAL:
- NGEN_UNIT: 1
- RANSEED_REPEAT: 10 12345
-```
-
-### Light Curve Fit
-
-This task runs the SALT2 light curve fitting process on light curves from the simulation or DataPrep task. As above,
-if something goes wrong I try and give a good reason why, if you don't get a good reason, let me know. The task is
-specified like so:
-
-```yaml
-LCFIT:
- SOMENAMEHERE:
- # MASK means only apply this light curve fitting on sims/Dataprep which have DES in the name
- # You can also specify a list for this, and they will be applied as a logical or
- MASK: DES
-
- # The base nml file used
- BASE: surveys/des/lcfit_nml/des.nml
-
- # FITOPTS can be left out for nothing, pointed to a file, specified manually or a combination of the two
- # Normally this would be a single entry like global.yml shown below, but you can also pass a list
- # If you specify a FITOPT manually, make sure it has the / around the label
- # And finally, if you specify a file, make sure its a yml dictionary that links a survey name to the correct
- # fitopts. See the file below for an example
- FITOPTS:
- - surveys/global/lcfit_fitopts/global.yml
- - "/custom_extra_fitopt/ REDSHIFT_FINAL_SHIFT 0.0001"
-
- # We can optionally customise keys in the FITINP section
- FITINP:
- FILTLIST_FIT: 'gri'
-
- # And do the same for the optional SNLCINP section
- SNLCINP:
- CUTWIN_SNRMAX: 3.0, 1.0E8
- CUTWIN_NFILT_SNRMAX: 3.0, 99.
-
- # Finally, options that go outside either of these sections just go in the generic OPTS
- OPTS:
- BATCH_INFO: sbatch $SBATCH_TEMPLATES/SBATCH_Midway2_1hr.TEMPLATE 10
-```
-
-### Classification
-
-Within Pippin, there are many different classifiers implemented. Most classifiers need to be trained, and
-can then run in predict mode. All classifiers that require training can either be trained in the same yml
-file, or you can point to an external serialised instance of the trained class and use that. The general syntax
-for a classifier is:
-
-```yaml
-CLASSIFICATION:
- SOMELABEL:
- CLASSIFIER: NameOfTheClass
- MODE: train # or predict
- MASK: mask # Masks both sim and lcfit together, logical and, optional
- MASK_SIM: sim_only_mask
- MASK_FIT: lcfit_only_mask
- COMBINE_MASK: [SIM_IA, SIM_CC] # optional mask to combine multiple sim runs into one classification job (e.g. separate CC and Ia sims). NOTE: currently not compatible with SuperNNova/SNIRF
- OPTS:
- MODEL: file_or_label # only needed in predict mode, how to find the trained classifier
- OPTIONAL_MASK: opt_mask # mask for optional dependencies. Not all classifiers make use of this
- OPTIONAL_MASK_SIM: opt_sim_only_mask # mask for optional sim dependencies. Not all classifiers make use of this
- OPTIONAL_MASK_FIT: opt_lcfit_only_mask # mask for optional lcfit dependencies. Not all classifiers make use of this
- WHATREVER_THE: CLASSIFIER_NEEDS
-```
-
-#### SCONE Classifier
-
-The [SCONE classifier](https://github.com/helenqu/scone) is a convolutional neural network-based classifier for supernova photometry. The model first creates "heatmaps" of flux values in wavelength-time space, then runs the neural network model on GPU (if available) to train or predict on these heatmaps. A successful run will produce `predictions.csv`, which shows the Ia probability of each SN. For debugging purposes, the model config (`model_config.yml`), Slurm job (`job.slurm`), log (`output.log`), and all the heatmaps (`heatmaps/`) can be found in the output directory. An example of how to define a SCONE classifier:
-
-```yaml
-CLASSIFICATION:
- SCONE_TRAIN: # Helen's CNN classifier
- CLASSIFIER: SconeClassifier
- MODE: train
- OPTS:
- GPU: True # OPTIONAL, default: False
- # HEATMAP CREATION OPTS
- CATEGORICAL: True # OPTIONAL, binary or categorical classification, default: False
- NUM_WAVELENGTH_BINS: 32 # OPTIONAL, heatmap height, default: 32
- NUM_MJD_BINS: 180 # OPTIONAL, heatmap width, default: 180
- REMAKE_HEATMAPS: False # OPTIONAL, SCONE does not remake heatmaps unless the 3_CLAS/heatmaps subdir doesn't exist or if this param is true, default: False
- # MODEL OPTS
- NUM_EPOCHS: 400 # REQUIRED, number of training epochs
- IA_FRACTION: 0.5 # OPTIONAL, desired Ia fraction in train/validation/test sets for binary classification, default: 0.5
-
- SCONE_PREDICT: # Helen's CNN classifier
- CLASSIFIER: SconeClassifier
- MODE: predict
- OPTS:
- GPU: True # OPTIONAL, default: False
- # HEATMAP CREATION OPTS
- CATEGORICAL: True # OPTIONAL, binary or categorical classification, default: False
- NUM_WAVELENGTH_BINS: 32 # OPTIONAL, heatmap height, default: 32
- NUM_MJD_BINS: 180 # OPTIONAL, heatmap width, default: 180
- REMAKE_HEATMAPS: False # OPTIONAL, SCONE does not remake heatmaps unless the 3_CLAS/heatmaps subdir doesn't exist or if this param is true, default: False
- # MODEL OPTS
- MODEL: "/path/to/trained/model" # REQUIRED, path to trained model that should be used for prediction
- IA_FRACTION: 0.5 # OPTIONAL, desired Ia fraction in train/validation/test sets for binary classification, default: 0.5
-```
-
-#### SuperNNova Classifier
-
-The [SuperNNova classifier](https://github.com/supernnova/SuperNNova) is a recurrent neural network that
-operates on simulation photometry. It has three in vuilt variants - its normal (vanilla) mode, a Bayesian mode
-and a Variational mode. After training, a `model.pt` can be found in the output directory,
-which you can point to from a different yaml file. You can define a classifier like so:
-
-```yaml
-CLASSIFICATION:
- SNN_TEST:
- CLASSIFIER: SuperNNovaClassifier
- MODE: predict
- GPU: True # Or False - determines which queue it gets sent into
- CLEAN: True # Or false - determine if Pippin removes the processed folder to sae space
- OPTS:
- MODEL: SNN_TRAIN # Havent shown this defined. Or /somepath/to/model.pt
- VARIANT: vanilla # or "variational" or "bayesian". Defaults to "vanilla"
- REDSHIFT: True # What redshift info to use when classifying. Defaults to 'zspe'. Options are [True, False, 'zpho', 'zspe', or 'none']. True and False are legacy options which map to 'zspe', and 'none' respectively.
- NORM: cosmo_quantile # How to normalise LCs. Other options are "perfilter", "cosmo", "global" or "cosmo_quantile".
- CYCLIC: True # Defaults to True for vanilla and variational model
- SEED: 0 # Sets random seed. Defaults to 0.
- LIST_FILTERS: ['G', 'R', 'I', 'Z'] # What filters are present in the data, defaults to ['g', 'r', 'i', 'z']
- SNTYPES: "/path/to/sntypes.txt" # Path to a file which lists the sn type mapping to be used. Example syntax for this can be found at https://github.com/LSSTDESC/plasticc_alerts/blob/main/Examples/plasticc_schema/elasticc_origmap.txt. Alternatively, yaml dictionaries can be used to specify each sn type individually.
-```
-
-Pippin also allows for supernnova input yaml files to be passed, instead of having to define all of the options in the Pippin input yaml. This is done via:
-
-```yaml
-OPTS:
- DATA_YML: path/to/data_input.yml
- CLASSIFICATION_YML: path/to/classification_input.yml
-```
-
-Example input yaml files can be found [here](https://github.com/supernnova/SuperNNova/tree/master/configs_yml), with the important variation that you must have:
-
-```yaml
-raw_dir: RAW_DIR
-dump_dir: DUMP_DIR
-done_file: DONE_FILE
-```
-
-So that Pippin can automatically replace these with the appropriate directories.
-
-#### SNIRF Classifier
-
-The [SNIRF classifier](https://github.com/evevkovacs/ML-SN-Classifier) is a random forest running off SALT2 summary
-statistics. You can specify which features it gets to train on, which has a large impact on performance. After training,
-there should be a `model.pkl` in the output directory. You can specify one like so:
-
-```yaml
-CLASSIFICATION:
- SNIRF_TEST:
- CLASSIFIER: SnirfClassifier
- MODE: predict
- OPTS:
- MODEL: SNIRF_TRAIN
- FITOPT: some_label # Optional FITOPT to use. Match the label. Defaults to no FITOPT
- FEATURES: x1 c zHD x1ERR cERR PKMJDERR # Columns to use. Defaults are shown. Check FITRES for options.
- N_ESTIMATORS: 100 # Number of trees in forest
- MIN_SAMPLES_SPLIT: 5 # Min number of samples to split a node on
- MIN_SAMPLES_LEAF: 1 # Minimum number samples in leaf node
- MAX_DEPTH: 0 # Max depth of tree. 0 means auto, which means as deep as it wants.
-```
-
-#### Nearest Neighbour Classifier
-
-Similar to SNIRF, NN trains on SALT2 summary statistics using a basic Nearest Neighbour algorithm from sklearn.
-It will produce a `model.pkl` file in its output directory when trained. You can configure it as per SNIRF:
-
-
-```yaml
-CLASSIFICATION:
- NN_TEST:
- CLASSIFIER: NearestNeighborPyClassifier
- MODE: predict
- OPTS:
- MODEL: NN_TRAIN
- FITOPT: some_label # Optional FITOPT to use. Match the label. Defaults to no FITOPT
- FEATURES: zHD x1 c cERR x1ERR COV_x1_c COV_x1_x0 COV_c_x0 PKMJDERR # Columns to use. Defaults are shown.
-```
-
-#### Perfect Classifier
-
-Sometimes you want to cheat, and if you have simulations, this is easy. The perfect classifier looks into the sims to
-get the actual type, and will then assign probabilities as per your configuration. This classifier has no training mode,
-only predict.
-
-```yaml
-CLASSIFICATION:
- PERFECT:
- CLASSIFIER: PerfectClassifier
- MODE: predict
- OPTS:
- PROB_IA: 1.0 # Probs to use for Ia events, default 1.0
- PROB_CC: 0.0 # Probs to use for CC events, default 0.0
-```
-
-#### Unity Classifier
-
-To emulate a spectroscopically confirmed sample, or just to save time, we can assign every event a probability of 1.0
-that it is a type Ia. As it just returns 1.0 for everything, it only has a predict mode
-
-```yaml
-CLASSIFICATION:
- UNITY:
- CLASSIFIER: UnityClassifier
- MODE: predict
-```
-
-#### FitProb Classifier
-
-Another useful debug test is to just take the SALT2 fit probability calculated from the chi2 fitting and use that
-as our probability. You'd hope that classifiers all improve on this. Again, this classifier only has a predict mode.
-
-```yaml
-CLASSIFICATION:
- FITPROBTEST:
- CLASSIFIER: FitProbClassifier
- MODE: predict
-```
-
-### Aggregation
-
-The aggregation task takes results from one or more classification tasks (that have been run in predict mode
-on the same dataset) and generates comparisons between the classifiers (their correlations, PR curves, ROC curves
-and their calibration plots). Additionally, it merges the results of the classifiers into a single
-csv file, mapping SNID to one column per classifier.
-
-```yaml
-AGGREGATION:
- SOMELABEL:
- MASK: mask # Match sim AND classifier
- MASK_SIM: mask # Match only sim
- MASK_CLAS: mask # Match only classifier
- RECALIBRATION: SIMNAME # Optional, use this simulation to recalibrate probabilities. Default no recal.
- # Optional, changes the probability column name of each classification task listed into the given probability column name.
- # Note that this will crash if the same classification task is given multiple probability column names.
- # Mostly used when you have multiple photometrically classified samples
- MERGE_CLASSIFIERS:
- PROB_COLUMN_NAME: [CLASS_TASK_1, CLASS_TASK_2, ...]
- OPTS:
- PLOT: True # Default True, make plots
- PLOT_ALL: False # Default False. Ie if RANSEED_CHANGE gives you 100 sims, make 100 set of plots.
-```
-
-### Merging
-
-The merging task will take the outputs of the aggregation task, and put the probabilities from each classifier
-into the light curve fit results (FITRES files) using SNID.
-
-```yaml
-MERGE:
- label:
- MASK: mask # partial match on all sim, fit and agg
- MASK_SIM: mask # partial match on sim
- MASK_FIT: mask # partial match on lcfit
- MASK_AGG: mask # partial match on aggregation task
-```
-
-### Bias Corrections
-
-With all the probability goodness now in the FITRES files, we can move onto calculating bias corrections.
-For spec-confirmed surveys, you only need a Ia sample for bias corrections. For surveys with contamination,
-you will also need a CC only simulation/lcfit result. For each survey being used (as we would often combine lowz and highz
-surveys), you can specify inputs like below.
-
-Note that I expect this task to have the most teething issues, especially when we jump into the MUOPTS.
-
-```yaml
-BIASCOR:
- LABEL:
- # The base input file to utilise
- BASE: surveys/des/bbc/bbc.input
-
- # The names of the lcfits_data/simulations going in. List format please. Note LcfitLabel_SimLabel format
- DATA: [DESFIT_DESSIM, LOWZFIT_LOWZSIM]
-
- # Input Ia bias correction simulations to be concatenated
- SIMFILE_BIASCOR: [DESFIT_DESBIASCOR, LOWZFIT_LOWZBIASCOR]
-
- # Optional, specify FITOPT to use. Defaults to 0 for each SIMFILE_BIASCOR. If using this option, you must specify a FITOPT for each SIMFILE_BIASCOR
- SIMFILE_BIASCOR_FITOPTS: [0, 1] # FITOPT000 and FITOPT001
-
- # For surveys that have contamination, add in the cc only simulation under CCPRIOR
- SIMFILE_CCPRIOR: DESFIT_DESSIMBIAS5YRCC
-
- # Optional, specify FITOPT to use. Defaults to 0 for each SIMFILE_CCPRIOR. If using this option, you must specify a FITOPT for each SIMFILE_CCPRIOR
- SIMFILE_CCPRIOR_FITOPTS: [0, 1] # FITOPT000 and FITOPT001
-
-
- # Which classifier to use. Column name in FITRES will be determined from this property.
- # In the case of multiple classifiers this can either be
- # 1. A list of classifiers which map to the same probability column name (as defined by MERGE_CLASSIFIERS in the AGGREGATION stage)
- # 2. A probability column name (as defined by MERGE_CLASSIFIERS in the AGGREGATION stage)
- # Note that this will crash if the specified classifiers do not map to the same probability column.
- CLASSIFIER: UNITY
-
- # Default False. If multiple sims (RANSEED_CHANGE), make one or all Hubble plots.
- MAKE_ALL_HUBBLE: False
-
- # Defaults to False. Will load in the recalibrated probabilities, and crash and burn if they dont exist.
- USE_RECALIBRATED: True
-
- # Defaults to True. If set to True, will rerun biascor twice, removing any SNID that got dropped in any FITOPT/MUOPT
- CONSISTENT_SAMPLE: False
-
-
- # We can also specify muopts to add in systematics. They share the structure of the main biascor definition
- # You can have multiple, use a dict structure, with the muopt name being the key
- MUOPTS:
- C11:
- SIMFILE_BIASCOR: [D_DESBIASSYS_C11, L_LOWZBIASSYS_C11]
- SCALE: 0.5 # Defaults to 1.0 scale, used by CREATE_COV to determine covariance matrix contribution
-
- # Generic OPTS that can modify the base file and overwrite properties
- OTPS:
- BATCH_INFO: sbatch $SBATCH_TEMPLATES/SBATCH_Midway2_1hr.TEMPLATE 10
-```
-
-For those that generate large simulations and want to cut them up into little pieces, you want the `NSPLITRAN` syntax.
-The configuration below will take the inputs and divide them into 10 samples, which will then propagate to 10 CosmoMC runs
-if you have a CosmoMC task defined.
-
-```yaml
-BIASCOR:
- LABEL:
- BASE: surveys/des/bbc/bbc_3yr.input
- DATA: [D_DES_G10]
- SIMFILE_BIASCOR: [D_DESSIMBIAS3YRIA_G10]
- PROB_COLUMN_NAME: some_column_name # optional instead of CLASSIFIER
- OPTS:
- NSPLITRAN: 10
-```
-
-### Create Covariance
-
-Assuming the biascor task hasn't died, its time to prep for CosmoMC. To do this, we invoke a script from Dan originally
-(I think) that essentially creates all the input files and structure needed by CosmoMC. It provides a way of scaling
-systematics, and determining which covariance options to run with.
-
-```yaml
-CREATE_COV:
- SOMELABEL:
- MASK: some_biascor_task
- OPTS:
- INI_DIR: /path/to/your/own/dir/of/cosmomc/templates # Defaults to cosmomc_templates, which you can exploit using DATA_DIRS
- SYS_SCALE: surveys/global/lcfit_fitopts/global.yml # Location of systematic scaling file, same as the FITOPTS file.
- SINGULAR_BLIND: False # Defaults to False, whether different contours will have different shifts applied
- BINNED: True # Whether to bin the SN or not for the covariance matrx. Defaults to True
- REBINNED_X1: 2 # Rebin x1 into 2 bins
- REBINNED_C: 4 # Rebin c into 4 bins
- SUBTRACT_VPEC: False # Subtract VPEC contribution to MUERR if True. Used when BINNED: False
- FITOPT_SCALES: # Optional
- FITOPT_LABEL: some_scale # Note this is a partial match, ie SALT2: 1.0 would apply to all SALT2 cal fitopts
- MUOPT_SCALES:
- MUOPT_LABEL: some_scale # This is NOT a partial match, must be exact
- COVOPTS: # Optional, and you'll always get an 'ALL' covopt. List format please
- - "[NOSYS] [=DEFAULT,=DEFAULT]" # This syntax is explained below
-```
-
-If you don't specify `SYS_SCALE`, Pippin will search the LCFIT tasks from the BIASCOR dependency and if all LCFIT tasks
-have the same fitopt file, it will use that.
-
-The `COVOPTS` section is a bit odd. In the square brackets first, we have the label that will be assigned and used
-in the plotting output later. The next set of square backets is a two-tuple, and it applies to `[fitopts,muopts]` in
-that order. For example, to get four contours out of CosmoMC corresponding to all uncertainty, statistics only,
-statistics + calibration uncertainty, and fitopts + C11 uncertainty, we could set:
-
-```yaml
-COVOPTS:
- - "[NOSYS] [=DEFAULT,=DEFAULT]"
- - "[CALIBRATION] [+cal,=DEFAULT]"
- - "[SCATTER] [=DEFAULT,=C11]"
-```
-
-### CosmoFit
-
-CosmoFit is a generic cosmological fitting task, which allows you to choose between different fitters.
-The syntax is very simple:
-```yaml
-COSMOFIT:
- COSMOMC:
- SOMELABEL:
- # CosmoMC options
- WFIT:
- SOMEOTHERLABEL:
- # WFit options
-```
-
-#### CosmoMC
-
-Launching CosmoMC is hopefully fairly simple. There are a list of provided configurations under the `cosmomc_templates`
-directory (inside `data_files`), and the main job of the user is to pick which one they want.
-
-```yaml
-COSMOFIT:
- COSMOMC:
- SOMELABEL:
- MASK_CREATE_COV: mask # partial match
- OPTS:
- INI: sn_cmb_omw # should match the filename of an ini file
- NUM_WALKERS: 8 # Optional, defaults to eight.
-
- # Optional, covopts from CREATE_COV step to run against. If blank, you get them all. Exact matching.
- COVOPTS: [ALL, NOSYS]
-```
-
-#### WFit
-
-Launching WFit simply requires providing the command line options you want to use for each fit.
-```yaml
-COSMOFIT:
- WFIT:
- SOMELABEL:
- MASK: mask # partial match
- OPTS:
- BATCH_INFO: sbatch path/to/SBATCH.TEMPLATE 10 # Last number is the number of cores
- WFITOPT_GLOBAL: "-hsteps 61 -wsteps 101 -omsteps 81" # Optional, will apply these options to all fits"
- WFITOPTS:
- - /om_pri/ -ompri 0.31 -dompri 0.01 # At least one option is required. The name in the /'s is a human readable label
- - /cmb_pri/ -cmb_sim -sigma_Rcmb 0.007 # Optionally include as many other fitopts as you want.
-
-```
-
-### Analyse
-
-The final step in the Pippin pipeline is the Analyse task. It creates a final output directory, moves relevant files into it,
-and generates extra plots. It will save out compressed CosmoMC chains and the plotting scripts (so you can download
-the entire directory and customise it without worrying about pointing to external files), it will copy in Hubble diagrams,
-and - depending on if you've told it to, will make histogram comparison plots between data and sim. Oh and also
-redshift evolution plots. The scripts which copy/compress/rename external files into the analyse directory are generally
-named `parse_*.py`. So `parse_cosmomc.py` is the script which finds, reads and compresses the MCMC chains from CosmoMC into
-the output directory. Then `plot_cosmomc.py` reads those compressed files to make the plots.
-
-Cosmology contours will be blinded when made by looking at the BLIND flag set on the data. For data, this defaults to
-True.
-
-Note that all the plotting scripts work the same way - `Analyse` generates a small yaml file pointing to all the
-resources called `input.yml`, and each script uses the same file to make different plots. It is thus super easy to add your own
-plotting code scripts, and you can specify arbitrary code to execute using the `ADDITIONAL_SCRIPTS` keyword in opts.
-Just make sure your code takes `input.yml` as an argument. As an example, to rerun the CosmoMC plots, you'd simply have to
-run `python plot_cosmomc.py input.yml`.
-
-```yaml
-ANALYSE:
- SOMELABEL:
- MASK_COSMOFIT: mask # partial match
- MASK_BIASCOR: mask # partial match
- MASK_LCFIT: [D_DESSIM, D_DATADES] # Creates histograms and efficiency based off the input LCFIT_SIMNAME matches. Optional
- OPTS:
- COVOPTS: [ALL, NOSYS] # Optional. Covopts to match when making contours. Single or list. Exact match.
- SHIFT: False # Defualt False. Shift all the contours on top of each other
- PRIOR: 0.01 # Default to None. Optional normal prior around Om=0.3 to apply for sims if wanted.
- ADDITIONAL_SCRIPTS: /somepath/to/your/script.py # Should take the input.yml as an argument
-```
-
-[//]: # (End of Task specification)
-
-![Developer Documentation Below](docs/_static/images/developer.jpg)
-
-
-## Coding style
-
-Please, for the love of god, don't code this up in vim/emacs on a terminal connection. Use a proper IDE (I recommend
-PyCharm or VSCode), and **install the Black extensiion**! I have Black set up in PyCharm as a file watcher, and all
-python files, on save, are automatically formatted. Use 160 characters a linewidth. Here is the Black file watcher config:
-
-![Black config](docs/_static/images/black.jpg)
-
-If everyone does this, then all files should remain consistent across different users.
-
-## Testing valid config in Pippin
-
-
- Click for the gory details
-
-To ensure we don't break things when pushing out new code, the tests directory contains a set of
-tests progressively increasing in pipeline complexity, designed to ensure that existing config files
-act consistently regardless of code changes. Any failure in the tests means a break in backwards compatibility
-and should be discussed before being incorporated into a release.
-
-To run the tests, in the top level directory, simply run:
-
-`pytest -v .`
-
-
-
-## Adding a new task
-
-
- Click for the gory details
-
-
-Alright there, you want to add a new task to Pippin? Great. Here's what you've got to do:
-
-1. Create an implementation of the `Task` class, can keep it empty for now.
-2. Figure out where it goes - in `manager.py` at the top you can see the current stages in Pippin. You'll probably need to figure out where it should go.
-Once you have figured it out, import the task and slot it in.
-3. Back in your new class that extends Task, you'll notice you have a few methods to implement:
- 1. `_run()`: Kick the task off, report True or False for successful kicking off.
- To help with determining the hash and whether the task shoudl run, there are a few handy functions:
- `_check_regenerate`, `get_hash_from_string`, `save_hash`, `get_hash_from_files`, `get_old_hash`. See, for example, the Analyse
- task for an example on how I use these.
- 2. `_check_completion(squeue)`: Check to see if the task (whether its being rerun or not) is done.
- Normally I do this by checking for a done file, which contains either SUCCESS or FAILURE. For example, if submitting a script to a queuing system, I might have this after the primary command:
- ```batch
- if [ $? -eq 0 ]; then
- echo SUCCESS > {done_file}
- else
- echo FAILURE > {done_file}
- fi
- ```
- This allows me to easily see if a job failed or passed. On failure, I then generally recommend looking through the task logs and trying to figure out what went wrong, so you can present a useful message
- to your user.
- To then show that error, or **ANY MESSAGE TO THE USER**, use the provided logger:
- `self.logger.error("The task failed because of this reason")`.
-
- This method should return either a) Task.FINISHED_FAILURE, Task.FINISHED_SUCCESS, or alternatively the number of jobs still in the queue, which you could figure out because I pass in all jobs the user has
- active in the variable squeue (which can sometimes be None).
- 3. `get_tasks(task_config, prior_tasks, output_dir, stage_num, prefix, global_config)`: From the given inputs, determine what tasks should be created, and create them, and then return them in a list. For context,
- here is the code I use to determine what simulation tasks to create:
- ```python
- @staticmethod
- def get_tasks(config, prior_tasks, base_output_dir, stage_number, prefix, global_config):
- tasks = []
- for sim_name in config.get("SIM", []):
- sim_output_dir = f"{base_output_dir}/{stage_number}_SIM/{sim_name}"
- s = SNANASimulation(sim_name, sim_output_dir, f"{prefix}_{sim_name}", config["SIM"][sim_name], global_config)
- Task.logger.debug(f"Creating simulation task {sim_name} with {s.num_jobs} jobs, output to {sim_output_dir}")
- tasks.append(s)
- return tasks
- ```
-
-
-
-## Adding a new classifier
-
-
- Click for the gory details
-
-Alright, so what if we're not after a brand new task, but just adding another classifier. Well, its easier to do, and I recommend looking at
-`nearest_neighbor_python.py` for something to copy from. You'll see we have the parent Classifier class, I write out the slurm script that
-would be used, and then define the `train` and `predict` method (which both invoke a general `classify` function in different ways, you can do this
-however you want.)
-
-You'll also notice a very simply `_check_completion` method, and a `get_requirmenets` method. The latter returns a two-tuple of booleans, indicating
-whether the classifier needs photometry and light curve fitting results respectively. For the NearestNeighbour code, it classifies based
-only on SALT2 features, so I return `(False, True)`.
-You can also define a `get_optional_requirements` method which, like `get_requirements`, returns a two-tuple of booleans, indicating whether the classifer needs photometry and light curve fitting results *for this particular run*. By default, this method returns:
-- `True, True` if `OPTIONAL_MASK` set in `OPTS`
-- `True, False` if `OPTIONAL_MASK_SIM` set in `OPTS`
-- `False, True` if `OPTIONAL_MASK_FIT` set in `OPTS`
-- `False, False` otherwise.
-
-If you define your own method based on classifier specific requirements, then these `OPTIONAL_MASK*` keys can still be set to choose which tasks are optionally included. If there are not set, then the normal `MASK`, `MASK_SIM`, and `MASK_FIT` are used instead. Note that if *no* masks are set then *every* sim or lcfit task will be included.
-
-Finally, you'll need to add your classifier into the ClassifierFactory in `classifiers/factory.py`, so that I can link a class name
-in the YAML configuration to your actual class. Yeah yeah, I could use reflection or dynamic module scanning or similar, but I've had issues getting
-the behaviour consistent across systems and conda environments, so we're doing it the hard way.
-
-
diff --git a/docs/index.rst b/docs/index.rst
new file mode 100644
index 00000000..9cc800eb
--- /dev/null
+++ b/docs/index.rst
@@ -0,0 +1,2 @@
+.. include:: README.md
+ :parser: myst_parser.sphinx_
diff --git a/docs/install.rst b/docs/install.rst
deleted file mode 100644
index a437d5f5..00000000
--- a/docs/install.rst
+++ /dev/null
@@ -1,23 +0,0 @@
-#############
-Installation
-#############
-
-If you're using a pre-installed version of Pippin - like the one on Midway, ignore this.
-
-If you're not, installing Pippin is simple.
-
-1. Checkout Pippin ``git clone git@github.com:dessn/Pippin.git``
-2. Ensure you have the dependencies isntalled ``pip install -r requirements.txt`` and that your python version is 3.7+
-3. Celebrate!
-
-There is no need to attempt to install Pippin like a package (no ``python setup.py install``), just run from the clone.
-
-Now, Pippin also interfaces with other tasks: SNANA and machine learning classifiers mostly. I'd highly recommend running on a high performance computer with SNANA already installed, but if you want to take a crack at installing it, you can find the docoumentation `here `__.
-
-I won't cover installing SNANA here, hopefully you already have it. But to install the classifiers, we'll take `SuperNNova `__ as an example. To install that, find a good place for it and:
-
-1. Checkout SuperNNova: ``git clone git@github.com:supernnova/SuperNNova.git``
-2. Create a GPU conda env for it: ``conda create --name snn_gpu --file env/conda_env_gpu_linux64.txt``
-3. Activate environment and install natsort: ``conda activate snn_gpu`` and ``conda install --yes natsort``
-
-Then, in the Pippin global configuration file ``cfg.yml`` in the top level directory, ensure that the SuperNNova path in Pippin is pointing to where you just cloned SuperNNova into. You will need to install the other external software packages if you want to use them, and you do not need to install any package you do not explicitly request in a config file.
diff --git a/docs/requirements.txt b/docs/requirements.txt
index 5434413d..a7b0fc00 100644
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -1,3 +1,5 @@
sphinx<8
sphinx_rtd_theme
+sphinx-rtd-dark-mode
myst-parser
+sphinxcontrib-youtube
diff --git a/docs/src/dev.md b/docs/src/dev.md
new file mode 100644
index 00000000..e2fbdffe
--- /dev/null
+++ b/docs/src/dev.md
@@ -0,0 +1,89 @@
+# Pippin Development
+
+## Issues and Contributing to Pippin
+
+Contributing to Pippin or raising issues is easy. Here are some ways you can do it, in order of preference:
+
+1. Submit an [issue on Github](https://github.com/dessn/Pippin/issues), and then submit a pull request to fix that issue.
+2. Submit an [issue on Github](https://github.com/dessn/Pippin/issues), and then wait until I have time to look at it. Hopefully thats quickly, but no guarantees.
+3. Email me with a feature request
+
+If you do want to contribute code, fantastic. [Please note that all code in Pippin is subject to the Black formatter](https://black.readthedocs.io/en/stable/). I would recommend installing this yourself because it's a great tool.
+
+![Developer Documentation Below](../_static/images/developer.jpg)
+
+## Coding style
+
+Please, for the love of god, don't code this up in vim/emacs on a terminal connection[^1]. Use a proper IDE (I recommend PyCharm or VSCode), and **install the Black extension**! I have Black set up in PyCharm as a file watcher, and all python files, on save, are automatically formatted. Use 160 characters a linewidth. Here is the Black file watcher config:
+
+![Black config](../_static/images/black.jpg)
+
+If everyone does this, then all files should remain consistent across different users.
+
+[^1]: {{patrick}}: Since taking over as primary developer, I have done nothing but code this up in vim on a terminal connection. It's not the worst thing you could possibly do. There's a [Black Linter](https://github.com/dessn/Pippin/actions/workflows/black-formatter.yml) github action which will trigger on pull requests to main, allowing you to format your contributions before merging.
+
+## Testing valid config in Pippin
+
+To ensure we don't break things when pushing out new code, the tests directory contains a set of tests progressively increasing in pipeline complexity, designed to ensure that existing config files act consistently regardless of code changes. Any failure in the tests means a break in backwards compatibility and should be discussed before being incorporated into a release.
+
+To run the tests, in the top level directory, simply run:
+
+`pytest -v .`
+
+## Adding a new task
+
+Alright there, you want to add a new task to Pippin? Great. Here's what you've got to do:
+
+1. Create an implementation of the `Task` class, can keep it empty for now.
+2. Figure out where it goes - in `manager.py` at the top you can see the current stages in Pippin. You'll probably need to figure out where it should go. Once you have figured it out, import the task and slot it in.
+3. Back in your new class that extends Task, you'll notice you have a few methods to implement:
+ 1. `_run()`: Kick the task off, report True or False for successful kicking off. To help with determining the hash and whether the task shoudl run, there are a few handy functions: `_check_regenerate`, `get_hash_from_string`, `save_hash`, `get_hash_from_files`, `get_old_hash`. See, for example, the task for an example on how I use these.
+ 2. `_check_completion(squeue)`: Check to see if the task (whether its being rerun or not) is done. Normally I do this by checking for a done file, which contains either SUCCESS or FAILURE. For example, if submitting a script to a queuing system, I might have this after the primary command:
+ ```sh
+ if [ $? -eq 0 ]; then
+ echo SUCCESS > {done_file}
+ else
+ echo FAILURE > {done_file}
+ fi
+ ```
+ This allows me to easily see if a job failed or passed. On failure, I then generally recommend looking through the task logs and trying to figure out what went wrong, so you can present a useful message to your user.
+ To then show that error, or **ANY MESSAGE TO THE USER**, use the provided logger:
+ `self.logger.error("The task failed because of this reason")`.
+
+ This method should return either a) Task.FINISHED_FAILURE, Task.FINISHED_SUCCESS, or alternatively the number of jobs still in the queue, which you could figure out because I pass in all jobs the user has
+ active in the variable squeue (which can sometimes be None).
+ 3. `get_tasks(task_config, prior_tasks, output_dir, stage_num, prefix, global_config)`: From the given inputs, determine what tasks should be created, and create them, and then return them in a list. For context,
+ here is the code I use to determine what simulation tasks to create:
+ ```python
+ @staticmethod
+ def get_tasks(config, prior_tasks, base_output_dir, stage_number, prefix, global_config):
+ tasks = []
+ for sim_name in config.get("SIM", []):
+ sim_output_dir = f"{base_output_dir}/{stage_number}_SIM/{sim_name}"
+ s = SNANASimulation(sim_name, sim_output_dir, f"{prefix}_{sim_name}", config["SIM"][sim_name], global_config)
+ Task.logger.debug(f"Creating simulation task {sim_name} with {s.num_jobs} jobs, output to {sim_output_dir}")
+ tasks.append(s)
+ return tasks
+ ```
+
+## Adding a new classifier
+
+Alright, so what if we're not after a brand new task, but just adding another classifier. Well, its easier to do, and I recommend looking at
+`nearest_neighbor_python.py` for something to copy from. You'll see we have the parent Classifier class, I write out the slurm script that
+would be used, and then define the `train` and `predict` method (which both invoke a general `classify` function in different ways, you can do this
+however you want.)
+
+You'll also notice a very simply `_check_completion` method, and a `get_requirmenets` method. The latter returns a two-tuple of booleans, indicating
+whether the classifier needs photometry and light curve fitting results respectively. For the NearestNeighbour code, it classifies based
+only on SALT2 features, so I return `(False, True)`.
+You can also define a `get_optional_requirements` method which, like `get_requirements`, returns a two-tuple of booleans, indicating whether the classifer needs photometry and light curve fitting results *for this particular run*. By default, this method returns:
+- `True, True` if `OPTIONAL_MASK` set in `OPTS`
+- `True, False` if `OPTIONAL_MASK_SIM` set in `OPTS`
+- `False, True` if `OPTIONAL_MASK_FIT` set in `OPTS`
+- `False, False` otherwise.
+
+If you define your own method based on classifier specific requirements, then these `OPTIONAL_MASK*` keys can still be set to choose which tasks are optionally included. If there are not set, then the normal `MASK`, `MASK_SIM`, and `MASK_FIT` are used instead. Note that if *no* masks are set then *every* sim or lcfit task will be included.
+
+Finally, you'll need to add your classifier into the ClassifierFactory in `classifiers/factory.py`, so that I can link a class name
+in the YAML configuration to your actual class. Yeah yeah, I could use reflection or dynamic module scanning or similar, but I've had issues getting
+the behaviour consistent across systems and conda environments, so we're doing it the hard way.
diff --git a/docs/src/install.md b/docs/src/install.md
new file mode 100644
index 00000000..6f839abf
--- /dev/null
+++ b/docs/src/install.md
@@ -0,0 +1,29 @@
+# Installation
+
+If you're using a pre-installed version of Pippin - like the one on Midway, ignore this.
+
+If you're not, installing Pippin is simple.
+
+1. Checkout Pippin
+2. Ensure you have the dependencies install (`pip install -r requirements.txt`) and that your python version is 3.7+.
+3. Celebrate
+
+There is no need to attempt to install Pippin like a package (no `python setup.py install`), just run from the clone.
+
+Now, Pippin also interfaces with other software, including:
+- [SNANA](https://github.com/RickKessler/SNANA)
+- [SuperNNova](https://github.com/supernnova/SuperNNova)
+- [SNIRF](https://github.com/evevkovacs/ML-SN-Classifier)
+- [DataSkimmer](https://github.com/supernnova/DES_SNN)
+- [SCONE](https://github.com/helenqu/scone)
+
+When it comes to installing SNANA, the best method is to already have it installed on a high performance server you have access to[^1]. However, installing the other software used by Pippin should be far simpler. Taking [SuperNNova](https://github.com/supernnova/SuperNNova) as an example:
+
+1. In an appropriate directory `git clone https://github.com/SuperNNova/SuperNNova`
+2. Create a GPU conda env for it: `conda create --name snn_gpu --file env/conda_env_gpu_linux64.txt`
+3. Activate environment and install natsort: `conda activate snn_gpu` and `conda install --yes natsort`
+
+Then, in the Pippin global configuration file, [cfg.yml](https://github.com/dessn/Pippin/blob/4fd0994bc445858bba83b2e9e5d3fcb3c4a83120/cfg.yml) in the top level directory, ensure that the `SuperNNova: location` path is pointing to where you just cloned SNN into. You will need to install the other external software packages if you want to use them, and you do not need to install any package you do not explicitly request in a config file[^2].
+
+[^1]: {{patrick}}: I am ***eventually*** going to attempt to create an SNANA docker image, but that's likely far down the line.
+[^2]: {{patrick}}: If Pippin is complaining about a missing software package which you aren't using, please file an issue.
diff --git a/docs/tasks.rst b/docs/src/tasks.md
similarity index 70%
rename from docs/tasks.rst
rename to docs/src/tasks.md
index 40e9cdf5..571c8765 100644
--- a/docs/tasks.rst
+++ b/docs/src/tasks.md
@@ -1,12 +1,20 @@
-#####
-Tasks
-#####
+# Tasks
Pippin is essentially a wrapper around many different tasks. In this section, I'll try and explain how tasks are related to each other, and what each task is.
As a general note, most tasks have an ``OPTS`` section where most details go. This is partially historical, but essentially properties that Pippin uses to determine how to construct tasks (like ``MASK``, classification mode, etc) are top level, and the Task itself gets passed everything inside OPTS to use however it wants.
-.. toctree::
- :maxdepth: 2
+:::{toctree}
+:maxdepth: 1
- tasks/dataprep
+tasks/dataprep.md
+tasks/sim.md
+tasks/lcfit.md
+tasks/classify.md
+tasks/agg.md
+tasks/merge.md
+tasks/biascor.md
+tasks/createcov.md
+tasks/cosmofit.md
+tasks/analyse.md
+:::
diff --git a/docs/src/tasks/agg.md b/docs/src/tasks/agg.md
new file mode 100644
index 00000000..f2af8611
--- /dev/null
+++ b/docs/src/tasks/agg.md
@@ -0,0 +1,20 @@
+# 4. AGGREGATION
+
+The aggregation task takes results from one or more classification tasks (that have been run in predict mode on the same dataset) and generates comparisons between the classifiers (their correlations, PR curves, ROC curves and their calibration plots). Additionally, it merges the results of the classifiers into a single csv file, mapping SNID to one column per classifier.
+
+```yaml
+AGGREGATION:
+ SOMELABEL:
+ MASK: mask # Match sim AND classifier
+ MASK_SIM: mask # Match only sim
+ MASK_CLAS: mask # Match only classifier
+ RECALIBRATION: SIMNAME # Optional, use this simulation to recalibrate probabilities. Default no recal.
+ # Optional, changes the probability column name of each classification task listed into the given probability column name.
+ # Note that this will crash if the same classification task is given multiple probability column names.
+ # Mostly used when you have multiple photometrically classified samples
+ MERGE_CLASSIFIERS:
+ PROB_COLUMN_NAME: [CLASS_TASK_1, CLASS_TASK_2, ...]
+ OPTS:
+ PLOT: True # Default True, make plots
+ PLOT_ALL: False # Default False. Ie if RANSEED_CHANGE gives you 100 sims, make 100 set of plots.
+```
diff --git a/docs/src/tasks/analyse.md b/docs/src/tasks/analyse.md
new file mode 100644
index 00000000..ab77c274
--- /dev/null
+++ b/docs/src/tasks/analyse.md
@@ -0,0 +1,20 @@
+# 9. ANALYSE
+
+The final step in the Pippin pipeline is the Analyse task. It creates a final output directory, moves relevant files into it, and generates extra plots. It will save out compressed CosmoMC chains and the plotting scripts (so you can download the entire directory and customise it without worrying about pointing to external files), it will copy in Hubble diagrams, and - depending on if you've told it to, will make histogram comparison plots between data and sim. Oh and also redshift evolution plots. The scripts which copy/compress/rename external files into the analyse directory are generally named `parse_*.py`. So `parse_cosmomc.py` is the script which finds, reads and compresses the MCMC chains from CosmoMC into the output directory. Then `plot_cosmomc.py` reads those compressed files to make the plots.
+
+Cosmology contours will be blinded when made by looking at the BLIND flag set on the data. For data, this defaults to True.
+
+Note that all the plotting scripts work the same way - `Analyse` generates a small yaml file pointing to all the resources called `input.yml`, and each script uses the same file to make different plots. It is thus super easy to add your own plotting code scripts, and you can specify arbitrary code to execute using the `ADDITIONAL_SCRIPTS` keyword in opts. Just make sure your code takes `input.yml` as an argument. As an example, to rerun the CosmoMC plots, you'd simply have to run `python plot_cosmomc.py input.yml`.
+
+```yaml
+ANALYSE:
+ SOMELABEL:
+ MASK_COSMOFIT: mask # partial match
+ MASK_BIASCOR: mask # partial match
+ MASK_LCFIT: [D_DESSIM, D_DATADES] # Creates histograms and efficiency based off the input LCFIT_SIMNAME matches. Optional
+ OPTS:
+ COVOPTS: [ALL, NOSYS] # Optional. Covopts to match when making contours. Single or list. Exact match.
+ SHIFT: False # Defualt False. Shift all the contours on top of each other
+ PRIOR: 0.01 # Default to None. Optional normal prior around Om=0.3 to apply for sims if wanted.
+ ADDITIONAL_SCRIPTS: /somepath/to/your/script.py # Should take the input.yml as an argument
+```
diff --git a/docs/src/tasks/biascor.md b/docs/src/tasks/biascor.md
new file mode 100644
index 00000000..b1fe3a68
--- /dev/null
+++ b/docs/src/tasks/biascor.md
@@ -0,0 +1,67 @@
+# 6. BIASCOR
+
+With all the probability goodness now in the FITRES files, we can move onto calculating bias corrections. For spec-confirmed surveys, you only need a Ia sample for bias corrections. For surveys with contamination, you will also need a CC only simulation/lcfit result. For each survey being used (as we would often combine lowz and highz surveys), you can specify inputs like below.
+
+```yaml
+BIASCOR:
+ LABEL:
+ # The base input file to utilise
+ BASE: surveys/des/bbc/bbc.input
+
+ # The names of the lcfits_data/simulations going in. List format please. Note LcfitLabel_SimLabel format
+ DATA: [DESFIT_DESSIM, LOWZFIT_LOWZSIM]
+
+ # Input Ia bias correction simulations to be concatenated
+ SIMFILE_BIASCOR: [DESFIT_DESBIASCOR, LOWZFIT_LOWZBIASCOR]
+
+ # Optional, specify FITOPT to use. Defaults to 0 for each SIMFILE_BIASCOR. If using this option, you must specify a FITOPT for each SIMFILE_BIASCOR
+ SIMFILE_BIASCOR_FITOPTS: [0, 1] # FITOPT000 and FITOPT001
+
+ # For surveys that have contamination, add in the cc only simulation under CCPRIOR
+ SIMFILE_CCPRIOR: DESFIT_DESSIMBIAS5YRCC
+
+ # Optional, specify FITOPT to use. Defaults to 0 for each SIMFILE_CCPRIOR. If using this option, you must specify a FITOPT for each SIMFILE_CCPRIOR
+ SIMFILE_CCPRIOR_FITOPTS: [0, 1] # FITOPT000 and FITOPT001
+
+
+ # Which classifier to use. Column name in FITRES will be determined from this property.
+ # In the case of multiple classifiers this can either be
+ # 1. A list of classifiers which map to the same probability column name (as defined by MERGE_CLASSIFIERS in the AGGREGATION stage)
+ # 2. A probability column name (as defined by MERGE_CLASSIFIERS in the AGGREGATION stage)
+ # Note that this will crash if the specified classifiers do not map to the same probability column.
+ CLASSIFIER: UNITY
+
+ # Default False. If multiple sims (RANSEED_CHANGE), make one or all Hubble plots.
+ MAKE_ALL_HUBBLE: False
+
+ # Defaults to False. Will load in the recalibrated probabilities, and crash and burn if they dont exist.
+ USE_RECALIBRATED: True
+
+ # Defaults to True. If set to True, will rerun biascor twice, removing any SNID that got dropped in any FITOPT/MUOPT
+ CONSISTENT_SAMPLE: False
+
+
+ # We can also specify muopts to add in systematics. They share the structure of the main biascor definition
+ # You can have multiple, use a dict structure, with the muopt name being the key
+ MUOPTS:
+ C11:
+ SIMFILE_BIASCOR: [D_DESBIASSYS_C11, L_LOWZBIASSYS_C11]
+ SCALE: 0.5 # Defaults to 1.0 scale, used by CREATE_COV to determine covariance matrix contribution
+
+ # Generic OPTS that can modify the base file and overwrite properties
+ OTPS:
+ BATCH_INFO: sbatch $SBATCH_TEMPLATES/SBATCH_Midway2_1hr.TEMPLATE 10
+```
+
+For those that generate large simulations and want to cut them up into little pieces, you want the `NSPLITRAN` syntax. The configuration below will take the inputs and divide them into 10 samples, which will then propagate to 10 CosmoMC runs if you have a CosmoMC task defined.
+
+```yaml
+BIASCOR:
+ LABEL:
+ BASE: surveys/des/bbc/bbc_3yr.input
+ DATA: [D_DES_G10]
+ SIMFILE_BIASCOR: [D_DESSIMBIAS3YRIA_G10]
+ PROB_COLUMN_NAME: some_column_name # optional instead of CLASSIFIER
+ OPTS:
+ NSPLITRAN: 10
+```
diff --git a/docs/src/tasks/classify.md b/docs/src/tasks/classify.md
new file mode 100644
index 00000000..76fac603
--- /dev/null
+++ b/docs/src/tasks/classify.md
@@ -0,0 +1,165 @@
+# 3. CLASSIFICATION
+
+Within Pippin, there are many different classifiers implemented. Most classifiers need to be trained, and can then run in predict mode. All classifiers that require training can either be trained in the same yml file, or you can point to an external serialised instance of the trained class and use that. The general syntax for a classifier is:
+
+```yaml
+CLASSIFICATION:
+ SOMELABEL:
+ CLASSIFIER: NameOfTheClass
+ MODE: train # or predict
+ MASK: mask # Masks both sim and lcfit together, logical and, optional
+ MASK_SIM: sim_only_mask
+ MASK_FIT: lcfit_only_mask
+ COMBINE_MASK: [SIM_IA, SIM_CC] # optional mask to combine multiple sim runs into one classification job (e.g. separate CC and Ia sims). NOTE: currently not compatible with SuperNNova/SNIRF
+ OPTS:
+ MODEL: file_or_label # only needed in predict mode, how to find the trained classifier
+ OPTIONAL_MASK: opt_mask # mask for optional dependencies. Not all classifiers make use of this
+ OPTIONAL_MASK_SIM: opt_sim_only_mask # mask for optional sim dependencies. Not all classifiers make use of this
+ OPTIONAL_MASK_FIT: opt_lcfit_only_mask # mask for optional lcfit dependencies. Not all classifiers make use of this
+ WHATREVER_THE: CLASSIFIER_NEEDS
+```
+
+## SCONE Classifier
+
+The [SCONE classifier](https://github.com/helenqu/scone) is a convolutional neural network-based classifier for supernova photometry. The model first creates "heatmaps" of flux values in wavelength-time space, then runs the neural network model on GPU (if available) to train or predict on these heatmaps. A successful run will produce `predictions.csv`, which shows the Ia probability of each SN. For debugging purposes, the model config (`model_config.yml`), Slurm job (`job.slurm`), log (`output.log`), and all the heatmaps (`heatmaps/`) can be found in the output directory. An example of how to define a SCONE classifier:
+
+```yaml
+CLASSIFICATION:
+ SCONE_TRAIN: # Helen's CNN classifier
+ CLASSIFIER: SconeClassifier
+ MODE: train
+ OPTS:
+ GPU: True # OPTIONAL, default: False
+ # HEATMAP CREATION OPTS
+ CATEGORICAL: True # OPTIONAL, binary or categorical classification, default: False
+ NUM_WAVELENGTH_BINS: 32 # OPTIONAL, heatmap height, default: 32
+ NUM_MJD_BINS: 180 # OPTIONAL, heatmap width, default: 180
+ REMAKE_HEATMAPS: False # OPTIONAL, SCONE does not remake heatmaps unless the 3_CLAS/heatmaps subdir doesn't exist or if this param is true, default: False
+ # MODEL OPTS
+ NUM_EPOCHS: 400 # REQUIRED, number of training epochs
+ IA_FRACTION: 0.5 # OPTIONAL, desired Ia fraction in train/validation/test sets for binary classification, default: 0.5
+
+ SCONE_PREDICT: # Helen's CNN classifier
+ CLASSIFIER: SconeClassifier
+ MODE: predict
+ OPTS:
+ GPU: True # OPTIONAL, default: False
+ # HEATMAP CREATION OPTS
+ CATEGORICAL: True # OPTIONAL, binary or categorical classification, default: False
+ NUM_WAVELENGTH_BINS: 32 # OPTIONAL, heatmap height, default: 32
+ NUM_MJD_BINS: 180 # OPTIONAL, heatmap width, default: 180
+ REMAKE_HEATMAPS: False # OPTIONAL, SCONE does not remake heatmaps unless the 3_CLAS/heatmaps subdir doesn't exist or if this param is true, default: False
+ # MODEL OPTS
+ MODEL: "/path/to/trained/model" # REQUIRED, path to trained model that should be used for prediction
+ IA_FRACTION: 0.5 # OPTIONAL, desired Ia fraction in train/validation/test sets for binary classification, default: 0.5
+```
+
+## SuperNNova Classifier
+
+The [SuperNNova classifier](https://github.com/supernnova/SuperNNova) is a recurrent neural network that operates on simulation photometry. It has three in vuilt variants - its normal (vanilla) mode, a Bayesian mode and a Variational mode. After training, a `model.pt` can be found in the output directory, which you can point to from a different yaml file. You can define a classifier like so:
+
+```yaml
+CLASSIFICATION:
+ SNN_TEST:
+ CLASSIFIER: SuperNNovaClassifier
+ MODE: predict
+ GPU: True # Or False - determines which queue it gets sent into
+ CLEAN: True # Or false - determine if Pippin removes the processed folder to sae space
+ OPTS:
+ MODEL: SNN_TRAIN # Havent shown this defined. Or /somepath/to/model.pt
+ VARIANT: vanilla # or "variational" or "bayesian". Defaults to "vanilla"
+ REDSHIFT: True # What redshift info to use when classifying. Defaults to 'zspe'. Options are [True, False, 'zpho', 'zspe', or 'none']. True and False are legacy options which map to 'zspe', and 'none' respectively.
+ NORM: cosmo_quantile # How to normalise LCs. Other options are "perfilter", "cosmo", "global" or "cosmo_quantile".
+ CYCLIC: True # Defaults to True for vanilla and variational model
+ SEED: 0 # Sets random seed. Defaults to 0.
+ LIST_FILTERS: ['G', 'R', 'I', 'Z'] # What filters are present in the data, defaults to ['g', 'r', 'i', 'z']
+ SNTYPES: "/path/to/sntypes.txt" # Path to a file which lists the sn type mapping to be used. Example syntax for this can be found at https://github.com/LSSTDESC/plasticc_alerts/blob/main/Examples/plasticc_schema/elasticc_origmap.txt. Alternatively, yaml dictionaries can be used to specify each sn type individually.
+```
+
+Pippin also allows for SuperNNova input yaml files to be passed, instead of having to define all of the options in the Pippin input yaml. This is done via:
+
+```yaml
+OPTS:
+ DATA_YML: path/to/data_input.yml
+ CLASSIFICATION_YML: path/to/classification_input.yml
+```
+
+Example input yaml files can be found [here](https://github.com/supernnova/SuperNNova/tree/master/configs_yml), with the important variation that you must have:
+
+```yaml
+raw_dir: RAW_DIR
+dump_dir: DUMP_DIR
+done_file: DONE_FILE
+```
+
+So that Pippin can automatically replace these with the appropriate directories.
+
+## SNIRF Classifier
+
+The [SNIRF classifier](https://github.com/evevkovacs/ML-SN-Classifier) is a random forest running off SALT2 summary statistics. You can specify which features it gets to train on, which has a large impact on performance. After training, there should be a `model.pkl` in the output directory. You can specify one like so:
+
+```yaml
+CLASSIFICATION:
+ SNIRF_TEST:
+ CLASSIFIER: SnirfClassifier
+ MODE: predict
+ OPTS:
+ MODEL: SNIRF_TRAIN
+ FITOPT: some_label # Optional FITOPT to use. Match the label. Defaults to no FITOPT
+ FEATURES: x1 c zHD x1ERR cERR PKMJDERR # Columns to use. Defaults are shown. Check FITRES for options.
+ N_ESTIMATORS: 100 # Number of trees in forest
+ MIN_SAMPLES_SPLIT: 5 # Min number of samples to split a node on
+ MIN_SAMPLES_LEAF: 1 # Minimum number samples in leaf node
+ MAX_DEPTH: 0 # Max depth of tree. 0 means auto, which means as deep as it wants.
+```
+
+#### Nearest Neighbour Classifier
+
+Similar to SNIRF, NN trains on SALT2 summary statistics using a basic Nearest Neighbour algorithm from sklearn. It will produce a `model.pkl` file in its output directory when trained. You can configure it as per SNIRF:
+
+```yaml
+CLASSIFICATION:
+ NN_TEST:
+ CLASSIFIER: NearestNeighborPyClassifier
+ MODE: predict
+ OPTS:
+ MODEL: NN_TRAIN
+ FITOPT: some_label # Optional FITOPT to use. Match the label. Defaults to no FITOPT
+ FEATURES: zHD x1 c cERR x1ERR COV_x1_c COV_x1_x0 COV_c_x0 PKMJDERR # Columns to use. Defaults are shown.
+```
+
+#### Perfect Classifier
+
+Sometimes you want to cheat, and if you have simulations, this is easy. The perfect classifier looks into the sims to get the actual type, and will then assign probabilities as per your configuration. This classifier has no training mode, only predict.
+
+```yaml
+CLASSIFICATION:
+ PERFECT:
+ CLASSIFIER: PerfectClassifier
+ MODE: predict
+ OPTS:
+ PROB_IA: 1.0 # Probs to use for Ia events, default 1.0
+ PROB_CC: 0.0 # Probs to use for CC events, default 0.0
+```
+
+#### Unity Classifier
+
+To emulate a spectroscopically confirmed sample, or just to save time, we can assign every event a probability of 1.0 that it is a type Ia. As it just returns 1.0 for everything, it only has a predict mode
+
+```yaml
+CLASSIFICATION:
+ UNITY:
+ CLASSIFIER: UnityClassifier
+ MODE: predict
+```
+
+#### FitProb Classifier
+
+Another useful debug test is to just take the SALT2 fit probability calculated from the chi2 fitting and use that as our probability. You'd hope that classifiers all improve on this. Again, this classifier only has a predict mode.
+
+```yaml
+CLASSIFICATION:
+ FITPROBTEST:
+ CLASSIFIER: FitProbClassifier
+ MODE: predict
+```
diff --git a/docs/src/tasks/cosmofit.md b/docs/src/tasks/cosmofit.md
new file mode 100644
index 00000000..a63ff88e
--- /dev/null
+++ b/docs/src/tasks/cosmofit.md
@@ -0,0 +1,48 @@
+# 8. COSMOFIT
+
+CosmoFit is a generic cosmological fitting task, which allows you to choose between different fitters.
+
+The syntax is very simple:
+```yaml
+COSMOFIT:
+ COSMOMC:
+ SOMELABEL:
+ # CosmoMC options
+ WFIT:
+ SOMEOTHERLABEL:
+ # WFit options
+```
+
+## CosmoMC
+
+Launching CosmoMC is hopefully fairly simple. There are a list of provided configurations under the `cosmomc_templates` directory (inside `data_files`), and the main job of the user is to pick which one they want.
+
+```yaml
+COSMOFIT:
+ COSMOMC:
+ SOMELABEL:
+ MASK_CREATE_COV: mask # partial match
+ OPTS:
+ INI: sn_cmb_omw # should match the filename of an ini file
+ NUM_WALKERS: 8 # Optional, defaults to eight.
+
+ # Optional, covopts from CREATE_COV step to run against. If blank, you get them all. Exact matching.
+ COVOPTS: [ALL, NOSYS]
+```
+
+## WFit
+
+Launching WFit simply requires providing the command line options you want to use for each fit.
+```yaml
+COSMOFIT:
+ WFIT:
+ SOMELABEL:
+ MASK: mask # partial match
+ OPTS:
+ BATCH_INFO: sbatch path/to/SBATCH.TEMPLATE 10 # Last number is the number of cores
+ WFITOPT_GLOBAL: "-hsteps 61 -wsteps 101 -omsteps 81" # Optional, will apply these options to all fits"
+ WFITOPTS:
+ - /om_pri/ -ompri 0.31 -dompri 0.01 # At least one option is required. The name in the /'s is a human readable label
+ - /cmb_pri/ -cmb_sim -sigma_Rcmb 0.007 # Optionally include as many other fitopts as you want.
+
+```
diff --git a/docs/src/tasks/createcov.md b/docs/src/tasks/createcov.md
new file mode 100644
index 00000000..d92b97c0
--- /dev/null
+++ b/docs/src/tasks/createcov.md
@@ -0,0 +1,34 @@
+# 7. CREATE_COV
+
+Assuming the biascor task hasn't died, its time to prep for CosmoMC. To do this, we invoke a script from Dan originally (I think) that essentially creates all the input files and structure needed by CosmoMC. It provides a way of scaling systematics, and determining which covariance options to run with.
+
+```yaml
+CREATE_COV:
+ SOMELABEL:
+ MASK: some_biascor_task
+ OPTS:
+ INI_DIR: /path/to/your/own/dir/of/cosmomc/templates # Defaults to cosmomc_templates, which you can exploit using DATA_DIRS
+ SYS_SCALE: surveys/global/lcfit_fitopts/global.yml # Location of systematic scaling file, same as the FITOPTS file.
+ SINGULAR_BLIND: False # Defaults to False, whether different contours will have different shifts applied
+ BINNED: True # Whether to bin the SN or not for the covariance matrx. Defaults to True
+ REBINNED_X1: 2 # Rebin x1 into 2 bins
+ REBINNED_C: 4 # Rebin c into 4 bins
+ SUBTRACT_VPEC: False # Subtract VPEC contribution to MUERR if True. Used when BINNED: False
+ FITOPT_SCALES: # Optional
+ FITOPT_LABEL: some_scale # Note this is a partial match, ie SALT2: 1.0 would apply to all SALT2 cal fitopts
+ MUOPT_SCALES:
+ MUOPT_LABEL: some_scale # This is NOT a partial match, must be exact
+ COVOPTS: # Optional, and you'll always get an 'ALL' covopt. List format please
+ - "[NOSYS] [=DEFAULT,=DEFAULT]" # This syntax is explained below
+```
+
+If you don't specify `SYS_SCALE`, Pippin will search the LCFIT tasks from the BIASCOR dependency and if all LCFIT tasks have the same fitopt file, it will use that.
+
+The `COVOPTS` section is a bit odd. In the square brackets first, we have the label that will be assigned and used in the plotting output later. The next set of square backets is a two-tuple, and it applies to `[fitopts,muopts]` in that order. For example, to get four contours out of CosmoMC corresponding to all uncertainty, statistics only, statistics + calibration uncertainty, and fitopts + C11 uncertainty, we could set:
+
+```yaml
+COVOPTS:
+ - "[NOSYS] [=DEFAULT,=DEFAULT]"
+ - "[CALIBRATION] [+cal,=DEFAULT]"
+ - "[SCATTER] [=DEFAULT,=C11]"
+```
diff --git a/docs/src/tasks/dataprep.md b/docs/src/tasks/dataprep.md
new file mode 100644
index 00000000..3ebe00c6
--- /dev/null
+++ b/docs/src/tasks/dataprep.md
@@ -0,0 +1,189 @@
+# 0. DATAPREP
+
+The DataPrep task is simple - it is mostly a pointer for Pippin towards an external directory that contains some photometry, to say we're going to make use of it. Normally this means data files, though you can also use it to point to simulations that have already been run to save yourself the hassle of rerunning them. The other thing the DataPrep task will do is run the new method of determining a viable initial guess for the peak time, which will be used by the light curve fitting task down the road.
+
+## Example
+
+```yaml
+DATAPREP:
+ SOMENAME:
+ OPTS:
+
+ # Location of the photometry files
+ RAW_DIR: $DES_ROOT/lcmerge/DESALL_forcePhoto_real_snana_fits
+
+ # Specify which types are confirmed Ia's, confirmed CC or unconfirmed. Used by ML down the line
+ TYPES:
+ IA: [101, 1]
+ NONIA: [20, 30, 120, 130]
+
+ # Blind the data. Defaults to True if SIM:True not set
+ BLIND: False
+
+ # Defaults to False. Important to set this flag if analysing a sim in the same way as data, as there
+ # are some subtle differences
+ SIM: False
+
+ # The method of estimating peak mjd values. Don't ask me what numbers mean what, ask Rick.
+ OPT_SETPKMJD: 16
+
+```
+
+## Options
+
+Here is an exhaustive list of everything you can pass to `OPTS`.
+
+### RAW_DIR
+
+Syntax:
+
+```yaml
+OPTS:
+ RAW_DIR: path/to/photometry/files
+```
+
+Required: `True`
+
+Pippin simply stores the `RAW_DIR` and passes it to other tasks which need it.
+
+### OPT_SETPKMJD
+
+Syntax:
+
+```yaml
+OPTS:
+ OPT_SETPKMJD: 16
+```
+
+Default: `16`
+
+This option is used by `SNANA` to choose how peak MJD will be estimated. In general stick with the default unless you have a good reason not to.
+
+Options are chosen via a bitmask, meaning you add the associated number of each option you want to get your final option number. Details of the available options can be found in the [SNANA Manual](https://github.com/RickKessler/SNANA/blob/master/doc/snana_manual.pdf) in sections 4.34, 5.51, and Figure 11 (as of the time of writing). The sections describe in detail how `OPT_SETPKMJD` is used, whilst the figure shows all possible options.
+
+### PHOTFLAG_MSKREJ
+
+Syntax:
+
+```yaml
+OPTS:
+ PHOTFLAG_MSKREJ: 1016
+```
+
+Default: `1016`
+
+This specifies to SNANA which observations to reject based on `PHOTFLAG` bits. In general stick with the default unless you have a good reason not to.
+
+Details can be found in the [SNANA Manual](https://github.com/RickKessler/SNANA/blob/master/doc/snana_manual.pdf) in sections 12.2.6 and 12.4.9 (as of the time of writing).
+
+### SIM
+
+Syntax:
+
+```yaml
+OPTS:
+ SIM: False
+```
+
+Default: `False`
+
+Required: `True` (if working with simulated data)
+
+This simply passes a flag to later tasks about whether the data provided comes from real photometry or simulated photometry. It is important to specify this as the distincation matters down the line.
+
+### BLIND
+
+Syntax:
+
+```yaml
+OPTS:
+ BLIND: True
+```
+
+Default: `True`
+
+This passes a flag throughout all of Pippin that this data should be blinded. **If working with real data, only unblind when you are absolutely certain your analysis is ready!**
+
+### TYPES
+
+Syntax:
+
+```yaml
+OPTS:
+ TYPES:
+ IA: [101, 1]
+ NONIA: [20, 30, 120, 130]
+```
+
+Default:
+* `IA: [1]`
+* `NONIA: [2, 20, 21, 22, 29, 30, 31, 32, 33, 39, 40, 41, 42, 43, 80, 81]`
+
+This is the SNANA `SNTYPE` of your IA and NONIA supernovae. This is mostly used by the various classifiers available to Pippin.
+
+In general if a spectroscopicaly classified supernova type is given the `SNTYPE` of `n` then photometrically identified supernovae of the same (suspected) type is given the `SNTYPE` of `100 + n`. By default spectroscopically classified type Ia supernovae are given the `SNTYPE` of 1. The default `SNTYPE` of non-ia supernova is a bit more complicated but details can be found `$SNDATA_ROOT/models/NON1ASED/*/NONIA.LIST`. More detail can be found in the [SNANA Manual](https://github.com/RickKessler/SNANA/blob/master/doc/snana_manual.pdf) in sections 4.6 for type Ia, and 9.6 for non-ia supernovae.
+
+### BATCH_FILE
+
+Syntax:
+
+```yaml
+OPTS:
+ BATCH_FILE: path/to/bath_template.TEMPLATE
+```
+
+Default: `cfg.yml` -> `SBATCH: cpu_location`
+
+Which SBATCH template to use. By default this will use the cpu template from the main `cfg.yml`. More details can be found at :ref:`Changing SBATCH options`.
+
+### BATCH_REPLACE
+
+Syntax:
+
+```yaml
+OPTS:
+ BATCH_REPLACE:
+ KEY1: value
+ KEY2: value
+```
+
+Default: `None`
+
+Overwrite certain SBATCH keys. More details can be found at :ref:`Changing SBATCH options`.
+
+### PHOTFLAG_DETECT
+
+Syntax:
+
+```yaml
+OPTS:
+ PHOTFLAG_DETECT: 4096
+```
+
+Default: `None`
+
+An optional SNANA flag to add a given bit to every detection. Adding this optional flag willresult in the `NEPOCH_DETECT` (number of detections) and `TLIVE_DETECT` (time between first and last detection) columns to be added to the SNANA and FITRES tables. More details can be found in the [SNANA Manual](https://github.com/RickKessler/SNANA/blob/master/doc/snana_manual.pdf) in sections 4.18.1, 4.18.6, 4.36.5, and Figure 6 (at the time of writing).
+
+### CUTWIN_SNR_NODETECT
+
+```yaml
+OPTS:
+ CUTWIM_SNR_NODETECT: -100,10
+```
+
+Default: `None`
+
+Flag to tell SNANA to reject non-detection events with a signal to noise ratio below the min or above the max.
+
+### Output
+
+Within the `$PIPPIN_OUTPUT/JOB_NAME/0_DATAPREP` directory you will find a directory for each dataprep task. Here is an example of some of the files you might find in each directory:
+
+* `clump.nml`: The clump fit input generated by Pippin and passed to `snana.exe`.
+* `config.yml`: A config file used to store all the options specified and generate the hash.
+* `{RAW_DIR}.SNANA.TEXT`: The SNANA data file containing information on each supernova.
+* `{RAW_DIR}.YAML`: The SNANA yaml file describing statistics and information about the dataset.
+* `done.txt`: A file which should contain `SUCCESS` if the job was successfull and `FAILURE` if the job was not successfull.
+* `hash.txt`: The Pippin generated hash file which ensures only get reran if something changes.
+* `output.log`: A output produced from the SBATCH job, should include SNANA output as well.
+* `slurm.job`: The slurm job file which Pippin ran.
diff --git a/docs/src/tasks/lcfit.md b/docs/src/tasks/lcfit.md
new file mode 100644
index 00000000..ff412b03
--- /dev/null
+++ b/docs/src/tasks/lcfit.md
@@ -0,0 +1,36 @@
+# 2. LCFIT
+
+This task runs the SALT2 light curve fitting process on light curves from the simulation or DataPrep task. As with the stage, if something goes wrong, Pippin will attempt to give a good reason why. The task is specified like so:
+
+```yaml
+LCFIT:
+ SOMENAMEHERE:
+ # MASK means only apply this light curve fitting on sims/Dataprep which have DES in the name
+ # You can also specify a list for this, and they will be applied as a logical or
+ MASK: DES
+
+ # The base nml file used
+ BASE: surveys/des/lcfit_nml/des.nml
+
+ # FITOPTS can be left out for nothing, pointed to a file, specified manually or a combination of the two
+ # Normally this would be a single entry like global.yml shown below, but you can also pass a list
+ # If you specify a FITOPT manually, make sure it has the / around the label
+ # And finally, if you specify a file, make sure its a yml dictionary that links a survey name to the correct
+ # fitopts. See the file below for an example
+ FITOPTS:
+ - surveys/global/lcfit_fitopts/global.yml
+ - "/custom_extra_fitopt/ REDSHIFT_FINAL_SHIFT 0.0001"
+
+ # We can optionally customise keys in the FITINP section
+ FITINP:
+ FILTLIST_FIT: 'gri'
+
+ # And do the same for the optional SNLCINP section
+ SNLCINP:
+ CUTWIN_SNRMAX: 3.0, 1.0E8
+ CUTWIN_NFILT_SNRMAX: 3.0, 99.
+
+ # Finally, options that go outside either of these sections just go in the generic OPTS
+ OPTS:
+ BATCH_INFO: sbatch $SBATCH_TEMPLATES/SBATCH_Midway2_1hr.TEMPLATE 10
+```
diff --git a/docs/src/tasks/merge.md b/docs/src/tasks/merge.md
new file mode 100644
index 00000000..aef00aee
--- /dev/null
+++ b/docs/src/tasks/merge.md
@@ -0,0 +1,12 @@
+# 5. MERGE
+
+The merging task will take the outputs of the aggregation task, and put the probabilities from each classifier into the light curve fit results (FITRES files) using SNID.
+
+```yaml
+MERGE:
+ label:
+ MASK: mask # partial match on all sim, fit and agg
+ MASK_SIM: mask # partial match on sim
+ MASK_FIT: mask # partial match on lcfit
+ MASK_AGG: mask # partial match on aggregation task
+```
diff --git a/docs/src/tasks/sim.md b/docs/src/tasks/sim.md
new file mode 100644
index 00000000..69d35ae9
--- /dev/null
+++ b/docs/src/tasks/sim.md
@@ -0,0 +1,25 @@
+# 1. SIM
+
+The simulation task does exactly what you'd think it does. It invokes [SNANA](https://github.com/RickKessler/SNANA) to run some similation as per your configuration. If something goes wrong, Pippin tries to dig through the log files to give you a useful error message, but sometimes this is difficult (i.e. the logs have been zipped up). With the current version of SNANA, each simulation can have at most one Ia component, and an arbitrary number of CC components. The specification for the simulation task config is as follows:
+
+```yaml
+SIM:
+ SOMENAMEHERE:
+
+ # We specify the Ia component, so it must have IA in its name
+ IA_G10:
+ BASE: surveys/des/sims_ia/sn_ia_salt2_g10_des5yr.input # And then we specify the base input file which generates it.
+
+ # Now we can specify as many CC sims to mix in as we want
+ II_JONES:
+ BASE: surveys/des/sims_cc/sn_collection_jones.input
+
+ IAX:
+ BASE: surveys/des/sims_cc/sn_iax.input
+ DNDZ_ALLSCALE: 3.0 # Note you can add/overwrite keys like so for specific files
+
+ # This section will apply to all components of the sim
+ GLOBAL:
+ NGEN_UNIT: 1
+ RANSEED_REPEAT: 10 12345
+```
diff --git a/docs/src/usage.md b/docs/src/usage.md
new file mode 100644
index 00000000..7447d2cb
--- /dev/null
+++ b/docs/src/usage.md
@@ -0,0 +1,460 @@
+# Using Pippin
+
+```{figure} ../_static/images/console.gif
+:alt: Console Output
+
+The console output from a succesfull Pippin run. Follow these instructions and you too can witness a beautiful wall of green text!
+```
+
+Using Pippin is very simple. In the top level directory, there is a `pippin.sh`. If you're on Midway and use SNANA, this script will be in your path already. To use Pippin, all you need is a config file, examples of which can be found in the [configs directory](https://github.com/dessn/Pippin/tree/4fd0994bc445858bba83b2e9e5d3fcb3c4a83120/configs). Given the config file `example.yml`, simply run `pippin.sh example.yml` to invoke Pippin. This will create a new folder in the `OUTPUT: output_dir` path defined in the global [cfg.yml](https://github.com/dessn/Pippin/blob/4fd0994bc445858bba83b2e9e5d3fcb3c4a83120/cfg.yml) file. By default, this is set to the `$PIPPIN_OUTPUT` environment variable, so please either set said variable or change the associated line in the `cfg.yml`.
+
+
+ For the morbidly curious, here's a small demo video of using Pippin in the Midway environment
+
+```{eval-rst}
+.. youtube:: pCaPvzFCZ-Y
+ :width: 100%
+ :align: center
+```
+
+
+
+## Creating your own configuration file
+
+Each configuration file is represented by a yaml dictionary linking each stage to a dictionary of tasks, the key being the unique name for the task and the value being its specific task configuration.
+
+For example, to define a configuration with two simulations and one light curve fitting task (resulting in 2 output simulations and 2 output light curve tasks - one for each simulation), a user would define:
+
+```yaml
+SIM:
+ SIM_NAME_1:
+ SIM_CONFIG: HERE
+ SIM_NAME_2:
+ SIM_CONFIG: HERE
+
+LCFIT:
+ LCFIT_NAME_1:
+ LCFIT_CONFIG: HERE
+```
+
+Configuration detail for each tasks can be found in the section, with stage-specific example config files available in the [examples directory](https://github.com/dessn/Pippin/tree/4fd0994bc445858bba83b2e9e5d3fcb3c4a83120/examples)
+
+## What If I change my config file?
+
+Happens all the time, don't even worry about it. Just start Pippin again and run the file again. Pippin will detect
+any changes in your configuration by hashing all the input files to a specific task. So this means, even if you're
+config file itself doesn't change, changes to an input file it references (for example, the default DES simulation
+input file) would result in Pippin rerunning that task. If it cannot detect anything has changed, and if the task
+finished successfully the last time it was run, the task is not re-executed. You can force re-execution of tasks using the `-r` flag.
+
+## Command Line Arguments
+
+On top of this, Pippin has a few command line arguments, which you can detail with `pippin.sh -h`, but I'll also detail here:
+
+```
+ -h Show the help menu
+ -v, --verbose Verbose. Shows debug output. I normally have this option enabled.
+ -r, --refresh Refresh/redo - Rerun tasks that completed in a previous run even if the inputs haven't changed.
+ -c, --check Check that the input config is valid but don't actually run any tasks.
+ -s, --start Start at this task and refresh everything after it. Number of string accepted
+ -f, --finish Finish at this stage. For example -f 3 or -f CLASSIFY to run up to and including classification.
+ -p, --permission Fix permissions and groups on all output, don't rerun
+ -i, --ignore Do NOT regenerate/run tasks up to and including this stage.
+ -S, --syntax If no task is given, prints out the possible tasks. If a task name or number is given, prints the docs on that task. For instance 'pippin.sh -S 0' and 'pippin.sh -S DATAPREP' will print the documentation for the DATAPREP task.
+```
+
+For an example, to have a verbose output configuration run and only do data preparation and simulation,
+you would run
+
+`pippin.sh -v -f 1 configfile.yml`
+
+
+## Stages in Pippin
+
+You may have noticed above that each stage has a numeric idea for convenience and lexigraphical sorting.
+
+The current stages are:
+
+- : Data preparation
+- : Simulation
+- : Light curve fitting
+- : Classification (training and testing)
+- : Aggregation (comparing classifiers)
+- : Merging (combining classifier and FITRES output)
+- : Bias corrections using BBC
+- : Determine the systematic covariance matrix
+- : Fit hubble diagram and produce cosmological constraint
+- : Create final output and plots.
+
+## Pippin on Midway
+
+On midway, sourcing the SNANA setup will add environment variables and Pippin to your path.
+
+Pippin itself can be found at `$PIPPIN`, output at `$PIPPIN_OUTPUT` (which goes to a scratch directory), and `pippin.sh` will automatically work from
+any location.
+
+Note that you only have 100 GB on scratch. If you fill that up and need to nuke some files, look both in `$SCRATCH_SIMDIR` to remove SNANA
+photometry and `$PIPPIN_OUTPUT` to remove Pippin's output. I'd recommend adding this to your `~/.bashrc` file to scan through directories you own and
+calculate directory size so you know what's taking the most space. After adding this and sourcing it, just put `dirusage` into the terminal
+in both of those locations and see what's eating your quota.
+
+```sh
+function dirusage {
+ for file in $(ls -l | grep $USER | awk '{print $NF}')
+ do
+ du -sh "$file"
+ done
+}
+```
+
+## Pippin on Perlmutter
+
+On perlmutter, add `source /global/cfs/cdirs/lsst/groups/TD/setup_td.sh` to your `~/.bashrc` to load all the relevant paths and environment variables.
+
+This will add the `$PIPPIN_DIR` path for Pippin source code, and `$PIPPIN_OUTPUT` for the output of Pippin jobs. Additionally `pippin.sh` can be run from any directory.
+
+To load the perlmutter specific `cfg.yml` you must add the following to the start of your Pippin job:
+```yaml
+GLOBAL:
+ CFG_PATH: $SNANA_LSST_ROOT/starterKits/pippin/cfg_lsst_perlmutter.yml
+```
+
+## Examples
+
+If you want detailed examples of what you can do with Pippin tasks, have a look in the [examples directory](https://github.com/dessn/Pippin/tree/4fd0994bc445858bba83b2e9e5d3fcb3c4a83120/examples), pick the task you want to know more about, and have a look over all the options.
+
+Here is a very simple configuration file which runs a simulation, does light curve fitting, and then classifies it using the debug FITPROB classifier.
+
+```yaml
+SIM:
+ DESSIM:
+ IA_G10_DES3YR:
+ BASE: surveys/des/sim_ia/sn_ia_salt2_g10_des3yr.input
+
+LCFIT:
+ BASEDES:
+ BASE: surveys/des/lcfit_nml/des_5yr.nml
+
+CLASSIFICATION:
+ FITPROBTEST:
+ CLASSIFIER: FitProbClassifier
+ MODE: predict
+```
+
+You can see that unless you specify a `MASK` on each subsequent task, Pippin will generally try and run everything on everything. So if you have two simulations defined, you don't need two light curve fitting tasks, Pippin will make one light curve fit task for each simulation, and then two classification tasks, one for each light curve fit task.
+
+## Best Practice
+
+Here are a few best practices for improving your chance of success with Pippin.
+
+### Use `screen`
+
+Pippin jobs can take a long time, so to avoid having to keep a terminal open and an ssh session active for the length of the entire run, it is *highly recommended* you run Pippin in a `screen` session.
+
+For example, if you are doing machine-learning testing, you may create a new screen session called `ml` by running `screen -S ml`. It will then launch a new instance of bash for you to play around in. Conda will **not work out of the box**. To make it work again, run `conda deactivate` and then `conda activate`, and you can check this works by running `which python` and verifying its pointing to the miniconda install. You can then run Pippin as per normal: `pippin.sh -v your_job.yml` and get the coloured output. To leave the screen session, but **still keep Pippin running even after you log out**, press `Ctrl-A`, `Ctrl-D`. As in one, and then the other, not `Ctrl-A-D`. This will detach from your screen session but keep it running. Just going `Ctrl_D` will disconnect and shut it down. To get back into your screen session, simply run `screen -r ml` to reattach. You can see your screen sessions using `screen -ls`.
+
+You may notice if you log in and out of midway that your screen sessions might not show up. This is because midway has multiple head nodes, and your screen session exists only on one of them. This is why when I ssh to midway I specify a specific login node instead of being assigned one. To make it simpler, I'd recommend setting your ssh host in your `.ssh/config` to something along the lines of:
+
+```sh
+Host midway2
+ HostName midway2-login1.rcc.uchicago.edu
+ User username
+```
+
+### Make the most of command line options
+
+There are a number of command line options that are particularly useful. Foremost amongst them is `-v, --verbose` which shows debug output when running Pippin. Including this flag in your run makes it significantly easier to diagnose if anything goes wrong.
+
+The next time saving flag is `-c, --check`, which will do an initial passthrough of your input yaml file, pointing out any obvious errors before anything runs. This is particularly useful if you have long jobs and want to catch bugs early.
+
+The final set of useful flags are the `-s, --start`, `-f, --finish`, and `-i, --ignore`. These allow you to customize exactly what parts of your full job Pippin runs. Pippin decides whether or not it should rerun a task based on a hash generated each time it's run. This hash produced based on the input, these flags are particularly useful if you change your input but *don't want stages to rerun*, such as if you are making small changes to a final stage, or debugging an early stage.
+
+## Advanced Usage
+
+The following are a number of advanced features which aren't required to use Pippin but can drastically improve your experience with Pippin.
+
+### Yaml Anchors
+
+If you are finding that your config files contain lots of duplicated sections (for example, many simulations configured almost the same way, but with one difference), consider using yaml anchors. A thorough explanation of how to use them is available [here](https://blog.daemonl.com/2016/02/yaml.html), however the basics are as follows. First you should add a new taml section at the tope of your input file. The name of this section doesn't matter as long as it doesn't clash with other Pippin stages, however I usually use `ALIAS`. Within this section, you include all of the yaml anchors you need. An example is shown below:
+
+```yaml
+ALIAS:
+ LOWZSIM_IA: &LOWZSIM_IA
+ BASE: surveys/lowz/sims_ia/sn_ia_salt2_g10_lowz.input
+
+SIM:
+ SIM_1:
+ IA_G10_LOWZ:
+ <<: *LOWZSIM_IA
+ # Other options here
+ SIM_2:
+ IA_G10_LOWZ:
+ <<: *LOWZSIM_IA
+ # Different options here
+```
+
+### Include external aliases
+
+**This is new and experimental, use with caution**.
+
+*Note that this is* **not** *yaml compliant*.
+
+When dealing with especially large jobs, or suites of jobs you might find yourself having very large `ALIAS`/`ANCHOR` blocks which are repated amongst a number of Pippin jobs. A cleaner alternative is to have a number of `.yml` files containing your anchors, and then `including` these in your input files which will run Pippin jobs. This way you can share anchors amongst multiple Pippin input files and update them all at the same time. In order to achieve this, Pippin can *preprocess* the input file to directly copy the anchor file into the job file. An example is provided below:
+
+`base_job_file.yml`
+```yaml
+# Values surround by % indicate preprocessing steps.
+# The preprocess below will copy the provided yml files into this one before this one is read in, allowing anchors to propegate into this file
+# They will be copied in, in the order you specify, with duplicate tasks merging.
+# Note that whitespace before or after the % is fine, as long as % is the first and last character.
+
+# % include: path/to/anchors_sim.yml %
+# %include: path/to/anchors_lcfit.yml%
+
+SIM:
+ DESSIM:
+ IA_G10_DES3YR:
+ BASE: surveys/des/sims_ia/sn_ia_salt2_g10_des3yr.input
+ GLOBAL:
+ # Note that this anchor doesn't exist in this file
+ <<: *SIM_GLOBAL
+ LCSIM:
+ IA_G10_LOWZ:
+ BASE: surveys/lowz/sims_ia/sn_ia_salt2_g10_lowz.input
+ GLOBAL:
+ # Note that this anchor doesn't exist in this file
+ <<: *SIM_GLOBAL
+
+LCFIT:
+ LS:
+ BASE: surveys/lowz/lcfit_nml/lowz.nml
+ MASK: DATALOWZ
+ FITOPTS: surveys/lowz/lcfit_fitopts/lowz.yml
+ # Note that this anchor doesn't exist in this file
+ <<: *LCFIT_OPTS
+
+ DS:
+ BASE: surveys/des/lcfit_nml/des_3yr.nml
+ MASK: DATADES
+ FITOPTS: surveys/des/lcfit_fitopts/des.yml
+ # Note that this anchor doesn't exist in this file
+ <<: *LCFIT_OPTS
+```
+
+`anchors_sim.yml`
+```yaml
+ANCHORS_SIM:
+ SIM_GLOBAL: &SIM_GLOBAL
+ W0_LAMBDA: -1.0
+ OMEGA_MATTER: 0.3
+ NGEN_UNIT: 0.1
+```
+
+`anchors_lcfit.yml`
+```yaml
+ANCHORS_LCFIT:
+ LCFIT_OPTS: &LCFIT_OPTS
+ SNLCINP:
+ USE_MINOS: F
+```
+
+This will be preprocessed to produce the following yaml file, which pippin will then run on.
+
+`final_pippin_input.yml`
+```yaml
+# Original input file: path/to/base_job_file.yml
+# Values surround by % indicate preprocessing steps.
+# The preprocess below will copy the provided yml files into this one before this one is read in, allowing anchors to propegate into this file
+# They will be copied in, in the order you specify, with duplicate tasks merging.
+# Note that whitespace before or after the % is fine, as long as % is the first and last character.
+
+# Anchors included from path/to/anchors_sim.yml
+ANCHORS_SIM:
+ SIM_GLOBAL: &SIM_GLOBAL
+ W0_LAMBDA: -1.0
+ OMEGA_MATTER: 0.3
+ NGEN_UNIT: 0.1
+
+# Anchors included from path/to/anchors_lcfit.yml
+ANCHORS_LCFIT:
+ LCFIT_OPTS: &LCFIT_OPTS
+ SNLCINP:
+ USE_MINOS: F
+
+SIM:
+ DESSIM:
+ IA_G10_DES3YR:
+ BASE: surveys/des/sims_ia/sn_ia_salt2_g10_des3yr.input
+ GLOBAL:
+ <<: *SIM_GLOBAL
+ LCSIM:
+ IA_G10_LOWZ:
+ BASE: surveys/lowz/sims_ia/sn_ia_salt2_g10_lowz.input
+ GLOBAL:
+ <<: *SIM_GLOBAL
+
+LCFIT:
+ LS:
+ BASE: surveys/lowz/lcfit_nml/lowz.nml
+ MASK: DATALOWZ
+ FITOPTS: surveys/lowz/lcfit_fitopts/lowz.yml
+ <<: *LCFIT_OPTS
+
+ DS:
+ BASE: surveys/des/lcfit_nml/des_3yr.nml
+ MASK: DATADES
+ FITOPTS: surveys/des/lcfit_fitopts/des.yml
+ <<: *LCFIT_OPTS
+```
+
+Now you can include the `anchors_sim.yml` and `anchors_lcfit.yml` anchors in any pippin job you want, and need only update those anchors once. There are a few caveats to this to be aware of. The preprocessing does not checking to ensure the given file is valid yaml, it simply copies the yaml directly in. As such you should always ensure that the name of your anchor block is unique, any duplicates will mean whichever block is lowest will overwrite all other blocks of the same name. Additionally, whilst you could technically use this to store Pippin task blocks in external yml files, this is discouraged as this feature was only intended for anchors and aliases.
+
+
+### Use external results
+
+Often times you will want to reuse the results of one Pippin job in other Pippin jobs, for instance reusing a biascor sim so you don't need to resimulate every time. This can be accomplished via the `EXTERNAL`, `EXTERNAL_DIRS`, and `EXTERNAL_MAP` keywords.
+
+There are in essense two ways of including external tasks. Both operate the same way, one is just a bit more explicit than the other. The explicit way is when adding a task that is an *exact* replica of an external task, you can just add the `EXTERNAL` keyword. For example, in the reference 5YR analysis, all the biascor sims are precomputed, so we can define them as external tasks like this:
+
+```yaml
+SIM:
+ DESSIMBIAS5YRIA_C11: # A SIM task we don't want to rerun
+ EXTERNAL: $PIPPIN_OUTPUT/GLOBAL/1_SIM/DESSIMBIAS5YRIA_C11 # The path to a matching external SIM task, which is already finished
+ DESSIMBIAS5YRIA_G10:
+ EXTERNAL: $PIPPIN_OUTPUT/GLOBAL/1_SIM/DESSIMBIAS5YRIA_G10
+ DESSIMBIAS5YRCC:
+ EXTERNAL: $PIPPIN_OUTPUT/GLOBAL/1_SIM/DESSIMBIAS5YRCC
+```
+
+In this case, we use the `EXTERNAL` keyword because each of the three defined tasks can only be associated with one, and only one, `EXTERNAL` task. Because `EXTERNAL` tasks are one-to-one with a defined task, the name of the defined task, and the `EXTERNAL` task do not need to match.
+
+Suppose we don't want to recompute the light curve fits. After all, most of the time we're not changing that step anyway! However, unlike `SIM`, `LCFIT` runs multiple sub-tasks - one for each `SIM` task you are performing lightcurve fitting on.
+
+```yaml
+LCFIT:
+ D: # An LCFIT task we don't want to rerun
+ BASE: surveys/des/lcfit_nml/des_5yr.nml
+ MASK: DESSIM # Selects a subset of SIM tasks to run lightcurve fitting on
+ # In this case, the SIM tasks are DESSIMBIAS5YRIA_C11, DESSIMBIAS5YRIA_G10, and DESSIMBIAS5YRCC
+ EXTERNAL_DIRS:
+ - $PIPPIN_OUTPUT/GLOBAL/2_LCFIT/D_DESSIMBIAS5YRIA_C11 # Path to a previously run LCFIT sub-task
+ - $PIPPIN_OUTPUT/GLOBAL/2_LCFIT/D_DESSIMBIAS5YRIA_G10
+ - $PIPPIN_OUTPUT/GLOBAL/2_LCFIT/D_DESSIMBIAS5YRCC
+```
+
+That is, we have one `LCFIT` task, but because we have three sims going into it and matching the mask, we can't point to a single `EXTERNAL` task. Instead, we provide an external path for each sub-task, as defined in `EXTERNAL_DIRS`. The name of each external sub-task must exactly match the `LCFIT` task name, and the `SIM` sub-task name. For example, the path to the `DESSIMBIAS5YRIA_C11` lightcurve fits, must be `D_DESSIMBIAS5YRIA_C11`.
+
+Note that you still need to point to the right base file, because Pippin still wants those details. It won't be submitted anywhere though, just loaded in.
+
+To use `EXTERNAL_DIRS` on pre-computed tasks that don't follow your current naming scheme (i.e the `LCFIT` task name, or the `SIM` sub-task names differ), you can make use of `EXTERNAL_MAP` to provide a mapping between the `EXTERNAL_DIR` paths, and each `LCFIT` sub-task.
+
+```yaml
+LCFIT:
+ D: # An LCFIT task we don't want to rerun
+ BASE: surveys/des/lcfit_nml/des_5yer.nml
+ MASK: DESSIM # Selects a subset of SIM tasks to run lightcurve fitting on
+ EXTERNAL_DIRS: # Paths to external LCFIT tasks, which do not have an exact match with this task
+ - $PIPPIN_OUTPUT/EXAMPLE_C11/2_LCFIT/DESFIT_SIM
+ - $PIPPIN_OUTPUT/EXAMPLE_G10/2_LCFIT/DESFIT_SIM
+ - $PIPPIN_OUTPUT/EXAMPLE/2_LCFIT/DESFIT_CCSIM
+ EXTERNAL_MAP:
+ # LCFIT_SIM: EXTERNAL_MASK
+ D_DESSIMBIAS5YRIA_C11: EXAMPLE_C11 # In this case we are matching to the pippin job name, as the LCFIT task name is shared between two EXTERNAL_DIRS
+ D_DESSIMBIAS5YRIA_G10: EXAMPLE_G10 # Same as C11
+ D_DESSIMBIAS5YRCC: DESFIT_CCSIM # In this case we match to the LCFIT task name, as the pippin job name (EXAMPLE) would match with the other EXTERNAL_DIRS
+```
+
+The flexibility of `EXTERNAL_DIRS` means you can mix both precomputed and non-precomputed tasks together. Take this classificaiton task:
+
+```yaml
+CLASSIFICATION:
+ SNNTEST:
+ CLASSIFIER: SuperNNovaClassifier
+ MODE: predict
+ OPTS:
+ MODEL: $PIPPIN_OUTPUT/GLOBAL/3_CLAS/SNNTRAIN_DESTRAIN/model.pt
+ EXTERNAL_DIRS:
+ - $PIPPIN_OUTPUT/GLOBAL/3_CLAS/SNNTEST_DESSIMBIAS5YRIA_C11_SNNTRAIN_DESTRAIN
+ - $PIPPIN_OUTPUT/GLOBAL/3_CLAS/SNNTEST_DESSIMBIAS5YRIA_G10_SNNTRAIN_DESTRAIN
+ - $PIPPIN_OUTPUT/GLOBAL/3_CLAS/SNNTEST_DESSIMBIAS5YRCC_SNNTRAIN_DESTRAIN
+```
+
+It will load in the precomputed classification results for the biascor sims, and then also run and generate classification results on any other simulation tasks (such as running on the data) using the pretrained model `model.pt`.
+
+Finally, the way this works under the hood is simple - it copies the directory over explicitly. And it will only copy once, so if you want the "latest version" just ask the task to refresh (or delete the folder). Once it copies it, there is no normal hash checking, it reads in the `config.yml` file created by the task in its initial run and powers onwards.
+
+If you have any issues using this new feature, check out the `ref_des_5yr.yml` file or flick me a message.
+
+### Changing SBATCH options
+
+Pippin has sensible defaults for the sbatch options of each task, however it is possible you may sometimes want to overwrite some keys, or even replace the sbatch template entirely. You can do this via the `BATCH_REPLACE`, and `BATCH_FILE` options respectively.
+
+In order to overwrite the default batch keys, add the following to any task which runs a batch job:
+
+```yaml
+BATCH_REPLACE:
+ REPLACE_KEY1: value
+ REPLACE_KEY2: value
+```
+
+Possible options for `BATCH_REPLACE` are:
+
+* `REPLACE_NAME`: `--job-name`
+* `REPLACE_LOGFILE`: `--output`
+* `REPLACE_WALLTIME`: `--time`
+* `REPLACE_MEM`: `--mem-per-cpu`
+
+Note that changing these could have unforseen consequences, so use at your own risk.
+
+If replacing these keys isn't enough, you are able to create you own sbatch templates and get Pippin to use them. This is useful if you want to change the partition, or add some additional code which runs before the Pippin job. Note that your template **must** contain the keys listed above in order to work properly. In addition you **must** have `REPLACE_JOB` at the bottom of your template file, otherwise Pippin will not be able to load it's jobs into your template. An example template is as follows:
+
+```bash
+#!/bin/bash
+
+#SBATCH -p broadwl-lc
+#SBATCH --account=pi-rkessler
+#SBATCH --job-name=REPLACE_NAME
+#SBATCH --output=REPLACE_LOGFILE
+#SBATCH --time=REPLACE_WALLTIME
+#SBATCH --nodes=1
+#SBATCH --mem-per-cpu=REPLACE_MEM
+echo $SLURM_JOB_ID starting execution `date` on `hostname`
+
+REPLACE_JOB
+```
+
+To have Pippin use your template, simply add the following to your task:
+
+```yaml
+BATCH_FILE: path/to/your/batch.TEMPLATE
+```
+
+## FAQ
+
+### Pippin is crashing on some task and the error message isn't useful
+
+Feel free to send me the log and stack, and I'll see what I can do turn the exception into something more human-readable.
+
+### I want to modify a ton of files but don't want huge yml files, please help
+
+You can modify input files and put them in a directory you own, and then tell Pippin to look there (in addition to the default location) when its constructing your tasks. To do this, see [this example here](https://github.com/dessn/Pippin/blob/4fd0994bc445858bba83b2e9e5d3fcb3c4a83120/examples/global.yml), or use this code snippet at the top of your YAML file (not that it matters if it's at the top):
+
+```yaml
+GLOBAL:
+ DATA_DIRS:
+ - /some/new/directory/with/your/files/in/it
+```
+
+### I want to use a different cfg.yml file!
+
+```yaml
+GLOBAL:
+ CFG_PATH: /your/path/here
+```
+
+### Stop rerunning my sims!
+
+For big biascor sims it can be frustrating if you're trying to tweak biascor or later stages and sims kick off
+because of some trivial change. So use the `--ignore` ro `-i` command to ignore any undone tasks or tasks with
+hash disagreements in previous stages. To clarify, even tasks that do not have a hash, and have never been submitted, will
+not be run if that stage is set to be ignored.
diff --git a/docs/tasks/dataprep.rst b/docs/tasks/dataprep.rst
deleted file mode 100644
index 751f8a42..00000000
--- a/docs/tasks/dataprep.rst
+++ /dev/null
@@ -1,203 +0,0 @@
-###########
-0. DATAPREP
-###########
-
-The DataPrep task is simple - it is mostly a pointer for Pippin towards an external directory that contains some photometry, to say we're going to make use of it. Normally this means data files, though you can also use it to point to simulations that have already been run to save yourself the hassle of rerunning them. The other thing the DataPrep task will do is run the new method of determining a viable initial guess for the peak time, which will be used by the light curve fitting task down the road.
-
-It does this by generating a ``clump.nml`` file and running ``snana.exe clump.nml``.
-
-Example
-=======
-
-.. code-block:: yaml
-
- DATAPREP:
- SOMENAME:
- OPTS:
-
- # Location of the photometry files
- RAW_DIR: $DES_ROOT/lcmerge/DESALL_forcePhoto_real_snana_fits
-
- # Specify which types are confirmed Ia's, confirmed CC or unconfirmed. Used by ML down the line
- TYPES:
- IA: [101, 1]
- NONIA: [20, 30, 120, 130]
-
- # Blind the data. Defaults to True if SIM:True not set
- BLIND: False
-
- # Defaults to False. Important to set this flag if analysing a sim in the same way as data, as there
- # are some subtle differences
- SIM: False
-
-Options
-=======
-
-Here is an exhaustive list of everything you can pass to ``OPTS``
-
-RAW_DIR
------------
-
-Syntax:
-
-.. code-block:: yaml
-
- OPTS:
- RAW_DIR: path/to/photometry/files
-
-Required: ``True``
-
-Pippin simply stores the ``RAW_DIR`` and passes it to other tasks which need it.
-
-OPT_SETPKMJD
------------------
-
-Syntax:
-
-.. code-block:: yaml
-
- OPTS:
- OPT_SETPKMJD: 16
-
-Default: ``16``
-
-This option is used by ``SNANA`` to choose how peak MJD will be estimated. In general stick with the default unless you have a good reason not to.
-
-Options are chosen via a bitmask, meaning you add the associated number of each option you want to get your final option number. Details of the available options can be found in the `SNANA Manual `_ in sections 4.34, 5.51, and Figure 11 (as of the time of writing). The sections describe in detail how ``OPT_SETPKMJD`` is used, whilst the figure shows all possible options.
-
-PHOTFLAG_MSKREJ
--------------------
-
-Syntax:
-
-.. code-block:: yaml
-
- OPTS:
- PHOTFLAG_MSKREJ: 1016
-
-Default: ``1016``
-
-This specifies to SNANA which observations to reject based on ``PHOTFLAG`` bits. In general stick with the default unless you have a good reason not to.
-
-Details can be found in the `SNANA Manual `_ in sections 12.2.6 and 12.4.9 (as of the time of writing).
-
-SIM
---------
-
-Syntax:
-
-.. code-block:: yaml
-
- OPTS:
- SIM: False
-
-Default: ``False``
-
-Required: ``True`` (if working with simulated data)
-
-This simply passes a flag to later tasks about whether the data provided comes from real photometry or simulated photometry. It is important to specify this as the distincation matters down the line.
-
-BLIND
----------
-
-Syntax:
-
-.. code-block:: yaml
-
- OPTS:
- BLIND: True
-
-Default: ``True``
-
-This passes a flag throughout all of Pippin that this data should be blinded. **If working with real data, only unblind when you are absolutely certain your analysis is ready!**
-
-TYPES
----------
-
-Syntax:
-
-.. code-block:: yaml
-
- OPTS:
- TYPES:
- IA: [101, 1]
- NONIA: [20, 30, 120, 130]
-
-Default:
-
-* ``IA: [1]``
-* ``NONIA: [2, 20, 21, 22, 29, 30, 31, 32, 33, 39, 40, 41, 42, 43, 80, 81``
-
-This is the SNANA ``SNTYPE`` of your IA and NONIA supernovae. This is mostly used by the various classifiers available to Pippin.
-
-In general if a spectroscopicaly classified supernova type is given the ``SNTYPE`` of ``n`` then photometrically identified supernovae of the same (suspected) type is given the ``SNTYPE`` of ``100 + n``. By default spectroscopically classified type Ia supernovae are given the ``SNTYPE`` of 1. The default ``SNTYPE`` of non-ia supernova is a bit more complicated but details can be found ``$SNDATA_ROOT/models/NON1ASED/*/NONIA.LIST``. More detail can be found in the `SNANA Manual `_ in sections 4.6 for type Ia, and 9.6 for non-ia supernovae.
-
-BATCH_FILE
---------------
-
-Syntax:
-
-.. code-block:: yaml
-
- OPTS:
- BATCH_FILE: path/to/bath_template.TEMPLATE
-
-Default: ``cfg.yml`` -> ``SBATCH: cpu_location``
-
-Which SBATCH template to use. By default this will use the cpu template from the main ``cfg.yml``. More details can be found at :ref:`Changing SBATCH options`.
-
-BATCH_REPLACE
-------------------
-
-Syntax:
-
-.. code-block:: yaml
-
- OPTS:
- BATCH_REPLACE:
- KEY1: value
- KEY2: value
-
-Default: ``None``
-
-Overwrite certain SBATCH keys. More details can be found at :ref:`Changing SBATCH options`.
-
-PHOTFLAG_DETECT
----------------------
-
-Syntax:
-
-.. code-block:: yaml
-
- OPTS:
- PHOTFLAG_DETECT: 4096
-
-Default: ``None``
-
-An optional SNANA flag to add a given bit to every detection. Adding this optional flag willresult in the ``NEPOCH_DETECT`` (number of detections) and ``TLIVE_DETECT`` (time between first and last detection) columns to be added to the SNANA and FITRES tables. More details can be found in the `SNANA Manual `_ in sections 4.18.1, 4.18.6, 4.36.5, and Figure 6 (at the time of writing).
-
-CUTWIN_SNR_NODETECT
-------------------------
-
-.. code-block:: yaml
-
- OPTS:
- CUTWIM_SNR_NODETECT: -100,10
-
-Default: ``None``
-
-Flag to tell SNANA to reject non-detection events with a signal to noise ratio below the min or above the max.
-
-Output
-======
-
-Within the ``$PIPPIN_OUTPUT/JOB_NAME/0_DATAPREP`` directory you will find a directory for each dataprep task. Here is an example of some of the files you might find in each directory:
-
-* ``clump.nml``: The clump fit input generated by Pippin and passed to ``snana.exe``.
-* ``config.yml``: A config file used to store all the options specified and generate the hash.
-* ``{RAW_DIR}.SNANA.TEXT``: The SNANA data file containing information on each supernova.
-* ``{RAW_DIR}.YAML``: The SNANA yaml file describing statistics and information about the dataset.
-* ``done.txt``: A file which should contain ``SUCCESS`` if the job was successfull and ``FAILURE`` if the job was not successfull.
-* ``hash.txt``: The Pippin generated hash file which ensures only get reran if something changes.
-* ``output.log``: A output produced from the SBATCH job, should include SNANA output as well.
-* ``slurm.job``: The slurm job file which Pippin ran.
diff --git a/docs/usage.rst b/docs/usage.rst
deleted file mode 100644
index ed087b09..00000000
--- a/docs/usage.rst
+++ /dev/null
@@ -1,377 +0,0 @@
-############
-Using Pippin
-############
-
-Using Pippin is very simple. In the top level directory, there is a ``pippin.sh`` script. If you're on midway and use SNANA, this script will be on your path already. Otherwise you can add it to your path by adding the following to your ``.bashrc``:
-
-.. code-block:: sh
-
- export PATH=$PATH:"path/to/pippin"
-
-To use Pippin, all you need is a config file ready to go. I've got a bunch of mine and some general ones in the configs directory, but you can put yours wherever you want. I recommend adding your initials to the front of the file to make it obvious in the shared output directory which folders as yours.
-
-If you have ``example.yml`` as your config file and want pippin to run it, simply run ``pippin.sh example.yml``.
-
-The file name that you pass in should contain a run configuration. Note that this is different to the global software configuration file ``cfg.yml``, and remember to ensure that your ``cfg.yml`` file is set up properly and that you know where you want your output to be installed. By default, I assume that the ``$PIPPIN_OUTPUT`` environment variable is set as the output location, so please either set said variable or change the associated line in the cfg.yml. For the morbidly curious, `here `__ is a very small demo video of using Pippin in the Midway environment.
-
-.. image:: _static/images/console.gif
-
-Creating your own configuration file
-=====================================
-
-Each configuration file is represented by a yaml dictionary linking each stage (see stage declaration section below) to a dictionary of tasks, the key being the unique name for the task and the value being its specific task configuration.
-
-For example, to define a configuration with two simulations and one light curve fitting task (resulting in 2 output simulations and 2 output light curve tasks - one for each simulation), a user would define:
-
-.. code-block:: yaml
-
- SIM:
- SIM_NAME_1:
- SIM_CONFIG: HERE
- SIM_NAME_2:
- SIM_CONFIG: HERE
-
- LCFIT:
- LCFIT_NAME_1:
- LCFIT_CONFIG: HERE
-
-The available tasks and their configuration details can be found in the :doc:`Tasks ` section. Alternatively, you can see examples in the ``examples`` directory for each task.
-
-Command Line Arguments
-=======================
-
-Pippin has a number of useful command line arguments which you can quickly reference via ``pippin.sh -h``.
-
-.. code-block:: text
-
- -h, --help show this help message and exit
- --config CONFIG Location of global config (i.e. cfg.yml)
- -v, --verbose increase output verbosity
- -s START, --start START
- Stage to start and force refresh. Accepts either the
- stage number or name (i.e. 1 or SIM)
- -f FINISH, --finish FINISH
- Stage to finish at (it runs this stage too). Accepts
- either the stage number or name (i.e. 1 or SIM)
- -r, --refresh Refresh all tasks, do not use hash
- -c, --check Check if config is valid
- -p, --permission Fix permissions and groups on all output, don't rerun
- -i IGNORE, --ignore IGNORE
- Dont rerun tasks with this stage or less. Accepts
- either the stage number of name (i.e. 1 or SIM)
- -S [SYNTAX], --syntax [SYNTAX]
- Get the syntax of the given stage. Accepts either the
- stage number or name (i.e. 1 or SIM). If run without
- argument, will tell you all stage numbers / names.
- -C, --compress Compress pippin output during job. Combine with -c /
- --check in order to compress completed pippin job.
- -U, --uncompress Do not compress pippin output during job. Combine
- with -c / --check in order to uncompress completed
- pippin job. Mutually exclusive with -C / --compress
-
-As an example, to have a verbose output configuration run and only do data preperation and simulation, you would run ``pippin.sh -vf 1 configfile.yml``.
-
-Pippin on Midway
-=================
-
-On midway, sourcing the SNANA setup will add environment variables and Pippin to your path.
-
-Pippin itself can be found at ``$PIPPIN``, output at ``$PIPPIN_OUTPUT`` (which goes to a scratch directory), and ``pippin.sh`` will automatically work from any location.
-
-Note that you only have 100 GB on scratch. If you fill that up and need to nuke some files, look both in ``$SCRATCH_SIMDIR`` to remove SNANA photometry and ``$PIPPIN_OUTPUT`` to remove Pippin's output. Running the ``dirusage`` command on midway will (after some time) give you a list of which directories are taking up the most space.
-
-Examples
-========
-
-If you want detailed examples of what you can do with Pippin tasks, have a look in the `examples directory `__, pick the task you want to know more about, and have a look over all the options.
-
-Here is a very simple configuration file which runs a simulation, does light curve fitting, and then classifies it useing the debug FITPROB classifier.
-
-.. code-block:: yaml
-
- SIM:
- DESSIM:
- IA_G10_DES3YR:
- BASE: surveys/des/sim_ia/sn_ia_salt2_g10_des3yr.input
-
- LCFIT:
- BASEDES:
- BASE: surveys/des/lcfit_nml/des_5yr.nml
-
- CLASSIFICATION:
- FITPROBTEST:
- CLASSIFIER: FitProbClassifier
- MODE: predict
-
-You can see that unless you specify a ``MASK`` on each subsequent task, Pippin will generally try and run everything on everything. So if you have two simulations defined, you don't need two light curve fitting tasks, Pippin will make one light curve fit task for each simulation, and then two classification tasks, one for each light curve fit task.
-
-Best Practice
-==============
-
-Here are a few best practices for improving your chance of success with Pippin.
-
-Use ``screen``
----------------
-
-Pippin jobs can take a long time, so to avoid having to keep a terminal open and an ssh session active for the length of the entire run, it is *highly recommended* you run Pippin in a ``screen`` session.
-
-For example, if you are doing machine-learning testing, you may create a new screen session called ml by running ``screen -S ml``. It will then launch a new instance of bash for you to play around in. conda will **not work out of the box**. To make it work again, run ``conda deactivate`` and then ``conda activate``, and you can check this works by running ``which python`` and verifying its pointing to the miniconda install. You can then run Pippin as per normal: ``pippin.sh -v your_job.yml`` and get the coloured output. To leave the screen session, but **still keep Pippin running even after you log out**, press ``Ctrl-A``, ``Ctrl-D``. As in one, and then the other, not ``Ctrl-A-D``. This will detach from your screen session but keep it running. Just going ``Ctrl_D`` will disconnect and shut it down. To get back into your screen session, simply run ``screen -r ml`` to reattach. You can see your screen sessions using ``screen -ls``.
-
-You may notice if you log in and out of midway that your screen sessions might not show up. This is because midway has multiple head nodes, and your screen session exists only on one of them. This is why when I ssh to midway I specify a specific login node instead of being assigned one. To make it simpler, I'd recommend setting your ssh host in your ``.ssh/config`` to something along the lines of:
-
-.. code-block:: sh
-
- Host midway2
- HostName midway2-login1.rcc.uchicago.edu
- User username
-
-Make the most of command line options
----------------------------------------
-
-There are a number of command line options that are particularly useful. Foremost amongst them is ``-v, --verbose`` which shows debug output when running Pippin. Including this flag in your run makes it significantly easier to diagnose if anything goes wrong.
-
-The next time saving flag is ``-c, --check``, which will do an initial passthrough of your input yaml file, pointing out any obvious errors before anything runs. This is particularly useful if you have long jobs and want to catch bugs early.
-
-The final set of useful flags are the ``-s, --start``, ``-f, --finish``, and ``-i, --ignore``. These allow you to customize exactly what parts of your full job Pippin runs. Pippin decides whether or not it should rerun a task based on a hash generated each time it's run. This hash produced based on the input, these flags are particularly useful if you change your input but *don't want stages to rerun*, such as if you are making small changes to a final stage, or debugging an early stage.
-
-Advanced Usage
-==============
-
-The following are a number of advanced features which aren't required to use Pippin but can drastically improve your experience with Pippin.
-
-Yaml Anchors
--------------
-
-If you are finding that your config files contain lots of duplicated sections (for example, many simulations configured almost the same way, but with one difference), consider using yaml anchors. A thorough explanation of how to use them is available `here `__, however the basics are as follows. First you should add a new taml section at the tope of your input file. The name of this section doesn't matter as long as it doesn't clash with other Pippin stages, however I usually use `ALIAS`. Within this section, you include all of the yaml anchors you need. An example is shown below:
-
-.. code-block:: yaml
-
- ALIAS:
- LOWZSIM_IA: &LOWZSIM_IA
- BASE: surveys/lowz/sims_ia/sn_ia_salt2_g10_lowz.input
-
- SIM:
- SIM_1:
- IA_G10_LOWZ:
- <<: *LOWZSIM_IA
- # Other options here
- SIM_2:
- IA_G10_LOWZ:
- <<: *LOWZSIM_IA
- # Different options here
-
-Include external aliases
-------------------------
-**This is new and experimental, use with caution**.
-
-*Note that this is* **not** *yaml compliant*.
-
-When dealing with especially large jobs, or suites of jobs you might find yourself having very large ``ALIAS``/``ANCHOR`` blocks which are repated amongst a number of Pippin jobs. A cleaner alternative is to have a number of ``.yml`` files containing your anchors, and then ``including`` these in your input files which will run Pippin jobs. This way you can share anchors amongst multiple Pippin input files and update them all at the same time. In order to achieve this, Pippin can *preprocess* the input file to directly copy the anchor file into the job file. An example is provided below:
-
-``base_job_file.yml``
-
-.. code-block:: yaml
-
- # Values surround by % indicate preprocessing steps.
- # The preprocess below will copy the provided yml files into this one before this one is read in, allowing anchors to propegate into this file
- # They will be copied in, in the order you specify, with duplicate tasks merging.
- # Note that whitespace before or after the % is fine, as long as % is the first and last character.
-
- # % include: path/to/anchors_sim.yml %
- # %include: path/to/anchors_lcfit.yml%
-
- SIM:
- DESSIM:
- IA_G10_DES3YR:
- BASE: surveys/des/sims_ia/sn_ia_salt2_g10_des3yr.input
- GLOBAL:
- # Note that this anchor doesn't exist in this file
- <<: *SIM_GLOBAL
- LCSIM:
- IA_G10_LOWZ:
- BASE: surveys/lowz/sims_ia/sn_ia_salt2_g10_lowz.input
- GLOBAL:
- # Note that this anchor doesn't exist in this file
- <<: *SIM_GLOBAL
-
- LCFIT:
- LS:
- BASE: surveys/lowz/lcfit_nml/lowz.nml
- MASK: DATALOWZ
- FITOPTS: surveys/lowz/lcfit_fitopts/lowz.yml
- # Note that this anchor doesn't exist in this file
- <<: *LCFIT_OPTS
-
- DS:
- BASE: surveys/des/lcfit_nml/des_3yr.nml
- MASK: DATADES
- FITOPTS: surveys/des/lcfit_fitopts/des.yml
- # Note that this anchor doesn't exist in this file
- <<: *LCFIT_OPTS
-
-``anchors_sim.yml``
-
-.. code-block:: yaml
-
- ANCHORS_SIM:
- SIM_GLOBAL: &SIM_GLOBAL
- W0_LAMBDA: -1.0
- OMEGA_MATTER: 0.3
- NGEN_UNIT: 0.1
-
-``anchors_lcfit.yml``
-
-.. code-block:: yaml
-
- ANCHORS_LCFIT:
- LCFIT_OPTS: &LCFIT_OPTS
- SNLCINP:
- USE_MINOS: F
-
-This will be preprocessed to produce the following yaml file, which pippin will then run on.
-
-``final_pippin_input.yml``
-
-.. code-block:: yaml
-
- # Original input file: path/to/base_job_file.yml
- # Values surround by % indicate preprocessing steps.
- # The preprocess below will copy the provided yml files into this one before this one is read in, allowing anchors to propegate into this file
- # They will be copied in, in the order you specify, with duplicate tasks merging.
- # Note that whitespace before or after the % is fine, as long as % is the first and last character.
-
- # Anchors included from path/to/anchors_sim.yml
- ANCHORS_SIM:
- SIM_GLOBAL: &SIM_GLOBAL
- W0_LAMBDA: -1.0
- OMEGA_MATTER: 0.3
- NGEN_UNIT: 0.1
-
- # Anchors included from path/to/anchors_lcfit.yml
- ANCHORS_LCFIT:
- LCFIT_OPTS: &LCFIT_OPTS
- SNLCINP:
- USE_MINOS: F
-
- SIM:
- DESSIM:
- IA_G10_DES3YR:
- BASE: surveys/des/sims_ia/sn_ia_salt2_g10_des3yr.input
- GLOBAL:
- <<: *SIM_GLOBAL
- LCSIM:
- IA_G10_LOWZ:
- BASE: surveys/lowz/sims_ia/sn_ia_salt2_g10_lowz.input
- GLOBAL:
- <<: *SIM_GLOBAL
-
- LCFIT:
- LS:
- BASE: surveys/lowz/lcfit_nml/lowz.nml
- MASK: DATALOWZ
- FITOPTS: surveys/lowz/lcfit_fitopts/lowz.yml
- <<: *LCFIT_OPTS
-
- DS:
- BASE: surveys/des/lcfit_nml/des_3yr.nml
- MASK: DATADES
- FITOPTS: surveys/des/lcfit_fitopts/des.yml
- <<: *LCFIT_OPTS
-
-Now you can include the ``anchors_sim.yml`` and ``anchors_lcfit.yml`` anchors in any pippin job you want, and need only update those anchors once. There are a few caveats to this to be aware of. The preprocessing does not checking to ensure the given file is valid yaml, it simply copies the yaml directly in. As such you should always ensure that the name of your anchor block is unique, any duplicates will mean whichever block is lowest will overwrite all other blocks of the same name. Additionally, whilst you could technically use this to store Pippin task blocks in external yml files, this is discouraged as this feature was only intended for anchors and aliases.
-
-
-Use external results
----------------------
-
-Often times you will want to reuse the results of one Pippin job in other Pippin jobs, for instance reusing a biascor sim so you don't need to resimulate every time. This can be accomplished via the ``EXTERNAL`` and ``EXTERNAL_DIR`` keywords.
-
-The ``EXTERNAL`` keyword is used when you only need to specify a single external result, such as when you are loading in a simulation. If that's the case you simply need to let Pippin know where the external results are located. An example loading in external biascor sims is below:
-
-.. code-block:: yaml
-
- SIM:
- DESSIMBIAS5YRIA_C11:
- EXTERNAL: $PIPPIN_OUTPUT/GLOBAL/1_SIM/DESSIMBIAS5YRIA_C11
- DESSIMBIAS5YRIA_G10:
- EXTERNAL: $PIPPIN_OUTPUT/GLOBAL/1_SIM/DESSIMBIAS5YRIA_G10
- DESSIMBIAS5YRCC:
- EXTERNAL: $PIPPIN_OUTPUT/GLOBAL/1_SIM/DESSIMBIAS5YRCC
-
-The ``EXTERNAL_DIRS`` keyword is used when there isn't a one-to-one mapping between the task the external results. An example of this is a lightcurve fitting task where a single task will fit multiple lightcurves. If this is the case, you can specify a number of external results using the ``EXTERNAL_DIRS`` keyword:
-
-.. code-block:: yaml
-
- LCFIT:
- D:
- BASE: surveys/des/lcfit_nml/des_5yr.nml
- MASK: DESSIM
- EXTERNAL_DIRS:
- - $PIPPIN_OUTPUT/GLOBAL/2_LCFIT/D_DESSIMBIAS5YRIA_C11
- - $PIPPIN_OUTPUT/GLOBAL/2_LCFIT/D_DESSIMBIAS5YRIA_G10
- - $PIPPIN_OUTPUT/GLOBAL/2_LCFIT/D_DESSIMBIAS5YRCC
-
-Note that in this case the name of the external results matches the name of the task. Any tasks which do not have an exact match in ``EXTERNAL_DIRS`` are run as normal, allowing you to mix and match both precomputed and non-precomputed tasks together.
-
-If you have external results which don't have an exact match but should still be used, you can specify how the external results should be used via the ``EXTERNAL_MAP`` keyword:
-
-.. code-block:: yaml
-
- LCFIT:
- D:
- BASE: surveys/des/lcfit_nml/des_5yer.nml
- MASK: DESSIM
- EXTERNAL_DIRS:
- - $PIPPIN_OUTPUT/EXAMPLE_C11/2_LCFIT/DESFIT_SIM
- - $PIPPIN_OUTPUT/EXAMPLE_G10/2_LCFIT/DESFIT_SIM
- - $PIPPIN_OUTPUT/EXAMPLE/2_LCFIT/DESFIT_CCSIM
- EXTERNAL_MAP:
- # LCFIT_SIM: EXTERNAL_MASK
- D_DESSIMBIAS5YRIA_C11: EXAMPLE_C11 # In this case we are matching to the pippin job name, as the LCFIT task name is shared between two EXTERNAL_DIRS
- D_DESSIMBIAS5YRIA_G10: EXAMPLE_G10 # Same as C11
- D_DESSIMBIAS5YRCC: DESFIT_CCSIM # In this case we match to the LCFIT task name, as the pippin job name (EXAMPLE) would match with the other EXTERNAL_DIRS
-
-Changing SBATCH options
------------------------
-
-Pippin has sensible defaults for the sbatch options of each task, however it is possible you may sometimes want to overwrite some keys, or even replace the sbatch template entirely. You can do this via the ``BATCH_REPLACE``, and ``BATCH_FILE`` options respectively.
-
-In order to overwrite the default batch keys, add the following to any task which runs a batch job:
-
-.. code-block:: yaml
-
- BATCH_REPLACE:
- REPLACE_KEY1: value
- REPLACE_KEY2: value
-
-Possible options for ``BATCH_REPLACE`` are:
-
-* ``REPLACE_NAME``: ``--job-name``
-* ``REPLACE_LOGFILE``: ``--output``
-* ``REPLACE_WALLTIME``: ``--time``
-* ``REPLACE_MEM``: ``--mem-per-cpu``
-
-Note that changing these could have unforseen consequences, so use at your own risk.
-
-If replacing these keys isn't enough, you are able to create you own sbatch templates and get Pippin to use them. This is useful if you want to change the partition, or add some additional code which runs before the Pippin job. Note that your template **must** contain the keys listed above in order to work properly. In addition you **must** have ``REPLACE_JOB`` at the bottom of your template file, otherwise Pippin will not be able to load it's jobs into your template. An example template is as follows:
-
-.. code-block:: bash
-
- #!/bin/bash
-
- #SBATCH -p broadwl-lc
- #SBATCH --account=pi-rkessler
- #SBATCH --job-name=REPLACE_NAME
- #SBATCH --output=REPLACE_LOGFILE
- #SBATCH --time=REPLACE_WALLTIME
- #SBATCH --nodes=1
- #SBATCH --mem-per-cpu=REPLACE_MEM
- echo $SLURM_JOB_ID starting execution `date` on `hostname`
-
- REPLACE_JOB
-
-To have Pippin use your template, simply add the following to your task:
-
-.. code-block:: yaml
-
- BATCH_FILE: path/to/your/batch.TEMPLATE