-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update how models are compared using snakemake #42
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Ubuntu lastest (22.04) seems to be not compatible - long installations fail: conda-incubator/setup-miniconda#116 - try setting up manuelly (to avoid updating env)
- try if mamba implementation works on runner instance - runner instance seems to run into memory issue (Kubernetes pod error hints at that)
- improve cmd interface for two key notebooks (as scripts) - mamba (replacement for conda) better for large environment
- index is wrapped into iterable (e.g. for 0 as integer indicating index columns) - remove "%%time" cell magic -> let's papermill miss error in cell. - remove old input - drop depreciated parameter from read_csv
- option for csv would need to be specified - raise error when column Index (level) has no name.
- format should be updated eventually (reading a seq of configs and metrics with same schema)
- exit before DISEASES part -> should be at best separated into new script
- functionality copied from API description: https://www.uniprot.org/help/id_mapping - example provided
- disentangle preprocessing from analysis
- metadata expected for long-wide format transformations
start with single dev dataset - name parameters consistently - model_key: save and use as given (easier for connecting - model abbrev.: RSN, CF, DAE, VAE (make consistent) - move helper function
start with single dev dataset - name parameters consistently - model_key: save and use as given (easier for connecting - model abbrev.: RSN, CF, DAE, VAE (make consistent) - move helper function
- update parameter parsing - collect dump figure paths - move one function to package - pick up sample index name automatically
- update parameter parsing - collect dump figure paths - move one function to package - pick up sample index name automatically
- model_key and model needed currently for grid search - "id" (constructed on loading config and metric files) + "model" name should be unique combination in futue - CF model: batch_size (not batch_size_collab) - ToDo: model pred should be saved by default (as currently done)
- model_key and model needed currently for grid search - "id" (constructed on loading config and metric files) + "model" name should be unique combination in futue - CF model: batch_size (not batch_size_collab) - ToDo: model pred should be saved by default (as currently done)
- add separate rules for interpolation and median imputation -> more separation - possibility: Optimize models one by one, write results and configs to shared database
- add separate rules for interpolation and median imputation -> more separation - possibility: Optimize models one by one, write results and configs to shared database
- scripts need to be futher adapted
- update model abbrev in notebooks (CF, DAE, VAE, RSN) - missing values (not real_na in comments)
- fake NA -> simulated missing values - drop n_obs from plot
- each model type will have it's own rules ToDo: - convoluted setup needs cleaning - folder layout will be adapted
- prepare for export to separate snakefiles
Make it easier to extend the search with further models - folder_dataset/models/{model}/ - folder_dataset/models/{model}/run_id/ > single files are not dumped in subfolders (-> metrics/models) Folders (ordered by hierarchy) - folder_dataset: dataset specific folder (here: level of HeLa development dataset) - root_model: all model runs (each in subfolder) - run_id_template: models with different hyperparameters ToDo: Settle on how model script has to look -> each run should produce the same folder structure.
- remodel dumping of (real) missing values
- single experiment model setup needs to be adapted
- on the way to better composition -> next: refactor interpolation
- notebook needs a refactoring
- metrics format simplified (more general) - remove interpolation dependence - remove plot from training scripts - not imputed -> "prop" gives share a specific method can impute - adapt comparison "performance_plots" - > metrics and "subset" had to be adapted for grid_search - median: Median
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Principles:
papermill
)a. for splits of simulated missing data
b. and predictions of missing data