Missing labels and captions in plots with default settings #221

peterjc · 2020-08-19T13:37:53Z

Following workflow produced plots without labels or class colors (edited for brevity):

pyani index pyani_sample_genomes
pyani anim pyani_sample_genomes pyani_sample_genomes_anim
pyani plot --formats png,pdf pyani_sample_genomes_anim 1

Solution is to explicitly set labels and classes when call pyani anim,

pyani index pyani_sample_genomes
pyani anim pyani_sample_genomes pyani_sample_genomes_anim \
--labels pyani_sample_genomes/labels.txt --classes pyani_sample_genomes/classes.txt
pyani plot --formats png,pdf pyani_sample_genomes_anim 1

Would it make sense to have --labels and --classes default to $DIR/labels.txt and $DIR/classes.txt if present when run on input directory $DIR?

If no labels are given, would it make sense to use the filename stems as the default labels?

(I'm also puzzled why the classes and labels are tied to the run; I expected pyani index $DIR to record them from $DIR/labels.txt and $DIR/classes.txt)

The text was updated successfully, but these errors were encountered:

widdowquinn · 2020-08-19T14:05:39Z

That's expected behaviour, at the moment… v0.3.0a is in active development and should be considered unfinished.

Default behaviour might eventually turn out to use the hash, or the filestem, for labels (where not provided). My intention is that classes will be ignored if not specified.

Tying classes and labels to the run allows the user to rerun the same analysis with different labels and classes, e.g. for generating plots. It may be helpful to provide a way to override the database classes/labels specifically for an analysis, but at the moment, this is intended behaviour.

peterjc · 2020-08-19T14:20:48Z

I agree that the hash would be another practical default for labels when not provided.

I don't yet understand why you would tie the class and label metadata to the comparison computation stage. I wouldn't want to recompute the comparisons (even with recovery mode) just to re-plot with different classes (e.g. samples sites, or sample year) or different labels (e.g. sampling method).

Can we supply the classes and labels to the plotting (and report?) commands?

widdowquinn · 2020-08-19T14:49:30Z

The database stores the results from previous comparisons, so you don't need to recompute them.

Considering a use-case:

I have a set of genomes I don't know how to classify
I use ANI to generate a likely classification (labels/classes are arbitrary at this point)
I plot the results of the analysis, and this tells me what the classes should be (and suggests labels, e.g. new species divisions)

Now I want to plot the results again, but with my new classes/labels. There are two obvious options:

replot, but use a new classes/labels file specific to that plot
"reanalyse" but with a new classes/labels file (this is computationally almost cost-free, as the results are stored)

Both will give the same file output. However, if you only redo the plot step, using a new class/label file, this gives an output that isn't consistent with the database (though you could make notes/log the changed labels/classes).

To have the database be "reproducible" such that plotting/writing tables for a particular run gives the same outputs each time with only the database as input, we'd need to capture the labels/classes files used for the plot, and remember that it's a combination of ANI run and plot command (and we could have arbitrarily many plots for a single run) that defines the output.

One goal is to have a Flask/whatever is useful at the time interface onto the local database, so that interactive plots can be produced, as well as those which are written statically to a file. These will get their information from the database. It makes sense in that context to have a "run" defined as the genomes + corresponding labels/classes. Changing labels/classes (keeping the same genomes) corresponds to another "run", in the same way removing genomes, but keeping labels/classes, corresponds to another "run". When no new calculations are required, this is a straightforward database update in both cases.

Now, I do see the utility of providing a classes/labels file at the pyani plot step, but it breaks that definition of "run" being "genomes + their labels/classes" that I want to keep for the more advanced interaction with the database. For quick and dirty outputs I see the attraction of having --classes/--labels options in pyani plot. Maybe that's worth implementing - but I'm quite keen on enforcing that "run" definition.

peterjc · 2020-08-19T14:57:12Z

That did clarify your design goals, thank you.

The "quick and dirty" option of --classes / --labels options in pyani plot is attractive, especially while "reanalyse" remains somewhat slow (even in -recovery mode).

widdowquinn · 2020-08-19T14:58:36Z

I should really write this stuff down somewhere ;)

widdowquinn · 2021-04-18T14:49:42Z

This is another thing that should go into the doumentation - the design goals and motivation for the database integration and how that affects the way we need to provide metadata for visualisation.

baileythegreen · 2022-04-29T10:37:46Z

Which part of the documentation? Design goals and motivation sounds like wiki material; there is already a bit of text in indexing.rst that seems related to the use of class and label files discussed here.

widdowquinn · 2022-04-29T11:04:41Z

Which part of the documentation? Design goals and motivation sounds like wiki material;

It does.

there is already a bit of text in indexing.rst that seems related to the use of class and label files discussed here.

Yes, there is. As ever there may be a judgement call involved to decide what is appropriately user-facing (so goes in ReadTheDocs) and what is "motivation/design detail" (so goes in the Wiki) - and some items may be represented, with different levels of detail perhaps, in both places.

widdowquinn added the question how can I do this? why does it do that? where can I get this? etc. label Aug 19, 2020

widdowquinn added the documentation documentation is unclear or incomplete label Apr 18, 2021

widdowquinn added this to the 0.3.0 milestone Apr 18, 2021

baileythegreen added the visualisation issues relating to plot outputs label Jul 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing labels and captions in plots with default settings #221

Missing labels and captions in plots with default settings #221

peterjc commented Aug 19, 2020

widdowquinn commented Aug 19, 2020 •

edited

Loading

peterjc commented Aug 19, 2020

widdowquinn commented Aug 19, 2020

peterjc commented Aug 19, 2020

widdowquinn commented Aug 19, 2020

widdowquinn commented Apr 18, 2021

baileythegreen commented Apr 29, 2022

widdowquinn commented Apr 29, 2022

Missing labels and captions in plots with default settings #221

Missing labels and captions in plots with default settings #221

Comments

peterjc commented Aug 19, 2020

widdowquinn commented Aug 19, 2020 • edited Loading

peterjc commented Aug 19, 2020

widdowquinn commented Aug 19, 2020

peterjc commented Aug 19, 2020

widdowquinn commented Aug 19, 2020

widdowquinn commented Apr 18, 2021

baileythegreen commented Apr 29, 2022

widdowquinn commented Apr 29, 2022

widdowquinn commented Aug 19, 2020 •

edited

Loading