-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing labels and captions in plots with default settings #221
Comments
That's expected behaviour, at the moment… v0.3.0a is in active development and should be considered unfinished. Default behaviour might eventually turn out to use the hash, or the filestem, for labels (where not provided). My intention is that classes will be ignored if not specified. Tying classes and labels to the run allows the user to rerun the same analysis with different labels and classes, e.g. for generating plots. It may be helpful to provide a way to override the database classes/labels specifically for an analysis, but at the moment, this is intended behaviour. |
I agree that the hash would be another practical default for labels when not provided. I don't yet understand why you would tie the class and label metadata to the comparison computation stage. I wouldn't want to recompute the comparisons (even with recovery mode) just to re-plot with different classes (e.g. samples sites, or sample year) or different labels (e.g. sampling method). Can we supply the classes and labels to the plotting (and report?) commands? |
The database stores the results from previous comparisons, so you don't need to recompute them. Considering a use-case:
Now I want to plot the results again, but with my new classes/labels. There are two obvious options:
Both will give the same file output. However, if you only redo the plot step, using a new class/label file, this gives an output that isn't consistent with the database (though you could make notes/log the changed labels/classes). To have the database be "reproducible" such that plotting/writing tables for a particular run gives the same outputs each time with only the database as input, we'd need to capture the labels/classes files used for the plot, and remember that it's a combination of ANI run and plot command (and we could have arbitrarily many plots for a single run) that defines the output. One goal is to have a Flask/whatever is useful at the time interface onto the local database, so that interactive plots can be produced, as well as those which are written statically to a file. These will get their information from the database. It makes sense in that context to have a "run" defined as the genomes + corresponding labels/classes. Changing labels/classes (keeping the same genomes) corresponds to another "run", in the same way removing genomes, but keeping labels/classes, corresponds to another "run". When no new calculations are required, this is a straightforward database update in both cases. Now, I do see the utility of providing a classes/labels file at the pyani plot step, but it breaks that definition of "run" being "genomes + their labels/classes" that I want to keep for the more advanced interaction with the database. For quick and dirty outputs I see the attraction of having |
That did clarify your design goals, thank you. The "quick and dirty" option of |
I should really write this stuff down somewhere ;) |
This is another thing that should go into the doumentation - the design goals and motivation for the database integration and how that affects the way we need to provide metadata for visualisation. |
Which part of the documentation? Design goals and motivation sounds like wiki material; there is already a bit of text in |
It does.
Yes, there is. As ever there may be a judgement call involved to decide what is appropriately user-facing (so goes in ReadTheDocs) and what is "motivation/design detail" (so goes in the Wiki) - and some items may be represented, with different levels of detail perhaps, in both places. |
Following workflow produced plots without labels or class colors (edited for brevity):
Solution is to explicitly set labels and classes when call
pyani anim
,Would it make sense to have
--labels
and--classes
default to$DIR/labels.txt
and$DIR/classes.txt
if present when run on input directory$DIR
?If no labels are given, would it make sense to use the filename stems as the default labels?
(I'm also puzzled why the classes and labels are tied to the run; I expected
pyani index $DIR
to record them from$DIR/labels.txt
and$DIR/classes.txt
)The text was updated successfully, but these errors were encountered: