-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is that dependant axes #12
Comments
This is part of the "Unmodeled" Measure type discussion. From that perspective:
With a formal model (what we did in Spectral)
At the Property Level in Mango
|
This what Mango does actually.
|
Right, the question is should Mango do it? or is it the responsibility of the Measure? My feeling at the moment, is that the user should be able to poll the Measure and identify what it is so that decisions can be made. That 'poll' may be a check on the class type (easy) or something else for GenericMeasure
That becomes tricky..
Generally the idea has been to define things in the model which covers the domain. |
In any case it is not the responsablity of
François tried to work with a model derived from CUBE but using |
This question is related to the approriate level of dependencies in a system
Now a comment about the model import.
This has 2 consequences
In the case of Coord it should be easy the import vs PhotDM components that way. |
I agree with this. But Parameter.ucd identifies the Type of the contained measure (as a UCD) ("pos", "time", "phot.flux", "phys.mass")
The structure "Source -> Parameter -> Measure" is very similar to the cube "Cube -> Observable -> Measure" structure. This same issue will effect Cube, so its good to has that out here and decide where that solution belongs. |
I agree that better VODML modeling tools would be very useful!
In my experience from resolving/extracting the Dataset metadata content from Characterization, Spectrum, ObsCore models, this leads to a LOT of inconsistencies and maintenance issues. The 'copy' is rarely a true mirror. I'm not sure it was your goal with this element, but even in Mango, the PhotFilter object is maybe compatible with, but not a copy of the photDM.PhotCal object. And, in Mango, it is an extension of coords:CoordFrame, which it is not in photDM. |
Parameter.semantic comes in addition to Parameter.ucd |
The main difference is the UCD use. |
True while you are doing this by hand. If now you have a system that is able copy a VODML class from a file to another thng would be more seamless Mango:PhotFilter is similar to PhotSys@VOTable. We did so until PhotDM is VODMLized. |
On Wed, Mar 10, 2021 at 10:07 AM Laurent MICHEL ***@***.***> wrote:
This has 2 consequences
1. I do not follow @msdemlei <https://github.com/msdemlei> when he
that says the evolution of a component model will break the stack. If
model1V1 imports model2V1 and model2V1 is updated to model2V2, then
model1V1 remained unchanged until it is upgraded to support model2V2
This is migrating off topic of the issue.. but.
Markus' point on this is quite valid though. Let's say that the Spectral
model work, or something around PlanetaryScience (Orbits), requires a major
version change to Meas/Coords.
* For providers to use those elements in the context of Cube or Mango would
require a version change to those models because they specifically import
Meas/Coords version 1.
* It is harder for providers to include annotations for both versions in
the same file
* you can't just double-annotate the elements that are different
between versions because the parent container is only expecting 1 of them.
* so you end up double-annotating more content than you might like to.
Technically, I could probably rig the annotation:
* annotate the model imports for meas/coords to version 2
* annotate the content per version 2 of those models
So that the serialization *looks* like the containing model (cube/mango)
was updated, but the file would be invalid (not validate).
Where I differ with Markus is that I think this is an annotation problem,
not a model problem.
|
On Wed, Mar 10, 2021 at 11:14 AM Laurent MICHEL ***@***.***> wrote:
In Mango, the 'role' is provided by Parameter.semantic.. right?
Parameter.semantic comes in addition to Parameter.ucd
I would say that Measure is passive, it provides components for who do
request it.
It is not *responsible* of the usage of the provided elements. This is
the responsability of the host model.
In case of MANGO there is no safety guard preventing misusing measures.
Last statement on this for now..
IF the purpose of Parameter.ucd is to identify the Type of Measure the
Parameter holds when that information is not available from the Measure
class itself, then I'd say that this job should be pushed into the Measure,
thereby removing the problem of Parameter.ucd being inconsistent with the
actual Measure.
IF it serves another purpose, then there may be reason to keep it at the
Mango level.
So far, I don't see another purpose, and the description in the Mango
document says "UCD1+ giving the type of the physical measure"
|
UCD tells more that measure type. I've no trouble with the risk of UCD/Class mismatch. It looks reasonable to me because we have a model that must be applicable for a very broad set of use-cases, past, present or future. This implies to introduce somewhere a very flexible feature (flexible seal?) connecting real life data with model elements. |
On 2021-03 -10, at 16:09, Mark Cresitello-Dittmar ***@***.***> wrote:
I'd love to see a UML utility that could generate the diagrams, XML and PDF; which, I think, Paul Harrison had started at one point.
I did - it is here https://github.com/pahjbo/vodsl <https://github.com/pahjbo/vodsl> and I think that it is better for what you are trying to do with sharing and refactoring models, mainly because the “source code” that you are working with is simple text (easy to compare/version control etc.). However, I gave up maintaining it, for lack of any interest in using it (and it will not work in the latest eclipse).
Although you lose a lot of the cleverness by not using eclipse, it is possible to just edit the files in your favourite text editor and then use the https://github.com/pahjbo/vodsl#using-the-stand-alone-parser <https://github.com/pahjbo/vodsl#using-the-stand-alone-parser> parser to convert back to VODML at the end (someone has actually done this).
If there was some interest, then there is a route towards making it work in modern javascript IDEs such as visual studio code - I am not sure that I would have time to do that, but I could point someone in the right direction.
Cheers,
Paul.
|
Not really because the main issue is not the propagation of the meas/coord upgrades, it is the nature of the changes. If the new meas/coord keeps the ascending compatibity, datasets annotated with different versions remain interoperable, otherwise they don't. That is the issue. If you limit the annotation to meas/coord , you loose the possibility to connect elements each to other. |
On Wed, Mar 10, 2021 at 09:08:07AM -0800, Laurent MICHEL wrote:
UCD tells more that measure type.
UCDs are 2 words label e.g. `pos;meta.main`
Therefore you cannot put UCDs in measures as a built-in parameter.
I don't think I follow this "therefore" -- it's still a string, no?
Of course, I'd still not actually build UCDs into the models, as it's
already in VOTable, so we don't need data models for this kind of
thing.
What we do need models for is defining frames, linking values and
error, linking times and places, etc.
I've no trouble with the risk of UCD/Class mismatch. It looks
reasonable to me because we have a model that must be applicable
for a very broad set of use-cases, past, present or future. This
implies to introduce somewhere a very flexible feature (flexible
seal?) connecting real life data with model elements.
Well, the mismatch isn't the only worrying thing; for me, it's more
that we build something for which we already have a solution, or at
least very nearly so. I'd still like to see what exactly you can do
when you have your per-physics classes on top that you cannot do when
you just have the UCD.
This whole thing would be different if you proposed to get rid of
the UCDs (and there would be arguments in favour of that, though far
less than against) once we have your DMs.
But as long as we keep the UCDs I'd be very reluctant to build something
that's this closely related to them.
|
On Wed, Mar 10, 2021 at 12:08 PM Laurent MICHEL ***@***.***> wrote:
UCD tells more that measure type.
UCDs are 2 words label e.g. pos;meta.main
Therefore you cannot put UCDs in measures as a built-in parameter.
I know it sounded like it, but I wasn't necessarily advocating that UCD
should move into Measure, but that all Measure classes should (maybe) be
responsible for identifying what physical quantity it represents. This may
not involve UCD.
Tying in with Markus' comments as well:
The UCD was developed early on to tag VOTable elements with some sort of
physical meaning. The words server multiple purposes, and overlap with the
model class ("pos", "phot.flux", "phys.mass"), and role
("phys.angSize.smajAxis",
"obs.exposure"). They are very useful and used in VOTable serializations.
But I don't think they should be used in the Models (or at least not
without qualifiers so that it ONLY identifies the type).
* in the above example "pos:meta.main", the "meta.main" word doesn't play
a part in the Property does it? That has more to do with the Role, which
is covered by the Parameter.semantic element.
My impression is that using UCD here is taking a model requirement
* identify the physical quantity represented by a Generic class
and matching it with an existing serialization concept, designed for other
purposes, which kind of includes that information.
|
Markus,
I feel like these statements answer your own question:
Additionally: the Position is complex, and the Annotation allows you to identify which 'roles' are filled by which VOTable elements. Which FIELD is the 'latitude', which 'longitude', which define the error ellipse. Again, regardless of whether or not the VOTable groups these elements or populates the ucd tag on the PARAM|FIELD. If you want to use UCDs in the Annotation, that is a different discussion, but you are still mapping the per-physics classes to particular UCDs .
If you're thinking we don't need to model Position, we just need to model Measure and use UCDs for the physics; (which I think is exactly what you've said), I assert you have the same problem
|
On Wed, Mar 10, 2021 at 09:35:58AM -0800, Laurent MICHEL wrote:
In the first case, updating models using meas/coord is
straighforward . We could even imagine a sort of errata process on
VODML files.
In the seconda case, we can get great damages, entangled models or not.
No, that is my point: if you don't entagle models, the "damage" is
limited to the model you're updating (i.e., "not *great* damage).
With entangled models, you're taking down the entire annotation when
one model changes (i.e., great damage).
So, if you will: avoid entangled models to limit the damage radius of
incompatible updates.
If you limit the annotation to meas/coord , you loose the
possibility to connect elements each to other.
Again, I'd contradict here: The connection(s) are what the model
should do and what goes beyond conventional VOTable annotation.
But making this point in abstract perhaps is not terribly convincing,
so: What kind of connections are you thinking of in the use cases we
have?
|
Markus,
On Thu, Mar 11, 2021 at 3:50 AM msdemlei ***@***.***> wrote:
> If you limit the annotation to meas/coord , you loose the
> possibility to connect elements each to other.
Again, I'd contradict here: The connection(s) are what the model
should do and what goes beyond conventional VOTable annotation.
But making this point in abstract perhaps is not terribly convincing,
so: What kind of connections are you thinking of in the use cases we
have?
I'm not sure what you mean by "what kind of connections".. basically the
philosophy that the models are 'building blocks" so that content is not
duplicated (reusable).
* Coordinates model the Coordinate systems, spaces and frames
* Measurements model uses these to define basically all of our physical
quantities ( coord + error )
* Cube/Mango defines a Framework which represents a data Cube or
Source/Catalog using Measurements model elements to describe the data.
The approach that you suggest, IMO, is contrary to all the work done in the
DM working group from day 1, so you can understand why there is
confusion/hesitation.
When considering that approach, I always get the impression of:
* toss out a bag of Lego-s and call it a "Death Star".. 'you just have to
put the pieces in the right order.'
A very significant obstacle to this approach is that there is no 'black
box' object in the base types (and IMO there shouldn't be.. but that isn't
the point).
Each attribute of the model MUST have a Type, and so, to disentangle the
models, you need an AnyType sort of thing
meas:Measure
+ coord: ivoa:AnyType << "object providing the 'value' of the
measure, with associated Coordinate System/Frame/Space information.. such
as coords:Coordinate"
+ error: meas:Error[0..1]
ds:Target
+ name: ivoa:string
+ position: ivoa:AnyType << "object providing the target position
with associated Coordinate system/Frame/Space and uncertainties.. such as
meas:Position"
of course, if we don't
model meas:Position then "such as meas:Measure with ucd containing the
primary atom "pos".
cube:Observable
+ measure: ivoa:AnyType << "an physical quantity with associated
errors, coordinate systems/frame/space.. such as meas:Measure"
This just doesn't make sense to me. The values cannot be AnyType and still
facilitate interoperability.
Laurent has an implementation of each of the cases (those with data anyway).
I have done several (working Standard Properties now). For mine, I try to
'do something' with the data which illustrates the usage, generally pulled
from the case description.
I've been looking forward to seeing your implementations on these to
compare and see how you envision this working in the larger scale.
Mark
|
On Thu, Mar 11, 2021 at 07:10:04AM -0800, Mark Cresitello-Dittmar wrote:
When considering that approach, I always get the impression of:
* toss out a bag of Lego-s and call it a "Death Star".. 'you just have to
put the pieces in the right order.'
This metaphor I think is very useful -- it has made me feel like I've
understand this discussion better, at least. You know, I think it is
how we should present the question to the wider (VO) public.
This is easy for me to say because I'm convinced that if you asked a
bunch of programmers if they'd rather have a bunch of Legos or a
pre-assembled Death Star, nine out of ten would go for the Legos
(well, perhaps except if it was a real, working Death Star, but let's
rather not consider this possibility).
And there's a good reason for that: In actual implementation, "Do
Time Series" isn't a use case. "Plot error bars" or "transform
coordinates" is. Having large, pre-assembled structures makes for
clumsy programmes, and our attempt to produce these large structures
perhaps is part of the reason why our DM efforts so far have made
very little inroads to any sort of running code.
Each attribute of the model MUST have a Type, and so, to disentangle the
The *values* of course have types -- ususally, the container format
provides them, and the INSTANCE-s have their dmtype, too.
For *attributes*, on the other hand, having types is a lot less
important, as evinced by the success of Python (that is strictly
typed on the value side but untyped for the attributes by default).
Now, there's cases for providing guarantees on the properties of
attributes as well, in within models, giving such guarantees by
default probably helps implementors while not damaging much.
Across models or into VOTables, however, type annotation on
attributes should be limited to where there's a strong operational
reason to guarantee types.
You see, you *will* want to change the types of the target objects,
and most of the time the clients would still do the right thing after
the change, for instance, because a VOTable library abstracts away
the modification, or because time has passed and they can just deal
with things.
If you blindly fix the expected types of Attributes ("static
typing"), you'll have a lot of breakage in model evolution where
nothing bad would have happened without the static typing.
Cf. this with SCS's regulation that the VOTables returned MUST be
version 1.1. This has been a sea of pain in implementation without
buying anything at all; actually, plenty of SCS services just ignore
the regulation and work fine with all existing SCS clients.
Of course, you'll need to find a balance there; it certainly *was*
right for SCS to require that a VOTable be returned, and probably
even that it's to be a VOTable 1. Finding this balance is only
possible based on *actual* use cases -- which are not pieces of
annotation but actual tasks like the ones I've mentioned in
http://mail.ivoa.net/pipermail/dm/2020-September/006123.html
model meas:Position then "such as meas:Measure with ucd containing the
primary atom "pos".
cube:Observable
+ measure: ivoa:AnyType << "an physical quantity with associated
errors, coordinate systems/frame/space.. such as meas:Measure"
This just doesn't make sense to me. The values cannot be AnyType and still
facilitate interoperability.
Actually, having dynamic typing here is the only way we can
interoperability on the long term, because both clients and servers
can support a significant number of incompatible measure models
without requiring to repeat all other annotations that perhaps will
never evolve again.
I've been looking forward to seeing your implementations on these to
compare and see how you envision this working in the larger scale.
I'm happy to (provisionally) annotate other kinds of data, but
frankly, this, I think, isn't going to tell us anything we don't
already know until we get the client/library authors on board,
perhaps starting with a simple use case like "automatically plot
error bars" -- or the (to me) central one "transform this catalogue
to a different epoch".
|
My proposal come with a Python client that provides model instances as Python dictionnaries.
|
Markus,
This is one of my favorite exchanges on this subject!
I can't say I agree with you, but I think I am understanding your
point-of-view better.
Instead of:
* each model being a building block to be used/imported by other complex
models
You advocat:
* model the building blocks to be used by clients to construct complex
instances
On this part below..
1) Sorry, I thought I heard you volunteer to implement the use case at
the preview meeting.
I'm not sure how I could resolve my concerns about your approach
without seeing it in action.
2) In the Time Series cas
<https://github.com/ivoa/dm-usecases/tree/main/usecases/time-series/mcd-implementation>e:
I pull out the points and plot them.. with error bars.
In the Native Frames case
<https://github.com/ivoa/dm-usecases/tree/main/usecases/native_frames/mcd-implementation>:
I take the input Positions (in ICRS and GALACTIC), transform them to (FK5
J2015.5) and plot them.
…On Fri, Mar 12, 2021 at 2:29 AM msdemlei ***@***.***> wrote:
> I've been looking forward to seeing your implementations on these to
> compare and see how you envision this working in the larger scale.
I'm happy to (provisionally) annotate other kinds of data, but
frankly, this, I think, isn't going to tell us anything we don't
already know until we get the client/library authors on board,
perhaps starting with a simple use case like "automatically plot
error bars" -- or the (to me) central one "transform this catalogue
to a different epoch".
|
There are here 2 topics (at least) tahte are getting entangled.
|
This threads comes in response to this post
In conclusion, I'll say that an annotation scheme limited to simple cases is not really interesting. If we want to get all the benefits of the data annotation (a painfull process for the data providers), we have to build a full featured system. |
On Fri, Mar 12, 2021 at 08:47:50AM -0800, Laurent MICHEL wrote:
This threads comes in response to this [post](#12 (comment))
> Well, the mismatch isn't the only worrying thing; for me, it's more
> that we build something for which we already have a solution, or at
> least very nearly so. I'd still like to see what exactly you can do
> when you have your per-physics classes on top that you cannot do when
> you just have the UCD.
- The model does not do anything. It is just a piece of structured
Ok, let's say "the model *enables* certain things" -- if it didn't I
frankly wouldn't see much point in going to the trouble of defining
machine-readable models. So, I stand by my basic point: Everything
we do here should be grounded in some actual *use* case, i.e.,
something a client can do with the model annotation that it couldn't
do without it.
documentation that allows people to understand each to other when
they talk about data content. In this context, having per-physics
Ummm... does "people" refer to actual humans? If so, I'd say no.
Humans don't need machine-readable models. They're much better
served by plain text and straight math.
- If I understand well, your question relates more to the data
annotation. The data annotation consists in inserting in data sets
In a way, yes, but again I'd say the only reason we're doing models
is that clients can rely on a certain structure of the annotations.
And while I'm usually all for divide-and-conquer when trying to solve
complex problems, in this particular case I think the attempt to
define models somehow detached from how they will be used hasn't
served us well in the past 10 years.
- If you have a very simple VOTables, the model mapping does not
help at all ,you are right. Note that none forces you annotate
your data.
I'd never say that annotation doesn't help in one place or another.
On the contrary, even the simplest VOTable (id, ra, dec, say) is in
dire need of annotation, because you need to define the frame, the
epoch, and so on. The reason I'm here is that we still can't do that
properly (though, admittedly, in this very simple case, COOSYS helps
a bit).
- If you have something a bit more tricky such as complex errors,
the annotation make them understandable by any client. I hear
you saying with good reason that clients can already do a very
good job without model annotation. But this is not a reason for
not helping them (tools and libs) with clean data interfaces.
No, not at all: Clients can't do without annotation even for the
simplest errors, and that's why I still have to scroll a lot through
combo boxes just to make TOPCAT plot the right error bars.
I'm just saying that we should tackle the simple things first, making
*those* work, and then tackle more complex things *as clients want to
consume them*. Let's not waste time on quarreling about complex
error representations when clients can't even do the simple "plot an
error bar". We *will* get it wrong if we do this without guidance by
client authors.
- In the higher level you may want to add structured data (e.g
Provenance) in your VOTable. This can only be done an advanced
annotation system.
Right. But Provenance is an excellent example: Do you *really* want
to mingle provenance annotation with, say, dataset, or doesn't it
make a lot more sense to have provenance next to (and independently
of) all the other annotations we can have in a dataset?
In conclusion, I'll say that an annotation scheme limited to simple
cases is not really interesting. If we want to get all the benefits
of the data annotation (a painfull process for the data providers),
we have to build a full featured system.
So, to make this concrete, I've created a fork of astropy that
illustrates how I think you can work with arbitrarily complex data:
https://github.com/msdemlei/astropy.
The README explains, I hope, the basic outlook, and the code that's
given there should already work.
I'm happy to demonstrate complex use cases based on this if you throw
them at me.
|
On Fri, Mar 12, 2021 at 06:41:19AM -0800, Mark Cresitello-Dittmar wrote:
This is one of my favorite exchanges on this subject!
I can't say I agree with you, but I think I am understanding your
point-of-view better.
Instead of:
* each model being a building block to be used/imported by other complex
models
You advocate:
* model the building blocks to be used by clients to construct complex
instances
Right -- and that's because I believe that will make our annotations
work *with* how the programmes are already written.
I claim very few programmers will want to, say, generate code from
our DMs and then organise their programme around that. I'm rather sure
they'd much prefer to have an easy go at pulling out the two or three
pieces of information they need for the task at hand and then work
with these in whatever way they like.
This is what my proposal over at https://github.com/msdemlei/astropy
tries to achieve (with relatively little code, I'd claim). I'd hope
the readme illustrates that (although I'm well aware that we'll want
to evolve the annotation -- see "overly minimal" -- and even if not,
the code would need some robustness improvements).
On this part below..
1) Sorry, I thought I heard you volunteer to implement the use case at
the preview meeting.
I'm not sure how I could resolve my concerns about your approach
without seeing it in action.
Does the astropy prototype work for the seeing-in-action thing?
If you give me data and *use* cases (i.e., "*do* this or that with
the data), I'd try to cover those as well.
2) In the Time Series cas
<https://github.com/ivoa/dm-usecases/tree/main/usecases/time-series/mcd-implementation>e:
I pull out the points and plot them.. with error bars.
In the Native Frames case
<https://github.com/ivoa/dm-usecases/tree/main/usecases/native_frames/mcd-implementation>:
I take the input Positions (in ICRS and GALACTIC), transform them to (FK5
J2015.5) and plot them.
Yeah... I'm not claiming these things won't work at all. I'm just
claiming we're making it unnecessarily hard to do them when
entangling data models, that we're making it unnecessarily hard for
us if we don't just re-use what already works (UCDs, xtypes, ...),
and we'll be regretting each entanglement of DMs the moment we need
to evolve the DMs.
So, the question I'm trying to raise can perhaps succinctly put as:
Is what we've come up with the simplest thing (in concept and
implementation) we can come up with that still satisfies actual use
cases?
|
On Fri, Mar 12, 2021 at 06:40:16AM -0800, Laurent MICHEL wrote:
- The point is that the public API does no refer to any native
data element but only to model elements.
- This is the key point for interoperability.
Ummm... Can you explain why you think that? You see, I've tried to
make the exact opposite point a couple of times, and perhaps I can
do a better job on that if I understand why you'd like to avoid
talking about the things you annotate.
|
The scope of the annotations must go beyond simple column annotations which must remain supported though. My point, is since we have a self-consistant model made with a hierarchy of elements identified with Once you have it, you can use accessors based on those identifiers. That is what I call a I the examples I showed up is these use-cases, I transform annotations blocks in Pyhton dictionnaries that are easily serializable in JSON (a good point for data exchange). In pseudo code, this would look like this:
This wouldn't require Python classes implementing the model (fundamental point) I claim that the annotation must be designed in a way that allows this in addition to basic usages. Let's consider that all Vizier tables come with such annotations, the same API code could that get many things:
|
On Fri, Mar 19, 2021 at 07:23:56AM -0700, Laurent MICHEL wrote:
The scope of the annotations must go beyond simple column
annotations which must remain supported though.
I detailed it [here](https://github.com/ivoa-std/ModelInstanceInVot/blob/master/doc/model-instance-in-vot.pdf) section 2.
I'm starting to be unsure whether we are actually disagreeing on much
here -- and I've not found anything in that section 2 that I'd need
to contradict.
So, perhaps a clarification: is my time series use case "single
column annotation", and if so, why? What actual usage would go
beyond what's possible there?
My point, is since we have a self-consistant model made with a
hierarchy of elements identified with `dmtype`, `dmrole` and others
things, the annotation must be something matching that structure.
Well, the thing with dmrole and dmtype to me *is* the annotation, but
I think what you're saying here is that the annotation should be
directly derived from the model. That I wholeheartedly agree with,
and that's why I'm so concerned about the current MCT proposal -- if
it were some abstract musing, I'd be totally ok with it. But when
the model defines the annotation structure. whatever we do in the
model has concrete operational consequences. Which, mind you, is
fine -- we'll have to deal with them *somewhere* and the DM is the
right place for that.
Once you have it, you can use accessors based on those identifiers.
That is what I call a `public API does no refer to any native data
element but only to model elements`
...and I still cannot figure out why you want this -- after all, the
point of the whole exercise IMNSHO is to add information to VOTables
(and later perhaps other container formats) that is not previously in
there.
What would the use case for your free-floating annotation be, if this
is what your are proposing?
I the examples I showed up is these use-cases, I transform the
annotation in Pyhton dictionnaries that are easily serializable in
JSON (a good point for data exchange).
In pseudo code, this would look like this:
```
annotation_reader = AnnotationReader(my_votable)
if annotation_reader.support("mango") is False:
sys.exit(1)
mongo_instance = annotation_reader.get_first_row()
print(mongo_instance.get_measures())
['pos", "magField"]
print("Magnetic field is:" + mongo_instance.get_measure("magField"))
Magnetic field is: 1.23e-6T +/- 2.e-7
```
This wouldn't require Python classes implementing the model
(fundamental point)
I claim that the annotation must be designed in a way that allows
this in addition to basic usages.
-- but why would you want to do this JSON serialisation? Wouldn't it
be much better overall to just put that value into a VOTable and
transmit that rather than fiddle around with custom JSON
dictionaries? In particular when there are quite tangible benefits
if you make it explicit in the model what exactly it is that you're
annotating?
By the way, if by "wouldn't require Python classes" you mean "You
don't have to map model classes into python classes" then yes, I
agree, that is a very desirable part of anything we come up with.
Let's avoid code generators and similar horrors as much as we can.
Nobody likes those.
Let's consider that all Vizier tables come with such annotations, the same API code could that get many things:
- Basic quantities (no significant gain I admit)
- Complex quantities (e.g. complex errors)
- Columns grouping
- Status values
- Associated data or services
I agree to all these use cases (except, as I said, even for basic
quantities the gain is enormous because we can finally express
frames, photometric systems, and the like in non-hackish ways).
But: which of these use cases would you miss with the non-entangled,
explicit-reference models?
|
discussion forked on #18 |
On Wed, Mar 10, 2021 at 12:21:24PM -0800, Mark Cresitello-Dittmar wrote:
* the models do not have UCDs, so you define a Class for the
concept (Position, Time)
* The per-physics class tells you what to expect: the
SphericalPosition should have a 'longitude' and 'latitude' and
'error' among other things. (illustrative, not exact)
Yeah, that's structural, and sure, you'll need classes for "scalar"
vs. "polar coordinate" vs. "cartesian coordinate" (where for now I'd
hope that's only necessary in coordinates for the time being).
But structurally, the scalar quantities all work the same way
(there's a single float). There's nothing to be gained by
introducing extra classes for "redshift scalar" versus "photometric
scalar" for all I can see; all these scalars essentially work the
same way.
Of course, a photometric scalar has different *additional* metadata
(information on the photometric system) than a redshift scalar (that
might is also be part of some spatial annotation). But again I
cannot see how entangling this additional metadata into a particular
class that essentially *only* does thing entanglement will help: A
client looking for this will plausibly look directly for photometric
system annotation rather than look for instances of "photometric
scalar" and than hope it has photometric system annotation.
* the VOTable serialization has UCDs:
* so if you are evaluating the VOTable content and find a PARAM with ucd="pos" or "time" you can infer (by interpreting the semantic word), that the PARAM represents a Position or Time concept, but no specific content expectation can be formed.
* The VOTable serialization, with Utype and ucd, was deemed insufficient for mapping content to models, so an Annotation scheme was requested and developed.
* the Annotation relates the model class meas:Position to a VOTable PARAM.
* NOTE: I know there is not a 1-1 match from Position to a VOTable PARAM, but this serves for illustration.
* this identifyies the PARAM as a Position regardless of whether or not the PARAM includes a ucd="pos"
* my understanding is that the Annotation should not depend on the underlying VOTable ucd or Utype
* if a VOTable has no ucd or Utype assignments, you can fully identify the content from the Annotation.
* I can distinguish Flux from Time without use of ucd or Utype.
Positions, being vectors usually, aren't a terribly good example to
investigate for the question of whether we ought to have per-physics
scalar classes. Let's keep it at scalars, so flux and time are good
examples.
And there I'm convinced that just providing the annotations of, say,
a time system or a photometric system as appropriate will be what
clients want by the above reasoning.
What kind of usage do you have in mind where a client will stumble
into a column and will want to tell whether it's a time or a flux and
where it wouldn't be equally well served with basing that judgement
on the UCD?
Conversely, saying "Ah, if people have been sloppy and haven't
defined a UCD, clients can fall back on the DM annotation" is I think
not very convincing: DM annotation is a lot more complex than just
slapping on a UCD. I don't see any chance that data providers that
don't manage to assign UCDs will get DM annotation remotely right.
Again, I'd perhaps be less concerned about this if we said: "Let's
scap the UCDs, we'll do it all with models now" (though from my
Registry perspective I'd wail and cry if someone proposed that). But
as long as we don't do that, let's not try to address UCDs' use cases
in models unless we're very sure UCDs aren't enough for what a client
might want to do -- and I've not seen an indication for that yet.
Let me quote the Zen of python here:
```
$ python -c "import this" | grep obvious
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
```
Additionally: the Position is complex, and the Annotation allows
you to identify which 'roles' are filled by which VOTable elements.
Of course, totally agreed. But that doesn't require per-physics
scalar classes.
If you want to use UCDs in the Annotation, that is a different
discussion, but you are still mapping the per-physics classes to
particular UCDs .
No, we definitely should avoid having UCDs in DM annotations
(excepting perhaps a few oddities, perhaps in provenance or so).
UCDs are in use already, and they are part of the container formats.
If you're thinking we don't need to model Position, we just need to
model Measure and use UCDs for the physics; (which I think is
No, of course I'm not thinking that. My point is: models for the
structure, UCDs for the physics.
* I don't think this gets you out of the model dependencies,
since you'd still want to model that Measure has a 'coord'
attribute which needs a Type which would be a
"coords:Coordinate" .
No -- whenever halfway feasible, the value of a Measure should be a
reference to the annotated thing (FIELD, PARAM, TABLE, RESOURCE),
where perhaps we may need to allow references to array elements.
That is the crux of the matter that decides whether our scheme will
blow up the first time we need an incompatible change to one of our
DMs.
|
If you have multiple filters in your dataset it is easier to have each magnitude instance referencing its proper filter than having a set of filters and to let the client to do the filter/measure matching.
This is right, but nothing prevents a model to embed attributes carrying the physics of the modeled quantities. I would say that is even necessary if you want model instances to be self-consistant. |
On Mon, Mar 22, 2021 at 09:25:31AM -0700, Laurent MICHEL wrote:
> Of course, a photometric scalar has different *additional* metadata
If you have multiple filters in your dataset it is easier to have
each magnitude instance referencing its proper filter than having a
set of filters and to let the client to do the filter/measure
matching.
I *think* I agree here, but perhaps you could point at examples for
the two approaches you envision here?
> My point is: models for the structure, UCDs for the physics.
This is right, but nothing prevents a model to embed attributes
carrying the physics of the modeled quantities. I would say that is
Ah-hm... sorry, but "nothing prevents" is a weak reason to do
something in a standard. I'll keep pleading that we to the minimum
required to fulfill our use cases, as long VO experience shows that
whatever extra bells and whistles we put into our standards later
turn into problems (see caproles).
Now, perhaps there are strong use cases that require per-physics
scalar classes, but I cannot see one yet, which may be because...
even necessary if you want model instances to be self-consistant.
I admit however that the way MANGO is doing this has to be
improved, but it has to do it.
...I still don't understand what you mean by self-consistent. Could
you perhaps try again to explain what you mean by that (is it "we can
serialise instances outside of container formats"?) and what use
cases you'd like to enable by this self-consistency?
|
Catching up a bit..
Nice to see this.
|
On Tue, Mar 23, 2021 at 12:24:23PM -0700, Mark Cresitello-Dittmar wrote:
> essentially *only* does thing entanglement will help: A client
> looking for this will plausibly look directly for photometric
> system annotation rather than look for instances of "photometric
> scalar" and than hope it has photometric system annotation.
* I'll note that we had "scalar", "polar coordinate", "cartesian
coordinate" in the coords model, and were asked to remove them in
favor of a single multi-dimensional "Point", and scalar
"PhysicalCoordinate". I do think that one outcome of this effort
is an interest in restoring the space-centric types (cartesian,
spherical).
As an aside: I believe given our track record we should probably just
do 2d and 3d polar coordinates in the first round (and rejoice if we
do that properly), but that may just be my natural pessimism.
* When you say: "Of course, a photometric scalar has different
*additional* metadata (information on the photometric system) than
a redshift scalar"
* to me, this calls for a model element which tells the client
that "if you have come across a photometric scalar, look 'here'
for the additional photometric system metadata". We need to
define the association *in the model*
But what does this extra intermediary buy vs. looking for the
photometric system metadata directly?
* A client looking for this will plausibly look directly for photometric system annotation rather than look for instances of "photometric scalar" and than hope it has photometric system annotation
* I think a client processing a cube will note it has
magnitudes, and then ask which bands are they in?
Well, there are certainly many ways a client may be prompted to look
for photometry metadata, units being one, UCDs another, but user
action ("plot this as a photometric time series") IMHO the most
likely one. But whatever the reason, I don't see how "is there a
annotation as a photometric scalar?" will make a client's life
simpler than asking "is there photometric system annotation?".
Part of this is of course the outlook: In my metamodel you can't say a
column "mag" *is* a phot:PhotometricPoint because it can be part of a
large number of annotations (among them potentially also and
importantly phot2:PhotometricPoint). That clients look for
annotations they understand (or prefer) is normal in this system and
the reason for its robustness over evolution.
> my proposal over at https://github.com/msdemlei/astropy
Nice to see this.
* the interface looks very similar to the rama interface which I'm
using in my implementations... looks like your 'get_annotations()'
is similar to Rama's 'find_instances()'.
I'd not be surprised -- I think it's a rather natural API to this
kind of thing.
* a quick question about the [target position](https://github.com/msdemlei/astropy#choosing-a-target-position-palatable-to-the-client) example.
* > for ann in target.position:
> \# this iterates over the fields/params containing the target position
Right.
> pos_anns = ann.get_annotations("stc2:Coords")
* can you explain the path from looping over the ITEMs under the position ATTRIBUTE, to an stc2:Coords instance?
* I don't see how iteration resolves to a stc2:Coord
In case of doubt, you can use iter_annotations() on a column to see
how it works out. The basic scheme, however, is that whenever an
item (param, field, table, resource) is referenced from an annotation
("instance"), the software will add this annotation to the list of
annotations of that item. Hence, in this situation, where ra is the
*longitude* of the *space* attribute (type stc2:SphericalCoordinate)
of an stc2:Coords instance, ra whill have annotations for both
stc2:SphericalCoordinate and stc2:Coords.
|
OK.. so, if we're iterating through the ITEMs, it should find:
So, you would find the Target position if you put ANY leaf from the stc2:Coords content into the Target.position collection. Q: how does this play out if the "stc2:Coords" is made entirely of LITERALs? There will be no 'ref' content to match.
|
On Wed, Mar 24, 2021 at 07:26:57AM -0700, Mark Cresitello-Dittmar wrote:
```
<ATTRIBUTE dmrole="position">
<COLLECTION>
<ITEM ref="ra"/>
<ITEM ref="dec"/>
<ITEM ref="ssa_location"/>
</COLLECTION>
</ATTRIBUTE>
```
OK.. so, if we're iterating through the ITEMs, it should find:
* "ra" - included in "ds:AstroTarget" which is in "ds:Dataset",
"stc2:SphericalCoordinate" which is in "stc2:Coords"
* returns pos_anns[0] = the "stc2:Coords" instance
* "dec"- included in "ds:AstroTarget" which is in "ds:Dataset", "stc2:SphericalCoodrinate" which is in "stc2:coords"
* returns pos_anns[1] = the "stc2:Coords" instance (the same one)
* "ssa_location" - included in "ds:AstroTarget" which is in "ds:Dataset", "stc2:SphericalCoordinate" which is in a different "stc2:coords"
* returns pos_anns[2] = the other "stc2:Coords" instance
So, you would find the Target position if you put ANY leaf from the
stc2:Coords content into the Target.position collection.
Right. The client get to choose whatever it understands, or, in the
advanced cases, whatever it prefers (think: simple position vs. a
simple MOC vs. a spatial distribution of MOCs)
Q: how does this play out if the "stc2:Coords" is made entirely of
LITERALs? There will be no 'ref' content to match.
First, I'd really like to discourage the use of literals against
properly making PARAM-s whenever that's not too inconvenient; this is
also because non-DM-enabled clients will still find the information,
and users can still play with it based on human understanding of the
stuff.
Using PARAMs, such quanitites will also have types, units, xtypes,
clear serialisation rules and all the other VOTable luxuries (you may
remember me having argued against LITERAL-s in the VO-DML
discussions). But then I give you just writing <ATTRIBUTE
dmrole="orientation" value="ICRS"/> is too convenient to miss.
But then *if* you really want immediates in COLLECTION-s my current
proposal lets you have INSTANCE-s in them (though that's untested and
might break on some little mistake yet). So, you'd write:
```
<ATTRIBUTE dmrole="position">
<COLLECTION>
<ITEM ref="ra"/>
<ITEM ref="dec"/>
<ITEM ref="ssa_location"/>
<INSTANCE dmtype="moc:WithLikelihood"">
<ATTRIBUTE name="likelihood" dmtype="real" value="0.95"/>
<ATTRIBUTE name="value" dmtype="???"
value="3/23-27 5/290,332,560"/>
</INSTANCE>
</COLLECTION>
</ATTRIBUTE>
```
(but as the "???" indicates: in all but the most trivial cases I
think that's a bad idea as explained above).
Q: I've mentioned this before, but ... since the annotation
reflects the model structure. Using the 2 annotations of
"stc2:SphericalCoordinate", the underlying model would be:
* SphericalCoordinate
* frame
* longitude
* latitude
* value - ssa_location (which includes longitude, latitude and some frame info) is assigned to this attribute which really should not be an attribute of SphericalCoordinate.)
No, value/ssa_location doesn't really include the frame info any more
than longitude/ra and latitude/dec does. Yes, they're referencing a
COOSYS, but that we want to get rid of in the long run.
But yes, such a model would be possible, and I think our models
should acknowledge the existence of the DALI types and make them
annotatable in some way. Whether or nto the ad-hoc thing I quickly
invented here is a good way I'm happy to discuss (and I suspect it's
not).
Also note that against the original annotation I've changed the
annotation of ssa_location to a hypothetical stc3:Coords model to
better make the intended point, which is: if better/newer ways to
describe, in this case, the target position come up over time, they
can be accomodated without having to touch the ds:Dataset model or
breaking legacy clients.
Apologies for having come up with a bad example initially.
|
Markus.
GAIA TS added in raw_data
The strong reason is that my model needs an attribute carrying the physical measure meaning and there no modeling rule preventing to add it to the model. such attribute is valid.
self-consistent The model must contain all attributes and relations required to describe the domain data. Instances of that model, whatever the serialization is, must have all of these attributes and relations properly set. |
On Thu, Mar 25, 2021 at 09:52:47AM -0700, Laurent MICHEL wrote:
> I *think* I agree here, but perhaps you could point at examples for
> the two approaches you envision here?
GAIA TS added in [raw_data](https://github.com/ivoa/dm-usecases/blob/main/usecases/time-series/raw_data/gaia_multiband.xml)
Ah... I think at some point we have to say "well, structure
your tables differently". Associating metadata to table cells (as
here, where the values G, BP, and RP in the rows would need to come
with photometric system annotation) is a recipe for disaster in so
many ways.
We've just almost gotten rid of the terrible Frame-and-whatnot
strings in STC-S geometries by DALI geometries. Let's not bring
inhomogeneous-metadata columns back again.
If I'm adamant on one thing, it's that metadata needs to be associated
to columns and params, but not to individual table cells. Violate
this principle, and the tables you get are basically unhandlable. A
very simple example: With different metadata per row, you normally
cannot meaningfully compare to values in the same column any more.
Which is the most basic thing you want in a table ("sorting").
And hence the Gaia folks should have written this table with three
photometry columns, one each for G, BP, and RP. I'm sure they'll do
this when we explain them the reasoning.
So, on this I'll solemnly declare "not being able to annotate tables
that aren't actually tables is a feature rather than a bug".
> serialise instances outside of container formats"?) and what use
> cases you'd like to enable by this self-consistency?
**self-consistent** The model must contain all attributes and
relations required to describe the domain data. Instance of that
model, whatever ther serialization is, must have all of these
attributes and relations properly set. The use-case is the
interoperability in general and to be more specific, the capacity
to exchange model instances e.g. by SAMP or DataLink. I'm aware on
I'll note in passing that Datalink is of course VOTable, and that
VOTables are regularly exchanged through SAMP.
that many people are looking at orther media than VOTable. I'm
thinking at JSON/YAML serializations which are mid term use-cases.
I'm not saying that you can't re-invent VOTable in JSON or YAML or
anywhere else; that actually wouldn't need to many conventions for
the more capable of the container formats (among them of course where
to put UCDs, units, xtypes and how to represent PARAMs and COLUMNs).
But that doesn't mean we need to encumber our models with things that
VOTable has already solved (it won't stop with UCDs; as soon as the
first clients consume your JSON, you'll see the discussion on date
formats flaming up again, and you'll have lots of fun at ADASS
sitting in JSON-for-Models BoFs).
No, let's concentrate the limited capacities we have on things that
VOTable cannot do. Teaching other container formats things VOTable
can do that they can't is a problem that can be solved entirely
independently when we actually have it.
|
I'm not the curator of the TABLE that has been provided 2 years ago by ESAC. AFAIR the rationale for this structure was that time stamps are not the sames for each band, and thus this avoids Swiss cheese table. |
But MANGO and CUBE mapping do resolve what VOTAble cannot do. |
On Wed, Apr 07, 2021 at 12:04:26AM -0700, Laurent MICHEL wrote:
> And hence the Gaia folks should have written this table with three
> photometry columns, one each for G, BP, and RP. I'm sure they'll do
> this when we explain them the reasoning.
I'm not the curator of the TABLE that has been provided 2 years ago
by ESAC. AFAIR the rationale for this structure was that time
stamps are not the sames for each band, and thus this avoids Swiss
cheese table.
Yes, I trust they had good reasons for doing what they did, but the
result still is inhomogeneous metadata on the magnitude, flux, and
error columns, and hence this denormalisation results in a severely
irregular table. The most obvious irregularity: a sort by magnitude
has no physical interpretation.
If we try to bend our design so it works with broken data structures
like this, we will make it work a lot worse on regular data -- and
perhaps entirely break it. And I trust DPAC won't mind having to go
for per-band time series (or the "swiss cheese") if they adopt our
annotation; that will help their users, too, even the ones that
ignore our annotation.
|
On Wed, Apr 07, 2021 at 12:11:13AM -0700, Laurent MICHEL wrote:
> No, let's concentrate the limited capacities we have on things that
> VOTable cannot do.
But MANGO and CUBE mapping do resolve what VOTAble cannot do.
Sure, but in managing UCDs, units and possibly serialisation (as in
xtypes), it also repeats things that VOTable can already do. And
this duplication of efforts is something we should only do if we are
very sure it is justified.
Until we are (and I still am not), it would seem wiser to me to
postpone this "VOTable model" until we have the very basic things
(STC, photometry) covered.
|
This time we are in agreement.
Mango uses extensively MCT and PhotDM.
|
Which is proving impossible to do unless we conduct this sort of workshop demonstrating that they are usable within the context of "real" usage in Source-s, Cube-s, TimeSeries-s. |
Hmm.. I'll maybe take a look at the GAIA multi-band example next. My initial reaction here is that if "reorganize your data" was an option, there wouldn't be a need for the work we are doing. It may not make sense to 'sort' on the "magnitude" columns, but it does make sense to 'screen Sources with associated G-band filter to magnitude>=X'. That is the benefit of the Models.. to turn the 'broken data structures' into meaningful entities. |
We are not trying to bend our design.
Nothing allows us to assert that such broken data structure will not be released ever.
|
The Gaia multi-band example dates back to when we started looking at how to represent time series in the IVOA. We asked data providers to send us their use cases, including examples of the kind of data that they wanted us to handle. If I remember correctly, the structure of the multi-band time series reflects the way that the data is collected on the spacecraft, how it is processed in their data processing pipelines, and how the project scientists are used to working with it. We asked them for examples, and they specifically requested that the IVOA time series should be flexible enough to be able to represent this use case. I don't think that telling them they are doing it wrong is an option. |
On Wed, Apr 07, 2021 at 07:44:50AM -0700, Laurent MICHEL wrote:
> ***@***.*** If we try to bend our design so it works with broken data structures ...
We are not trying to bend our design.
- `ModelInsanceInVot` has been designed on the base of data sets we found around (TDIG work).
- `gaia_multiband` is a show case for using FILTERs.
I'd conjecture you wouldn't have introduced FILTER without this
particular example -- and that counts as "bending" to me.
More abstractly, VOTable right now has *no* per-row metadata.
There's one FIELD per table column. That's a very sane design, and
when we tried to break it with the STC-S strings we regretted that
(and are still mopping up the resulting mess).
Nothing allows us to assert that such *broken data structure* will
not be released ever.
Clearly people *have* released data like that, so such an assertion
would be silly, and I'm of course not making it.
But since it breaks a very sane metamodel (tables with per-column
metadata), it is something we should try hard to discourage, and it
is totally possible to say "if you want interoperability, then don't
do it like that."
- This allows more compact VOtables which is something that many
people wish.
Doing it properly increases the size of the VOTable by, what, 30%?
After gzip by perhaps 10%? In my book that's nowhere near a good deal
for complicating the metamodel by a large factor.
- They can be consumed by specific clients or pre-processed by
associated data publishers (e.g. as you did I guess) to be
compliant with their infrastructure by the way.
But we're not writing our specs for "specific clients" -- they don't
need the annotation, as they know what to expect anyway.
We're writing our spec so clients can do interesting things without a
prior contract. In that scenario, embedding a major part of
relational algebra in our metamodel (you already have Aggregation and
Selection, and I'm sure you'll end up having all kinds of joins as
well if you follow this path) is a *very* high price to pay, even
more so since we already have ADQL to write relational expressions.
*If* we really want to enable "canned" relational operations on our
tables (for which I personally don't see a credible use case yet),
we *could* think about embedding ADQL into VOTable, and given the
wide availability of SQLite, I think it could even be implemented
with a reasonable effort.
But whatever we do, let's not re-invent a SQL-in-XML.
Proposing an annotation scheme that is able to map them is
meaningful in this context
Hm -- I'd say a sensible restriction as to what structures can be
annotated and what is just too irregular makes for a good standard.
"Do one thing and do it well" is what made the original Unix great.
I think that's a good precedent to follow.
|
On Wed, Apr 07, 2021 at 07:55:19PM -0700, Zarquan wrote:
If I remember correctly, the structure of the multi-band time
series reflects the way that the data is collected on the
spacecraft, how it is processed in their data processing pipelines,
No. The data on the spacecraft are cut-outs that are relayed down
and analysed per-image, then combined and cross-identified in a
rather complex process.
How this raw data is combined is certainly just a minor matter at the
very end of this process.
I don't think that telling them they are doing it wrong is an option.
Why not? I'm sure they'll at least listen to us. And I'd say the
point that column metadata changing per row is making things very
difficult is a fairly strong one.
|
This is what they are trying to represent in their time series data.
The rotation of the spacecraft generates a repeated sequence of blue then red measurements, offset by a small time delay |
If I understand well your serialisation, you map a list of
NDPoint
, each one being composed withtime
GenericMeasure
I don not see how a client can see that the 1st dependant value is a magnitude and a second a flux.
This question is related to the discussion we have been having here
The text was updated successfully, but these errors were encountered: