Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is that dependant axes #12

Open
lmichel opened this issue Mar 5, 2021 · 52 comments
Open

What is that dependant axes #12

lmichel opened this issue Mar 5, 2021 · 52 comments
Labels
question Further information is requested

Comments

@lmichel
Copy link
Collaborator

lmichel commented Mar 5, 2021

If I understand well your serialisation, you map a list of NDPoint, each one being composed with

  • one independant value typed as time
  • 2 dependants values typed as GenericMeasure

I don not see how a client can see that the 1st dependant value is a magnitude and a second a flux.

  • Is it supposed to check the FIELD ucd or the coord system type?

This question is related to the discussion we have been having here

@lmichel lmichel added the question Further information is requested label Mar 5, 2021
@mcdittmar
Copy link
Collaborator

mcdittmar commented Mar 5, 2021

This is part of the "Unmodeled" Measure type discussion.
Since there is no formal model containing Flux or Magnitude as a Measure (or any other type), then it MUST be handled by GenericMeasure.

From that perspective:

  1. there is no way to determine that one is Flux and one is Mag (other than the units)
    I'd say this may indicate a weakness in the Measurement model: should every Measurement instance be able to identify what physical entity it represents? either by class type, or semantic?
  2. there is no way of conveying that there "should be a corresponding PhotCal instance associated with this Measure"
    so, even if the Measurement provided its identity, it could not convey the dependency on other info.

With a formal model (what we did in Spectral)

  1. would create a Measure type for it
  2. define a Frame which include reference to the appropriate PhotCal instance.

At the Property Level in Mango

  1. you add a 'ucd' to the mix, which at least lets you identify it as a Flux or Magnitude
  2. but still have no way of conveying that there "should be a PhotCal" and which one.
  3. one COULD use the Property.associatedPropterty mechanism to make that connection, but it would be an abuse of the link.

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 10, 2021

there is no way to determine that one is Flux and one is Mag (other than the units)
I'd say this may indicate a weakness in the Measurement model: should every Measurement instance be able to identify what physical entity it represents? either by class type, or semantic?

This what Mango does actually.
I believe that meas should at least support some sort of photometric data (as well as LonLat position) with a filter definition (PhotCal) somewhere in coords

At the Property Level in Mango

PhotCal is a component of PhotometricCoordSystem named PhotFilter

@mcdittmar
Copy link
Collaborator

mcdittmar commented Mar 10, 2021

there is no way to determine that one is Flux and one is Mag (other than the units)
I'd say this may indicate a weakness in the Measurement model: should every Measurement instance be able to identify what physical entity it represents? either by class type, or semantic?

This what Mango does actually.

Right, the question is should Mango do it? or is it the responsibility of the Measure?
ie: should users be able to determine the specific 'kind' of Measure from the Measure itself, whether in Mango or Cube or TimeSeries or etc.

My feeling at the moment, is that the user should be able to poll the Measure and identify what it is so that decisions can be made. That 'poll' may be a check on the class type (easy) or something else for GenericMeasure

I believe that meas should at least support some sort of photometric data (as well as LonLat position) with a filter definition (PhotCal) somewhere in coords

At the Property Level in Mango

PhotCal is a component of PhotometricCoordSystem named PhotFilter

That becomes tricky..
Meas having some Photometry type is not a problem.
Coords having a PhotometryFrame which relates the photDM:PhotCal object would add an awkward dependency since "Coords" is more core than "PhotoDM".
This is a case where Markus' modeling plan would be handy (not that I'm advocating it)

  • PhotometryFrame
    • filter: <something providing Photometry Filter specifications >[1]

Generally the idea has been to define things in the model which covers the domain.
So Photometry measure would be defined in the Spectral model, or perhaps photDM itself.

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 10, 2021

Right, the question is should Mango do it? or is it the responsibility of the Measure?

In any case it is not the responsablity of Measure which models measure classes but not their roles in a given context
This is the responsability of CUBE in the case: It has to assign a role to the components it uses.

My feeling at the moment, is that the user should be able to poll the Measure and identify what it is so that decisions can be made. That 'poll' may be a check on the class type (easy) or something else for GenericMeasure

François tried to work with a model derived from CUBE but using mango:Parameter instead for meas:measure
This was an elegant way to go through the issue.

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 10, 2021

Coords having a PhotometryFrame which relates the photDM:PhotCal object would add an awkward dependency since "Coords" is more core than "PhotoDM".

This question is related to the approriate level of dependencies in a system

  • Too much dependencies between the components lead to self-locking
  • Too few lead to inconsistencies
  • up to us to set the cursor

Now a comment about the model import.

  • This is a nice feature of VODML but difficult to use in practice.

    • the proxy class trick you proposed for Modelio works fine
    • But it is not very safe because you have to cut/past VODML ids and class names from the imported models to your modelio: big risk of mistake (believe my experience)!
  • On a conceptual level

    • The import refers to a specific version (given by its URL) of the imported model.
    • Thus evolutions of the imported model do not affect the importer. The imported model is frozen from the importer point of view.

This has 2 consequences

  1. I do not follow @msdemlei when he that says that the evolution of a component model will break the stack. If model1V1 imports model2V1 and model2V1 is updated to model2V2, then model1V1 remained unchanged until it is upgraded to support model2V2
  2. As the imported model is frozen, why should we continue to work with dynamic links? As models are well defined and versionned, I'm wondering whether we could consider working with class copies. This class of my model model1V1 is a copy the its sibling in model2V1. This wouldn't break the consistancy I mentionned above while making our job easier.

In the case of Coord it should be easy the import vs PhotDM components that way.
If this proposal looks too odd, using both Coords and PhotDM in measures will require to work with another model aggregating them.

@mcdittmar
Copy link
Collaborator

Right, the question is should Mango do it? or is it the responsibility of the Measure?

In any case it is not the responsablity of Measure which models measure classes but not their roles in a given context

I agree with this.
In Mango, the 'role' is provided by Parameter.semantic.. right?

But Parameter.ucd identifies the Type of the contained measure (as a UCD) ("pos", "time", "phot.flux", "phys.mass")
And THIS is probably the responsibility of Measure to provide this info either by the class name, or by some other means in the case of GenericMeasure.

François tried to work with a model derived from CUBE but using mango:Parameter instead for meas:measure
This was an elegant way to go through the issue.

The structure "Source -> Parameter -> Measure" is very similar to the cube "Cube -> Observable -> Measure" structure. This same issue will effect Cube, so its good to has that out here and decide where that solution belongs.

@mcdittmar
Copy link
Collaborator

  • This is a nice feature of VODML but difficult to use in practice.

    • the proxy class trick you proposed for Modelio works fine
    • But it is not very safe because you have to cut/past VODML ids and class names from the imported models to your modelio: big risk of mistake (believe my experience)!

I agree that better VODML modeling tools would be very useful!
I'd love to see a UML utility that could generate the diagrams, XML and PDF; which, I think, Paul Harrison had started at one point.

  1. As the imported model is frozen, why should we continue to work with dynamic links? As models are well defined and versionned, I'm wondering whether we could consider working with class copies. This class of my model model1V1 is a copy the its sibling in model2V1. This wouldn't break the consistancy I mentionned above while making our job easier.

In my experience from resolving/extracting the Dataset metadata content from Characterization, Spectrum, ObsCore models, this leads to a LOT of inconsistencies and maintenance issues. The 'copy' is rarely a true mirror.

I'm not sure it was your goal with this element, but even in Mango, the PhotFilter object is maybe compatible with, but not a copy of the photDM.PhotCal object. And, in Mango, it is an extension of coords:CoordFrame, which it is not in photDM.

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 10, 2021

In Mango, the 'role' is provided by Parameter.semantic.. right?

Parameter.semantic comes in addition to Parameter.ucd
I would say that Measure is passive, it provides components for who do request it.
It is not responsible of the usage of the provided elements. This is the responsability of the host model.
In case of MANGO there is no safety guard preventing misusing measures.

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 10, 2021

The structure "Source -> Parameter -> Measure" is very similar to the cube "Cube -> Observable -> Measure" structure. This same issue will effect Cube, so its good to has that out here and decide where that solution belongs.

The main difference is the UCD use.

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 10, 2021

In my experience from resolving/extracting the Dataset metadata content from Characterization, Spectrum, ObsCore models, this leads to a LOT of inconsistencies and maintenance issues. The 'copy' is rarely a true mirror.

True while you are doing this by hand. If now you have a system that is able copy a VODML class from a file to another thng would be more seamless

Mango:PhotFilter is similar to PhotSys@VOTable. We did so until PhotDM is VODMLized.
@loumir already complained about this and proposed a PhotDM clone more consistant

@mcdittmar
Copy link
Collaborator

mcdittmar commented Mar 10, 2021 via email

@mcdittmar
Copy link
Collaborator

mcdittmar commented Mar 10, 2021 via email

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 10, 2021

UCD tells more that measure type.
UCDs are 2 words label e.g. pos;meta.main
Therefore you cannot put UCDs in measures as built-in parameters.

I've no trouble with the risk of UCD/Class mismatch. It looks reasonable to me because we have a model that must be applicable for a very broad set of use-cases, past, present or future. This implies to introduce somewhere a very flexible feature (flexible seal?) connecting real life data with model elements.

@pahjbo
Copy link
Member

pahjbo commented Mar 10, 2021 via email

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 10, 2021

Markus' point on this is quite valid though.

Not really because the main issue is not the propagation of the meas/coord upgrades, it is the nature of the changes.

If the new meas/coord keeps the ascending compatibity, datasets annotated with different versions remain interoperable, otherwise they don't. That is the issue.
In the first case, updating models using meas/coord is straighforward . We could even imagine a sort of errata process on VODML files.
In the seconda case, we can get great damages, entangled models or not.

If you limit the annotation to meas/coord , you loose the possibility to connect elements each to other.

@msdemlei
Copy link
Contributor

msdemlei commented Mar 10, 2021 via email

@mcdittmar
Copy link
Collaborator

mcdittmar commented Mar 10, 2021 via email

@mcdittmar
Copy link
Collaborator

Markus,

I'd still not actually build UCDs into the models, as it's already in VOTable

I'd still like to see what exactly you can do when you have your per-physics classes on top that you cannot do when you just have the UCD.

I feel like these statements answer your own question:

  • the models do not have UCDs, so you define a Class for the concept (Position, Time)

    • The per-physics class tells you what to expect: the SphericalPosition should have a 'longitude' and 'latitude' and 'error' among other things. (illustrative, not exact)
  • the VOTable serialization has UCDs:

    • so if you are evaluating the VOTable content and find a PARAM with ucd="pos" or "time" you can infer (by interpreting the semantic word), that the PARAM represents a Position or Time concept, but no specific content expectation can be formed.
  • The VOTable serialization, with Utype and ucd, was deemed insufficient for mapping content to models, so an Annotation scheme was requested and developed.

    • the Annotation relates the model class meas:Position to a VOTable PARAM.
      • NOTE: I know there is not a 1-1 match from Position to a VOTable PARAM, but this serves for illustration.
    • this identifyies the PARAM as a Position regardless of whether or not the PARAM includes a ucd="pos"
    • my understanding is that the Annotation should not depend on the underlying VOTable ucd or Utype
      • if a VOTable has no ucd or Utype assignments, you can fully identify the content from the Annotation.
        • I can distinguish Flux from Time without use of ucd or Utype.

Additionally: the Position is complex, and the Annotation allows you to identify which 'roles' are filled by which VOTable elements. Which FIELD is the 'latitude', which 'longitude', which define the error ellipse. Again, regardless of whether or not the VOTable groups these elements or populates the ucd tag on the PARAM|FIELD.

If you want to use UCDs in the Annotation, that is a different discussion, but you are still mapping the per-physics classes to particular UCDs .

  • Instead of <INSTANCE dmtype: "meas:Position"> you could have <INSTANCE ucd: "pos">.
    • this would be model agnostic, so you can have no idea what to expect inside this INSTANCE, so I don't think this helps with interoperability.

If you're thinking we don't need to model Position, we just need to model Measure and use UCDs for the physics; (which I think is exactly what you've said), I assert you have the same problem

  • <INSTANCE dmtype: "meas:Measure", ucd: "pos">
    what can I expect for the Attributes under here? can only be what is modeled in Measure
    There may be a very flexible path here, but it means that individuals are building implied models using the semantics. I'm not sure what the ramifications of this approach are.
    • I expect my earlier comment re: UCDs mixing concepts (Type, role, etc) come into play.
    • I imagine it would be VERY hard to model this way, what needs to be modeled, what is left to semantics.
    • It is contrary to the method by which every IVOA data model to date has been done.
    • I don't think this gets you out of the model dependencies, since you'd still want to model that Measure has a 'coord' attribute which needs a Type which would be a "coords:Coordinate" .
    • Puts more onus on the users for creating the specialized classes by hand.. cannot be generated from the models
    • This moves into our discussion on Issue Adding a time series/1d data use case  #2

@msdemlei
Copy link
Contributor

msdemlei commented Mar 11, 2021 via email

@mcdittmar
Copy link
Collaborator

mcdittmar commented Mar 11, 2021 via email

@msdemlei
Copy link
Contributor

msdemlei commented Mar 12, 2021 via email

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 12, 2021

I'm happy to (provisionally)

My proposal come with a Python client that provides model instances as Python dictionnaries.

  • From that point you can plot or doing many things
  • The point is that the public API does no refer to any native data element but only to model elements.
  • This is the key point for interoperability.
  • It is valuable even if you have sometime to annotate basic quantities (FIELD + unit + UCD) that wouldn't need to be modeled

@mcdittmar
Copy link
Collaborator

mcdittmar commented Mar 12, 2021 via email

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 12, 2021

There are here 2 topics (at least) tahte are getting entangled.

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 12, 2021

This threads comes in response to this post

Well, the mismatch isn't the only worrying thing; for me, it's more
that we build something for which we already have a solution, or at
least very nearly so. I'd still like to see what exactly you can do
when you have your per-physics classes on top that you cannot do when
you just have the UCD.

  • The model does not do anything. It is just a piece of structured documentation that allows people to understand each to other when they talk about data content. In this context, having per-physics classe make sense because this allows people to get fine grain descriptions of complex quantities.
  • If I understand well, your question relates more to the data annotation. The data annotation consists in inserting in data sets elements that bridge the actual data with the model nodes and you are questioning the usefulness of mapping data on such classes. My answer in a few points:
    • The annotation is a framework that allows curators to provide data views shared by all and to add missing meta-data. This the basic of the interoperability. Clients working with model instances have no longer to care about the exact structure of the data.
    • If you have a very simple VOTables, the model mapping does not help at all ,you are right. Note that none forces you annotate your data.
    • If you have something a bit more tricky such as complex errors, the annotation make them understandable by any client. I hear you saying with good reason that clients can already do a very good job without model annotation. But this is not a reason for not helping them (tools and libs) with clean data interfaces.
    • At an highest level, if you have data pieces connected each to others (errors matrix, multi table-data), you need annotations. There is a very strong Vizier requirement (the main one) for grouping measures together. This can only be done an advanced annotation system.
    • In the higher level you may want to add structured data (e.g Provenance) in your VOTable. This can only be done an advanced annotation system.

In conclusion, I'll say that an annotation scheme limited to simple cases is not really interesting. If we want to get all the benefits of the data annotation (a painfull process for the data providers), we have to build a full featured system.

@msdemlei
Copy link
Contributor

msdemlei commented Mar 18, 2021 via email

@msdemlei
Copy link
Contributor

msdemlei commented Mar 18, 2021 via email

@msdemlei
Copy link
Contributor

msdemlei commented Mar 18, 2021 via email

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 19, 2021

The scope of the annotations must go beyond simple column annotations which must remain supported though.
I detailed it here section 2.

My point, is since we have a self-consistant model made with a hierarchy of elements identified with dmtype, dmrole and others things, the annotation must be something matching that structure.

Once you have it, you can use accessors based on those identifiers. That is what I call a public API does no refer to any native data element but only to model elements

I the examples I showed up is these use-cases, I transform annotations blocks in Pyhton dictionnaries that are easily serializable in JSON (a good point for data exchange).

In pseudo code, this would look like this:

annotation_reader = AnnotationReader(my_votable)
if annotation_reader.support("mango") is False:
  sys.exit(1)

mongo_instance = annotation_reader.get_first_row()
print(mongo_instance.get_measures())
['pos", "magField"]
print("Magnetic field is:" + mongo_instance.get_measure("magField"))
Magnetic field is: 1.23e-6T +/- 2.e-7

This wouldn't require Python classes implementing the model (fundamental point)

I claim that the annotation must be designed in a way that allows this in addition to basic usages.

Let's consider that all Vizier tables come with such annotations, the same API code could that get many things:

  • Basic quantities (no significant gain I admit)
  • Complex quantities (e.g. complex errors)
  • Columns grouping
  • Status values
  • Associated data or services

@msdemlei
Copy link
Contributor

msdemlei commented Mar 19, 2021 via email

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 19, 2021

discussion forked on #18

@msdemlei
Copy link
Contributor

msdemlei commented Mar 22, 2021 via email

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 22, 2021

Of course, a photometric scalar has different additional metadata

If you have multiple filters in your dataset it is easier to have each magnitude instance referencing its proper filter than having a set of filters and to let the client to do the filter/measure matching.

My point is: models for the structure, UCDs for the physics.

This is right, but nothing prevents a model to embed attributes carrying the physics of the modeled quantities. I would say that is even necessary if you want model instances to be self-consistant.
I admit however that the way MANGO is doing this has to be improved, but it has to do it.

@msdemlei
Copy link
Contributor

msdemlei commented Mar 23, 2021 via email

@mcdittmar
Copy link
Collaborator

On Wed, Mar 10, 2021 at 12:21:24PM -0800, Mark Cresitello-Dittmar wrote:

  • the models do not have UCDs, so you define a Class for the concept (Position, Time)
  • The per-physics class tells you what to expect: the SphericalPosition should have a 'longitude' and 'latitude' and 'error' among other things. (illustrative, not exact)

Yeah, that's structural, and sure, you'll need classes for "scalar" vs. "polar coordinate" vs. "cartesian coordinate" (where for now I'd hope that's only necessary in coordinates for the time being). But structurally, the scalar quantities all work the same way (there's a single float). There's nothing to be gained by introducing extra classes for "redshift scalar" versus "photometric scalar" for all I can see; all these scalars essentially work the same way. Of course, a photometric scalar has different additional metadata (information on the photometric system) than a redshift scalar (that might is also be part of some spatial annotation). But again I cannot see how entangling this additional metadata into a particular class that essentially only does thing entanglement will help: A client looking for this will plausibly look directly for photometric system annotation rather than look for instances of "photometric scalar" and than hope it has photometric system annotation.

Catching up a bit..

  • I agree, for the most part, with "structurally, the scalar quantities all work the same way".
  • I'll note that we had "scalar", "polar coordinate", "cartesian coordinate" in the coords model, and were asked to remove them in favor of a single multi-dimensional "Point", and scalar "PhysicalCoordinate". I do think that one outcome of this effort is an interest in restoring the space-centric types (cartesian, spherical).
  • When you say: "Of course, a photometric scalar has different additional metadata (information on the photometric system) than a redshift scalar"
    • to me, this calls for a model element which tells the client that "if you have come across a photometric scalar, look 'here' for the additional photometric system metadata". We need to define the association in the model
  • A client looking for this will plausibly look directly for photometric system annotation rather than look for instances of "photometric scalar" and than hope it has photometric system annotation
    • I think a client processing a cube will note it has magnitudes, and then ask which bands are they in?

my proposal over at https://github.com/msdemlei/astropy

Nice to see this.

  • the interface looks very similar to the rama interface which I'm using in my implementations... looks like your 'get_annotations()' is similar to Rama's 'find_instances()'.
  • a quick question about the target position example.
    • for ann in target.position:
      # this iterates over the fields/params containing the target position
      pos_anns = ann.get_annotations("stc2:Coords")

    • can you explain the path from looping over the ITEMs under the position ATTRIBUTE, to an stc2:Coords instance?
      • I don't see how iteration resolves to a stc2:Coord

@msdemlei
Copy link
Contributor

msdemlei commented Mar 24, 2021 via email

@mcdittmar
Copy link
Collaborator

pos_anns = ann.get_annotations("stc2:Coords") * can you explain the path from looping over the ITEMs under the position ATTRIBUTE, to an stc2:Coords instance? * I don't see how iteration resolves to a stc2:Coord

In case of doubt, you can use iter_annotations() on a column to see how it works out. The basic scheme, however, is that whenever an item (param, field, table, resource) is referenced from an annotation ("instance"), the software will add this annotation to the list of annotations of that item. Hence, in this situation, where ra is the longitude of the space attribute (type stc2:SphericalCoordinate) of an stc2:Coords instance, ra whill have annotations for both stc2:SphericalCoordinate and stc2:Coords.

  <ATTRIBUTE dmrole="position">
    <COLLECTION>
      <ITEM ref="ra"/>
      <ITEM ref="dec"/>
      <ITEM ref="ssa_location"/>
    </COLLECTION>
  </ATTRIBUTE>

OK.. so, if we're iterating through the ITEMs, it should find:

  • "ra" - included in "ds:AstroTarget" which is in "ds:Dataset", "stc2:SphericalCoordinate" which is in "stc2:Coords"
    • returns pos_anns[0] = the "stc2:Coords" instance
  • "dec"- included in "ds:AstroTarget" which is in "ds:Dataset", "stc2:SphericalCoodrinate" which is in "stc2:coords"
    • returns pos_anns[1] = the "stc2:Coords" instance (the same one)
  • "ssa_location" - included in "ds:AstroTarget" which is in "ds:Dataset", "stc2:SphericalCoordinate" which is in a different "stc2:coords"
    • returns pos_anns[2] = the other "stc2:Coords" instance

So, you would find the Target position if you put ANY leaf from the stc2:Coords content into the Target.position collection.

Q: how does this play out if the "stc2:Coords" is made entirely of LITERALs? There will be no 'ref' content to match.
Q: I've mentioned this before, but ... since the annotation reflects the model structure. Using the 2 annotations of "stc2:SphericalCoordinate", the underlying model would be:

  • SphericalCoordinate
    • frame
    • longitude
    • latitude
    • value - ssa_location (which includes longitude, latitude and some frame info) is assigned to this attribute which really should not be an attribute of SphericalCoordinate.)

@msdemlei
Copy link
Contributor

msdemlei commented Mar 25, 2021 via email

@lmichel
Copy link
Collaborator Author

lmichel commented Mar 25, 2021

Markus.

I think I agree here, but perhaps you could point at examples for
the two approaches you envision here?

GAIA TS added in raw_data

Ah-hm... sorry, but "nothing prevents" is a weak reason

The strong reason is that my model needs an attribute carrying the physical measure meaning and there no modeling rule preventing to add it to the model. such attribute is valid.

...I still don't understand what you mean by self-consistent. Could
you perhaps try again to explain what you mean by that (is it "we can
serialise instances outside of container formats"?) and what use
cases you'd like to enable by this self-consistency?

self-consistent The model must contain all attributes and relations required to describe the domain data. Instances of that model, whatever the serialization is, must have all of these attributes and relations properly set.
The use-case is the interoperability in general and to be more specific, the capacity to exchange model instances e.g. by SAMP, DataLink or any other WEB endpoint.
I'm also aware on that many people are looking at other media than VOTable.
I'm thinking at JSON/YAML serializations which are mid term use-cases.

@msdemlei
Copy link
Contributor

msdemlei commented Mar 26, 2021 via email

@lmichel
Copy link
Collaborator Author

lmichel commented Apr 7, 2021

And hence the Gaia folks should have written this table with three
photometry columns, one each for G, BP, and RP. I'm sure they'll do
this when we explain them the reasoning.

I'm not the curator of the TABLE that has been provided 2 years ago by ESAC. AFAIR the rationale for this structure was that time stamps are not the sames for each band, and thus this avoids Swiss cheese table.

@lmichel
Copy link
Collaborator Author

lmichel commented Apr 7, 2021

No, let's concentrate the limited capacities we have on things that
VOTable cannot do.

But MANGO and CUBE mapping do resolve what VOTAble cannot do.

@msdemlei
Copy link
Contributor

msdemlei commented Apr 7, 2021 via email

@msdemlei
Copy link
Contributor

msdemlei commented Apr 7, 2021 via email

@lmichel
Copy link
Collaborator Author

lmichel commented Apr 7, 2021

This time we are in agreement.
My mapping should be able to refer to FIELD meta-data instead of duplicating them.
This has already been discussed with @mcdittmar see and here.

  • Referring to native meta-data is a necessary improvement for ModelInsanceInVot
  • This is a work in progress with the SC_QUANTITY and SC_FIELD elements.
    • element definition must be refined
    • I won't change any feature during this workshop period
    • my client code does not support these element, this is why they are not used in any example.

Mango uses extensively MCT and PhotDM.

  • It demonstrates how these component models can be used in the context of various data context.
  • It is not a VOTable model
  • No reason to postpone it

@mcdittmar
Copy link
Collaborator

Until we are (and I still am not), it would seem wiser to me to postpone this "VOTable model" until we have the very basic things (STC, photometry) covered.

Which is proving impossible to do unless we conduct this sort of workshop demonstrating that they are usable within the context of "real" usage in Source-s, Cube-s, TimeSeries-s.

@mcdittmar
Copy link
Collaborator

Yes, I trust they had good reasons for doing what they did, but the result still is inhomogeneous metadata on the magnitude, flux, and error columns, and hence this denormalisation results in a severely irregular table. The most obvious irregularity: a sort by magnitude has no physical interpretation. If we try to bend our design so it works with broken data structures like this, we will make it work a lot worse on regular data -- and perhaps entirely break it. And I trust DPAC won't mind having to go for per-band time series (or the "swiss cheese") if they adopt our annotation; that will help their users, too, even the ones that ignore our annotation.

Hmm.. I'll maybe take a look at the GAIA multi-band example next.

My initial reaction here is that if "reorganize your data" was an option, there wouldn't be a need for the work we are doing.

It may not make sense to 'sort' on the "magnitude" columns, but it does make sense to 'screen Sources with associated G-band filter to magnitude>=X'. That is the benefit of the Models.. to turn the 'broken data structures' into meaningful entities.

@lmichel
Copy link
Collaborator Author

lmichel commented Apr 7, 2021

[@msdemlei] If we try to bend our design so it works with broken data structures ...

We are not trying to bend our design.

  • ModelInsanceInVot has been designed on the base of data sets we found around (TDIG work).
  • gaia_multiband is a show case for using FILTERs.

Nothing allows us to assert that such broken data structure will not be released ever.

  • This allows more compact VOtables which is something that many people wish.
  • They can be consumed by specific clients or pre-processed by associated data publishers (e.g. as you did I guess) to be compliant with their infrastructure by the way.
    Proposing an annotation scheme that is able to map them is meaningful in this context

@Zarquan
Copy link
Member

Zarquan commented Apr 8, 2021

The Gaia multi-band example dates back to when we started looking at how to represent time series in the IVOA. We asked data providers to send us their use cases, including examples of the kind of data that they wanted us to handle.

If I remember correctly, the structure of the multi-band time series reflects the way that the data is collected on the spacecraft, how it is processed in their data processing pipelines, and how the project scientists are used to working with it.

We asked them for examples, and they specifically requested that the IVOA time series should be flexible enough to be able to represent this use case.

I don't think that telling them they are doing it wrong is an option.

@msdemlei
Copy link
Contributor

msdemlei commented Apr 8, 2021 via email

@msdemlei
Copy link
Contributor

msdemlei commented Apr 8, 2021 via email

@Zarquan
Copy link
Member

Zarquan commented Apr 8, 2021

This is what they are trying to represent in their time series data.

The rotation of the spacecraft generates a repeated sequence of blue then red measurements, offset by a small time delay
as light from a source passes over the blue and then red photometer strips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants