Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APE7 - A plan for NDData #8

Merged
merged 2 commits into from
Dec 17, 2014
Merged

APE7 - A plan for NDData #8

merged 2 commits into from
Dec 17, 2014

Conversation

astrofrog
Copy link
Member

This is ready for review. Comments are solicited until Friday 14th November after which a vote will be taken on whether to accept or reject this APE.

(note, this has been renamed to APE7 since there is already an APE6 proposed in #7)

@astrofrog astrofrog changed the title APE6 - A plan for NDData APE7 - A plan for NDData Oct 30, 2014
@keflavich
Copy link
Contributor

I'm fairly happy with this proposal, especially the @support_nddata decorator as a solution to the function API question. To me, that means that we can go on developing cube/spectrum/image utilities essentially without concern for nddata, then just drop on nddata support later.

I think the mask implementation may need further consideration. In SpectralCube, one of the requirements for a LazyMask is that it has a WCS that matches the cube's WCS. This is a sensible requirement but one that I have found frustrating to work with at times. As long as the mask and array are defined in the same pixel space, there is no problem, but lazily-evaluated masks are not defined on any pixel space.

Also, I think there is good reason to have nddata in place since the only natural thing to do with Cubes (given that monitors are flat) is subset them into 1D or 2D slices/projections, and it makes sense to have those subsets behave as similarly as possible to the original cube.

@mhvk
Copy link
Contributor

mhvk commented Oct 30, 2014

Main comment: I like the decorator in principle, but feel it should not be part of this APE, as it does not depend on how NDData itself is implemented (and is more general than nddata -- see my comment on the mailing list).

On the implementation: I like the principle of just defining the properties any function dealing with N-dimensional data should be able to count on being present, but I think that in python this most logically corresponds to not so much a supersimple implementation but rather an abstract base class [1]. I would suggest to actually use an ABC, which other classes can register with. For instance, I see many advantages to basing an image class on Quantity (numpy functions just work, etc.). If we have an ABC, I can just do:

class MyNDData(Quantity):
    @property
     def data(self):
        return self

    @property
    def wcs(self):
        ...
NDDataABC.register(MyNDData)
...

and, presto, all nddata functions will now work with my class, without me needing to subclass from both Quantity and NDData (although in this example that may actually be easy, not sure).

Note that of course this doesn't change anything for the proposal to have a simple NDData class; the only thing would be that code that wants to use NDData checks for isinstance(object, NDDataABC) -- as this would be hard to do retroactively, it is important to get it right in this APE.

If we go the ABC route, the items that i think do not belong in the metaclass are read and write; they should be in the simple implementation.

[1] https://docs.python.org/2/library/abc.html

@mhvk
Copy link
Contributor

mhvk commented Oct 30, 2014

More directly on the implementation: I think for consistency what we do elsewhere in astropy (Time, Coordinates, Table...), none of the data, wcs, mask, etc., attributes should be settable, so that one avoids the possibility of having an object in an inconsistent state. For __getitem__, this of course means going through a class initializer, but I think that's fine (in fact, better than assuming any subclass can handle being initialized with no argument as is done in the example!).

@mhvk
Copy link
Contributor

mhvk commented Oct 30, 2014

Another small comment: I see no reason to have a _slicable attribute; it should be up to a subclass to redefine __getitem__ is it has other attributes that need to be sliced (or not sliced). Arguably, it is also too much of a detail to be part of an APE.

@mhvk
Copy link
Contributor

mhvk commented Oct 30, 2014

Overall: very much in favour of a very simple NDData class!

@perrygreenfield
Copy link
Member

I'm not sure I buy into the idea that Image or Spectrum (or all the other variants possible) should be a subclass. The WCS property has sufficient information to indicate whether or not it contains an image or spectrum, or both. We can put the relevant operations into functions that test to see whether that nddata object has the necessary axes to support the kinds of operations intended. (We could stick all these functions as generic nddata methods, that raise an exception if it doesn't have the appropriate axis.

This would make using nddata objects a lot simpler in many respects and avoid the complications of having to deal with:

Image
1d, 2d, 3,d spectral
time series
polarization

etc.

and all the combinations that might be possible with these.

Instead, nddata could have a method (or a function) that one can use to see if is is usable as a spectrum (of any spatial dimension). Or if it supports image operations (perhaps as a time series, or IFU). So if I want to write a function that integrates a spectrum over a bandpass, it would work with 1d, 2d, and 3d spectral cases and return the appropriate result (e.g, an image for a spectral cube, but still nddata, but with a modified wcs and different .data dimensionalty).

If I want to sum the flux in a spatial region, that should work with any nddata object that has two spatial dimensions, e.g., images, spectral cubes, polarization image sets, or image time series.

If we go with the proposed subclasses, we'll end up either reimplementing the same methods in many cases.

@perrygreenfield
Copy link
Member

I think this should be clearer about the standard attributes and whether they are optional (and if so, how their absence is handled, e.g., value of None?)

@astrofrog
Copy link
Member Author

Just a quick comment to @mhvk regarding the decorator stuff - this APE is not just about how NDData is implemented, but about how we can use it, hence why this is included here. I don't think we can decouple the discussion of implementation of NDData from how it's used.

@mhvk
Copy link
Contributor

mhvk commented Oct 30, 2014

@astrofrog - I do think you are trying to do 2 things at the same time here. But underlying this is perhaps my impression that no real progress with NDData was made mostly because it was defined to have all kids of bells and wistles, like a working uncertainty and all those flags, for which no good implementation was available. If so, it may be that most problems are solved just by simplifying it -- once that is done, and we actually start using it, we will find out soon enough whether more is needed. My main worry would be that prescribing how it is used locks us in to something we may come to regret later.

Anyway, the bigger question in my mind is whether to do this through an ABC..

@astrofrog
Copy link
Member Author

@mhvk - the decorator way of doing things is not an only solution and anyone can chose to implement functions that use e.g. only NDData. I agree we are doing two things at one in this APE, but I think that's ok given that the scope of the APE is the overall plan for NDData :) If it looks as though the decorator part is going to be controversial (it has not been in previous discussions) then we can consider taking it out. But it sounds like you want to go even more general, which is not incompatible with what is in the APE.

Regarding the ABC - I originally had NDData as an ABC in this APE, and @cdeil and @embray convinced me that this isn't needed. In fact, @perrygreenfield's comments above go the other way by saying that NDData could be used directly in a lot of cases. Let's see what others think.

@embray
Copy link
Member

embray commented Oct 30, 2014

What if there were some NDDataBase which is an ABC, from which the plain NDData inherits. I like @mhvk's argument in favor of this, I just don't think the NDData class itself should be abstract, but rather a minimal concrete class (it could still have an ABC underlying it though which could be good).

@embray
Copy link
Member

embray commented Oct 30, 2014

Also regarding what @mhvk wrote:

Note that of course this doesn't change anything for the proposal to have a simple NDData class; the only thing would be that code that wants to use NDData checks for isinstance(object, NDDataABC) -- as this would be hard to do retroactively, it is important to get it right in this APE.

In addition to

NDDataABC.register(MyNDData)

you could also write

NDData.register(MyNDData)

then isinstance(MyNDData(...), NDData) will work.

@embray
Copy link
Member

embray commented Oct 30, 2014

In other words, I have argued that there should be some NDData base class that is minimally functional, and can be subclassed to provide additional functionality. But also having an ABC that defines the interface for "what is an NDData" means any class that implements this interface can work as an "NDData" even if it chooses not to be an explicit subclass of NDData.

@mhvk
Copy link
Contributor

mhvk commented Oct 30, 2014

@embray - yes, having both the ABC and the minimal implementation is what I had in mind. I hadn't realised though one would also be able to register with NDData; that makes it even neater!

(@taldcroft - I think the above discussion on ABC usage is relevant for how we deal with column-like objects as well; effectively, astropy code would have Column.register(Quantity), etc., and users could register their own column-like object the same way.)

@embray
Copy link
Member

embray commented Oct 30, 2014

With apologies if this is getting too off track from discussion of the broader points of this proposal (and if so we can take it elsewhere) but I think this minimal example demonstrates what @mhvk and I are talking about:

import abc

from astropy.units import Quantity


class NDDataMeta(abc.ABCMeta):
    def __instancecheck__(cls, obj):
        if cls is NDData and issubclass(type(obj), NDDataBase):
            return True
        return super().__instancecheck__(obj)   


class NDDataBase(metaclass=NDDataMeta):
    @abc.abstractproperty
    def data(self):
        """The raw data contained by this object."""

    @abc.abstractproperty
    def wcs(self):
        """The WCS mapping data to some coordinate system."""


class NDData(NDDataBase):
    @property
    def data(self):
        return "Hello World!"

    @property
    def wcs(self):
        raise NotImplementedError()


class Image(NDData):
    """
    Some application-specific NDData subclass that inherits useful
    functionality from the base class.
    """


@NDDataBase.register
class MyNDData(Quantity):
    """Not an NDData subclass but implements the appropriate interface."""

    @property
    def wcs(self):
        raise NotImplemented()

Then

>>> d = MyNDData(1)
>>> isinstance(d, NDDataBase)
True
>>> isinstance(d, NDData)
True
>>> isinstance(d, Image)
False

Something along those lines.

@astrofrog
Copy link
Member Author

@embray - I need to think about this more, but can you explain why:

>>> isinstance(d, NDData)
True

in the example you showed? I'm very confused as to how this tests as True (and why that should make sense).

@embray
Copy link
Member

embray commented Oct 30, 2014

Because NDDataMeta.__instancecheck__ is overridden so that any instance of a class that is a registered implementation of NDDataBase also tests true for isinstance(..., NDData). I don't think it's so important that that works over just isinstance(..., NDDataBase) but if it were really a point of confusion then that can be solved.

My example is probably more complicated than necessary. The point is that it may be useful to have an interface defined by an ABC that other classes can match themselves to, while at the same time having a minimally useful NDData base class.

@astrofrog
Copy link
Member Author

@perrygreenfield - regarding the sub-classes, I can see the argument for not needing the sub-classes. On the other hand, a Spectrum class may make sense in that it can define methods that only apply to spectra (for example specific types of interpolation). In addition, the I/O framework needs to know which readers and writers it can use, and relying on the WCS to choose what formats are available may not be very robust. The current I/O framework relies on the class of the data to determine which readers/writers will work. There's also the issue of uncalibrated data that has no WCS - how would you distinguish a position-position image from a 2d spectrum for example?

@astrofrog
Copy link
Member Author

@perrygreenfield - just to make sure I understand what you are suggesting - I take it you are not opposed to having classes such as CCDImage for example, which actually add functionality specific to CCD images? Are you just saying that one shouldn't have a sub-class just for the sake of constraining the WCS type/number of dimensions?

@perrygreenfield
Copy link
Member

@astrofrog - Isn't the I/O aspect best handled by a factory function? It could:

  1. is able to unambiguously determine which interpretation it should apply to create the appropriate axis types, and does so in the wcs of the nddata object.
  2. if it is ambiguous (some missing items in WCS, or multiple possible interpretations), then it is an error, and the user must supply a standard interpretation option. For example, if no WCS exists and it is a 2d array, then they could say axes=('spatial','spectral') (how we deal with the specification is a detail to work out; I'm just illustrating crudely; in principle, something has to be assumed about units and scale without specified inputs so that information has to be supplied somehow.

@perrygreenfield
Copy link
Member

@astrofrog - Right, subclasses would be permitted, just not necessary for specifying the kinds of axes it has.

@mhvk
Copy link
Contributor

mhvk commented Oct 31, 2014

Just to leave also here Steve Crawford's suggestion on the mailing list to include uncertainty in the list of pre-defined property names (i.e., does not have to be present, but when it is, it should be called uncertainy). I'm +1 on that.

not accept it - for example, if ``wcs`` is set, but the function cannot support
WCS objects, an error would be raised. On the other hand, if an argument in the
function does not exist in the ``NDData`` object or is not set, it is simply
left to its default value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is problematic to let the decorator fail is an NDData property is set, but the function does not support it. I image e.g. a function that shifts a spectrum from heliocentric to the reference frame of a star. All this function needs is the wcs (and maybe meta to record this shift in a keyword), but with this implementation is would also be required to take data, unit and uncertainty arguments just because the decorator fails if not every attribute of the NDData is accepted by the function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hamogu - maybe we could make all this parametrizable in the decorator, e.g. you could say that you don't care if there is anything else that is not needed by the function, or that it can ignore specific attributes, etc.

Having said that, the way this decorator is designed, it would not work if the function takes only wcs - it would need to take a data argument too.

@astrofrog
Copy link
Member Author

@perrygreenfield - I have been thinking about your comments some more, and to some extent I think we can decide on which sub-classes exist after this APE. So I removed the paragraph that I think you were disagreeing with:

In practice, users would very rarely use the ``NDData`` class directly,
instead using sub-classes, such as ``Image`` or ``Spectrum`` for example
(although this APE does not specify what sub-classes should exist, and these
are given purely as illustrative examples). In addition, developers would not
need to write functions that can take fully generic ``NDData`` objects, but
could restrict themselves to specific sub-classes such as ``Image`` or
``Spectrum``.

and I have re-worded things in other places to avoid talking about this. With these changes, do you support this APE or do you still have concerns?

@astrofrog
Copy link
Member Author

I have implemented changes based on the comments so far:

@mhvk: I have removed the setters and mentioned why we don't want to include them. I have also removed the advanced details about the slicing and _sliceable since it's too detailed for an APE. Finally, I added a mention saying that uncertainty won't be in the base class (yet) but if sub-classes want to implement uncertainties they should use the term uncertainty.

@mhvk @embray: I can see where you are going with the ABC but I think that it requires additional consideration, and I'm not sure if it's really needed yet. I thought about it and if we implement things as suggested in this APE and later want to add the ABC, it will be possible to do it without breaking any backward compatibility, so I've taken the option of adding a paragraph briefly mentioning this option but saying we leave it for a future discussion beyond the APE.

@mwcraig
Copy link
Member

mwcraig commented Nov 14, 2014

@astrofrog -- this might not belong in the APE, but the PR astropy/astropy#2905 currently contains a mixin class for arithmetic and a subclass that provides the functionality of the old NDData. I could see:

  • mentioning it in the APE under Backward Compatibility
  • providing the mixin/subclass in astropy or ccdproc but not including it in the APE, which really focuses on the base NDData class

I'm fine with either one, so count this as a +1

@crawfordsm
Copy link
Member

I am very happy with the overall APE, but I still feel very strongly about including uncertainty. However, I don't want to hold up the development and I know even with this APE, it doesn't necessarily prevent an uncertainty property being included or necessarily removed. So consider this a +1 vote and if people feel strongly that uncertainty should not be included at all, then I am happy to support the consensus.

I would be much happier with including the following (or some variation of the following):

  • uncertainty -- An object describing the uncertainty associated with data. No restrictions are placed on what type of object this is and its behavior should be set in classes that inherit from NDData and use uncertainty.

I am not happy with the current APE that does not include this as a base property and also with the current language in the APE. First, I don't think we should place any restrictions on what the uncertainty can be (it could be a function, a probability distribution, and possible many things that we can't even think of right now). From my view, I think not specifying what it will be is a feature. We should never put any limitations on how uncertainty behaves as that will limit the possibility of what we can develop and how useful NDData is suppose to be. It is one of the things that started this whole conversation of changing NDData (ie that the base class was too specific for general use cases). So I think the phrasing of ' once we settle on an infrastructure for handling uncertainties.' is wrong and that we should never really settle on an infrastructure for uncertainty in NDData, itself.

In addition, we really want to prevent people from calling it error, variance, likelihood, or anything else when they are developing different classes that inherit from NDdata, and not linking those properties back to an uncertainty property. My feeling is that the average user won't read the APE or even the documentation, so I do not think mentioning it here is really sufficient to prevent that. Others may disagree with this or it might be such a small case situation as to not really be that big of a concern and I won't really argue about that.

@jehturner
Copy link
Member

This APE doesn't rule out that we could end up moving things to the base class or NDData in future if we find that they are general, but I think we'll only know what those are through real-life usage, rather than design-by-committee.

OK, so this is the important point. Perhaps it's worth saying up front that NDDataBase & NDData are a work in progress that are liable to acquire more specific definitions as the community thrashes out what works in practice? Also, separately from the APE, maybe encourage people to work on a small number of initial subclasses for specific data types rather than doing their own thing based on NDDataBase/NDData (without intending to discourage alternative experiments for a good reason).

Regarding the wording for #3, maybe just consider changing "could" to "will" or "will probably".

@astrofrog
Copy link
Member Author

I'd be in favor of adding a sentence saying that if we the community identify core functionality that make sense in these classes, there isn't any reason why these couldn't be added in future, but I think those should probably require an APE or at least a strong consensus.

@jehturner
Copy link
Member

Fair enough -- you certainly don't want it evolving erratically -- but I'd hope in the long run it will be expected to evolve further towards encouraging compatibility over lots of similar-but-not-quite-compatible conventions.

@mwcraig
Copy link
Member

mwcraig commented Dec 11, 2014

Sorry about the silence -- will have time to comment/update later today.

@mwcraig
Copy link
Member

mwcraig commented Dec 11, 2014

@jehturner -- thanks for condensing, and apologies for the delay in responding.

  1. If I want to propagate my datasets as nddata instances, where do I keep my data quality arrays, since "mask" is boolean? Is there any reason why this class doesn't include flags like the previous nddata? Perhaps because there's no standard for what the different flags mean?

I think the main reason that FlagCollection wasn't in the APE is that it seems not have been used at all in its current incarnation. That said, in the implementation PR so far I haven't actually removed the FlagCollection code, so it could remain even though it isn't part of NDData or NDDataBase.

  1. Shouldn't "meta" be defined a bit more precisely? Let's say I need to access keyword / value / description triplets (eg. a FITS header).

My experience with ccdproc, is that anything beyond dict-like is problematic. In ccdproc we started by requiring the meta-data to be a FITS header; that was a hassle if your data wasn't coming from a FITS file and resulted in warning messages if one, e.g., used a long key. Then we tried using something like an OrderedDict, but that rendered metadata that came from a FITS header unusable in the sense that going from FITS → meta → FITS resulted in a broken FITS file. We've settled (in ccdproc) on requiring that the meta-data be dict-like and providing a default OrderedDIct for meta if nothing is provided.

This approach for NDData would provide consistency across subclasses (any metadata is in meta) without putting format-specific constraints on the meta-data.

@mwcraig
Copy link
Member

mwcraig commented Dec 11, 2014

I'd be in favor of adding a sentence saying that if we the community identify core functionality...

I would strengthen this a bit: NDData needs more community feedback on core functionality than many other areas of astropy. I think this APE gets NDData to the point that it can be more widely used by the community so that down the road more functionality can be added.

I hope it does lead to future APEs once the updated NDData is out and gets some use.

Address all remaining line comments on APE
but does only very limited input validation.

* It provides generic ``read`` and ``write`` methods that connect to the I/O
registry, as for the ``Table`` class.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwcraig - damn I just noticed these two bullet points. The second is no longer true, right? Shall I re-word to say

The proposed ``NDDataBase`` simplifies the current ``NDData`` class to the extreme,
such that it essentially only defines the properties needed for ``NDData``?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@weaverba137
Copy link
Member

👍

@cdeil
Copy link
Member

cdeil commented Dec 15, 2014

👍

(Previously I argued for the alternative: remove NDData, because it didn't do anything useful. Now it sounds like it will provide some I/O and slicing infrastructure or implementations ... that sounds great to me!)

@adrn
Copy link
Member

adrn commented Dec 16, 2014

👍

1 similar comment
@Cadair
Copy link
Member

Cadair commented Dec 16, 2014

👍

@jehturner
Copy link
Member

@mwcraig Thanks for the edits. I haven't actually thought hard about whether the existing FlagCollection is the best way to achieve what it does & just need a way to propagate multiple flags, so if FlagCollection is currently not being used it might be worth revisiting for the sub classes.

I've tried to do FITS → meta → FITS previously for another project and it was clear that an ordered dictionary with comments as well as values would be the minimum needed to preserve the information usably. Did you find that still insufficient? I can see you might also need to track empty space to reproduce FITS files exactly (I'm sure Erik could comment definitively but maybe it's a bit of a tangent here at this point). I was thinking that might be better than nothing (otherwise the consistency provided may still be inadequate to write interoperable code) but it wouldn't accommodate every format and for now it seems this discussion has moved onto finalizing what's already here for the base class.

Sorry, I tried to nest this under your comment, out of the way of the votes, but it keeps ending up at the bottom.

@mwcraig
Copy link
Member

mwcraig commented Dec 16, 2014

@jehturner -- no worries! The bigger problem I ran into was long keywords getting mangled (because a dict doesn't know about HIERARCH) and CONTINUE cards for long values getting mangled. At that point I was going to have to teach the metadata to understand FITS, so I went the route of being more permissive, and writing short I/O wrappers. Standardizing on calling the metadata meta and the data data should make it easy to write one wrapper per output format.

@jehturner
Copy link
Member

I see. Yes, at least it facilitates 1 wrapper per format for I/O (maybe also for manipulating the metadata non-trivially, but whether small project developers would consistently find time to do that I'm not sure).

@astrofrog
Copy link
Member Author

@mhvk @embray @keflavich - since you all commented above, any final votes?

@astrofrog
Copy link
Member Author

@crawfordsm @mwcraig - since uncertainty is now included I'll assume your +1 votes still hold.

I'm 👍 on the APE as it is now.

@crawfordsm
Copy link
Member

+1

2 similar comments
@hamogu
Copy link
Member

hamogu commented Dec 17, 2014

+1

@keflavich
Copy link
Contributor

👍

@mhvk
Copy link
Contributor

mhvk commented Dec 17, 2014

I like the APE too.

I have one quick textual issue, which I'll comment on in-line, and a slightly larger one that I think the APE prescribes some things for the abstract base class that I'm not sure how to enforce in practice. In particular, a very nice thing is that one can set @abstractproperty which ensures that a given property has to be defined, but I don't think one can easily make a default one that returns None. In the end, I think this is a detail, though, as in practice it is most likely one will just subclass from NDData itself.

on-the-fly.

* ``unit`` - the unit of the data values, which will be internally
represented as an Astropy Unit. Sub-classes could choose to connect this to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the first instance of Sub-classes could choose to connect this to data.unit should be deleted, as it is repeated further down.

@mwcraig
Copy link
Member

mwcraig commented Dec 17, 2014

Wasn't sure how voting worked for APE authors, but yes, 👍

@eteq eteq merged commit 1f81555 into astropy:master Dec 17, 2014
eteq added a commit that referenced this pull request Dec 17, 2014
@eteq
Copy link
Member

eteq commented Dec 17, 2014

Seeing all the votes in favor, the coordination committee has accepted this proposal.

@mhvk - I'm 👍 on using @abstractproperty, but my thinking is that that's something of an implementation detail that can go in the actual code implementing this APE.

@mhvk
Copy link
Contributor

mhvk commented Dec 17, 2014

@eteq - yes, agreed, this is an implementation detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.