Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use skycell WCS in outlier detection #1615

Open
stscijgbot-rstdms opened this issue Feb 11, 2025 · 8 comments · May be fixed by #1642
Open

Use skycell WCS in outlier detection #1615

stscijgbot-rstdms opened this issue Feb 11, 2025 · 8 comments · May be fixed by #1642

Comments

@stscijgbot-rstdms
Copy link
Collaborator

Issue RCAL-1016 was created on JIRA by Eddie Schlafly:

For Roman our usual mode for creating coadded products is to create them on a common set of predefined WCSes which we call skycells.  We support coadding onto skycells in resample but we do not currently support it in outlier detection.  This means that we make larger coadds during outlier detection than we need, and detect outliers in those larger coadds, and then throw away those computed values.  

We should instead use the skycell WCSes during the outlier detection step as well in order to save time and memory.  An argument could be made that we should in fact use a WCS that is one or two pixels larger than the skycell WCS so that we guarantee the exact same treatment of outliers in the boundary region between skycells.  If that's easy to support, we should do it, but if it's annoying, we should pass for now since it's certainly harder to think about.

@braingram
Copy link
Collaborator

One option for this would be to read the wcs from the association data linked to the ModelLibrary fed into outlier detection. That could mean that providing an association/library with a skycell name would automatically trigger using the skycell for outlier detection. Would it be helpful to have an option to disable this (to provide an association with a skycell but to run outlier detection with the full field of view)?

@braingram braingram linked a pull request Feb 25, 2025 that will close this issue
10 tasks
@schlafly
Copy link
Collaborator

I don't think an option to do outlier detection for the full field of view when we're only going to resample on a skycell is valuable. Unless the code is easier for some reason I wouldn't do that. Our intent is now to populate the skycell_wcs_info in the association files that includes enough information to construct the output WCS. I think I would probably key on that information rather than doing the lookup into the tessellation file.

@braingram
Copy link
Collaborator

Do you expect there to be data differences (different outliers detected and data differences in the resampled to skycell image) when using the skycell wcs for outlier detection? So far I'm seeing differences for the output of test_mos_skycell_pipeline:
https://github.com/spacetelescope/RegressionTests/actions/runs/13523271813/job/37797551254#step:30:2069
here's the summary of the data differences:

                     "root['roman']['data']": {'abs_diff': np.float32(761662730000.0),
                                               'n_diffs': 23664,
                                               'worst_abs_diff': {'index': (np.int64(743),
                                                                            np.int64(2253)),
                                                                  'value': np.float32(343840030000.0)},
                                               'worst_fractional_diff': {'index': (np.int64(3418),
                                                                                   np.int64(1362)),
                                                                         'value': np.float32(209763600000.0)}},

I'm unsure if this is due to the skycell pixels being a different size (perhaps combined with the other resample bugs like the failure to scale intensity) or if this is caused by a different bug.
I understand the large number of reported outliers:
https://github.com/spacetelescope/RegressionTests/actions/runs/13523271813/job/37797551254#step:30:769
since the "blot" images are largely undefined (0 value at the moment although the fillval should likely be nan). These 0 values in the blot image end up far from the input values and result in many pixels outside the skycell being flagged as outliers. If I switch the fillval to nan (see #1630) the number of detected outliers is more reasonable (3574 w/nan fillval vs 6781457 w/indef/0 fillval vs 6970 on main). However the data differences are similar.

If the expectation is that the resampled results should be the same with this change I'd say we resolve some of the other resample/outlier detection issues first (fillval, use code from stcal, correct intensity scale) before updating outlier detection to use the skycell wcs.

@schlafly
Copy link
Collaborator

Thanks Brett. I don't expect the results to be identical---since the exact boundaries of the pixels that are used for making the median will move, that will have to tickle some pixels with values on the boundary of being cut. But conceptually I don't see a reason to prefer whatever the default WCS is that outlier detection constructs to cover the full footprint covered by the exposures to one that matches the skycell that we eventually output.

I agree that for values outside the skycells there will be big differences, and changing the behavior of the code to not compare those nonsense values (e.g., via the fill value) seems like the right decision.

i.e., another option would be not to look at the parts of the exposures that land off the skycell in the blot stage. I don't think there are big savings to be had there and the fill value change seems a better idea to me---just mentioning that we don't technically even need to compute those values.

@braingram
Copy link
Collaborator

braingram commented Feb 27, 2025

Thanks again. I spent some more time looking at this and convinced myself that a change in outliers within the bounding box is also not unexpected. Since the skycell wcs differs from the combined wcs (in pixel size, bounding box, etc) the computed median differs (leading to different outliers). We have quite a few resample changes and one way to queue these up would be:

@braingram
Copy link
Collaborator

@nden mentioned it may be useful to end users (and local runs) to continue to run outlier detection and resample using the combined wcs. I can think of 2 options for handling this:

  • ask the user to modify the association by removing the target name (we'll want to make sure we're testing this case) which will trigger these steps and the pipeline to used the combined wcs
  • add a pipeline/resample/outlier_detection spec argument to disable using the skycell information (perhaps use_skycell_wcs defaulting to True)

The spec update seems most convenient for quick testing (doesn't require file changes) but does require updating several specs and we'll want to define what happens if a user says use_skycell_wcs and there is no skycell information in the association). I think both have pros and cons. If I had to chose I'm currently leaning towards adding use_skycell_wcs which will take a bit more work to implement but seems most useful for quick testing.

@nden
Copy link
Collaborator

nden commented Feb 27, 2025

I am also in favor of this being controlled by a parameter. Is the name resample_on_skycell more informative?

@schlafly
Copy link
Collaborator

Exactly, thanks. I agree with everything you say re the median changing due to the exact boundaries of the output pixels.

I'm fine with use_skycell_wcs or resample_on_skycell (though the latter feels a little more awkward in outlier detection). One thing I would add though is that we added the skycell_wcs_info to the association in order to avoid needing the 'target' name trigger any special behavior. Relying on skycell_wcs_info makes it easier for people to define their own skycells without needing to download and use our patch table, etc.. So I would try to have special behavior trigger on skycell_wcs_info rather than target.

If someone includes skycell_wcs_info, it does seem to me that they probably want to use it for both outlier detection and resample. But as long as this parameter is default True, fine with me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants