Parallel and epoch based association #277

ajstewart · 2020-08-20T14:37:56Z

I've developed a little bit of a parallel association method, essentially what it does is:

Analyses the sky regions to be processed and groups them into overlapping regions. E.g. the attached plot shows the results of this grouping using the full run sky regions information. Each different colour is a different 'group'.

These sky region groups don't overlap and hence don't need to be associated together. Instead it ships these off to have the association run in parallel and gathers the results at the end.
It combines the results and corrects the source ID numbers and relation information.

I've added this as an option as you should only really see speed improvements if you have a lot of images over a number of sky regions. If you only have a small job it's better to keep it switched off (just the initialise time of the scheduler pushes it to be slower).

~~I've kept this as a draft for now as it needs to be tested on a large dataset first~~, I've tested it locally and it seems to run fine. This is about as parallel as you could get it at the moment without adjusting the the kind of ASKAP data we are getting. Ideally you could envision having one FITS file and one catalogue per epoch but of course these will be a nightmare (and a silly idea) to deal with. So we have what we have at the moment.

Fixes #259.

New sources analysis needs updating

ajstewart · 2020-08-26T05:41:57Z

@srggrs with the recent updates this can start to be looked at as to what you think.

Along with the ability to split the input sky region into groups to associate in parallel as explained above, it can also now allow the user to define epochs, i.e. which images count as one 'epoch' that should be considered as the same overall time. An example of this is without epoch mode the source looks like this because of the duplicate measurements in the epochs:

using epoch mode results in the following:

So users have the choice what they want to do.

Currently epoch mode is activated is the user inputs dictionaries with the keys being the epoch number (for all file inputs), e.g.:

IMAGE_FILES = {
    # insert images file path(s) here
    1: sorted(glob.glob('/import/ada1/askap/PILOT/release/EPOCH01/COMBINED/STOKESI_IMAGES/*.fits')),
    2: sorted(glob.glob('/import/ada1/askap/PILOT/release/EPOCH02/COMBINED/STOKESI_IMAGES/*.fits')),
    3: sorted(glob.glob('/import/ada1/askap/PILOT/release/EPOCH03x/COMBINED/STOKESI_IMAGES/*.fits')),
    4: sorted(glob.glob('/import/ada1/askap/PILOT/release/EPOCH04x/COMBINED/STOKESI_IMAGES/*.fits')),
    5: sorted(glob.glob('/import/ada1/askap/PILOT/release/EPOCH05x/COMBINED/STOKESI_IMAGES/*.fits')),
    6: sorted(glob.glob('/import/ada1/askap/PILOT/release/EPOCH06x/COMBINED/STOKESI_IMAGES/*.fits')),
    7: sorted(glob.glob('/import/ada1/askap/PILOT/release/EPOCH07x/COMBINED/STOKESI_IMAGES/*.fits')),
    8: sorted(glob.glob('/import/ada1/askap/PILOT/release/EPOCH08/COMBINED/STOKESI_IMAGES/*.fits')),
    9: sorted(glob.glob('/import/ada1/askap/PILOT/release/EPOCH09/COMBINED/STOKESI_IMAGES/*.fits')),
    10: sorted(glob.glob('/import/ada1/askap/PILOT/release/EPOCH10x/COMBINED/STOKESI_IMAGES/*.fits')),
    11: sorted(glob.glob('/import/ada1/askap/PILOT/release/EPOCH11x/COMBINED/STOKESI_IMAGES/*.fits')),
}

Essentially internally what happens is that the images are sorted into 'epoch' groups. Importantly the user sees no difference if they enter just a list of images, in this case it converts the list into a dictionary where each image is it's own epoch (sorted in date order).

In the association what happens is:

The prep_df for the association now takes into account epochs. It will load all measurements from all the images requested in the epoch.
The measurements are then analysed to drop duplicates. A new user config duplicate_radius governs what is considered to be a duplicate source. It will always take the most 'central' source as the actual measurement, i.e. the one closest to the centre of its respective sky region (commonly in radio data this is considered to be the most reliable datapoint').
Perform association as normal.
If parallel is selected it will do this per sky region group.

After association the only other change when using epoch mode is the calculation of the 'missing' images used in new sources and forced extraction. Here it will check that each epoch that it was supposed to be seen in is accounted for, if a forced extraction is required it will again always choose the image from the sky region closest to the source.

Bits that are not clear on how to do:

How to alter the UI such that that users can easily use the website to set up such a run. However I'm not too fussed about this as I consider it an advanced mode that you should use if you know what you are doing, so allowing the user to edit the config in the website may be enough.
Related to above there isn't an explicit switch to turn on, it's just activated when the user enters dicts.
Pipeline run may need to specify that it was run in epoch mode?

* Fixed the ordering of epochs in the ideal coverage dataframe. Dask seemed to randomise them. * Added a check such that if a an epoch was already in the source but wasn't the ideal image, the pipeline now won't force extract from the same epoch.

Co-authored-by: Serg <[email protected]>

Also a minor template fix

vast_pipeline/config_template.py.j2

vast_pipeline/pipeline/main.py

marxide · 2020-10-06T23:21:12Z

I see there's a switch to turn this on when creating a pipeline run with the UI, but there doesn't appear to be a way to define the epoch dict that it expects. Is that right?

Co-authored-by: Andrew O'Brien <[email protected]>

ajstewart · 2020-10-06T23:31:12Z

I see there's a switch to turn this on when creating a pipeline run with the UI, but there doesn't appear to be a way to define the epoch dict that it expects. Is that right?

Yeah the parallel option can be taken advantage of whether it's epoch based or not hence that option in the config.

For actually using the epoch based option, triggered when dictionaries are entered, my idea at the moment is to have a good documentation page on this and if users want to set this (purely through the website) they can use the text editor on the job config page.

I think a more sophisticated entry method is definitely required but could be done after that fact. Partly because I know that initially our usage numbers will be low to begin with (as in actually constructing and running custom jobs).

* Validation is now run before images are linked to selavy, noise and bkg. * Background images are able to be added even if Monitor is False. * Check added for whether background images are defined in data linking.

vast_pipeline/pipeline/association.py

Removed unnecessary indexing on corrections.

vast_pipeline/pipeline/association.py

ajstewart · 2020-10-13T00:26:15Z

@srggrs conflicts resolved (it was just the change log).

CHANGELOG.md

ajstewart added 7 commits August 7, 2020 22:41

Added a parallel association option

bc2d154

Clearer function names

f3fd005

First epoch association steps

58433c1

Merge branch 'master' into epoch-based-association

0854cdf

Working remove duplocates

2c6b5da

New sources analysis needs updating

Working parallel and epoch association

37c0a36

Fixed related id correction

904d645

ajstewart added the enhancement New feature or request label Aug 20, 2020

ajstewart added 14 commits August 21, 2020 00:48

Tidy remove duplicates

1ba25d5

Images df fix in main

87efeff

Template config fix

b9a6671

Switch to full cpu usage

9aaa0f3

De ruiter and source number fix

b1b41e1

Efficiency improvements

3a7ee3b

Tidied duplicate sources

54c0de9

Attempts to keep memory down

147de5c

Merge branch 'master' into epoch-based-association

51e6fdb

Merge branch 'master' into epoch-based-association

7706cb5

Merge branch 'master' into epoch-based-association

27ec963

Added UI options

8ca970a

Config and log minor changes

4af9ce2

Added comment

51d63cc

ajstewart requested a review from srggrs August 26, 2020 05:42

ajstewart added 5 commits August 27, 2020 02:24

Merge branch 'master' into epoch-based-association

f8689a0

Edit logging message

badc3ce

Forced extraction drop flux_peak

8798616

Merge branch 'master' into epoch-based-association

5fd7750

ajstewart and others added 2 commits October 3, 2020 00:47

Apply suggestions from code review

3d5b991

Co-authored-by: Serg <[email protected]>

Changed 'image' to 'image_dj'

da5b756

ajstewart requested a review from marxide October 2, 2020 15:27

ajstewart added 2 commits October 3, 2020 13:14

Updated CHANGELOG.md

65922df

Move epoch based saving to start

a1a1a86

Also a minor template fix

marxide requested changes Oct 6, 2020

View reviewed changes

vast_pipeline/config_template.py.j2 Outdated Show resolved Hide resolved

vast_pipeline/pipeline/main.py Outdated Show resolved Hide resolved

vast_pipeline/pipeline/main.py Outdated Show resolved Hide resolved

ajstewart and others added 2 commits October 7, 2020 10:24

Update vast_pipeline/config_template.py.j2

3ecb414

Co-authored-by: Andrew O'Brien <[email protected]>

Update vast_pipeline/pipeline/main.py

b185131

Co-authored-by: Andrew O'Brien <[email protected]>

ajstewart added 3 commits October 7, 2020 12:36

Changed order of validation and image data linking

f780d1c

* Validation is now run before images are linked to selavy, noise and bkg. * Background images are able to be added even if Monitor is False. * Check added for whether background images are defined in data linking.

Completed docstring

fe8e5a5

Merge branch 'master' into epoch-based-association

a82d2a1

srggrs reviewed Oct 9, 2020

View reviewed changes

vast_pipeline/pipeline/association.py Outdated Show resolved Hide resolved

ajstewart added 2 commits October 10, 2020 13:30

Added comments to parallel association method

71e6f7b

Removed unnecessary indexing on corrections.

Changed wording on comment

f145c75

srggrs requested changes Oct 12, 2020

View reviewed changes

vast_pipeline/pipeline/association.py Show resolved Hide resolved

Fix parallel assoc

759756a

marxide previously approved these changes Oct 12, 2020

View reviewed changes

Merge branch 'master' into epoch-based-association

7660bfe

ajstewart dismissed marxide’s stale review via 7660bfe October 13, 2020 00:23

srggrs requested changes Oct 13, 2020

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

CHANGELOG.md Show resolved Hide resolved

ajstewart requested a review from srggrs October 13, 2020 04:52

srggrs approved these changes Oct 13, 2020

View reviewed changes

ajstewart merged commit 1abc88f into master Oct 13, 2020

ajstewart deleted the epoch-based-association branch October 13, 2020 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel and epoch based association #277

Parallel and epoch based association #277

ajstewart commented Aug 20, 2020 •

edited

Loading

ajstewart commented Aug 26, 2020 •

edited

Loading

marxide commented Oct 6, 2020

ajstewart commented Oct 6, 2020

ajstewart commented Oct 13, 2020

Parallel and epoch based association #277

Parallel and epoch based association #277

Conversation

ajstewart commented Aug 20, 2020 • edited Loading

ajstewart commented Aug 26, 2020 • edited Loading

marxide commented Oct 6, 2020

ajstewart commented Oct 6, 2020

ajstewart commented Oct 13, 2020

ajstewart commented Aug 20, 2020 •

edited

Loading

ajstewart commented Aug 26, 2020 •

edited

Loading