Stack overwrites previous data? (plus other issues) #371

asyates · 2024-09-26T12:25:17Z

Taking a look at stacking code, it seems like the previous stacked data would always be overwritten by new days (e.g. with flag='T'), since overwrite is set to True by default.

xr_save_ccf(sta1, sta2, components, filterid, mov_stack, taxis, xx, overwrite=True)

I ran a quick test also to confirm this is happening. Is this intended behaviour? i.e., if you were running in real-time... the stacks directory would always just contain stacked CCFs for most recent data.

Just thinking that perhaps the idea is to continue through and process dv/v, update the output files (assuming these are not overwritten but just inserting new result) and not worry about keeping old stacked data (since its all contained within cross-correlation directory). But then, the plotting functions should also use cross-correlation directory, not the stacked data.

The text was updated successfully, but these errors were encountered:

asyates · 2024-09-26T12:48:26Z

On a similar note re. stacking (subdaily) and real-time processing, if I imagine pulling in one day of data at a time and stacking subdaily, e.g. 12 hr with 1 hr sampling rate.... based on the current implementation, would I expect that the first 11 stacks are constructed using less data? e.g. first stack only one hour, second only two hours etc.

i see some old comment/code from msnoise 1.6 (below) suggesting that we should be pulling the updated days - mov_stack, but i don't see this in the current get_results_all function? Appears to me, just from the code, that it is reading only the .h5 files for individual days where flag='T', not any previous (so would not stack properly). Am I wrong?

                    # TODO: load only the updated dates +- max(mov_stack)+1
                    # Note: this would no longer be needed if the stack is h5
                    nstack, stack_total = get_results(
                        db, sta1, sta2, filterid, components, datelist, format=format, params=params)

asyates · 2024-09-26T14:18:20Z

Also, current implementation presumably assumes that new CCF jobs are adjacent in time, but won't necessarily always be the case (often is, but if someone got access to more data for different time periods... these would all go into the same pandas database prior to rolling average).

I guess simply resampling to corr_duration and filling with nan prior to applying rolling mean (stacking) will do the trick.

ThomasLecocq · 2024-09-26T14:21:10Z

Re nan+mov: question is also if we re-mask after roll? I would say yes, don't wanna create data.

asyates · 2024-09-26T14:28:44Z

Yeah i think thats a good idea.

I was similarly thinking, for fixing issue where not pulling in prior data for stacking, that could pull in additional ccfs via get_results_all (by modifying the datetime.datetime list input), and similarly remove the additional 'past' days post-roll.

asyates · 2024-09-26T14:47:53Z

In fact, it's okay re. rolling with gaps. I didn't notice a resample line already exists, so its already filling with nan and then dropping nan after. So just the issue of using past data to fix.

ThomasLecocq · 2024-09-26T15:35:02Z

Hehe ok! And reading "enough" data to allow the rolling stats!

asyates · 2024-09-27T10:58:34Z

This seems very problematic also:

                c = get_results_all(db, sta1, sta2, filterid, components, days, format="xarray")
                # print(c)
                # dr = xr_save_ccf(sta1, sta2, components, filterid, 1, taxis, c)
                dr = c
                if stype == "ref":
                    start, end, datelist = build_ref_datelist(db)
                    start = np.array(start, dtype=np.datetime64)
                    end = np.array(end, dtype=np.datetime64)
                    _ = dr.where(dr.times >= start, drop=True)
                    _ = _.where(_.times <= end, drop=True)
                    # TODO add other stack methods here! using apply?
                    _ = _.mean(dim="times")
                    xr_save_ref(sta1, sta2, components, filterid, taxis, _)
                    continue

looks to me like the reference will be built only with recently processed CCFs. Quick check using print statement seems to confirm this (printing '_' to see if contains only newly processed data).

ThomasLecocq · 2024-09-30T11:27:20Z

very bad idea indeed :-) as said, the stack2 stuff was written on the "process all archive", which is a bad idea. Actually, I thought of moving the ref stack out of the "jobs" world, if you run stack -r you'll re-compute the REF, that's it, no need to check for jobs or else. That way, the STACK jobs will only be used for the -m or -s jobs. (to check, do -S have any meaning now that you can do the ('30d', '30d') mov_stack ?

asyates · 2024-09-30T13:08:37Z

Think that's a good idea, running the stack reset between the commands always felt a bit untidy. I can make the change and update the pull request.

Actually, i'd forgotten about the existence of the -s job ^^. I guess because easy to extract anyway with rolling output; but would also assume now redundant if tuple mov_stack working as intended.

ThomasLecocq · 2024-09-30T14:24:27Z

It'd be amazing if you could do all this indeed in that PR! Please make sure to adapt the documentation :) (multiple places, including the how-tos :) )

asyates changed the title ~~Stack overwrites previous data?~~ Stack overwrites previous data? (plus other issues) Sep 26, 2024

asyates mentioned this issue Sep 27, 2024

s04_stack2 bug fixes + mov_stack improvement #372

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stack overwrites previous data? (plus other issues) #371

Stack overwrites previous data? (plus other issues) #371

asyates commented Sep 26, 2024 •

edited

Loading

asyates commented Sep 26, 2024 •

edited

Loading

asyates commented Sep 26, 2024

ThomasLecocq commented Sep 26, 2024

asyates commented Sep 26, 2024

asyates commented Sep 26, 2024 •

edited

Loading

ThomasLecocq commented Sep 26, 2024

asyates commented Sep 27, 2024

ThomasLecocq commented Sep 30, 2024

asyates commented Sep 30, 2024 •

edited

Loading

ThomasLecocq commented Sep 30, 2024

Stack overwrites previous data? (plus other issues) #371

Stack overwrites previous data? (plus other issues) #371

Comments

asyates commented Sep 26, 2024 • edited Loading

asyates commented Sep 26, 2024 • edited Loading

asyates commented Sep 26, 2024

ThomasLecocq commented Sep 26, 2024

asyates commented Sep 26, 2024

asyates commented Sep 26, 2024 • edited Loading

ThomasLecocq commented Sep 26, 2024

asyates commented Sep 27, 2024

ThomasLecocq commented Sep 30, 2024

asyates commented Sep 30, 2024 • edited Loading

ThomasLecocq commented Sep 30, 2024

asyates commented Sep 26, 2024 •

edited

Loading

asyates commented Sep 26, 2024 •

edited

Loading

asyates commented Sep 26, 2024 •

edited

Loading

asyates commented Sep 30, 2024 •

edited

Loading