-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make maude_decom.get_aca_images robust agains missing/repeated VCDU frames #159
Conversation
chandra_aca/maude_decom.py
Outdated
- VCDU counters come in packets of four, with each group starting with a multiple of 4. | ||
""" | ||
current_packet = [] # to group in fours | ||
last_frame = 0 # for monotonicity check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last_frame
is never updated so the check below last_frame > vcdu_counter
is not doing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a fix for this and included a corresponding test case.
chandra_aca/maude_decom.py
Outdated
@@ -601,6 +601,29 @@ def _group_packets(packets, discard=True): | |||
yield res | |||
|
|||
|
|||
def filter_vcdu_jumps(vcdu_counters): | |||
""" | |||
Return a boolean mas to filter VCDU counters that are not continuous. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is the annoying complication of the VCDU counter rolling over. While it is rare, this function should handle that. I think the rollover is 2**18
but that needs to be checked. Basically if you see a negative delta (from one VCDU to the next) of more than (2**18 - 3600 * 4
) then all the counters after that point should be incremented by 2**18
.
The 3600*4 is meant to reflect the largest number of VCDU's that could be dropped during a real-time support. You need to protect against a rollover happening right in the middle of a loss of comm during telemetry which would otherwise mask out every VCDU frame after the loss of comm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a fix for this and included a corresponding test case.
The VCDU frame rolls over at 2^24. I used the maximum number of missing frames as 3600*4 as you said here, but why is this the right number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the VCDU rollover, I see there a new test with what I would call "synthetic" VCDUs... from this work can you point me to a handy real rollover or should I go find one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 3600 * 4 is a bit arbitrary. That corresponds to a one-hour comm loss. In the end it doesn't matter all that much since the worst case scenario (which is incredibly unlikely) is dropping one image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I don't know how far we need to get to the solution in this PR, but it looks like your changes helped the aca_view test a bit, in that it looks like the aca_view playback was no longer stuck at 2023:296:22:20:35 . But in playback it also didn't show any new ACA data changes for the rest of the interval (time and att updating but all the ACA data stuck).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the behavior I see. I ran this:
python -m aca_view.scripts.aca_viewer --start 2023:296:22:20:26 --stop 2023:296:22:20:50
and I see the data updating after the rollover
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, I was running
python -m aca_view.scripts.aca_viewer --start 2023:296:22:20:00.000 --stop 2023:296:22:24:00.000 --maude
And it looks like it stops working at about 2023:296:22:20:40.301
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I can confirm the issue in aca_view. One can check it by starting aca_view, opening the ipython widget, and typing:
from cxotime import CxoTime
CxoTime(timeline.table['TIME'][-1]).date
Out[3]: '2023:296:22:21:59.837'
CxoTime(timeline.slots[0].table['TIME'][-1]).date
Out[4]: '2023:296:22:20:39.887'
I think this is an aca_view issue, because running chandra_aca.maude_decom.get_aca_images
returns correct data:
In [1]: from chandra_aca import maude_decom
...: from cxotime import CxoTime
...: start = '2023:296:22:20:26'
...: stop = '2023:296:22:22:00'
...: images = maude_decom.get_aca_images(start, stop)
...: print(CxoTime(images[images['IMGNUM'] == 0]['TIME'][-1]).date)
...: np.asarray(images[images['IMGNUM'] == 0]['IMG'][-1])
2023:296:22:21:55.914
Out[1]:
array([[ 31. , 112. , 91.75, 71.5 , 139. , 98.5 , 44.5 , 17.5 ],
[ 132.25, 44.5 , 105.25, 267.25, 442.75, 226.75, 64.75, 31. ],
[ 71.5 , 105.25, 267.25, 706. , 753.25, 469.75, 166. , 132.25],
[ 91.75, 321.25, 1286.5 , 5788.75, 5289.25, 1192. , 280.75, 226.75],
[ 71.5 , 280.75, 1246. , 6855.25, 6841.75, 1711.75, 361.75, 98.5 ],
[ 24.25, 78.25, 287.5 , 1192. , 1151.5 , 510.25, 247. , 64.75],
[ 17.5 , 64.75, 152.5 , 152.5 , 247. , 233.5 , 145.75, 71.5 ],
[ 24.25, 24.25, 51.25, 71.5 , 118.75, 37.75, 64.75, 44.5 ]])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I think the integration issue is not part of this PR.
) | ||
assert len(packets) == 32 | ||
|
||
# the following checks the underlying function used to filter out repeated/missing frames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write these tests like below. This will make it much easier for me to review the tests.
filter_vcdu_test_cases = [
{
"inp": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"exp": [1, 1, 1, 1, 1, 1, 1, 1, 0, 0],
},
{
"inp": [0, 1, 3, 2, 4, 5, 6, 7, 8, 9],
"exp": [0, 0, 0, 0, 1, 1, 1, 1, 0, 0],
},
{
"inp": [0, 1, 2, 4, 5, 6, 7, 8, 9],
"exp": [0, 0, 0, 1, 1, 1, 1, 0, 0],
},
...,
]
@pytest.mark.parametrize("case", filter_vcdu_test_cases)
def test_filter_vcdu_jumps(case):
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You did something different than I asked. Your version does not demonstrate how the code behaves for repeated VCDU's.
{
"inp": np.array([1, 2, 3, 4, 5, 6, 6, 7, 8, 9]),
"exp": np.array([4, 5, 6, 7]), # Which vcdu=6 was selected?
},
Versus
{
"inp": [1, 2, 3, 4, 5, 6, 6, 7, 8, 9],
"exp": [0, 0, 0, 1, 1, 1, 0, 1, 0, 0],
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I just changed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I give up. What you did still looks nothing like what I asked for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry there is some miscommunication. Perhaps you could invest just a couple of minutes (I doubt it would take you more than two minutes) to convince yourself that this checks what you requested.
The only difference is that what you wrote above gives an integer index for the resulting groups, whereas I gave a boolean mask that tells you which frame was selected.
For example, in this case:
"inp": np.array([1, 2, 3, 4, 5, 6, 6, 7, 8, 9]),
"exp": np.array(
[False, False, False, True, True, True, False, True, False, False]
),
},
frame 6
is repeated, and the one selected is the first one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and by the way: I understood your request above as '1' and '0' being boolean, and you wanted to check which instance of a repeated frame was chosen, so what is in the test is equivalent to (with bool arrays as int)
"inp": np.array([1, 2, 3, 4, 5, 6, 6, 7, 8, 9]),
"exp": np.array(
[0, 0, 0, 1, 1, 1, 0, 1, 0, 0]
),
which is verbatim what you wrote above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I didn't explain. The entire point of what I wrote was so that the mask values would visually line up with the vcdu numbers. That would make it easy for me to review and make sure the output is what I would expect.
The way you wrote it is just too much work for me as a reviewer to mentally line things up, so I didn't bother and just 🤞 it's OK.
Also the np.array
wrappers are not in what I wrote and they can cause the numbers to not line up visually again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I totally did not get that what you meant was to line up visually. I understood that what you meant was to actually check which frame was being selected (the previous version of the test did not check that, and that is the change I made).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also did not pull the test cases out and use pytest.mark.parametrize
as I requested. I think you understand why that is important for testing.
It looks like @taldcroft's comments are substantive, so feel free to ping me for a new review when those issues are resolved. |
…e of missing/repeated frames in telemetry
…d test case) (last_frame was not being used)
2a95740
to
bcd2671
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with these changes and their testing.
I'm delegating to @jeanconn's review and bowing out. |
Description
This PR fixes two issues that were found while reading monitoring data:
maude_decom.get_aca_images
crashes ifstop
is after the last frame available in MAUDE. In this case some maude queries return no data, and the downstream code should not crash.maude_decom.get_aca_images
crashes if there are missing/repeated frames.A test was added for the case of repeated/missing frames.
Interface impacts
Testing
Unit tests
Independent check of unit tests by Jean
Functional tests
No functional testing.