-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added automated mask length determination in file #28
Conversation
@vbalderdash thanks for fixing this long-standing oversight, which is a sneaky way things can go wrong for all of us. Do you think this also fixes #3? Could you move this into LMADataset, instead? It already parses the data format line, and would solve the problem at the data reader instead of the flash-sort level. |
And this might not fix #3, which may have to do with storing enough characters in the station mask in the HDF5 file. But this solves the problem of automated determination, which is the first step. |
Looks like the root of #3 is line 57 of LMA_h5_write, and it should simple to carry that value through to the right function call. I'll try to move the fix to LMAarrayFile and get that carried through, too. |
For carrying values through, feel free to add a mask_length attribute to LMAdataFile. |
…le with matching string length
mask_length is now in the LMAdataFile and pulling the value during the unzipped file read. I moved the h5 column class to within the function with the mask length variable so that can be initialized with the variable string size, so it seems to be writing out what is expected! There was a hanging 'import gzip' in the gen_autorun file that needs to be removed |
This looks good in terms of the internals, and is what I wish I had written initially! Have you looked at whether the removal of the |
The mask_length parameter in lmatools.flashsort.autosort.autorun_sklearn.cluster should be irrelevant since it calls LMADataFile.read which now defines the mask_length, but it probably should be removed for cleanliness. The flash sorting is reasonable as far as I can tell (tested on s6 and s7 formatted files), although I have not verified anything downstream of gridding. I was selfishly focusing on the flashes and and flash extents. |
OK, so you've run things end to end and there's no breakage. I was worried about breaking other users' code if they were to update lmatools. If you want to remove the other mask_length stuff for cleanliness / saving us from future confusion that would be great, or I can merge this now and open an issue to remind us to come back to do the cleanup. |
Probably still worth testing on another machine first! |
I pushed some changes to your branch that I think fix the issue you reported with an empty station mask value that caused flash sorting to crash. The changes above also fix #29, I think. |
Added a quick parser to automatically pull the mask length from the header of the LMA '.dat.gz' file for the flash sorting.