block-by-block IO - part 1 #478

yunjunz · 2020-11-24T23:07:57Z

Description of proposed changes

The first part of block-by-block IO for the following scripts, in preparation for a memory-efficient smallbaselineApp workflow:

dem_error.py
diff.py
multilook.py
subset.py

Improved auto-skip for reference_point.py.
objects/ramp.py: downsample the very large dataset for ramp estimation, as suggested in Improve memory efficiency in the steps of calculating ramp, deramp, residual_RMS #436 by @hfattahi.

Reminders

Pass Codacy code review (green)
Pass Circle CI test (green)
If modifying functionality, describe changes to function behavior and arguments in a comment below the function declaration.
If adding new functionality, add a detailed description to the documentation and/or an example.

+ move the commonly shared groups of arguments parsing functions from utils/plot.py into a new sub-module utils/arg_group.py + move the parallel computing related arguments into utils/arg_group.add_parallel_argument() + adjust usages across the whole mintpy package

+ add/modify split2boxes() from ifgram_inversion.py + move key msg and design_matrix into get_design_matrix4defo() for cleaner code + read_geometry(): explicit input arguments + box + add correct_dem_error_patch() from the code of correct_dem_error() + use writefile.layout_hdf() and writefile.wirte_hdf5_block() for block-by-block IO, like ifg_inv.py + add --memory option with support of mintpy.compute.memorySize + run_or_skip(): check file size to detect partially written file

+ utils/writefile.layout_hdf5(): support output file in different size in space from the reference file + subset.py: - add subset_dataset() to get the subseted data from input file - block-by-block IO for HDF5 file with 3D dataset, to significantly reduce the memory usage. Testing on laptop (16GB memory) for ifgramStack.h5 file with 140GB works very smooth.

+ objects/cluster: move split_box2sub_boxes() out of the DaskCluster object for easy import; add print_msg. + diff: add block-by-block IO for time-series files

If the number of valid pixels > 1m, apply a uniform sampling for a faster ramp estimation and for less memory usage

to avoid un-necessary temporal average computation.

+ add -m / --method option to choose among average / nearest + add lks_x/y == 1 check + multilook_data(): ensure in/out matrix in the same data type + multilook_file(): - support nearest downsampling using readfile.read(x/ystep) option - add block-by-block IO for HDF5 files + apply block-by-block IO for multilook.py for efficient memory usage + plot.read_pts2inps(): bug fix when no lookup file is availble + utils1.get_center_lat_lon(): support file in geo coordinates + writefile.write_isce_xml(): add example usage in the comment

…mp_avg/read() and use numpy indexing instead

yunjunz added 8 commits November 16, 2020 22:20

block-by-block IO for diff.py

1b815a0

+ objects/cluster: move split_box2sub_boxes() out of the DaskCluster object for easy import; add print_msg. + diff: add block-by-block IO for time-series files

downsample big dataset for deramp

19688af

If the number of valid pixels > 1m, apply a uniform sampling for a faster ramp estimation and for less memory usage

ref_point: move update mode check before valid value check

4e2e8ab

to avoid un-necessary temporal average computation.

speedup h5 dset IO by disabling large list indexing in ifgramStack.te…

3e76e56

…mp_avg/read() and use numpy indexing instead

yunjunz added this to the Big Data milestone Nov 24, 2020

yunjunz merged commit 8096888 into insarlab:main Nov 25, 2020

yunjunz deleted the big_data branch November 25, 2020 04:02

yunjunz mentioned this pull request Dec 13, 2020

block-by-block IO - part 2 #488

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

block-by-block IO - part 1 #478

block-by-block IO - part 1 #478

yunjunz commented Nov 24, 2020 •

edited

Loading

block-by-block IO - part 1 #478

block-by-block IO - part 1 #478

Conversation

yunjunz commented Nov 24, 2020 • edited Loading

yunjunz commented Nov 24, 2020 •

edited

Loading