Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

block-by-block IO - part 1 #478

Merged
merged 8 commits into from
Nov 25, 2020
Merged

block-by-block IO - part 1 #478

merged 8 commits into from
Nov 25, 2020

Conversation

yunjunz
Copy link
Member

@yunjunz yunjunz commented Nov 24, 2020

Description of proposed changes

The first part of block-by-block IO for the following scripts, in preparation for a memory-efficient smallbaselineApp workflow:

  • dem_error.py
  • diff.py
  • multilook.py
  • subset.py

Reminders

  • Pass Codacy code review (green)
  • Pass Circle CI test (green)
  • If modifying functionality, describe changes to function behavior and arguments in a comment below the function declaration.
  • If adding new functionality, add a detailed description to the documentation and/or an example.

+ move the commonly shared groups of arguments parsing functions from utils/plot.py into a new sub-module utils/arg_group.py

+ move the parallel computing related arguments into utils/arg_group.add_parallel_argument()

+ adjust usages across the whole mintpy package
+ add/modify split2boxes() from ifgram_inversion.py
+ move key msg and design_matrix into get_design_matrix4defo() for cleaner code
+ read_geometry(): explicit input arguments + box
+ add correct_dem_error_patch() from the code of correct_dem_error()
+ use writefile.layout_hdf() and writefile.wirte_hdf5_block() for block-by-block IO, like ifg_inv.py
+ add --memory option with support of mintpy.compute.memorySize
+ run_or_skip(): check file size to detect partially written file
+ utils/writefile.layout_hdf5(): support output file in different size in space from the reference file

+ subset.py:
   - add subset_dataset() to get the subseted data from input file
   - block-by-block IO for HDF5 file with 3D dataset, to significantly reduce the memory usage. Testing on laptop (16GB memory) for ifgramStack.h5 file with 140GB works very smooth.
+ objects/cluster: move split_box2sub_boxes() out of the DaskCluster object for easy import; add print_msg.

+ diff: add block-by-block IO for time-series files
If the number of valid pixels > 1m, apply a uniform sampling for a faster ramp estimation and for less memory usage
to avoid un-necessary temporal average computation.
+ add -m / --method option to choose among average / nearest

+ add lks_x/y == 1 check

+ multilook_data(): ensure in/out matrix in the same data type

+ multilook_file():
   - support nearest downsampling using readfile.read(x/ystep) option
   - add block-by-block IO for HDF5 files

+ apply block-by-block IO for multilook.py for efficient memory usage

+ plot.read_pts2inps(): bug fix when no lookup file is availble

+ utils1.get_center_lat_lon(): support file in geo coordinates

+ writefile.write_isce_xml(): add example usage in the comment
…mp_avg/read()

and use numpy indexing instead
@yunjunz yunjunz added this to the Big Data milestone Nov 24, 2020
@yunjunz yunjunz merged commit 8096888 into insarlab:main Nov 25, 2020
@yunjunz yunjunz deleted the big_data branch November 25, 2020 04:02
@yunjunz yunjunz mentioned this pull request Dec 13, 2020
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant