Skip to content

Commit

Permalink
Support for -Dshimplify.remove.shim
Browse files Browse the repository at this point in the history
Signed-off-by: Gera Shegalov <[email protected]>
  • Loading branch information
gerashegalov committed Feb 17, 2023
1 parent 7826a38 commit 3d46182
Show file tree
Hide file tree
Showing 2 changed files with 82 additions and 44 deletions.
101 changes: 66 additions & 35 deletions build/shimplify.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,22 +23,23 @@
```bash
mvn clean install -DskipTests
mvn generate-sources antrun:run@shimplify-shim-sources
mvn generate-sources
```
Switches:
shimplify - property passed to task attribute `if` whether to modify files
shimplify.add.base - old buildver to base the new one provided by shimplify.add.shim
shimplify.add.shim - add new shim/buildver based on the one provided by shimplify.add.base
shimplify.dirs - comma-separated list of dirs to modify, supersedes shimplify.shims
shimplify.move - property to allow moving files to canonical location, otherwise update
without moving
shimplify.overwrite - property to allow shimplify executing file changes repeatedly,
error out otherwise
shimplify.shims - comma-separated list of shims to simplify instead of all, superseded by
shimplify.dirs
shimplify.trace - property to enable trace logging
shimplify - property passed to task attribute `if` whether to modify files
shimplify.add.base - old buildver to base the new one provided by shimplify.add.shim
shimplify.add.shim - add new shim/buildver based on the one provided by shimplify.add.base
shimplify.dirs - comma-separated list of dirs to modify, supersedes shimplify.shims
shimplify.move - property to allow moving files to canonical location, otherwise update
without moving
shimplify.overwrite - property to allow shimplify executing file changes repeatedly,
error out otherwise
shimplify.shims - comma-separated list of shims to simplify instead of all, superseded by
shimplify.dirs
shimplify.remove.shim - drop support for shim/buildver, its exclusive files are removed
shimplify.trace - property to enable trace logging
If the task attribute "if" evaluates to false shimplify does not alter files on the filesystem.
The task merely suggests to consolidate shim directories if it finds some shims that are comprised
Expand All @@ -62,7 +63,7 @@
git clean -f -d [--dry-run]
Each shim Scala/Java file receives a comment describing all Spark builds it
belongs to. Lines are sorted by the Spark buildver lexicographically.
belongs to. Lines are sorted by the Spark `buildver` lexicographically.
Each line is assumed to be a JSON to keep it extensible.
/*** spark-rapids-shim-json-lines
Expand Down Expand Up @@ -165,15 +166,20 @@ def __csv_as_arr(str_val):
# should we move files?
__should_move_files = __is_enabled_property('shimplify.move')

# allowed to overwrite the existing comment?
__should_overwrite = __is_enabled_property('shimplify.overwrite')

# enable log tracing?
__should_trace = __is_enabled_property('shimplify.trace')

__add_shim_buildver = __ant_proj_prop('shimplify.add.shim')
__add_shim_base = __ant_proj_prop('shimplify.add.base')

__remove_shim_buildver = __ant_proj_prop('shimplify.remove.shim')

# allowed to overwrite the existing comment?
__should_overwrite = (__is_enabled_property('shimplify.overwrite')
or __add_shim_buildver is not None
or __remove_shim_buildver is not None)


__shim_comment_tag = 'spark-rapids-shim-json-lines'
__opening_shim_tag = '/*** ' + __shim_comment_tag
__closing_shim_tag = __shim_comment_tag + ' ***/'
Expand Down Expand Up @@ -229,7 +235,7 @@ def __delete_prior_comment_if_allowed(contents, tag, filename):
# no work
return
if not __should_overwrite:
__fail("found shim comment from prior execution at %s:%d, use -Dshimplify.overwrite=true"
__fail("found shim comment from prior execution at %s:%d, use -Dshimplify.overwrite=true "
"to overwrite" % (filename, opening_shim_comment_line))
assert (opening_shim_comment_line is not None) and (closing_shim_comment_line is not None)
__log.debug("removing comments %s:%d:%d", filename, opening_shim_comment_line,
Expand Down Expand Up @@ -265,22 +271,24 @@ def __git_rename_or_copy(shim_file, owner_shim, from_shim=None):
__makedirs(new_shim_dir)
if from_path_comp is None:
shell_cmd = ['git', 'mv', shim_file, new_shim_file]
ret_code = subprocess.call(shell_cmd)
if ret_code != 0:
__fail("failed to execute %s" % shell_cmd)
__shell_exec(shell_cmd)
else:
with open(shim_file, 'r') as src_shim_fh:
with open(new_shim_file, 'w') as dst_shim_fh:
content = src_shim_fh.read()
dst_content = content.replace(from_path_comp, owner_path_comp)
dst_shim_fh.write(dst_content)
git_add_cmd = ['git', 'add', new_shim_file]
ret_code = subprocess.call(git_add_cmd)
if ret_code != 0:
__fail("failed to execute %s" % git_add_cmd)
__shell_exec(git_add_cmd)
return new_shim_file


def __shell_exec(shell_cmd):
ret_code = subprocess.call(shell_cmd)
if ret_code != 0:
__fail("failed to execute %s" % shell_cmd)


def __makedirs(new_dir):
try:
__log.debug("__makedirs %s", new_dir)
Expand Down Expand Up @@ -392,12 +400,12 @@ def __generate_symlink_to_file(buildver, src_type, shim_file_path, build_ver_arr
base_dir = str(__project().getBaseDir())
src_root = os.path.join(base_dir, 'src', src_type)
target_root = os.path.join(base_dir, 'target', "spark%s" % buildver, 'generated', 'src',
src_type)
src_type)
first_build_ver = build_ver_arr[0]
__log.debug("top shim comment %s", first_build_ver)
shim_file_rel_path = os.path.relpath(shim_file_path, src_root)
expected_prefix = "spark%s%s" % (first_build_ver, os.sep)
assert shim_file_rel_path.startswith(expected_prefix), "Unexpected: %s is not prefixed" \
assert shim_file_rel_path.startswith(expected_prefix), "Unexpected: %s is not prefixed " \
"by %s" % (shim_file_rel_path, expected_prefix)
shim_file_rel_path_parts = shim_file_rel_path.split(os.sep)
# drop spark3XY from spark3XY/scala/com/nvidia
Expand Down Expand Up @@ -435,6 +443,8 @@ def __shimplify_layout():
"shimplify.add.shim cannot be specified without shimplify.add.base and vice versa"
assert __add_shim_base is None or __add_shim_base in __shims_arr,\
"shimplify.add.base is not in %s" % __shims_arr
assert __add_shim_buildver is None or __remove_shim_buildver is None,\
"Adding and deleting a shim in a single invocation is not supported!"
# map file -> [shims it's part of]
files2bv = {}
for buildver in __all_shims_arr:
Expand All @@ -458,8 +468,8 @@ def __shimplify_layout():
files2bv[shim_path] = [buildver]

# if the user allows to overwrite / reorganize shimplified shims,
# commonly while adding or removing shims we must includes new shim locations
if __should_overwrite or __add_shim_buildver is not None:
# commonly while adding or removing shims we must include new shim locations
if __should_overwrite or __add_shim_buildver is not None or __remove_shim_buildver is not None:
for src_type in ['main', 'test']:
__traverse_source_tree_of_all_shims(
src_type,
Expand All @@ -470,15 +480,22 @@ def __shimplify_layout():
if __add_shim_buildver is not None:
__add_new_shim_to_file_map(files2bv)

if __remove_shim_buildver is not None:
__remove_shim_from_file_map(files2bv)

for shim_file, bv_list in files2bv.items():
sorted_build_vers = sorted(bv_list)
__log.debug("calling upsert_shim_json on shim_file %s bv_list=%s", shim_file,
sorted_build_vers)
owner_shim = sorted_build_vers[0]
if owner_shim in __shims_arr:
__upsert_shim_json(shim_file, sorted_build_vers)
if __should_move_files:
__git_rename_or_copy(shim_file, owner_shim)
if len(bv_list) == 0:
__log.info("Removing orphaned file %s", shim_file)
__shell_exec(['git', 'rm', shim_file])
else:
sorted_build_vers = sorted(bv_list)
__log.debug("calling upsert_shim_json on shim_file %s bv_list=%s", shim_file,
sorted_build_vers)
owner_shim = sorted_build_vers[0]
if owner_shim in __shims_arr:
__upsert_shim_json(shim_file, sorted_build_vers)
if __should_move_files:
__git_rename_or_copy(shim_file, owner_shim)


def __update_files2bv(files2bv, path, buildver_arr):
Expand Down Expand Up @@ -521,6 +538,20 @@ def __add_new_shim_to_file_map(files2bv):
bv_list.append(__add_shim_buildver)


def __remove_shim_from_file_map(files2bv):
__log.info("Removing %s shim, pom.xml should be updated manually.", __remove_shim_buildver)
# copy keys to be able to modify the original dictionary while iterating
for shim_file in set(files2bv.keys()):
bv_list = files2bv[shim_file]
try:
bv_list.remove(__remove_shim_buildver)
except ValueError as ve:
# __remove_shim_buildver is mot in the list
__log.debug("%s: file %s does not belong to shim %s, skipping it", ve, shim_file,
__remove_shim_buildver)
pass


def __warn_shims_with_multiple_dedicated_dirs(dirs2bv):
# each shim has at least one dedicated dir, report shims
# with multiple dedicated dirs because they can be merged
Expand Down
25 changes: 16 additions & 9 deletions docs/dev/shimplify.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Previous `src/(main|test)/${buildver}` and
version-range-with-exceptions directories such as `src/main/311until340-non330db` are deprecated and
will be removed soon as a result of the conversion to the new structure.

`shimplify` changes the way the source code is shared among Shims by using an explicit
`shimplify` changes the way the source code is shared among shims by using an explicit
lexicographically sorted list of `buildver` property values
in a source-code level comment instead of the shared directories.

Expand All @@ -43,7 +43,7 @@ spark-rapids-shim-json-lines ***/
```

The content inside the tags `spark-rapids-shim-json-lines` is in the [JSON Lines][4] format where
each line is an extensible object with the Shim metadata currently consisting just of the Spark
each line is an extensible object with the shim metadata currently consisting just of the Spark
build dependency version. The top object in the comment, the minimum version in the comment
intuitively represents the first version of Spark requiring shimming in the plugin, albeit it might
not be the original one as support for older Spark releases is eventually dropped. This `buildver`
Expand Down Expand Up @@ -101,7 +101,7 @@ on the current state of the `spark-rapids` repo.
After that you can execute conversion in one or more iterations depending on specified -D parameters

```bash
mvn generate-sources antrun:run@shimplify-shim-sources -Dshimplify=true [-D...]
mvn generate-sources -Dshimplify=true [-D...]
```

With `-Dshimplify=true`, shimplify is put on the write call path to generate and inject
Expand All @@ -119,7 +119,7 @@ Once the shim comments looks good (as expected, it was tested), you can repeat i
move the files to designated locations by invoking

```bash
mvn generate-sources antrun:run@shimplify=shim-sources -Dshimplify=true -Dshimplify.move=true
mvn generate-sources -Dshimplify=true -Dshimplify.move=true
```

Now you can run a package build with the simplified directory structure and run a few integration
Expand Down Expand Up @@ -187,7 +187,7 @@ support for a new [maintenance][5] version of Spark, say 3.2.4, it's expected to
If just 3.2.3 or all shims after the full transition have already been converted you can execute

```bash
mvn generate-sources antrun:run@shimplify=shim-sources -Dshimplify=true \
mvn generate-sources -Dshimplify=true \
-Dshimplify.move=true -Dshimplify.overwrite=true \
-Dshimplify.add.shim=324 -Dshimplify.add.base=323
```
Expand All @@ -207,11 +207,18 @@ work on resolving potential compilation failures manually.

## Deleting a Shim

Every Spark build is de-supported eventually. To drop a build say 311 you can run a bulk
search&replace in your IDE deleting all occurrences of `{"spark": "311"}` including the newline
character an empty line. Shimplify will fail the build until all the orphaned files are removed.
Every Spark build is de-supported eventually. To drop a build say 311 you can run

After adding or deleting shims you can run the integration tests above.
```bash
mvn generate-sources -Dshimplify=true -Dshimplify.move=true \
-Dshimplify.remove.shim=311
```

This command will remove the comment line `{"spark": "311"}` from all source files contributing to
the 311 shim. If a file belongs exclusively to 311 it will be removed.

After adding or deleting shims you should sanity-check the diff in the local git repo and
run the integration tests above.

## Symlinks & IDE

Expand Down

0 comments on commit 3d46182

Please sign in to comment.