Skip to content

Commit

Permalink
Remove stale references to the pre-shimplify dirs (NVIDIA#7965)
Browse files Browse the repository at this point in the history
Closes NVIDIA#7847

Tesing:

- premerge 
- `./build/buldall`
- `mvn -B generate-sources -Dshimplify=true -Dshimplify.move=true -Dshimplify.remove.shim=311`
- `mvn -B generate-sources  -Dshimplify=true -Dshimplify.add.base=323 -Dshimplify.add.shim=324`

Signed-off-by: Gera Shegalov <[email protected]>
  • Loading branch information
gerashegalov authored and abellina committed Apr 14, 2023
1 parent 7535d40 commit e06dee1
Show file tree
Hide file tree
Showing 5 changed files with 6 additions and 384 deletions.
33 changes: 2 additions & 31 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,30 +187,6 @@ The following acronyms may appear in directory names:

The version-specific directory names have one of the following forms / use cases:

#### Version range directories

The following source directory system is deprecated. See below and [shimplify.md][1]

* `src/main/312/scala` contains Scala source code for a single Spark version, 3.1.2 in this case
* `src/main/312+-apache/scala`contains Scala source code for *upstream* **Apache** Spark builds,
only beginning with version Spark 3.1.2, and + signifies there is no upper version boundary
among the supported versions
* `src/main/311until320-all` contains code that applies to all shims between 3.1.1 *inclusive*,
3.2.0 *exclusive*
* `src/main/pre320-treenode` contains shims for the Catalyst `TreeNode` class before the
[children trait specialization in Apache Spark 3.2.0](https://issues.apache.org/jira/browse/SPARK-34906).
* `src/main/post320-treenode` contains shims for the Catalyst `TreeNode` class after the
[children trait specialization in Apache Spark 3.2.0](https://issues.apache.org/jira/browse/SPARK-34906).

For each Spark shim, we use Ant path patterns to compute the property
`spark${buildver}.sources` in [sql-plugin/pom.xml](./sql-plugin/pom.xml) that is
picked up as additional source code roots. When possible path patterns are reused using
the conventions outlined in the pom.

#### Simplified version directory structure

Going forward new shim files should be added under:

* `src/main/spark${buildver}`, example: `src/main/spark330db`
* `src/test/spark${buildver}`, example: `src/test/spark340`

Expand Down Expand Up @@ -253,13 +229,8 @@ is changed to `process-test-resources`. In the Maven tool window hit `Reload all
Known Issues:

* There is a known issue that the test sources added via the `build-helper-maven-plugin` are not handled
[properly](https://youtrack.jetbrains.com/issue/IDEA-100532). The workaround is to `mark` the affected folders
such as

* `tests/src/test/320+-noncdh-nondb`
* `tests/src/test/spark340`

manually as `Test Sources Root`
[properly](https://youtrack.jetbrains.com/issue/IDEA-100532). The workaround is to `mark` the affected
folders such as `tests/src/test/spark3*` manually as `Test Sources Root`

* There is a known issue where, even after selecting a different Maven profile in the Maven submenu,
the source folders from a previously selected profile may remain active. As a workaround,
Expand Down
19 changes: 0 additions & 19 deletions build/shimplify.py
Original file line number Diff line number Diff line change
Expand Up @@ -447,25 +447,6 @@ def __shimplify_layout():
"Adding and deleting a shim in a single invocation is not supported!"
# map file -> [shims it's part of]
files2bv = {}
for buildver in __all_shims_arr:
src_roots = __csv_ant_prop_as_arr("spark%s.sources" % buildver)
test_src_roots = __csv_ant_prop_as_arr("spark%s.test.sources" % buildver)
__log.debug("check %s sources: %s", buildver, src_roots)
__log.debug("check %s test sources: %s", buildver, test_src_roots)
main_and_test_roots = src_roots + test_src_roots
# alternatively we can use range dirs instead of files, which is more efficient.
# file level provides flexibility until/unless the shim layer becomes unexpectedly
# large
for src_root in main_and_test_roots:
__log.debug("os.walk looking for shim files from %s", src_root)
for dir, _, shim_source_files in os.walk(src_root):
for shim_file in shim_source_files:
shim_path = os.path.join(dir, shim_file)
__log.debug("updating files2bv %s -> %s", shim_path, buildver)
if shim_path in files2bv.keys():
files2bv[shim_path] += [buildver]
else:
files2bv[shim_path] = [buildver]

# if the user allows to overwrite / reorganize shimplified shims,
# commonly while adding or removing shims we must include new shim locations
Expand Down
2 changes: 1 addition & 1 deletion docs/dev/shimplify.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ time of this writing) have a new set of special sibling directories

Previous `src/(main|test)/${buildver}` and
version-range-with-exceptions directories such as `src/main/311until340-non330db` are deprecated and
will be removed soon as a result of the conversion to the new structure.
are being removed as a result of the conversion to the new structure.

`shimplify` changes the way the source code is shared among shims by using an explicit
lexicographically sorted list of `buildver` property values
Expand Down
6 changes: 3 additions & 3 deletions docs/dev/shims.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@ inject an intermediate trait e.g. `com.nvidia.spark.rapids.shims.ShimExpression`
has a varying source code depending on the Spark version we compile against to overcome this
issue as you can see e.g., comparing TreeNode:

1. [ShimExpression For 3.1.x](https://github.com/NVIDIA/spark-rapids/blob/main/sql-plugin/src/main/pre320-treenode/scala/com/nvidia/spark/rapids/shims/TreeNode.scala#L23)
2. [ShimExpression For 3.2.x](https://github.com/NVIDIA/spark-rapids/blob/main/sql-plugin/src/main/post320-treenode/scala/com/nvidia/spark/rapids/shims/TreeNode.scala#L23)
1. [ShimExpression For 3.1.x](https://github.com/NVIDIA/spark-rapids/blob/6a82213a798a81a5f32f8cf8b4c630e38d112f65/sql-plugin/src/main/spark311/scala/com/nvidia/spark/rapids/shims/TreeNode.scala#L28)
2. [ShimExpression For 3.2.x](https://github.com/NVIDIA/spark-rapids/blob/6a82213a798a81a5f32f8cf8b4c630e38d112f65/sql-plugin/src/main/spark320/scala/com/nvidia/spark/rapids/shims/TreeNode.scala#L37)

This resolves compile-time problems, however, now we face the problem at run time.

Expand Down Expand Up @@ -96,7 +96,7 @@ by having the documented facade classes with a shim specifier in their package n
The second issue that every parent class/trait in the inheritance graph is loaded using the classloader outside
Plugin's control. Therefore, all this bytecode must reside in the conventional jar location, and it must
be bitwise-identical across *all* shims. The only way to keep the source code for shared functionality unduplicated,
(i.e., in `sql-plugin/src/main/scala` as opposed to be duplicated in `sql-plugin/src/main/3*/scala` source code roots)
(i.e., in `sql-plugin/src/main/scala` as opposed to be duplicated in `sql-plugin/src/main/spark3*/scala` source code roots)
is to delay inheriting `ShuffleManager` until as late as possible, as close as possible to the facade class where we
have to split the source code anyway. Use traits as much as possible for flexibility.

Expand Down
Loading

0 comments on commit e06dee1

Please sign in to comment.