Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update scaladocs mentioning deprecated API #886

Merged
merged 1 commit into from
Dec 9, 2022
Merged

Conversation

tuxji
Copy link
Contributor

@tuxji tuxji commented Dec 6, 2022

Ensure nothing mentions the deprecated API removed in previous commits
for DAFFODIL-2743. Also fix some IDEA nits and improve DataProcessor
constructor.

SchemaSet.scala: Simplify scaladoc to stop mentioning deprecated API.
Fix some IDEA nits in the same code block with IDEA quick fixes.

SchemaSetRuntime1Mixin.scala: Make onPath call first DataProcessor
constructor instead of now-deleted second constructor.

DataProcessor.scala: Make SerializedDataProcessor extend first
DataProcessor constructor instead of now-deleted second
constructor. Change first DataProcessor constructor from private to
public, reorder its last 4 parameters, and make its last 2 parameters
optional. Delete second DataProcessor constructor and remove its
scaladoc which had been saying it was using deprecated
compilerExternalVars. Make copy call first constructor with reordered
parameters. Remove another scaladoc which had been mentioning tunables
not even passed to method anymore.

DAFFODIL-2743

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, one minor question if we even need this other DataProcessor constructor?

this(ssrd, tunables, ExternalVariablesLoader.loadVariables(compilerExternalVars, ssrd, ssrd.originalVariables),
false, None, validationMode, compilerExternalVars)
this(ssrd, tunables, ExternalVariablesLoader.loadVariables(externalVars, ssrd, ssrd.originalVariables),
false, None, validationMode, externalVars)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this constructor should even exist? Or if it should be changed to not even accept externalVars and validationMode? I think those parameters should never be passed in now, with the only way to change them via the withXYZ functions on the DataProcessor?

I think the only use of this constructor is in SchemaSetRuntime1Mixin.scala, and it doesn't pass anything in anything for these parameters. Seems like maybe this constructor can be removed, and the SSRD can be responsible for creating the variable map before it creates the DataProcessor and passing in the other variables that it currently doesn't explicitly set? Or maybe just remove these params so they can't be passed in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had initially thought SchemaSetRuntime1Mixin.scala was the only caller of this constructor too and removed the externalVars parameter from this constructor, but I got a compilation error which showed that there was a second call of this constructor from SerializableDataProcessor's constructor in DataProcessor.scala at line 107 that I didn't know about. SerializableDataProcessor extends DataProcessor(data, tunable, externalVars, validationModeArg) in order to preserve the externalVars and validationMode in case they may be needed by other serializations besides Daffodil save/reload such as Apache Spark which serializes in order to move objects for remote execution.

We can't remove the externalVars field from DataProcessor because we inform DataProcessor about external variable bindings in that field. DataProcessor's primary constructor is private (probably because of the mix of parameters some of which should be serialized all the time, some which should be serialized only sometimes, and some which should never be serialized). DataProcessor has both a class and an object, but the object DataProcessor has no apply method, only a private class SerializableDataProcessor. Therefore, this particular (public) constructor is the only way which callers outside DataProcessor.scala can create a DataProcessor. Our options are to keep this public constructor or add an apply method to the object DataProcessor which calls the private constructor.

At the very least, I need to fix this constructor's scaladoc. I had changed the scaladoc to say it was needed by SerializableDataProcessor, but I misstated the reason. I can make the scaladoc state that this constructor exists to construct a SerializableDataProcessor, not the other way around. Or I can remove this constructor and add an apply method to object DataProcessor, which would put that apply method next to the class SerializableDataProcessor and make the relationship between them more obvious. I would also add a second apply method with only 2 parameters and its own scaladoc to let external callers like SchemaSetRuntime1Mixin.scala create a DataProcessor.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good points.

Digging in some more based on what you've found, the only differences between the public and private DataProcessor constructors are the ability to pass in areDebugging and optDebugger. I'm not sure how important it is to restrict setting debugging when creating a DataProcessor (especially since this DataProcessor is internal only), so we could potentially just make the current private constructor public and remove the public one. And then the SerializableDataProcessor could just extend the current private one? And the current private constructor could set default values of the debugging parameters if we want the to be optional?

@tuxji tuxji changed the title Update scaladocs mentioning deprecated API (also fix CI) Update scaladocs mentioning deprecated API Dec 7, 2022
@tuxji
Copy link
Contributor Author

tuxji commented Dec 7, 2022

Rebased and updated pull request, now needs a second reviewer.

@@ -129,10 +131,10 @@ class DataProcessor private (
// The values these will have (since this is a base class) are the correct default values that we want
// back when the object is re-initialized.
//
protected val areDebugging : Boolean,
protected val optDebugger : Option[Debugger],
private val externalVars: Queue[Binding],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe externalVars can go away completely? The only way to set external variables should be with withExternalVariables, which I think should already create a copy of the variable map and mutate it with the new bindings?

I think externalVars was only needed because the compiler/processor factory could have variables set on them, so they needed to be carried down this way to the DataProcessor?

So maybe the SchemaSetRuntimeData just passes in the originalVariables Variable map when it creates the DataProcosser, and that is copied/mutated as needed?

externalVars: Queue[Binding] = externalVars,
validationMode: ValidationMode.Type = validationMode,
areDebugging: Boolean = areDebugging,
optDebugger: Option[Debugger] = optDebugger) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this indentation isn't the scala standard? I think the previous way is right.

@tuxji
Copy link
Contributor Author

tuxji commented Dec 7, 2022

I've removed all mentions/uses of externalVars in DataProcessor and restored copy's original formatting. I'm letting CI run the tests for me; if they pass, then this pull request is ready for a second reviewer to review it.

validationModeArg: ValidationMode.Type) // must be explicitly set from Full to Limited by save method.
extends DataProcessor(data, tunable, externalVars, validationModeArg) {
validationModeArg: ValidationMode.Type) // must be explicitly turned off by save method
extends DataProcessor(data, tunable, data.originalVariables, validationModeArg) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be the same (or a copy of) the VariableMap from the DataProcessor? That way if something like Spark wants to serialize the existing VariableMap then it can? We just don't do it thorugh bindings anymore but directly serialize the actual VariableMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data.originalVariable is a method which returns a copy of the VariableMap stored in data so that we can proceed to serialize the SerializableDataProcessor even if another thread tries to mutate the VariableMap stored in data at the same time. Are you suggesting that SerializableDataProcessor should have 4 (instead of 3) arguments with VariableMap becoming the 3rd parameter and validationModeArg becoming the 4th parameter? All so that our writeReplace method can pass the DataProcessor's own VariableMap to the SerializableDataProcessor constructor instead of ssrd's (data's) VariableMap getting passed?

I think that makes sense since the withExternalVariables methods return a copy of the DataProcessor with a new VariableMap and we should serialize that new VariableMap instead of the original VariableMap we started with. VariableMap says it's not thread-safe, so I'll make writeReplace pass variableMap.copy() just to be as safe as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that SerializableDataProcessor should have 4 (instead of 3) arguments with VariableMap becoming the 3rd parameter and validationModeArg becoming the 4th parameter? All so that our writeReplace method can pass the DataProcessor's own VariableMap to the SerializableDataProcessor constructor instead of ssrd's (data's) VariableMap getting passed?

Yep. This way a SerializableDataProcessor is exactly the same as the associated DataProcessor, with the one difference being Full validation is not allowed. This is needed for things like Spark, which serializes objects and then sends them to different threads/machines to do the processing.

But if we use save() instead of built-in serialization, then we don't actually want all that state saved since it's a bit more confusing, so the save function resets that state before doing any serialization.

Would definitely like to hear @mbeckerle's view though. I think he implemented that logic?

@@ -224,14 +203,14 @@ class DataProcessor private (
}

// TODO Deprecate and replace usages with just tunables.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do what this comment suggests and replace getTunable with tunbles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd taken a look and backed away when I saw getTunable was used in DFDL.DataProcessor, but I just looked again and I think it's feasible and wouldn't require deprecation since getTunable doesn't seem to be part of the Java or Scala API. The only places using getTunable seem to be some tests and runtime1 itself. I'll replace getTunable with tunables.

./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:66:    val t1 = dp1.getTunables()
./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:67:    val t2 = dp2.getTunables()
./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:72:    val t3 = dp2.getTunables() // modified tunables at 'run-time'
./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:73:    val t4 = dp1.getTunables() // obtain first data processor to see if anything changed
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursor.scala:94:    infosetInputter.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursor1.scala:52:    ic.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursorFromReader.scala:58:    inputter.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursorFromReader2.scala:66:    inputter.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/util/TestUtils.scala:348:    override def getTunables(): DaffodilTunables = { tunables }
./daffodil-runtime1/src/main/scala/org/apache/daffodil/api/DFDLParserUnparser.scala:195:    def getTunables(): DaffodilTunables
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilUnparseContentHandler.scala:81:  private lazy val tunablesBatchSize = dp.getTunables().saxUnparseEventBatchSize
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala:206:  def getTunables(): DaffodilTunables = tunables
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala:476:    inputter.initialize(ssrd.elementRuntimeData, getTunables())
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala:661:    val tunables = dataProc.getTunables()
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala:691:    val tunables = dataProc.getTunables()
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/unparsers/UState.scala:692:        dataProc.getTunables(),
./daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/Runtime2DataProcessor.scala:66:  override def getTunables(): DaffodilTunables = ???

Copy link
Contributor Author

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional changes seem reasonable but want a discussion whether a saved processor should preserve its own variableMap or reset variableMap.

validationModeArg: ValidationMode.Type) // must be explicitly set from Full to Limited by save method.
extends DataProcessor(data, tunable, externalVars, validationModeArg) {
validationModeArg: ValidationMode.Type) // must be explicitly turned off by save method
extends DataProcessor(data, tunable, data.originalVariables, validationModeArg) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data.originalVariable is a method which returns a copy of the VariableMap stored in data so that we can proceed to serialize the SerializableDataProcessor even if another thread tries to mutate the VariableMap stored in data at the same time. Are you suggesting that SerializableDataProcessor should have 4 (instead of 3) arguments with VariableMap becoming the 3rd parameter and validationModeArg becoming the 4th parameter? All so that our writeReplace method can pass the DataProcessor's own VariableMap to the SerializableDataProcessor constructor instead of ssrd's (data's) VariableMap getting passed?

I think that makes sense since the withExternalVariables methods return a copy of the DataProcessor with a new VariableMap and we should serialize that new VariableMap instead of the original VariableMap we started with. VariableMap says it's not thread-safe, so I'll make writeReplace pass variableMap.copy() just to be as safe as possible.

@@ -224,14 +203,14 @@ class DataProcessor private (
}

// TODO Deprecate and replace usages with just tunables.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd taken a look and backed away when I saw getTunable was used in DFDL.DataProcessor, but I just looked again and I think it's feasible and wouldn't require deprecation since getTunable doesn't seem to be part of the Java or Scala API. The only places using getTunable seem to be some tests and runtime1 itself. I'll replace getTunable with tunables.

./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:66:    val t1 = dp1.getTunables()
./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:67:    val t2 = dp2.getTunables()
./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:72:    val t3 = dp2.getTunables() // modified tunables at 'run-time'
./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:73:    val t4 = dp1.getTunables() // obtain first data processor to see if anything changed
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursor.scala:94:    infosetInputter.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursor1.scala:52:    ic.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursorFromReader.scala:58:    inputter.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursorFromReader2.scala:66:    inputter.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/util/TestUtils.scala:348:    override def getTunables(): DaffodilTunables = { tunables }
./daffodil-runtime1/src/main/scala/org/apache/daffodil/api/DFDLParserUnparser.scala:195:    def getTunables(): DaffodilTunables
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilUnparseContentHandler.scala:81:  private lazy val tunablesBatchSize = dp.getTunables().saxUnparseEventBatchSize
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala:206:  def getTunables(): DaffodilTunables = tunables
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala:476:    inputter.initialize(ssrd.elementRuntimeData, getTunables())
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala:661:    val tunables = dataProc.getTunables()
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala:691:    val tunables = dataProc.getTunables()
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/unparsers/UState.scala:692:        dataProc.getTunables(),
./daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/Runtime2DataProcessor.scala:66:  override def getTunables(): DaffodilTunables = ???

@mbeckerle
Copy link
Contributor

When you restore a saved compiled schema, you should get the initial values of any DFDL external variables, regardless of whether they had any values set before the schema was pre-compiled.

Saving the parser does NOT lock-down/freeze or even utilize any variable values. Saving the parser is supposed to be transparent, i.e., only faster, than using the uncompiled schema.

For example, expressions that refer to variables, even if the variable has been externally set, cannot be constant-folded by the schema compiler to remove the variable reference. Rather, any attempt to reference a variable at schema compile time causes the expression folding to stop.

There are, however, some tunables you can set which only affect schema compilation. Those are only used by the schema compiler so clearly one cannot subsequently modify them after restoring a saved processor. (You can, but it won't do anything because the compilation is already over. ) Example of this is a tunable to suppress a particular compile-time warning, or to specify a particular interpretation (UnqualifiedPathStepPolicy is an example.)

@tuxji tuxji requested a review from mbeckerle December 8, 2022 20:40
Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Scrub constructor parameters and scaladocs of all remnants of the
deprecated API removed by previous DAFFODIL-2743 commits.  Remove 2nd
DataProcessor constructor; always call 1st constructor instead.  Also
replace all calls/definitions of DataProcessor's getTunables() with
tunables.  Make saving and reloading a DataProcessor reset both
variableMap and validationMode.  Fix some IDEA nits and formatting.

SchemaSet.scala: Scrub deprecated API from SchemaSet.root's scaladocs.
Fix some IDEA nits in root's code block.

SchemaSetRuntime1Mixin.scala: Make onPath call 1st DataProcessor
constructor instead of 2nd constructor.

TestTunables.scala, TestInfosetCursor.scala, TestInfosetCursor1.scala,
TestInfosetCursorFromReader.scala, TestInfosetCursorFromReader2.scala,
TestUtils.scala, DFDLParserUnparser.scala,
DaffodilUnparseContentHandler.scala, DataProcessor.scala,
PState.scala, UState.scala, Runtime2DataProcessor.scala:
- Replace getTunables() with tunables.

DataProcessor.scala: Remove externalVars from constructor parameters
and scaladocs since we no longer pass in compiler-set external
variable bindings.  Make SerializableDataProcessor's 4 fields same as
DataProcessor's first 4 fields, including variableMap to keep
preserving variables' mappings.  Make SerializableDataProcessor extend
1st DataProcessor constructor instead of 2nd constructor.  Format
constructors and a few methods according to Scala code style for
declarations (https://docs.scala-lang.org/style/declarations.html)
including trailing commas and closing parentheses to provide visual
separation between function arguments and extensions or body block.
Make 1st DataProcessor constructor public, reorder its parameters, and
make its last 2 parameters optional.  Make writeReplace save
variableMap instead of externalVars.  Delete 2nd DataProcessor
constructor since we no longer pass in compiler-set externalVars.
Reorder copy's parameters and make it call 1st constructor instead of
2nd constructor.  Remove loadExternalVariables and modify
withExternalVariables to get rid of externalVars.  Make save reset
variableMap as well as validationMode to original values.

Runtime2DataProcessor.scala: Put coverage off/on around unimplemented
methods.

DAFFODIL-2743
@tuxji tuxji merged commit b141ead into apache:main Dec 9, 2022
@tuxji tuxji deleted the ji/daf-2743 branch December 9, 2022 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants