-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update scaladocs mentioning deprecated API #886
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, one minor question if we even need this other DataProcessor constructor?
this(ssrd, tunables, ExternalVariablesLoader.loadVariables(compilerExternalVars, ssrd, ssrd.originalVariables), | ||
false, None, validationMode, compilerExternalVars) | ||
this(ssrd, tunables, ExternalVariablesLoader.loadVariables(externalVars, ssrd, ssrd.originalVariables), | ||
false, None, validationMode, externalVars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this constructor should even exist? Or if it should be changed to not even accept externalVars
and validationMode
? I think those parameters should never be passed in now, with the only way to change them via the withXYZ
functions on the DataProcessor
?
I think the only use of this constructor is in SchemaSetRuntime1Mixin.scala, and it doesn't pass anything in anything for these parameters. Seems like maybe this constructor can be removed, and the SSRD can be responsible for creating the variable map before it creates the DataProcessor
and passing in the other variables that it currently doesn't explicitly set? Or maybe just remove these params so they can't be passed in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had initially thought SchemaSetRuntime1Mixin.scala was the only caller of this constructor too and removed the externalVars parameter from this constructor, but I got a compilation error which showed that there was a second call of this constructor from SerializableDataProcessor's constructor in DataProcessor.scala at line 107 that I didn't know about. SerializableDataProcessor extends DataProcessor(data, tunable, externalVars, validationModeArg) in order to preserve the externalVars and validationMode in case they may be needed by other serializations besides Daffodil save/reload such as Apache Spark which serializes in order to move objects for remote execution.
We can't remove the externalVars field from DataProcessor because we inform DataProcessor about external variable bindings in that field. DataProcessor's primary constructor is private (probably because of the mix of parameters some of which should be serialized all the time, some which should be serialized only sometimes, and some which should never be serialized). DataProcessor has both a class and an object, but the object DataProcessor has no apply method, only a private class SerializableDataProcessor. Therefore, this particular (public) constructor is the only way which callers outside DataProcessor.scala can create a DataProcessor. Our options are to keep this public constructor or add an apply method to the object DataProcessor which calls the private constructor.
At the very least, I need to fix this constructor's scaladoc. I had changed the scaladoc to say it was needed by SerializableDataProcessor, but I misstated the reason. I can make the scaladoc state that this constructor exists to construct a SerializableDataProcessor, not the other way around. Or I can remove this constructor and add an apply method to object DataProcessor, which would put that apply method next to the class SerializableDataProcessor and make the relationship between them more obvious. I would also add a second apply method with only 2 parameters and its own scaladoc to let external callers like SchemaSetRuntime1Mixin.scala create a DataProcessor.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good points.
Digging in some more based on what you've found, the only differences between the public and private DataProcessor constructors are the ability to pass in areDebugging
and optDebugger
. I'm not sure how important it is to restrict setting debugging when creating a DataProcessor (especially since this DataProcessor is internal only), so we could potentially just make the current private constructor public and remove the public one. And then the SerializableDataProcessor
could just extend the current private one? And the current private constructor could set default values of the debugging parameters if we want the to be optional?
Rebased and updated pull request, now needs a second reviewer. |
@@ -129,10 +131,10 @@ class DataProcessor private ( | |||
// The values these will have (since this is a base class) are the correct default values that we want | |||
// back when the object is re-initialized. | |||
// | |||
protected val areDebugging : Boolean, | |||
protected val optDebugger : Option[Debugger], | |||
private val externalVars: Queue[Binding], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe externalVars
can go away completely? The only way to set external variables should be with withExternalVariables
, which I think should already create a copy of the variable map and mutate it with the new bindings?
I think externalVars was only needed because the compiler/processor factory could have variables set on them, so they needed to be carried down this way to the DataProcessor?
So maybe the SchemaSetRuntimeData
just passes in the originalVariables
Variable map when it creates the DataProcosser, and that is copied/mutated as needed?
externalVars: Queue[Binding] = externalVars, | ||
validationMode: ValidationMode.Type = validationMode, | ||
areDebugging: Boolean = areDebugging, | ||
optDebugger: Option[Debugger] = optDebugger) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this indentation isn't the scala standard? I think the previous way is right.
I've removed all mentions/uses of |
validationModeArg: ValidationMode.Type) // must be explicitly set from Full to Limited by save method. | ||
extends DataProcessor(data, tunable, externalVars, validationModeArg) { | ||
validationModeArg: ValidationMode.Type) // must be explicitly turned off by save method | ||
extends DataProcessor(data, tunable, data.originalVariables, validationModeArg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be the same (or a copy of) the VariableMap from the DataProcessor? That way if something like Spark wants to serialize the existing VariableMap then it can? We just don't do it thorugh bindings anymore but directly serialize the actual VariableMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data.originalVariable
is a method which returns a copy of the VariableMap stored in data
so that we can proceed to serialize the SerializableDataProcessor even if another thread tries to mutate the VariableMap stored in data
at the same time. Are you suggesting that SerializableDataProcessor should have 4 (instead of 3) arguments with VariableMap becoming the 3rd parameter and validationModeArg becoming the 4th parameter? All so that our writeReplace method can pass the DataProcessor's own VariableMap to the SerializableDataProcessor constructor instead of ssrd's (data's) VariableMap getting passed?
I think that makes sense since the withExternalVariables methods return a copy of the DataProcessor with a new VariableMap and we should serialize that new VariableMap instead of the original VariableMap we started with. VariableMap says it's not thread-safe, so I'll make writeReplace pass variableMap.copy() just to be as safe as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that SerializableDataProcessor should have 4 (instead of 3) arguments with VariableMap becoming the 3rd parameter and validationModeArg becoming the 4th parameter? All so that our writeReplace method can pass the DataProcessor's own VariableMap to the SerializableDataProcessor constructor instead of ssrd's (data's) VariableMap getting passed?
Yep. This way a SerializableDataProcessor is exactly the same as the associated DataProcessor, with the one difference being Full validation is not allowed. This is needed for things like Spark, which serializes objects and then sends them to different threads/machines to do the processing.
But if we use save()
instead of built-in serialization, then we don't actually want all that state saved since it's a bit more confusing, so the save function resets that state before doing any serialization.
Would definitely like to hear @mbeckerle's view though. I think he implemented that logic?
@@ -224,14 +203,14 @@ class DataProcessor private ( | |||
} | |||
|
|||
// TODO Deprecate and replace usages with just tunables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do what this comment suggests and replace getTunable
with tunbles
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd taken a look and backed away when I saw getTunable was used in DFDL.DataProcessor, but I just looked again and I think it's feasible and wouldn't require deprecation since getTunable doesn't seem to be part of the Java or Scala API. The only places using getTunable seem to be some tests and runtime1 itself. I'll replace getTunable with tunables.
./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:66: val t1 = dp1.getTunables()
./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:67: val t2 = dp2.getTunables()
./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:72: val t3 = dp2.getTunables() // modified tunables at 'run-time'
./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:73: val t4 = dp1.getTunables() // obtain first data processor to see if anything changed
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursor.scala:94: infosetInputter.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursor1.scala:52: ic.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursorFromReader.scala:58: inputter.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursorFromReader2.scala:66: inputter.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/util/TestUtils.scala:348: override def getTunables(): DaffodilTunables = { tunables }
./daffodil-runtime1/src/main/scala/org/apache/daffodil/api/DFDLParserUnparser.scala:195: def getTunables(): DaffodilTunables
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilUnparseContentHandler.scala:81: private lazy val tunablesBatchSize = dp.getTunables().saxUnparseEventBatchSize
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala:206: def getTunables(): DaffodilTunables = tunables
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala:476: inputter.initialize(ssrd.elementRuntimeData, getTunables())
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala:661: val tunables = dataProc.getTunables()
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala:691: val tunables = dataProc.getTunables()
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/unparsers/UState.scala:692: dataProc.getTunables(),
./daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/Runtime2DataProcessor.scala:66: override def getTunables(): DaffodilTunables = ???
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional changes seem reasonable but want a discussion whether a saved processor should preserve its own variableMap or reset variableMap.
validationModeArg: ValidationMode.Type) // must be explicitly set from Full to Limited by save method. | ||
extends DataProcessor(data, tunable, externalVars, validationModeArg) { | ||
validationModeArg: ValidationMode.Type) // must be explicitly turned off by save method | ||
extends DataProcessor(data, tunable, data.originalVariables, validationModeArg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data.originalVariable
is a method which returns a copy of the VariableMap stored in data
so that we can proceed to serialize the SerializableDataProcessor even if another thread tries to mutate the VariableMap stored in data
at the same time. Are you suggesting that SerializableDataProcessor should have 4 (instead of 3) arguments with VariableMap becoming the 3rd parameter and validationModeArg becoming the 4th parameter? All so that our writeReplace method can pass the DataProcessor's own VariableMap to the SerializableDataProcessor constructor instead of ssrd's (data's) VariableMap getting passed?
I think that makes sense since the withExternalVariables methods return a copy of the DataProcessor with a new VariableMap and we should serialize that new VariableMap instead of the original VariableMap we started with. VariableMap says it's not thread-safe, so I'll make writeReplace pass variableMap.copy() just to be as safe as possible.
@@ -224,14 +203,14 @@ class DataProcessor private ( | |||
} | |||
|
|||
// TODO Deprecate and replace usages with just tunables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd taken a look and backed away when I saw getTunable was used in DFDL.DataProcessor, but I just looked again and I think it's feasible and wouldn't require deprecation since getTunable doesn't seem to be part of the Java or Scala API. The only places using getTunable seem to be some tests and runtime1 itself. I'll replace getTunable with tunables.
./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:66: val t1 = dp1.getTunables()
./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:67: val t2 = dp2.getTunables()
./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:72: val t3 = dp2.getTunables() // modified tunables at 'run-time'
./daffodil-core/src/test/scala/org/apache/daffodil/general/TestTunables.scala:73: val t4 = dp1.getTunables() // obtain first data processor to see if anything changed
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursor.scala:94: infosetInputter.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursor1.scala:52: ic.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursorFromReader.scala:58: inputter.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfosetCursorFromReader2.scala:66: inputter.initialize(rootERD, u.getTunables())
./daffodil-core/src/test/scala/org/apache/daffodil/util/TestUtils.scala:348: override def getTunables(): DaffodilTunables = { tunables }
./daffodil-runtime1/src/main/scala/org/apache/daffodil/api/DFDLParserUnparser.scala:195: def getTunables(): DaffodilTunables
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilUnparseContentHandler.scala:81: private lazy val tunablesBatchSize = dp.getTunables().saxUnparseEventBatchSize
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala:206: def getTunables(): DaffodilTunables = tunables
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala:476: inputter.initialize(ssrd.elementRuntimeData, getTunables())
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala:661: val tunables = dataProc.getTunables()
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PState.scala:691: val tunables = dataProc.getTunables()
./daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/unparsers/UState.scala:692: dataProc.getTunables(),
./daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/Runtime2DataProcessor.scala:66: override def getTunables(): DaffodilTunables = ???
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala
Show resolved
Hide resolved
When you restore a saved compiled schema, you should get the initial values of any DFDL external variables, regardless of whether they had any values set before the schema was pre-compiled. Saving the parser does NOT lock-down/freeze or even utilize any variable values. Saving the parser is supposed to be transparent, i.e., only faster, than using the uncompiled schema. For example, expressions that refer to variables, even if the variable has been externally set, cannot be constant-folded by the schema compiler to remove the variable reference. Rather, any attempt to reference a variable at schema compile time causes the expression folding to stop. There are, however, some tunables you can set which only affect schema compilation. Those are only used by the schema compiler so clearly one cannot subsequently modify them after restoring a saved processor. (You can, but it won't do anything because the compilation is already over. ) Example of this is a tunable to suppress a particular compile-time warning, or to specify a particular interpretation (UnqualifiedPathStepPolicy is an example.) |
daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/Runtime2DataProcessor.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Scrub constructor parameters and scaladocs of all remnants of the deprecated API removed by previous DAFFODIL-2743 commits. Remove 2nd DataProcessor constructor; always call 1st constructor instead. Also replace all calls/definitions of DataProcessor's getTunables() with tunables. Make saving and reloading a DataProcessor reset both variableMap and validationMode. Fix some IDEA nits and formatting. SchemaSet.scala: Scrub deprecated API from SchemaSet.root's scaladocs. Fix some IDEA nits in root's code block. SchemaSetRuntime1Mixin.scala: Make onPath call 1st DataProcessor constructor instead of 2nd constructor. TestTunables.scala, TestInfosetCursor.scala, TestInfosetCursor1.scala, TestInfosetCursorFromReader.scala, TestInfosetCursorFromReader2.scala, TestUtils.scala, DFDLParserUnparser.scala, DaffodilUnparseContentHandler.scala, DataProcessor.scala, PState.scala, UState.scala, Runtime2DataProcessor.scala: - Replace getTunables() with tunables. DataProcessor.scala: Remove externalVars from constructor parameters and scaladocs since we no longer pass in compiler-set external variable bindings. Make SerializableDataProcessor's 4 fields same as DataProcessor's first 4 fields, including variableMap to keep preserving variables' mappings. Make SerializableDataProcessor extend 1st DataProcessor constructor instead of 2nd constructor. Format constructors and a few methods according to Scala code style for declarations (https://docs.scala-lang.org/style/declarations.html) including trailing commas and closing parentheses to provide visual separation between function arguments and extensions or body block. Make 1st DataProcessor constructor public, reorder its parameters, and make its last 2 parameters optional. Make writeReplace save variableMap instead of externalVars. Delete 2nd DataProcessor constructor since we no longer pass in compiler-set externalVars. Reorder copy's parameters and make it call 1st constructor instead of 2nd constructor. Remove loadExternalVariables and modify withExternalVariables to get rid of externalVars. Make save reset variableMap as well as validationMode to original values. Runtime2DataProcessor.scala: Put coverage off/on around unimplemented methods. DAFFODIL-2743
Ensure nothing mentions the deprecated API removed in previous commits
for DAFFODIL-2743. Also fix some IDEA nits and improve DataProcessor
constructor.
SchemaSet.scala: Simplify scaladoc to stop mentioning deprecated API.
Fix some IDEA nits in the same code block with IDEA quick fixes.
SchemaSetRuntime1Mixin.scala: Make onPath call first DataProcessor
constructor instead of now-deleted second constructor.
DataProcessor.scala: Make SerializedDataProcessor extend first
DataProcessor constructor instead of now-deleted second
constructor. Change first DataProcessor constructor from private to
public, reorder its last 4 parameters, and make its last 2 parameters
optional. Delete second DataProcessor constructor and remove its
scaladoc which had been saying it was using deprecated
compilerExternalVars. Make copy call first constructor with reordered
parameters. Remove another scaladoc which had been mentioning tunables
not even passed to method anymore.
DAFFODIL-2743