You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you think the bug depends on external factors (e.g., dataset), please provide us with a minimal reproducible example that consists of the following items:
a minimal dataset, necessary to reproduce the error
the minimal runnable code necessary to reproduce the error, which can be run on the given dataset
the necessary information on any used packages, .NET runtime version, and system it is run on
in the case of random processes, a seed for reproducibility
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
with Persist. We could see the same part of "FlatMapGroupsInPandas [ContainerKey#10], Apache.Arrow.RecordBatch b__1(Apache.Arrow.RecordBatch)(ContainerKey#10, FeatureCol#23), [FeatureConnectivity#36"
In stead of Persist(), I save the data to storage and reload it and here is the job. You could see all the computing above the FlatMapGroupsInPandas is gone
Desktop (please complete the following information):
OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]
Additional context
Add any other context about the problem here.
This discussion was converted from issue #768 on December 11, 2020 03:29.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Describe the bug
I am trying to use DataFrame.Persist() to save the computing results and avoid re-computing seems it is not working
here is the concept of my code:
var df1 = input.Groupby(key).apply(udf(batchRecode)).persist();
var df2 = df1.Select()
To Reproduce
I am using
Azure HDInsight cluster,
Using Scala version 2.11.12, OpenJDK 64-Bit Server VM, 1.8.0_262
Branch Clearlake/Releases/hdi-2.4.4
Dotnet spark:
Microsoft.Spark.Worker.netcoreapp3.1.linux-x64-1.0.0.tar.gz
Steps to reproduce the behavior:
If you think the bug depends on external factors (e.g., dataset), please provide us with a minimal reproducible example that consists of the following items:
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
with Persist. We could see the same part of "FlatMapGroupsInPandas [ContainerKey#10], Apache.Arrow.RecordBatch b__1(Apache.Arrow.RecordBatch)(ContainerKey#10, FeatureCol#23), [FeatureConnectivity#36"
In stead of Persist(), I save the data to storage and reload it and here is the job. You could see all the computing above the FlatMapGroupsInPandas is gone
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
Beta Was this translation helpful? Give feedback.
All reactions