You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm having some problems in converting my root files into parquet files. I'm using k8s with https://swan.cern.ch and I used this notebook [1] as a guide, basically I only changed the path to the root files. The conversion seems to work properly, but it seems there's a problem while writing the parquet file, I get this error:
Py4JJavaError: An error occurred while calling o141.parquet.
: org.apache.spark.SparkException: Job aborted
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
[...]
Caused by: java.lang.StackOverflowError
at java.util.jar.JarFile.getEntry(JarFile.java:240)
at java.util.jar.JarFile.getJarEntry(JarFile.java:223)
at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:1005)
at sun.misc.URLClassPath.getResource(URLClassPath.java:212)
at java.net.URLClassLoader$1.run(URLClassLoader.java:365)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
[...]
This is the full code:
import os
import glob
baseDir_Run2017_UL = '/eos/user/f/fcolombi/root/Run2017_UL'
fnamesMap = {
'Z': {
'Run2017_UL': {
'Run2017B': [f for f in glob.glob(os.path.join(baseDir_Run2017_UL, 'Run2017B/tree*.root')) if 'hadd' not in f],
'Run2017C': [f for f in glob.glob(os.path.join(baseDir_Run2017_UL, 'Run2017C/tree*.root')) if 'hadd' not in f],
'Run2017D': [f for f in glob.glob(os.path.join(baseDir_Run2017_UL, 'Run2017D/tree*.root')) if 'hadd' not in f],
'Run2017E': [f for f in glob.glob(os.path.join(baseDir_Run2017_UL, 'Run2017E/tree*.root')) if 'hadd' not in f],
'Run2017F': [f for f in glob.glob(os.path.join(baseDir_Run2017_UL, 'Run2017F/tree*.root')) if 'hadd' not in f],
'DY17': [f for f in glob.glob(os.path.join(baseDir_Run2017_UL, 'DY17/tree*.root')) if 'hadd' not in f],
},
},
'JPsi': {
},
}
def convert(resonance,era,subEra):
fnames = ['root://eosuser'+f for f in fnamesMap.get(resonance,{}).get(era,{}).get(subEra,[])]
outDir = os.path.join('parquet',resonance,era,subEra)
outname = os.path.join(outDir,'tnp.parquet')
treename = 'Events'
# process 1000 files at a time
# this is about the limit that can be handled when writing
batchsize = 1000
new = True
while fnames:
current = fnames[:batchsize]
fnames = fnames[batchsize:]
rootfiles = spark.read.format("root").option('tree', treename).load(current)
if new:
rootfiles.write.parquet(outname)
new = False
else:
rootfiles.write.mode('append').parquet(outname)
resonance = 'Z'
era = 'Run2017_UL'
subEra = 'DY17'
convert(resonance, era, subEra)
and I also execute the following commands before running it:
Does someone have an idea of where the problem might be? I really cannot understand it. Thanks a lot for the help, let me know if you need more details.
Hi,
I'm having some problems in converting my root files into parquet files. I'm using k8s with https://swan.cern.ch and I used this notebook [1] as a guide, basically I only changed the path to the root files. The conversion seems to work properly, but it seems there's a problem while writing the parquet file, I get this error:
This is the full code:
and I also execute the following commands before running it:
!wget -N https://repo1.maven.org/maven2/edu/vanderbilt/accre/laurelin/1.0.0/laurelin-1.0.0.jar &&
wget -N https://repo1.maven.org/maven2/org/apache/logging/log4j/log4j-api/2.13.0/log4j-api-2.13.0.jar &&
wget -N https://repo1.maven.org/maven2/org/apache/logging/log4j/log4j-core/2.13.0/log4j-core-2.13.0.jar
Does someone have an idea of where the problem might be? I really cannot understand it. Thanks a lot for the help, let me know if you need more details.
[1] https://github.com/dntaylor/spark_tnp/blob/master/notebooks/RootToParquet.ipynb
The text was updated successfully, but these errors were encountered: