You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running byzer-lang with juiceFS, loading JuiceFS text file in Notebook LOAD text.jfs://test/access.log AS nginx_raw_access_log; failed with an exception
2022-02-12 20:37:51,607 INFO job.DefaultMLSQLJobProgressListener: [owner] [admin] [groupId] [6908c4e6-17fb-4c89-b256-5d692a25ed82] __MMMMMM__ Total jobs: 1 current job:1 job script:LOAD text.`jfs://test/access.log` AS nginx_raw_access_log
org.apache.spark.sql.AnalysisException: Path does not exist: file:/mlsql/admin/jfs:/test/access.log
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4(DataSource.scala:803)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4$adapted(DataSource.scala:800)
at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1067)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1703)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:172)
Cause analysis
The exception shows real path is file:/mlsql/admin/jfs:/test/access.log. /mlsql/admin is user admin's home directory, and jfs://test is JuiceFS's scheme name, which is defined in core-site.xml
DslAdaptor.scala
defwithPathPrefix(prefix: String, path: String):String= {
valnewPath= cleanStr(path)
if (prefix.isEmpty) return newPath
if (path.contains("..")) {
thrownewRuntimeException("path should not contains ..")
}
if (path.startsWith("/")) {
return prefix + path.substring(1, path.length)
}
return prefix + newPath
}
This works for path starting with "/"; but breaks if path starts with jfs:// hdfs:// wasb:// etc.
Proposed Solutions
Code Change
Since Byzer-lang uses HDFS-compatible APi to access 3rd-party storages, The real path format should be <storage_type>://<scheme>/<user_home_path>/<original_path> . In the case of juicefs, the real path should be jfs://test/mlsql/admin/access.log for JuiceFS.
For local fileSystem, the real path is /<user_home_path>/<original_path> .
So the new logic should be:
If original path does not start with "/", generate real path like: <stroage_type>:///<user_home_path>/<original_path>
If the original path starts with "/", it's local file system. generate path like: /<user_home_path>/<original_path>.
Personally, I prefer this solution.
Configuration Change
Add config to core-site.xml and change code to LOAD text./access.log;`
Lindsaylin
changed the title
Failed to load JuiceFS text file in Notebook
修复了在 Notebook 中加载 JuiceFS 文本文件失败的问题。Fix the issue that fail to load JuiceFS text files in Notebook.
Mar 8, 2022
Issue Description
When running byzer-lang with juiceFS, loading JuiceFS text file in Notebook
LOAD text.
jfs://test/access.logAS nginx_raw_access_log;
failed with an exceptionCause analysis
The exception shows real path is
file:/mlsql/admin/jfs:/test/access.log
. /mlsql/admin is user admin's home directory, and jfs://test is JuiceFS's scheme name, which is defined in core-site.xmlThe realPath logic is in
This works for path starting with "/"; but breaks if path starts with jfs:// hdfs:// wasb:// etc.
Proposed Solutions
Code Change
Since Byzer-lang uses HDFS-compatible APi to access 3rd-party storages, The real path format should be
<storage_type>://<scheme>/<user_home_path>/<original_path>
. In the case of juicefs, the real path should bejfs://test/mlsql/admin/access.log
for JuiceFS.For local fileSystem, the real path is /<user_home_path>/<original_path> .
So the new logic should be:
Personally, I prefer this solution.
Configuration Change
Add config to core-site.xml and change code to
LOAD text.
/access.log;`The text was updated successfully, but these errors were encountered: