Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [Source] Temporary directory are not ignored #8644

Closed
3 tasks done
JeremyXin opened this issue Feb 11, 2025 · 1 comment
Closed
3 tasks done

[Bug] [Source] Temporary directory are not ignored #8644

JeremyXin opened this issue Feb 11, 2025 · 1 comment
Assignees
Labels

Comments

@JeremyXin
Copy link
Contributor

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

similar to #4873.
This improvement only addresses the case when.hive-staging_hive*** is a file, not when.hive-staging_hive*** is a temporary folder.
I encounter the following situation when.hive-staging_hive is a folder, and the folder needs to be filtered out.

Is this a bug that needs to be fixed? If so, I can submit a pr to resolve

SeaTunnel Version

2.3.8

SeaTunnel Config

env {
  job.mode = "BATCH"
  parallelism = 10
}
source {
  HdfsFile {
    path = "hdfs://xxx/xxx_mx_change_log_di"
    file_format_type = "parquet"
    fs.defaultFS = "hdfs://cluster1"
    hdfs_site_path = "/opt/soft/seatunnel/hadoop-conf/hdfs-site.xml"
    krb5_path = "/opt/soft/seatunnel/hadoop-conf/krb5.conf"
    kerberos_principal = "xxx"
    kerberos_keytab_path = "/opt/soft/seatunnel/hadoop-conf/kerberos.keytab"
  }
}

sink {
  Doris {
    fenodes = "xxx:8030"
    username = "root"
    password = "root123"
    database = "test"
    table = "xxx_mx_change_log_di"
    sink.buffer-size = "2621440"
    sink.buffer-count = "10"
    doris.batch.size = "10240"
    doris.config {
      format = "json"
      read_json_by_line = "true"
      }
  }
}

Running Command

seatunnel.sh --config test.conf --deploy-mode local

Error Exception

Caused by: java.lang.RuntimeException: hdfs://xxx/xxx_mx_change_log_di/.hive-staging_hive_2021-07-22_13-56-05_587_1779994300234021397-1/-ext-10000/_temporary/0/_temporary/attempt_20210722135614_0000_m_000000_0/part-00000-effc5e0c-deaa-4d2f-941b-6644f5c945cc-c000 is not a Parquet file. Expected magic number at tail, but found [-2, 12, 0, 8]
		at org.apache.seatunnel.shade.connector.file.org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:557)

Zeta or Flink or Spark Version

No response

Java or Scala Version

JDK 1.8

Screenshots

Image

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@JeremyXin
Copy link
Contributor Author

I found that #8402 has solved this problem. I will close this issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant