fix: reset the df schema after read to fix the "not null" issue when loading data #1341

vagetablechicken · 2022-02-28T08:18:54Z

No description provided.

If read non-streaming files, spark will set all schema fields to nullable. We should reset it.

tobegit3hub

LGTM

github-actions · 2022-02-28T10:22:29Z

Linux Test Report

    102 files     228 suites 43m 0s ⏱️
  8 593 tests   8 590 ✔️ 3 💤 0 ❌
12 715 runs 12 712 ✔️ 3 💤 0 ❌

Results for commit 9f88fb7.

codecov · 2022-02-28T10:58:05Z

Codecov Report

Merging #1341 (9f88fb7) into branch-0.4 (d4a07c5) will decrease coverage by 0.05%.
The diff coverage is 77.61%.

@@               Coverage Diff                @@
##             branch-0.4    #1341      +/-   ##
================================================
- Coverage         65.50%   65.45%   -0.06%     
  Complexity          222      222              
================================================
  Files               570      570              
  Lines            106347   106386      +39     
  Branches            841      854      +13     
================================================
- Hits              69664    69636      -28     
- Misses            36537    36604      +67     
  Partials            146      146

Impacted Files	Coverage Δ
...openmldb/taskmanager/config/TaskManagerConfig.java	`0.00% <0.00%> (ø)`
...m/_4paradigm/openmldb/taskmanager/dao/JobInfo.java	`0.00% <0.00%> (ø)`
...nmldb/taskmanager/server/impl/TaskManagerImpl.java	`0.00% <0.00%> (ø)`
...paradigm/openmldb/taskmanager/JobInfoManager.scala	`0.00% <0.00%> (ø)`
src/cmd/sql_cmd.h	`21.41% <ø> (-0.10%)`	⬇️
..._4paradigm/openmldb/batch/utils/HybridseUtil.scala	`60.90% <38.46%> (-2.44%)`	⬇️
...4paradigm/openmldb/batch/nodes/WindowAggPlan.scala	`73.89% <66.66%> (-0.24%)`	⬇️
...4paradigm/openmldb/batch/api/OpenmldbSession.scala	`57.57% <94.11%> (+4.70%)`	⬆️
..._4paradigm/openmldb/batch/nodes/LoadDataPlan.scala	`60.37% <100.00%> (+4.58%)`	⬆️
..._4paradigm/openmldb/batch/utils/SparkRowUtil.scala	`66.66% <100.00%> (+14.03%)`	⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1977e88...9f88fb7. Read the comment docs.

fix: reset the df schema after read

9f88fb7

If read non-streaming files, spark will set all schema fields to nullable. We should reset it.

vagetablechicken requested review from tobegit3hub and dl239 February 28, 2022 08:18

vagetablechicken self-assigned this Feb 28, 2022

dl239 approved these changes Feb 28, 2022

View reviewed changes

tobegit3hub approved these changes Feb 28, 2022

View reviewed changes

dl239 merged commit 4dd3479 into 4paradigm:branch-0.4 Mar 1, 2022

This was referenced Mar 14, 2022

docs: add 0.4.3 changelog #1427

Merged

online load: reading data of "not null" columns in Spark causes data load failed #1271

Closed

lumianph changed the title ~~fix: reset the df schema after read~~ fix: reset the df schema after read to fix the "not null" issue when loading data Mar 14, 2022

lumianph mentioned this pull request Mar 28, 2022

OpenMLDB v0.4 Roadmap #1520

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reset the df schema after read to fix the "not null" issue when loading data #1341

fix: reset the df schema after read to fix the "not null" issue when loading data #1341

vagetablechicken commented Feb 28, 2022

tobegit3hub left a comment

github-actions bot commented Feb 28, 2022

codecov bot commented Feb 28, 2022

fix: reset the df schema after read to fix the "not null" issue when loading data #1341

fix: reset the df schema after read to fix the "not null" issue when loading data #1341

Conversation

vagetablechicken commented Feb 28, 2022

tobegit3hub left a comment

Choose a reason for hiding this comment

github-actions bot commented Feb 28, 2022

Linux Test Report

codecov bot commented Feb 28, 2022

Codecov Report