Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Lightweight schema change of add/drop column #10136

Merged
merged 4 commits into from
Jul 12, 2022

Conversation

Lchangliang
Copy link
Contributor

@Lchangliang Lchangliang commented Jun 14, 2022

Proposed changes

Issue Number: close #10135

Problem Summary:

Optimize ideas for writing in associated issue

Checklist(Required)

  1. Does it affect the original behavior: (Yes/No/I Don't know)
  2. Has unit tests been added: (Yes/No/No Need)
  3. Has document been added or modified: (Yes/No/No Need)
  4. Does it need to update dependencies: (Yes/No)
  5. Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added area/load Issues or PRs related to all kinds of load area/planner Issues or PRs related to the query planner area/vectorization kind/test labels Jun 14, 2022
@Gabriel39
Copy link
Contributor

What a fancy improvement! And after reading your issue, I'm curious will it increase the pressure on metadata storage because we persist all schema of all rowsets?

@yixiutt
Copy link
Contributor

yixiutt commented Jun 15, 2022

What a fancy improvement! And after reading your issue, I'm curious will it increase the pressure on metadata storage because we persist all schema of all rowsets?

schema is really small, and one schema per rowset is really controllable.

@Lchangliang Lchangliang force-pushed the feature_sc_optimize branch 9 times, most recently from 6b0ee82 to 0e62965 Compare June 22, 2022 10:32
@github-actions github-actions bot added the area/spark-load Issues or PRs related to the spark load label Jun 27, 2022
@Lchangliang Lchangliang force-pushed the feature_sc_optimize branch 2 times, most recently from 923cd3b to ea7d504 Compare June 30, 2022 15:09
@yiguolei
Copy link
Contributor

yiguolei commented Jul 1, 2022

There is already a schema info for every segment. I think you could use it.

@Lchangliang Lchangliang force-pushed the feature_sc_optimize branch 2 times, most recently from 5c1a22b to ccbf8f9 Compare July 2, 2022 06:32
be/src/olap/tablet.cpp Outdated Show resolved Hide resolved
@morningman morningman added kind/feature Categorizes issue or PR as related to a new feature. kind/meta-version-change Categorizes issue or PR as related to changing meta version labels Jul 2, 2022
@Lchangliang Lchangliang force-pushed the feature_sc_optimize branch 3 times, most recently from f1c42c3 to 3055438 Compare July 3, 2022 07:06
@Lchangliang Lchangliang force-pushed the feature_sc_optimize branch 9 times, most recently from ac6ac9b to 47c163e Compare July 10, 2022 14:33
Lchangliang and others added 4 commits July 12, 2022 10:01
* [feature](schema-change) support fast schema change. coauthor: yixiutt

* [schema change] Using columns desc from fe to read data. coauthor: Lchangliang

* [feature](schema change) schema change optimize for add/drop columns.

1.add uniqueId field for class column.
2.schema change for add/drop columns directly update schema meta

Co-authored-by: yixiutt <[email protected]>
Co-authored-by: SWJTU-ZhangLei <[email protected]>

[Feature](schema change) fix write and add regression test (apache#69)

Co-authored-by: yixiutt <[email protected]>

[schema change] be ssupport that delete use newest schema

add delete regression test

fix regression case (apache#107)

tmp

[feature](schema change) light schema change exclude rollup and agg/uniq/dup key type.

[feature](schema change) fe olapTable maxUniqueId write in disk.

[feature](schema change) add rpc iface for sc add column.

[feature](schema change) add columnsDesc to TPushReq for ligtht sc.

resolve the deadlock when schema change (apache#124)

fix columns from fe don't has bitmap_index flag (apache#134)

add update/delete case

construct MATERIALIZED schema from origin schema when insert

fix not vectorized compaction coredump

use segment cache

choose newest schema by schema version when compaction (apache#182)

[bugfix](schema change) fix ligth schema change problem.

[feature](schema change) light schema change add alter job. (#1)

fix be ut

[bug] (schema change) unique drop key column should not light schema
change

[feature](schema change) add schema change regression-test.

fix regression test

[bugfix](schema change) fix multi alter clauses for light schema change. (#2)

[bugfix](schema change) fix multi clauses calculate column unique id (#3)

modify PushTask process (apache#217)

[Bugfix](schema change) fix jobId replay cause bdbje exception.

[bug](schema change) fix max col unique id repeatitive. (apache#232)

[optimize](schema change) modify pendingMaxColUniqueId generate rule.

fix compaction error
* fix be ut

* fix snapshot load core

fix unique_id error (apache#278)

[refact](fe) remove redundant code for light schema change. (#4)

[refact](fe) remove redundant code for light schema change. (#4)

format fe core

format be core

fix be ut

modify fe meta version

fix rebase error

flush schema into rowset_meta in old table

[refactor](schema change) refact fe light schema change. (#5)

delete the change of schemahash and support get max version schema
@Lchangliang Lchangliang force-pushed the feature_sc_optimize branch from 47c163e to 672f617 Compare July 12, 2022 02:02
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 12, 2022
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@dataroaring dataroaring merged commit 486cf0e into apache:master Jul 12, 2022
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Aug 1, 2022
* [Schema Change] support fast add/drop column  (apache#49)

* [feature](schema-change) support fast schema change. coauthor: yixiutt

* [schema change] Using columns desc from fe to read data. coauthor: Lchangliang

* [feature](schema change) schema change optimize for add/drop columns.

1.add uniqueId field for class column.
2.schema change for add/drop columns directly update schema meta

Co-authored-by: yixiutt <[email protected]>
Co-authored-by: SWJTU-ZhangLei <[email protected]>

[Feature](schema change) fix write and add regression test (apache#69)

Co-authored-by: yixiutt <[email protected]>

[schema change] be ssupport that delete use newest schema

add delete regression test

fix regression case (apache#107)

tmp

[feature](schema change) light schema change exclude rollup and agg/uniq/dup key type.

[feature](schema change) fe olapTable maxUniqueId write in disk.

[feature](schema change) add rpc iface for sc add column.

[feature](schema change) add columnsDesc to TPushReq for ligtht sc.

resolve the deadlock when schema change (apache#124)

fix columns from fe don't has bitmap_index flag (apache#134)

add update/delete case

construct MATERIALIZED schema from origin schema when insert

fix not vectorized compaction coredump

use segment cache

choose newest schema by schema version when compaction (apache#182)

[bugfix](schema change) fix ligth schema change problem.

[feature](schema change) light schema change add alter job. (apache#1)

fix be ut

[bug] (schema change) unique drop key column should not light schema
change

[feature](schema change) add schema change regression-test.

fix regression test

[bugfix](schema change) fix multi alter clauses for light schema change. (apache#2)

[bugfix](schema change) fix multi clauses calculate column unique id (apache#3)

modify PushTask process (apache#217)

[Bugfix](schema change) fix jobId replay cause bdbje exception.

[bug](schema change) fix max col unique id repeatitive. (apache#232)

[optimize](schema change) modify pendingMaxColUniqueId generate rule.

fix compaction error
* fix be ut

* fix snapshot load core

fix unique_id error (apache#278)

[refact](fe) remove redundant code for light schema change. (apache#4)

[refact](fe) remove redundant code for light schema change. (apache#4)

format fe core

format be core

fix be ut

modify fe meta version

fix rebase error

flush schema into rowset_meta in old table

[refactor](schema change) refact fe light schema change. (apache#5)

delete the change of schemahash and support get max version schema

* modify for review

* fix be ut

* fix schema change test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. area/load Issues or PRs related to all kinds of load area/planner Issues or PRs related to the query planner area/spark-load Issues or PRs related to the spark load area/vectorization kind/feature Categorizes issue or PR as related to a new feature. kind/meta-version-change Categorizes issue or PR as related to changing meta version kind/test reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Lightweight schema change of add/drop column
7 participants