Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issues of upstream first violation #6926

Open
Xuanwo opened this issue Aug 1, 2022 · 21 comments
Open

Tracking issues of upstream first violation #6926

Xuanwo opened this issue Aug 1, 2022 · 21 comments
Labels
C-improvement Category: improvement

Comments

@Xuanwo
Copy link
Member

Xuanwo commented Aug 1, 2022

Databend will adopt Upstream First rules across the whole project, which means:

  • We will try our best to keep our dependences in the latest stable version.
  • We will try our best to report issues we met upstream.
  • We will try our best to contribute back our own changes.

In this tracking issue, we will record the temporary forks that we are using and our plan for them:

  • sled = { git = "https://github.com/datafuse-extras/sled", tag = "v0.34.7-datafuse.1", default-features = false }

Already cleaned

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 1, 2022

cc @andylokandy @leiysky, maybe we can remove sqlparser after the new planner is ready?

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 1, 2022

cc @drmingdrmer @lichuang, will we maintain sled as a new upstream? Or we will use openkv to replace sled finally?

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 1, 2022

cc @PsiACE, why we need to use a fork of cargo-license, async-trait, clickhouse-driver and chrono?

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 1, 2022

cc @dantengsky, can we migrate parquet2 to upstream instead of our own fork?

@drmingdrmer
Copy link
Member

cc @drmingdrmer @lichuang, will we maintain sled as a new upstream? Or we will use openkv to replace sled finally?

To me. I'm not sure about this. sled has not been actively maintained for a while. Its author recently switched to another LSM storage project.
The extra/sled is mainly for easily adding a tag for databend use.

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 1, 2022

To me. I'm not sure about this. sled has not been actively maintained for a while. Its author recently switched to another LSM storage project.

We need a decision here. Possible option are as follows:

  • Fork sled (like openraft from async-raft) and act as a new upstream, release as sled-databend (maybe)
  • Switch to other active k-v storage projects (I don't know, maybe rocksdb?)
  • Replace sled with openkv (A roadmap is enough for now)

@dantengsky
Copy link
Member

cc @dantengsky, can we migrate parquet2 to upstream instead of our own fork?

There is only one thing that prevents us from using the vanilla upstream parquet2: there might be some legacy data that is compressed by the non-standardard lz4.
let me try to make sure that no such data is left, and then switch to upstream parquet2.

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 1, 2022

let me try to make sure that no such data is left, and then switch to upstream parquet2.

Looks nice!

Please create an issue for it and link to this issue. Thanks!

@drmingdrmer
Copy link
Member

To me. I'm not sure about this. sled has not been actively maintained for a while. Its author recently switched to another LSM storage project.

We need a decision here. Possible option are as follows:

  • Fork sled (like openraft from async-raft) and act as a new upstream, release as sled-databend (maybe)
  • Switch to other active k-v storage projects
  • Replace sled with openkv (A roadmap is enough for now)

datafuse-extras/sled is already a fork. Do we have to release it as a standalone crate?

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 1, 2022

Do we have to release it as a standalone crate?

Release it as a new crate looks better to me. Like tikv does for jemalloctor: https://github.com/tikv/jemallocator

So people in the rust community can benefit from our fork too.

And if we want to maintain sled, we need to set up the lowest bar like CI must pass, changes must with tests, etc.

@drmingdrmer
Copy link
Member

Do we have to release it as a standalone crate?

Release it as a new crate looks better to me. Like tikv does for jemalloctor: https://github.com/tikv/jemallocator

So people in the rust community can benefit from our fork too.

And if we want to maintain sled, we need to set up the lowest bar like CI must pass, changes must with tests, etc.

There is no plan to introduce any improvement to sled yet:(.
The global singleton model in sled is not easy to use or test.
And the transaction model is also half mature.

@PsiACE
Copy link
Member

PsiACE commented Aug 1, 2022

why we need to use a fork of cargo-license, async-trait, clickhouse-driver and chrono?

  • cargo-license: It's not as aggressively maintained, of course, it is perfectly possible to use specific commits from upstream. I will deal with it, the only problem is that there is no new version of the cargo-license released, I will consider communicating with its maintainer about this.
  • async-trait: We forked a fork that someone else was maintaining and assync-trait refused to accept the change, perhaps consider suggesting that the upstream maintainer release a new crate.
  • chrono Good news, chrono has some new maintainers and we can use 0.4.20-rc.1
  • clickhouse-driver It's not as aggressively maintained, I will move it to opensrv later to maintain it and release a new crate.

@PsiACE
Copy link
Member

PsiACE commented Aug 1, 2022

As for sled, if we still plan to use it, perhaps we can consider refactoring it to meet our needs. If we decide to abandon it, emm, for the time being, the Rust ecosystem may lack a really strong alternative.

I am not opposed to using rocksdb or implementing openkv, nor am I opposed to looking into the availability of redb/persy.

@Xuanwo
Copy link
Member Author

Xuanwo commented Aug 1, 2022

  • chrono Good news, chrono has some new maintainers and we can use 0.4.20-rc.1

Great, it's nice to know chrono come back again!

@aseaday
Copy link
Contributor

aseaday commented Aug 2, 2022

Awesome practice. How about give a lecture!

@BohuTANG BohuTANG added the C-improvement Category: improvement label Aug 5, 2022
@BohuTANG
Copy link
Member

BohuTANG commented Sep 6, 2022

Done.

@BohuTANG BohuTANG closed this as completed Sep 6, 2022
@Xuanwo
Copy link
Member Author

Xuanwo commented Sep 6, 2022

Done.

Not yet. We must address sqlparser and sled, which are still going on.

@Xuanwo Xuanwo reopened this Sep 6, 2022
@Xuanwo Xuanwo moved this to 📋 Backlog in Xuanwo's Work Sep 15, 2022
@Xuanwo
Copy link
Member Author

Xuanwo commented Nov 10, 2022

@PsiACE do you know why we need to keep our own arrow-format?

@PsiACE
Copy link
Member

PsiACE commented Nov 11, 2022

@PsiACE do you know why we need to keep our own arrow-format?

We need support for higher versions of tonic and prost.

@Xuanwo
Copy link
Member Author

Xuanwo commented Nov 11, 2022

We need support for higher versions of tonic and prost.

Ok, I got it. Related to DataEngineeringLabs/arrow-format#5

@Xuanwo
Copy link
Member Author

Xuanwo commented Nov 14, 2022

@PsiACE do you know why we need to keep our own arrow-format?

This has been addressed by #8785.

Thanks @GPSnoopy @jorgecarleitao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-improvement Category: improvement
Projects
None yet
Development

No branches or pull requests

6 participants