Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow community maintenance of older SDK's #14426

Closed
faddat opened this issue Dec 27, 2022 · 6 comments
Closed

Allow community maintenance of older SDK's #14426

faddat opened this issue Dec 27, 2022 · 6 comments

Comments

@faddat
Copy link
Contributor

faddat commented Dec 27, 2022

Problem

  • older cosmos-sdk verisons don't support newer tendermint versions
  • sdk v0.42.* is in fact still in play
  • sdk team doesn't have enough time to keep v0.42.* and v0.44.* up to date with:
    • tendermint
    • iavl
    • etc, etc, etc

Symptoms

The problem makes it harder for older chains to adopt newer technologies. This can be seen with:

  • lunc on v0.44.*
  • older builds of chains that have been upgraded in place, like the cosmos hub and osmosis

Solution

I'm proposing to bring these branches up to date, and provide some proof of efficacy, like a sync log or such. This will provide help to older chains when migrating because there will be fewer discrete upgrades that way. I'm making the issue, because I am hoping to get the work merged, so that chains with old histories (like the cosmos hub, osmosis, akash, sentinel, and others) can have easier management of state.

The solution should involve:

  • ensuring that tests still pass, and mentioning when a Go version change will break them
  • linting to the standard found on the main branch
  • standardizing around Go 1.18
  • standardizing around iavl 0.19.4

While working on some issues for Osmosis:

I was able to prove out that there's no problem with v0.42.x using iavl v0.19.4 and tendermint v0.34.24, but I did have some issues with passing all tests in the sdk. Eventually, this led back upstream, and I figured that the best possible course was to go through older SDK's and give them a bit of a cleanup.

@julienrbrt
Copy link
Member

julienrbrt commented Dec 27, 2022

First just curious why cannot chains upgrade? And is there something we could do for that? Is it due to the in-place / genesis migration not working expectedly? Or due to the breaking changes between versions or something else (lack of docs,...)? We've seen some make a big jump (v44 -> v46).

You assume that upgrading older version will incentivize people to upgrade, why is that? Won't it make them stay forever in deprecated, and possibly vulnerable software because it seems maintained?

Personally, I like community forks but I think hosting it in the cosmos/cosmos-sdk repo set wrong expectations.

Expectation of maintenance by the SDK team and expectation of stability.

I feel like usually the way to go is to fork (libreoffice, nextcloud,...) and maybe instead have a community maintained repo of deprecated versions (only) of the SDK (e.g. cosmos/cosmos-sdk-deprecated-ce).

Users will need to add a replace directive, but it directly sets the expectations because the change is explicit and still lower the burden on the SDK team (because if PRs show up here, we will read them and test them anyway). We could always add a disclaimer about the a community-edition in the README of the unmaintained versions.

Again, just my two cents, I'm not the one deciding that anyway 😬

@faddat
Copy link
Contributor Author

faddat commented Dec 27, 2022

:)

Super happy to walk you through this sir :)

So, the way I ended up on this path, was working on getting iavl 0.19.4 in the oldest versions of Osmosis.

What I actually found is that on those older SDK's even having the tests (the ones that are there originally) pass reliably is a bit of a challenge.

Now, as for the holdup to upgrading, I can tell you in one word:

performance

Examples

  • lunc is tough to upgrade because archive nodes are so large and now suffer form "the archive nodes are slowly dying faster" syndrome.
  • genesisl1 is in a similar position to lunc
  • afaik, crypto.org is also in this spot, and their hesitancy to use iavl fast node centered around the upgrade times for their archive nodes.
  • cosmos hub is likely either "in this spot" or "almost in this spot" too

In all cases, the ideal solution is to upgrade progressively, eg:

sequentially

  1. add a fast node enabled iavl to the version of the SDK that the chain/community currently uses
  2. convert the db from goleveldb to pebble

result

  • upgrades that modify state just got conservatively 10x faster
  • archive nodes are smaller and easier to handle
  • archive nodes don't randomly die over time

Result of fork-insistence

Suggestion, which respects limited time of SDK team

Instead of being fork-insistent, change the readme to explain the source of the code, and ensure users understand that everything past a certain commit has no backing from the SDK team.

@tac0turtle
Copy link
Member

we do support older versions of the software for security releases, the idea has always been to only maintain 2 versions back in order to get people to upgrade. This way they get new features without passing more maintenance to others. Secondly, if a version is not EOL the sdk team is responsible for it, no matter which way we put it. If there is a new security vulnerability it becomes the core teams issue, this is why we recommend people upgrade sooner than later.

@faddat
Copy link
Contributor Author

faddat commented Dec 27, 2022

hmmmm I'm right with you on "upgrade sooner than later" which is why I want to grease the skids. I have another idea on this, will make an additional PR, but it won't touch code.

Instead it will link people to the skid-greasing release. I'm basically looking to make these tools available as-- like yourself, I think that teams should upgrade (much) sooner than later.

Then there's the archive node issue.

@faddat
Copy link
Contributor Author

faddat commented Dec 27, 2022

so, @tac0turtle -- consider the scenario where you, as a infranerd, wish to make an archive from scratch.

For any chain that has 42 in its history and used in-place upgrades, you need to kinda... go back in time performance-wise.

  • You'll be stuck without iavl fast node.
  • Since you don't have iavl fast node, you'll eat vastly more time
  • Pebble won't shine as it should, because you don't have iavl fast node

I've proven out that it is non-apphashy to upgrade to v0.19.4 of iavl on the 42 series.

Additionally:

  • due to ci misconfiguration, 42 and 44 were never linted -- teams that rely on linters to highlight issues in code, can't benefit from that when spinning up such a node

So, while it may seem like a giant misallocation of time, I am certain that currently, from-scratch archives consume more time than making these changes.

If you check out the readme, you'll note explicit deprecation warnings, and also some usage guidance.

https://github.com/notional-labs/cosmos-sdk/tree/faddat/v0.42.x-modern

The issue with the Osmosis issues, for us, was finding where issues began and ended. In the end @catShaark and I were able to prepare branches that "worked fine" -- but did not pass tests.

So, this issue, and pull requests related to it, are in fact intended to make it easier for teams to adopt new sdk versions faster.

User stories

  • I'm an archive node operator, and it's hell.
    • Fast node helps me.
    • Pebble helps me.
  • I need to make modifications to a historical SDK, and I step into it to find it is a giant slew of lint issues
  • I'm trying to verify the state of the cosmos hub, but it takes a week, so instead of gaining additional confidence in the hub, I just "meh"

the point

  • SDK 42 isn't deprecated at all. It is in production on many chains, because it's required for archive sync.
  • SDK 44 isn't deprecated at all. It is in production on many chains, because it's required for archive sync.

Archive syncs are a billion times too hard, for example:

@tac0turtle
Copy link
Member

so i would ask why are you syncing from scratch instead of using something like a version db that you can send the data to and not have to maintain large dbs. If you look at other ecosystems syncing from genesis also takes a while. In the sdk we have a strict policy of the latest 2 versions are maintained, this is something the golang language also follows. If we allow older versions to be maintained then teams are less inclined to upgrade cause they know it will be maintained. In the near future we should get rid of syncing from scratch the way you are doing it as its inefficient.

A new tool to do this can easily be made as an alternative to this. There are dbs on the network with the data, but they serve the data very slow, that is one issue, the second issue is execution, which we are trying to fix, but for older data there are simpler ways than syncing from scratch.

Im sorry, I will have to close this issue because backporting features to older releases is out of scope of the release and security process of this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants