Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dria: add DKN compute node #556

Merged
merged 1 commit into from
Aug 19, 2024

Conversation

vpavlin
Copy link
Contributor

@vpavlin vpavlin commented Aug 15, 2024

This PR adds recentrly launched Dria Knowlesge Network compute node to the Awesome Akash repo. It allows deployers to join the network and execute assign tasks.

There will be GPU support additions in the future

@Dimokus88
Copy link
Contributor

Hey @vpavlin !
You can edit PR?

@vpavlin
Copy link
Contributor Author

vpavlin commented Aug 19, 2024

@Dimokus88 I can, what do you need edited?

@Dimokus88
Copy link
Contributor

@vpavlin replace latest with the number versions. Akash providers cache images, so using latest may be a problem.

@vpavlin
Copy link
Contributor Author

vpavlin commented Aug 19, 2024

Hmm, on one hand I do agree with you, but then also for new deployments it is quite probably it will pull a the latest image (assuming not everyone picks the same provider and even then the deployment might happened on a new node). If people need to update or be sure what version they use, they can still modify the deployment, but I'd like to avoid the need to update this manifest with every push since the approval process is not straightforward here - unless I ping someone explicitly my PRs are usually unmerged:) So having latest there at least gives me hope in many cases the image used will actually be the latest:D

@andy108369
Copy link
Collaborator

Hmm, on one hand I do agree with you, but then also for new deployments it is quite probably it will pull a the latest image (assuming not everyone picks the same provider and even then the deployment might happened on a new node). If people need to update or be sure what version they use, they can still modify the deployment, but I'd like to avoid the need to update this manifest with every push since the approval process is not straightforward here - unless I ping someone explicitly my PRs are usually unmerged:) So having latest there at least gives me hope in many cases the image used will actually be the latest:D

Hey @vpavlin, using :latest tags in Production is generally not recommended. Here’s why:

  1. Inconsistency: The :latest tag can lead to inconsistencies, as :latest today might not be the same as :latest tomorrow. This means different environments could be running different versions without clear visibility.

  2. Caching Issues: Akash aggressively caches images, including those tagged with :latest. Once an image is pulled on the Akash provider, it won’t be re-pulled there, which can unintentionally result in very-very outdated :latest-tagged images being used. This behavior with :latest tags is something I'm not a fan of, and I raised this issue back in 2021.

@andy108369
Copy link
Collaborator

andy108369 commented Aug 19, 2024

@vpavlin FWIW: sometimes even non-:latest-tagged images go havoc, example #547 where we had to use the sha256 digest manifest right after the image tag 🙄 (That was PITA to debug before we knew)

@vpavlin
Copy link
Contributor Author

vpavlin commented Aug 19, 2024

Yeah, I understand all these - and I hate this behaviour, but Dria compute node is really early and being updated basically daily, so latest is the best bet right now IMO - at least it makes it clear to users that if something is not working, they might want to check particular version. (that is my perception - I never trust latest:D)

I can also omit the tag altogether to indicate users should add some and it will default to latest if that helps:)

@vpavlin
Copy link
Contributor Author

vpavlin commented Aug 19, 2024

@vpavlin FWIW: sometimes even non-:latest-tagged images go havoc, example #547 where we had to use the sha256 digest manifest right after the image tag 🙄 (That was PITA to debug before we knew)

Absolutely, I tend to use shas in "production" deployments to make sure the image I deploy is actually the one I want, but again, that only makes sense for images and projects with reasonable release cycle, otherwise someone would have to send a PR here at least daily:) So I'd say keep latest and I'll add a note about it being still under fast decelopment and iterations and that users should check the latest image and use the sha?

@andy108369
Copy link
Collaborator

andy108369 commented Aug 19, 2024

Yeah, I understand all these - and I hate this behaviour, but Dria compute node is really early and being updated basically daily, so latest is the best bet right now IMO - at least it makes it clear to users that if something is not working, they might want to check particular version. (that is my perception - I never trust latest:D)

I can also omit the tag altogether to indicate users should add some and it will default to latest if that helps:)

@vpavlin It seems like there might be some confusion regarding the use of the "latest" image tags on Akash. As mentioned above, Akash aggressively caches images, which means that the "latest" tag on the providers may not actually correspond to the most recent version of the image. This discrepancy could lead to unexpected behavior during deployment, where the "latest" tag doesn't pull the newest version as intended.

To mitigate this, it is safer to specify exact image versions (or digests) to ensure that the desired image is deployed consistently across providers. This would help avoid any inconsistencies that could arise from Akash's image caching mechanism.
Could be a PITA when it comes to people running different "latest" tags.

I know that's another complication to keep updating the images each time in the SDL, but that's unfortunately the only way until this non-default K8s behaviour akash-network/support#50 gets sorted.
As the compromise you could maybe add a link (say in form of a comment right above image: directive) so people know where to obtain the actual most recent image tag (not the ":latest").

@andy108369
Copy link
Collaborator

You can use v0.1.4 tag
https://hub.docker.com/r/firstbatch/dkn-compute-node/tags

@vpavlin
Copy link
Contributor Author

vpavlin commented Aug 19, 2024

Yeah, yeah, I'll use the tags/hashes:-) Waiting for 0.1.5 to test it with GPUs and will update the PR

@vpavlin vpavlin force-pushed the feat/add-dria-node branch from 1ea8729 to 5e10a46 Compare August 19, 2024 20:10
@vpavlin
Copy link
Contributor Author

vpavlin commented Aug 19, 2024

Confirmed with Dria team that GPU integration works, pinned ollama image to sha and pinned dkn image to a dev tag until they release v0.1.5.

@vpavlin vpavlin force-pushed the feat/add-dria-node branch from 5e10a46 to 6fccf52 Compare August 19, 2024 20:24
@andy108369 andy108369 merged commit 8af5b54 into akash-network:master Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants