-
Notifications
You must be signed in to change notification settings - Fork 30
IPFS and Gentoo Portage (distfiles) #296
Comments
A few things that I have hit so far;
|
Hi, thank you for pushing this! 👍
|
Please remember to read the below as things to make something awesome even better
Still using 0.4.16 so it works with sharding locally, but the gateways are currently "broken" (ipfs/kubo#5270) - the main issue I had here was that Yes UrlStore seems interesting indeed, just need to hook it into existing sync process somehow. Most of #212 is followed so far, I'm still having major issues however with files that are deleted, or even more so symlinks that is modified (this is due to using filestore) It's up and "running" on /ipns/QmescA7sGoc4yZEe3Gof7dYt2qkkxDEXQPT2z84MpjVu8o/ One observation while exploring ipfs is that there is multiple commands that does almost the same thing, this is to an degree confusing, but even more so a time thief. |
You are absolutely correct. Please file a bug in go-ipfs.
Please file a bug in go-ipfs.
That often happens due to backwards compatibility concerns. Eventually, we'd like to release a new command with an entirely new, thought out API (and make that the default for the 1.0 release). The current thinking is to make everything use something like "mfs". This should significantly reduce confusion as all files will get names and can be managed through a file system. |
Really thank you for your feedback!
It simply takes time because it is huge dataset (will try to collect actual figures), and I think that ipfs/kubo#4260 (comment) which includes Thinking about it right now I could use a Ok update; |
I'd expect it to take time, but it shouldn't be slower than an add. Could you try ipfs/kubo#5286? Is that any faster? Note: It may actually be slower as I haven't benchmarked it. |
With One weird thing IMHO, old versions of the the files, those that are changed, or are missing does not get removed by the verify command (logs of verify command is found at QmZ35fVbUUMoTcz5a24f17qHvguZVwBiZ5nxz3pUwnhRjq) when I run verify again I get the same output, isn't it supposed to remove those links?
and similar lines |
Just a quick note about time taken for verify .. But the whole reason to run verify was to clean out deleted files, and/or files that have changed contents (timstamps and symlinks) which it seems verify does not do, and I have missunderstood how to deal with this. (for this to be viable the |
A smallish update. I have rewriten the sync script to loop over recently modified files, add them one by one with nocopy and then update relevant mfs nodes. Since everything now is in mfs, it is easy to get filelists as well. |
Are you sayng it regressed in 0.4.16? |
The test was done with commit 23f5cd4f0, in PR ipfs/kubo#5286 |
I've been working on something in this area, it's currently coming up here. The initial add is rather slow and takes days, which is probably caused by the fact that I'm using a 1TB storage instance from time4vps. A recheck with no new files added can be complete in seconds, since all it does is listing the folder structure in mfs and comparing files existing on disk based on file change time. (I would have liked to use xattrs, but uh, nothing like that supported there. Linux 2.6.32-042stab133.2 and all other kinds of weirdness.) So all in all: This might work, but it likely requires a bit better hardware. |
Okay, so I am able to put all the gentoo distfiles onto IPFS in about half a day and even have a way to deal with the symlinks (not that that is necessary for a usable mirror). This could be optimized by increasing the block size to the maximum of 1 MB, that would leave a bit more than 400,000 blocks for the 400 GB. That's still too much. One further step of optimization would be to not provide all the blocks, but only provide files and folders. That might infrequently lead to some situations where one has the root block of a file but not the actual content blocks, and can't find them either. But I suppose that would be rare. Problem is: There are 66941 files and 542 folders (today). |
I think it is important to not forget to try and use standard settings if possible to reuse blocks between distros/mirrors |
To clarify - given the large number of files and the request that they each be individually findable (i.e. instead of finding the data as Some things you can do about this now:
This problem (making huge numbers of files accessible over the network) is IMO a pretty important one to deal with and has a lot of moving parts. I'm hoping go-ipfs will make some progress here next year, but given some of the complexities we'll just have to see how it goes 😃. |
thanks for putting in the effort and time to make this happen! |
I actually only need to find things by one root, e.g. as
[Edit:]
to be run after the sync but before the publish. It does seem to work with some oddities: e.g. I can't seem to |
@jcaesar Yes, that's absolutely a problem. I'd like to make Bitswap session a little more complex so that the session can track exactly why the request is being made. For example, I'm not just asking for QmFoo, I'm trying to download QmFoo because I was asked to get QmBar/field1. This means that if the DHT fails to find QmFoo, it could go up the path and eventually hit QmBar. This would mean that at the very least we'd ask the peer with QmBar if they have QmFoo which seems totally plausible. This isn't the be-all-end-all solution, but IMO it would be a useful step forward here. I suspect people will start putting together issues describing proposals for how we deal with this type of provider records problem over the next couple months and if/when that happens I'll do my best to remember to link to this issue 🤞. |
I came across this and was wondering if anyone of the previous authors re-tried their experiments using the latest IPFS version @NiKiZe @jcaesar ? Would be interested to know whether the issues described here are still relevant or whether this could be something that could actually be done. :) //edit: The issue in regard of symlinks still seems to be open. |
I think they are still relevant. And the best approach (scalable) would be if ipfs links could be included in the ebuilds |
I was running a Gentoo mirror with my ftp2mfs thingie on a time4vps storage vps. I deleted it three months ago because:
I think it should be possible to run a working gentoo mirror on IPFS if you spend a bit more hardware on it. If you'd like to try ftp2mfs yourself, you roughly have to:
@NiKiZe: Having the IPFS CID for each distfile in the ebuilds actually worsens the problem with the number of hashes you have to keep available to the DHT / makes the "provide only the folder" trick impossible. And you can absolutely run a successful mirror without having IPFS information in the package files, as the Arch mirror demonstrates. (As for the symlink issue: ftp2mfs does have a mechanism to resolve symlinks with copies (all copies are shallow in IPFS after all) and keep those copies up to date, but that isn't really necessary for a functioning distfiles mirror.) |
Hello, how active is the project. Are the packages still up to date? |
Just as #84 Gentoo could use the same concepts. Creating this issue to inform about and track progress.
Relevant Gentoo forum thread
Some info about Gentoo distfile mirrors; https://wiki.gentoo.org/wiki/Project:Infrastructure/Mirrors/Source
Don't know yet, but I expect everything to be 4-500GB of data.
Currently working on creating a gentoo distfiles mirror, using similar concepts to VictorBjelkholm/arch-mirror
Updates coming when I have succeed with initial sync and started testing.
Current WIP repo NiKiZe/Gentoo-distfiles-IPFS
The text was updated successfully, but these errors were encountered: