-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datastore interface is a bottleneck #165
Comments
Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
Finally, remember to use https://discuss.ipfs.io if you just need general support. |
It might be beneficial to introduce the concept of the The The method In Go, see also SectionReader. |
This doesn't really seem like a datastore problem as much as an issue in the FUSE implementation. Sure you could create a However, the application layer (i.e. your FUSE implementation) probably has a better idea of what it's trying to do than a datastore might. For example, if a block of data is 1MiB and you're reading it in 4KiB chunks, but ultimately are likely to be reading the full block of data, you could just read the 1MiB from disk and hold it until the file descriptor closes or some LRU cache is exceeded. Given that most of the current implementations of the datastore interface that I know of do not implement If you could implement this more efficiently in the application layer we could then figure out if and how it might be extracted to work across more applications. Note: If you're working with the go-ipfs FUSE implementation I agree that it could use a number of improvements (performance and otherwise). You may want to check through this issue to see some work that was done. There was a contributor working on improving FUSE, but it he hasn't had the time to push things along. Perhaps you'd be interested in helping out. |
That's not the case :^/ I'm available to work on this if it's something the project wants, but I need someone with authority to discuss direction and decisions with me as well as just generally review things from time to time. Feel free to reach out and we can collaborate on this. |
Oh hey @djdv. Didn't mean to offend you at all, sorry if came off like that. If I left you hanging I apologize. I've been a bit swamped, but things are starting to clear up and we're making plans for next year. I'll reach out so we can catch up and figure out the next steps. |
This in fact might be seen as a problem of go-ipfs FUSE implementation but this will be inherent to any application using datastore interface. This is the reason for the issue. I also think that datastore user applications could be improved as @aschmahmann said but this will be putting lipstick on a pig. In the case of ds-flatfs, current implementation can be easily improved in many ways. @djdv, because of the garbage collector in Go, optimizations are less obvious than in most other languages. If the user can't manage the |
I think some might be concerned about backward compatibility and the issue of porting all libraries at once, this is not necessary when changing it in such a small way. A new datastore method can be added to the interface, keeping The main problem is allocating new byte slices on each call to In turn, this would allow us to provide alternative lazy-loading of values returned by the datastore interface. The same happens when you open a file on your file system. |
FlatFS is only one of many datastores using these interfaces (Badger, LevelDB, S3, Azure, Memory, Pebble, ...) there's nothing wrong with adding more to go-ds-flatfs and then checking for optimizations if the datastore supports it. If it turns out more than one or two databases support the given functionality we can talk about adding a new extension interface (e.g. TTLStore), but until then Go let's you just check the interface type without needing to declare that interface globally. |
@aschmahmann indeed maybe that could be extension using Go interface casting. |
Curious to see what the flamegraph looks like on this branch: https://github.com/djdv/go-ipfs/tree/tmp
I don't remember much about the original fuse implementation's use of the Datastore API directly, but I can say that in the branch above, a FileSystem interface is defined which is used to interface with FUSE and other host APIs. I suppose these things are considered to be a bridge between the 'File System' and 'Application' layers. For example, we're implementing A good example of this is the UFS wrapper |
As was mentioned above given that this request is coming from an application layer request (a FUSE implementation of go-ipfs wants to be able to read partial file data) and that there are a large number of steps and refactors that would be required in order to provide that functionality natively rather than on top of the application layer (e.g. by just caching the bytes instead of repeatedly retrieving them) I'm going to close this issue. For context some of the refactors that might be required to make this work include:
If someone is interested in this I'd recommend going from the application layer downwards rather than from the lowest layer interface upwards. Go makes it so you can just make new interfaces and do type checking on them rather than needing to change the base interface declaration which makes this even easier to do. |
Hey, I just want to raise an issue that datastore interface is a bottleneck for high-performance usage eq. in FUSE mount because it doesn't provide
Seek
or partial read function withoffset
andsize
.The text was updated successfully, but these errors were encountered: