-
TL;DR What is the performance increase of implementing public S3 access and MongoDB. Has it ever been calculated? To the best of my knowledge the companies using these libraries are implementing production servers using Amazing Web Services. I am currently using your h5p-client and h5p-server ReactJS project, with several modifications, to implement a development, testing and production server for my employer. Very soon I will be implementing the backend. What is the performance difference between hosting everything inside one backend lambda function and hosting each part separately: MongoDB, S3, etc. for the library, content and metadata, respectively? Also, is there any idea how much performance will be gained if we use public over private S3 endpoints? I am preparing and collecting information about this problem, but if somebody with more experience on this issue could give some pointers, I would be much obliged. Also might be serving data to 100 users at first, then as much as 1000 and probably never more than 10000. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 18 replies
-
Do you mean the functionality to serve files directly from S3 as described in the section „ Increasing scalability by getting content files directly from S3“? Or do you mean how the Mongo/S3 storages compare to the file system storages? The Mongo/S3 storage classes are certainly superior to file system storage in general, especially if you list content. Their main benefit, however is scalability beyond a single instance as the file system storage classes won‘t work (well) across machines or containers. If you want to use the library in a scaled environment, you‘ll have to use Mongo/S3 (and Redis). If you use public S3 links as described in the docs section above, the performance gain depends a bit on several factors:
To sum it up, I don‘t think it‘s worth using the public S3 endpoint functionality if you don‘t have a very big user base. You always have to keep in mind that serving files directly from the S3 service is charged for by AWS (as far as I know) and that you have to put up something like CloudFront to really have high speed. We don‘t have measurements on performance gains. The Mongo / S3 content storage adapter is not in large-scale use that we manage ourselves. I wrote it as contractual work and I don‘t know how exactly the customer is using the adapter and how it performs depending on the configuration. I haven‘t heard of any complaints either, though. If you want to use the Mongo/S3 library storage adapter, I would highly recommend using CloudFront to serve the library files and to change the library files URL accordingly. H5P creates a ton of HTTP requests and they must be as fast as possible. h5p-server has a cache busting mechanism that will allow safe use of a CDN for library files. In general, I think most of the performance issues can be solved by having a super-fast and cached library files storage. It‘s also important to use the cached library storage to improve metadata access times in the library. The performance gains can be more than 1000% compared to uncached access. I haven‘t used Amazon Lambdas, as we deploy on a self-hosted Kubernetes. I think you should use AWS S3, DocumentDB and their Redis implemention as backend service. It should be fine to put all H5P functionality into a single lambda. H5P wasn‘t designed for a microservice approach, so splitting the core H5P functionality into several services is really hard, as they only have few routes that contain a lot of functionality. (I have a few thoughts on improving this, but I don‘t have time for this at the moment) Your custom functionality to perform CRUD on H5Ps can be separated into their own lambda, but I don‘t think you really have to. One think to bear in mind is that you might run into issues if users upload massive .h5p files. The library extracts these files to local temporary storage to perform validation and then persist them in the backend storage. (pure in memory validation turned out to be a problem in the past) From what I‘ve seen there‘s only 500 MB of temp storage in lambdas, so this might be a problem. Are your user numbers concurrent users or total users? If they are total users, I wouldn‘t worry too much about premature performance optimizations. |
Beta Was this translation helpful? Give feedback.
Do you mean the functionality to serve files directly from S3 as described in the section „ Increasing scalability by getting content files directly from S3“? Or do you mean how the Mongo/S3 storages compare to the file system storages?
The Mongo/S3 storage classes are certainly superior to file system storage in general, especially if you list content. Their main benefit, however is scalability beyond a single instance as the file system storage classes won‘t work (well) across machines or containers. If you want to use the library in a scaled environment, you‘ll have to use Mongo/S3 (and Redis).
If you use public S3 links as described in the docs section above, the performance gain depe…