-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clickhouse-local build small enough to fit in a Lambda function #29378
Comments
Lowering the size to 50 MB should be doable.
|
The binary from master almost fits:
|
Packing the official build with UPX yields a 77mb binary that takes ~1 second to start up. This doesn't use up much of /tmp or /dev/shm storage, and overhead is fine for a 30-60 second job. I think putting the binary in its own S3 bucket and downloading it at startup will work. Will report back later |
@occasionallydavid where the 50MB memory limit comes from? I did some experiments with clickhouse on aws lambda and was able to build and run even 2GB custom debug docker images that run successfully on lambda. https://docs.aws.amazon.com/lambda/latest/dg/images-create.html |
Just a little update on this, I got it working. ClickHouse needs a small patch to allow With 7076 MiB allocation (4 vCPUs), container boots in around 5500 ms, and can COUNT(*) 33 million rows from a local S3 zstd-compressed tab separated file at around 44 MiB/sec / 651 MiB/sec decompressed. With 10000 MiB allocation (6 vCPUS), 59 MiB/s / 873 MiB/s decompressed. That works out around $2.60/TB compared to $5.00/TB for Redshift Spectrum, and still without playing with an ARM build (which reduces cost by 20%) I haven't yet tried many complex queries. Lambda only has 500 MB The ideal goal for this is using around 220 functions to get 10 GiB/s throughput, then performing query in 2 steps.. first produces results for individual file / partition, then a final combining query. Do you think there is any easy way to tease useful information out of ClickHouse parser to automatically write a combining (or the partitioned) query? |
@occasionallydavid Good news! Maybe you can send a PR with the applied changes (a "draft" PR to be used as an example).
Yes and it is already implemented, see the |
I will definitely send a PR for that PR_SET_NAME change. Just thinking about how this could be 'done properly', and especially addressing the cost of UPX (900 ms per invocation -- approx 10% the entire runtime for my current input files), would it be crazy to think about a dedicated local-like mode just for Lambda? Current stripped+UPX'd binary is 48 MiB, almost comfortably small enough to fit in a ZIP rather than a Docker image (faster cold start). But to stay under 50 MiB, need wrapper code no larger than 2 MiB. A little static linked HTTP server would do it, but still on each invocation paying the UPX decompression cost. So what about reusing Poco HTTPServer etc ClickHouse already links, and implement the Lambda runtime interface directly? UPX would then only run once during cold start, and new special mode takes care to clean up and reinitialize after each invocation. It seems it would not be a huge amount of work, but maybe cleaning up state between runs is more involved than I imagine right now. In any case it's still unclear how useful this whole setup is in general. Lambda's storage + RAM limits are quite severe, so this may always only be of limited use for e.g. simple log filtering / counting tasks. |
We can implement self-extracting executable with zstd, it should be slightly better (I expect around 100..300 ms for decompression). It's possible to create a Docker image with overhead of only 1.5 MB: Trying to implement Lambda API in the existing HTTP server is possible and fairly easy.
I think, it's better to try Fargate. |
My experience: I did implement lambda runtime interface and distributed query processing across multiple lambdas. My conclusion is that it's unusable in the current lambda design. The major issue is the limitations in lambda request/response size, 6MB https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html That severely limits the amount of queries you can run, especially when there are aggregations which need to transfer intermediate results which can be on the order of GB/s. Also, no support for streaming. An alternative would be to spill requests/results to s3 and use eg Dynamo to implement communication between lambdas but things get complicated (hacky?) very quickly. I've noticed that google cloud run supports streaming grpc. That could be a more appropriate pay-per-use target. https://twitter.com/nvartolomei/status/1427409540790767616?t=yv5i60h0ljTYFXMvTpYlzQ&s=19 |
6 MB limit is ridiculous 🤣 And for distributed queries we have to somehow discover other lambdas and connect to the exposed ports directly... |
@nvartolomei for handling a mutating underlying set of files (e.g. growing logs), there is a lot of sense in writing individual results to S3, for a little extra cost in latency and price. This could be either 1:1 mapping between input object and some query result object, or 1:1 mapping to S3 multipart upload chunk. I was considering the former, as it means old results can be reused if a query is later re-run. It would also be possible to stream the result out over some TCP connection, but that would again need some extra design/infrastructure Re: Fargate, it is a good idea, but its cold start time is far too high for interactive use. edit: using |
Ok. Two more ideas:
Note: it won't work for queries that have more than one stage of coordination.
|
This sits far outside the Lambda model.. it is definitely possible to start some background process to wire up networking, but the concept of having functions communicate while not serving an active request is not intended. All functions must have some request active for each VM to remain unfrozen and to prevent random destruction by the orchestrator. It would definitely be nice to explore running clickhouse-server inside Lambda. I'm content focusing on -local, as I probably don't have enough experience for the -server route yet. Your pointer to AggregateFunction documentation was extremely useful, especially when combined with discovery of the ANTLR grammar. Will look at higher level execution problems later.. for now, just getting robust build of a wrapper + ClickHouse built for Amazon Linux is enough of a pain :) |
What is special about the build for Amazon Linux? |
Not much, just older libc, ancient compilers, and some build issues. There is a vendored I got it building with an ugly Dockerfile that also builds clang from scratch, as there don't seem to be any RPMs with a modern enough version of clang targeting the RHEL 7-like environment of Amazon Linux (though I didn't spend much time looking). The custom clang build may well be pointless, I tried that after seeing the readpassphrase() errors assuming it might be some problem with the ancient linker Current config is:
Lambda definition:
Example event:
|
@occasionallydavid Can you share your Dockerfile? |
Hi Mike, IIRC the attached Dockerfile build will fail, the final link command line is missing a reference to libreadpassphrase. You must edit the command line to include libreadpassphrase (which does get otherwise built) via Dockerfile.clickhouse-build.txt If you just want a binary to play with, there is one in the ZIP file at https://im-clickhouse-lambda.s3.eu-west-1.amazonaws.com/clickhouse.zip |
@occasionallydavid Thanks, I had hard time building ch until using your patch. |
Hi @mikeTWC1984, The custom bootstrap allow decompressing (via zstd) once at function cold start. Subsequent invocations do not pay the cost of decompression again, saving ~1-2 seconds per execution. See the comments above about memfd_create(). You can just use UPX, but there is this large expensive decompress step on every invocation in that case, which was a double digit percentage of my overall runs |
I have tried all of these (and much more, including NAT piercing) approx a year ago. |
I've compiled ClickHouse with clang-16 (trunk) on my machine with the patch #40460 and the following options:
It builds successfully, and the size of the binary is
Which is:
|
AWS Lambdas (and other virtualized platforms) lack of support for PR_SET_NAME causing a blocking exception. Pending an upstream PR or fix in ClickHouse, this patch allows this function to fail unharmed. The resulting executable has been tested on various platforms without drawbacks and discussed in clickhouse issue [29378](ClickHouse/ClickHouse#29378) $ sed -i '/Cannot set thread name/c\' /ClickHouse/src/Common/setThreadName.cpp
* PR_SET_NAME workaround AWS Lambdas (and other virtualized platforms) lack of support for PR_SET_NAME causing a blocking exception. Pending an upstream PR or fix in ClickHouse, this patch allows this function to fail unharmed. The resulting executable has been tested on various platforms without drawbacks and discussed in clickhouse issue [29378](ClickHouse/ClickHouse#29378) $ sed -i '/Cannot set thread name/c\' /ClickHouse/src/Common/setThreadName.cpp * Disable AVX2 support
* PR_SET_NAME workaround AWS Lambdas (and other virtualized platforms) lack of support for PR_SET_NAME causing a blocking exception. Pending an upstream PR or fix in ClickHouse, this patch allows this function to fail unharmed. The resulting executable has been tested on various platforms without drawbacks and discussed in clickhouse issue [29378](ClickHouse/ClickHouse#29378) $ sed -i '/Cannot set thread name/c\' /ClickHouse/src/Common/setThreadName.cpp * Disable AVX2 support
* PR_SET_NAME workaround AWS Lambdas (and other virtualized platforms) lack of support for PR_SET_NAME causing a blocking exception. Pending an upstream PR or fix in ClickHouse, this patch allows this function to fail unharmed. The resulting executable has been tested on various platforms without drawbacks and discussed in clickhouse issue [29378](ClickHouse/ClickHouse#29378) $ sed -i '/Cannot set thread name/c\' /ClickHouse/src/Common/setThreadName.cpp * Disable AVX2 support
* PR_SET_NAME workaround AWS Lambdas (and other virtualized platforms) lack of support for PR_SET_NAME causing a blocking exception. Pending an upstream PR or fix in ClickHouse, this patch allows this function to fail unharmed. The resulting executable has been tested on various platforms without drawbacks and discussed in clickhouse issue [29378](ClickHouse/ClickHouse#29378) $ sed -i '/Cannot set thread name/c\' /ClickHouse/src/Common/setThreadName.cpp * Disable AVX2 support
* PR_SET_NAME workaround AWS Lambdas (and other virtualized platforms) lack of support for PR_SET_NAME causing a blocking exception. Pending an upstream PR or fix in ClickHouse, this patch allows this function to fail unharmed. The resulting executable has been tested on various platforms without drawbacks and discussed in clickhouse issue [29378](ClickHouse/ClickHouse#29378) $ sed -i '/Cannot set thread name/c\' /ClickHouse/src/Common/setThreadName.cpp * Disable AVX2 support
* PR_SET_NAME workaround AWS Lambdas (and other virtualized platforms) lack of support for PR_SET_NAME causing a blocking exception. Pending an upstream PR or fix in ClickHouse, this patch allows this function to fail unharmed. The resulting executable has been tested on various platforms without drawbacks and discussed in clickhouse issue [29378](ClickHouse/ClickHouse#29378) $ sed -i '/Cannot set thread name/c\' /ClickHouse/src/Common/setThreadName.cpp * Disable AVX2 support
Use case
clickhouse-local
is probably the most powerful logs analysis tool I've ever come across, and it seems to obviate most of the pain with running a full ClickHouse installation (ops work, ETL) without sacrificing much (if any?) performance for offline batch tasks.For use cases where large amount of logs are stored in object stores like GCS or S3, ability to streaming read from object store and process with
clickhouse-local
is very desirable: no large expensive VMs running a permanent database with a copy of the original data.This raises the possibility: can ClickHouse be made totally serverless? It is very cheap to run short S3/GCS scan jobs from Lambda or Cloud Functions. Many organizations already use this pattern, but usually they write custom code to run in Lambda, but ClickHouse already provides for many use cases in a much nicer form. However current ClickHouse binary is much too large to fit in a Lambda ZIP file (current max: 50mb, vs. 300mb+ for current official binaries).
Describe the solution you'd like
Potentially a custom build, or simply some documentation steps (PGO build?), to slim down
clickhouse-local
to read only from S3/GCS/URLs and enough functionality disabled so it will fit within 50mb. Is it possible? I see lots of template-heavy C++. Perhaps this is why binary is so large, and it won't change :)Describe alternatives you've considered
Obvious alternative is spinning up a container or spot instance to run a job, but this requires inventing some ops framework for managing the VMs and containers. Lambda functions can be fed e.g. by SNS, auto-scaled according to queue of user queries with zero management.
Additional context
None. Low priority, just an idea, but one I've already tried because it makes so much sense over here.
The text was updated successfully, but these errors were encountered: