Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Using threading.Thread in a Function #755

Closed
Stael opened this issue Oct 8, 2020 · 4 comments
Closed

[Question] Using threading.Thread in a Function #755

Stael opened this issue Oct 8, 2020 · 4 comments

Comments

@Stael
Copy link

Stael commented Oct 8, 2020

Hello !
We are currently using several AzureFunctions in production.

Use case:

  • Part of the functions are used to receive data from HTTP Triggers and send those data to EventHub using the azure-eventhub package. (Previously we used confluent-kafka and the Kafka API for Event Hub but recently we have started facing unsolvable issues (because of Kafka API for Event Hub behavior) so we migrated to azure-eventhub package smoothly.)
  • The other part is used to process those data through EventHub Triggers.

Usage:

from azure.eventhub import EventHubProducerClient, EventData
client = EventHubProducerClient.from_connection_string(_CONNECTION_STRING_)
client.send_batch([EventData(_SOME_DATA_)])
  • The first time we send some data it takes > 600 ms and < 1 sec
  • Then it's very fast, around 50ms per "send"
  • However, if we do not send any data for more than 240sec the connection gets closed. And to re-open it, it takes ~ 1.5 sec and sometimes more than 2.5 sec.

Issue:

  • One of the functions is actually used to redirect users (while sending data about it to EventHub).
  • It is possible from a business standpoint to have no user redirected for more than 240 sec.
  • It is not acceptable for the user to wait. Which means that we cannot open a new connection everytime / cannot have the user wait for the connection to be re-opened.

Question:

  • Is it "ok" from a Function architecture standpoint to run client.send_batch([EventData(_SOME_DATA_)]) in a thread (threading.Thread) ? That way we can redirect the user "almost instantaneously" and "in background" we send the data to EventHub.
  • I have done some testing and it seems to work fine. But I would like to have confirmation from you that there are no potential caveats before deploying it to production.

Thanks !

@Hazhzeng
Copy link
Contributor

Hazhzeng commented Oct 9, 2020

Hi @Stael, thanks for providing the example and sharing your scenario.
If you're running on Linux Consumption, the re-open time you mentioned is introduced by the ColdStart. The function instance is stateless, and it will be deallocated after a long idle time (that also mean the whole Python process will shutdown and so does the threads inside).
In case if your thread is handling stateless task, I think Linux Consumption SKU is your best choice.
If you want to thread to handle stateful tasks and mitigate the ColdStart, I would suggest Linux Dedicated or Linux Premium plan.

@Stael
Copy link
Author

Stael commented Oct 9, 2020

Hi @Hazhzeng,
Thank you for looking into my issue !

Here are more details on what actually happens.

We are using:

  • Premium V2 pricing tier
  • Operating system: Linux
  • Plan type: App service plan

I do not think that the re-open time is linked to the ColdStart, here is why:

  1. When I reproduce the schema: connection / send / wait for disconnection / send locally, I have more or less the same execution time as when I run it inside an AzureFunction. Which proves, imho, that opening / re-opening a connection to EventHub takes some time (which is absolutely fine). Visible here
  2. When I use a thread in AzureFunction to send the data, no matter how long I wait, the execution time is always < 50ms. Which suggests, imho, that there is little to no cold start. Visible here

Here is a scaffolding of our implementation:

def execute_in_thread(f):
    def wrapper(*args, **kwargs):
        thread = Thread(target=f, args=args, kwargs=kwargs)
        thread.start()

    return wrapper

@execute_in_thread
def send_data(client: EventHubProducerClient, data: str):
    client.send_batch([EventData(data)])

Given our configuration (Linux / App service plan / Premium v2 pricing tier), can you ensure me that there are no potential caveats with our use of the Python builtin threading library within an Azure Function app ?

Thanks a lot !

@Hazhzeng
Copy link
Contributor

Hazhzeng commented Oct 9, 2020

Thanks @Stael for the quick turn around. If your function app is running on Premium V2, I don't see any caveats.
I'm closing the thread now, but feel free to reopen it and let me know if you have any other questions.

@Hazhzeng Hazhzeng closed this as completed Oct 9, 2020
@Stael
Copy link
Author

Stael commented Oct 10, 2020

Awesome ! Thank you very much for your help @Hazhzeng, really appreciate it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants