-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LIVE-6506 parameter for concurrent message processing #1221
LIVE-6506 parameter for concurrent message processing #1221
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work 🔥
|
||
val deliveryServiceStream: Stream[IO, Fcm[IO]] = | ||
Stream.emits(fcmClients).covary[IO].flatMap(_.fold(e => Stream.raiseError[IO](e), c => Stream.eval[IO, Fcm[IO]]( IO.delay(new Fcm(c))))) | ||
Stream.emits(fcmClients).covary[IO].flatMap(_.fold(e => Stream.raiseError[IO](e), c => Stream.eval[IO, Fcm[IO]]( IO.pure(new Fcm(c))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with the IO
API. Curious about the decision to change the delay
function to a pure
one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we create the IO
with delay
method, the new Fcm
will be executed every time we evaluate this IO
object.
Now that we want to evaluate this IO object to get the Fcm
for every device token, it may be good not to create this Fcm
instance for every device? So I use pure
here to create the Fcm
instance and put this instance into this IO
object.
The Fcm
does not have any states so it should be safe to reuse the same instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation
.broadcastTo( | ||
reportBatchSuccesses(chunkedTokens, sentTime, functionStartTime, sqsMessageBatchSize), | ||
reportBatchSuccesses(chunkedTokens, sentTime, functionStartTime, sqsMessageBatchSize, awsRequestId), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good shout to log out the request id
@@ -35,15 +35,18 @@ trait SenderRequestHandler[C <: DeliveryClient] extends Logging { | |||
implicit val timer: Timer[IO] = IO.timer(ec) | |||
implicit val logger: Logger = LoggerFactory.getLogger(this.getClass) | |||
|
|||
def reportSuccesses[C <: DeliveryClient](chunkedTokens: ChunkedTokens, sentTime: Long, functionStartTime: Instant, sqsMessageBatchSize: Int): Pipe[IO, Either[DeliveryException, DeliverySuccess], Unit] = { input => | |||
logger.info("Java version: " + System.getProperty("java.version")) | |||
logger.info(s"Max heap size: ${Runtime.getRuntime().maxMemory()}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, good shout to log out available memory
event | ||
} | ||
val localTestContext: Context = new Context{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether a mock might be more appropriate here if we only need to have one of the variables defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to use mock objects, we may need to use the spec2
and extend the class to Mockito
(correct me if I was wrong), but this NotificationWorkerLocalRun
is not a test case. It is more of a command line Java entry class for us to run the Android sender lambda code locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I'm sorry, my mistake, I thought this was a test file. Yeah makes sense for a real invocation
What does this change?
We are evaluating the throughput of our new implementation with Firebase individual send operation on PROD.
When we tested it on fairly large notifications (with more than 40,000 subscribers), we saw that the Android sender lambda function threw an exception when it attempted to create threads. In these cases, the lambda was invoked with nearly 20 SQS messages and when it processed the 17th or 18th messages, it threw the exceptions and stopped working. It was terminated by AWS lambda service when it went over the 3 minute timeout. It appears that the lambda runtime may be have sufficient resources / file descriptor space to support 20 HttpClient instances.
This PR addresses this problem by -
How to test
I ran the android lambda locally simulating a batch of 4 messages each with 5 device tokens. I set the configuration as -
I was able to observe the logs written in the expected order -
On
CODE
, we sent a test notification and my Android emulator received it successfully.We were also able to see the AWS request ID in some of the logs we use to measure the throughput:
![Screenshot 2024-04-30 at 12 46 45](https://private-user-images.githubusercontent.com/89925410/326791360-669e23a3-49a2-4be9-a277-17d058695263.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NjYwNzUsIm5iZiI6MTczOTY2NTc3NSwicGF0aCI6Ii84OTkyNTQxMC8zMjY3OTEzNjAtNjY5ZTIzYTMtNDlhMi00YmU5LWEyNzctMTdkMDU4Njk1MjYzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE2VDAwMjkzNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTcwNzA5N2ZkMTI4YTM0YzkxMzIxM2M4MjZhNmY3MmUyZmY3MGMzMmQ1NWU4M2NjMWEzYTgyYmE0OTU1ZTY4MmYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.sYu_LnwehPOZrir1VFEs465RhBdo61Sh_Tn9Zu8NuF0)
![Screenshot 2024-04-30 at 12 36 11](https://private-user-images.githubusercontent.com/89925410/326789021-6239139a-b758-4fbe-a1a1-2944f6f6a7fd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NjYwNzUsIm5iZiI6MTczOTY2NTc3NSwicGF0aCI6Ii84OTkyNTQxMC8zMjY3ODkwMjEtNjIzOTEzOWEtYjc1OC00ZmJlLWExYTEtMjk0NGY2ZjZhN2ZkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE2VDAwMjkzNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQwMWEzMjNjOTFlY2ExYmNhOTU5YzUyYzNlYjY1ZWNiNGFkMmFhMGM5YjEzZTliNjdiYTgwMTRlMDA2NDdlNjImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.7B8YvQSvwJJuSdC48BhgGwxx7hNGxx5YhiP2fWsfRb0)
As an aside, we printed out the maximum JVM heap size obtained via
![Screenshot 2024-04-30 at 12 35 37](https://private-user-images.githubusercontent.com/89925410/326789700-84af00ee-a63a-4c65-9517-0b984c9d9234.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NjYwNzUsIm5iZiI6MTczOTY2NTc3NSwicGF0aCI6Ii84OTkyNTQxMC8zMjY3ODk3MDAtODRhZjAwZWUtYTYzYS00YzY1LTk1MTctMGI5ODRjOWQ5MjM0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE2VDAwMjkzNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTE4NjM4NTUwZGM1MjYzNDhiZmRkODMzYzY5NDZjNDMzZjg5YmRkNTY0NjU2MWM3MDhiNTQ3MDViNmI1M2Y0MzAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.C5qUzxAQubE_DLitscjrmJRYipN4eUgLEdciVk0wAB8)
Runtime.getRuntime().maxMemory()
calland it was around 10GB, nearly the amount we set for the lambda. It confirmed that the Java runtime was using nearly all the memory allocated.
How can we measure success?
We have been testing the implementation with small notifications on PROD. It is a success if
Have we considered potential risks?
Severe performance degradation or unexpected errors. We can revert back to multicast API via changing the
fcm.allowedTopicsForIndividualSend
parameter in SSM.