Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Too many operations in progress" in FrontJsonFapi fetching #26335

Closed
rtyley opened this issue Jul 20, 2023 · 2 comments · Fixed by #26338
Closed

"Too many operations in progress" in FrontJsonFapi fetching #26335

rtyley opened this issue Jul 20, 2023 · 2 comments · Fixed by #26338

Comments

@rtyley
Copy link
Member

rtyley commented Jul 20, 2023

There have been several recent incidents (13/07/23: retro, timeline)where the serving of fronts (eg https://www.theguardian.com/uk) to users became slow, and in fact failed.

There's a lot of things going on in these incidents, but in particular we do see these TooManyOperationsInProgress errors in the logs:

Failed rendering world with concurrent.FutureSemaphore$TooManyOperationsInProgress: Too many operations in progress, 
cannot execute task
concurrent.FutureSemaphore$TooManyOperationsInProgress: Too many operations in progress, cannot execute task
	at concurrent.FutureSemaphore.execute(FutureSemaphore.scala:16)
	at services.fronts.FrontJsonFapi.parsePressedPage(FrontJsonFapi.scala:34)
	at services.fronts.FrontJsonFapi.$anonfun$pressedPageFromS3$2(FrontJsonFapi.scala:58)

image

The code in FrontJsonFapi where this issue is occurring is interesting, because it has two implementation details that are worth revisiting: it's non-async, and it uses a semaphore to put a hard limit on how many operations can occur concurrently.

private def parsePressedPage(
jsonStringOpt: Option[String],
)(implicit executionContext: ExecutionContext): Future[Option[PressedPage]] =
futureSemaphore.execute {
blockingOperations.executeBlocking {
jsonStringOpt.map { jsonString =>
DurationMetric.withMetrics(FaciaPressMetrics.FrontDecodingLatency) {
// This operation is run in the thread pool since it is very CPU intensive
Json.parse(jsonString).as[PressedPage]
}
}
}
}
private def loadPressedPageFromS3(path: String) =
blockingOperations.executeBlocking {
DurationMetric.withMetrics(FaciaPressMetrics.FrontDownloadLatency) {
S3.getGzipped(path)
}
}
private def pressedPageFromS3(
path: String,
)(implicit executionContext: ExecutionContext): Future[Option[PressedPage]] =
errorLoggingF(s"FrontJsonFapi.pressedPageFromS3 $path") {
for {
s3FrontData <- loadPressedPageFromS3(path)
pressedPage <- parsePressedPage(s3FrontData)
} yield pressedPage
}

georgeblahblah added a commit that referenced this issue Jul 20, 2023
We believe that the limit was too low. There are 2 instances of this
class sharing 128 threads, so it can probably be higher.

See #26335 and
#18331 for the original change.
rtyley added a commit that referenced this issue Jul 20, 2023
Context: #26335

In this change:

* Use AWS SDK v2 and non-blocking async code to avoid blocking a thread
  while downloading the Facia JSON data from S3
* Directly transform bytes into JsValue, without constructing a `String`
  first

Note that the work done between the two metrics `FrontDownloadLatency`
& `FrontDecodingLatency` has changed slightly - the conversion to the
basic JSON model (JsValue) now occurs in `FrontDownloadLatency`, rather
than the 'decoding' step.

We could get rid of the futureSemaphore and stop using the dedicated
blocking-operations pool here, but we'll leave that to another PR.

Co-authored-by: Ravi <[email protected]>
Co-authored-by: Ioanna Kokkini <[email protected]>
@rtyley
Copy link
Member Author

rtyley commented Jul 27, 2023

We've had 2 TooManyOperationsInProgress outages recently (ELK logs):

  • 13th July : 12:35 - 12:55 (20 minutes) - 49662 errors (peaking at 5133 requests per minute) over just 356 paths image Initially there were 3 facia app-servers running (this was immediately after a deploy, facia scaled up to 12 across the course of the outage) image

  • 19th July : 9:35 - 9:57 (22 minutes) - 58577 errors (peaking at 4618 requests per minute) image

I've exported the data for these outages out of the ELK, just trying to understand what paths are being requested

The rate of 5133 errors per minute is really high, given that fronts have a CDN cache time of 1 minute, and that only 356 different paths were being requested (or at least erroring) in the outage - the number of requests is 5133/356 = 14 times what an optimal request rate would be. In other words, if requests had been succeeding, and getting cached, there would have been 14x less traffic - but they were failing, being throttled by futureSemaphore.

parsePressedPage() takes 140ms when it's healthy, meaning the upper limit on a single healthy app server with the futureSemaphore 32-concurrent-actions limit is 13,700 per minute. During the outage, parsePressedPage() took 1937ms as it contested CPU with GC, giving an upper limit of 990 per minute - well below the incoming request rate - it couldn't catch up without 6 app servers running at the very least.

@rtyley
Copy link
Member Author

rtyley commented Jul 28, 2023

I described a possible failure mode for Facia here: #26336 (comment) - basically, futureSemaphore would throw away 'download' work already done (ie downloading entire S3 files to Strings in memory) because of it's max-32-concurrent-actions limit on decoding, rather than holding on to that work and completing it so that a successful result could be returned to the CDN and cached - lack of CDN caching meant that many requests were repeated, and this would probably have led to increased GC pressure.

This graph shows how many downloads vs decodes Facia was doing. When the server was healthy, these were the same number - when things go wrong, you can see we are doing many more downloads than decodes - all that the extra download work is achieving is generating more garbage collection & consuming threads from the blockingOperations thread pool.

image

rtyley added a commit that referenced this issue Aug 2, 2023
Context: #26335

In this change:

* Use AWS SDK v2 and non-blocking async code to avoid blocking a thread
  while downloading the Facia JSON data from S3
* Directly transform bytes into JsValue, without constructing a `String`
  first

Note that the work done between the two metrics `FrontDownloadLatency`
& `FrontDecodingLatency` has changed slightly - the conversion to the
basic JSON model (JsValue) now occurs in `FrontDownloadLatency`, rather
than the 'decoding' step.

We could get rid of the futureSemaphore and stop using the dedicated
blocking-operations pool here, but we'll leave that to another PR.

Co-authored-by: Ravi <[email protected]>
Co-authored-by: Ioanna Kokkini <[email protected]>
rtyley pushed a commit that referenced this issue Aug 2, 2023
Context: #26335

In this change:

* Use AWS SDK v2 and non-blocking async code to avoid blocking a thread
  while downloading the Facia JSON data from S3
* Directly transform bytes into JsValue, without constructing a `String`
  first

Note that the work done between the two metrics `FrontDownloadLatency`
& `FrontDecodingLatency` has changed slightly - the conversion to the
basic JSON model (JsValue) now occurs in `FrontDownloadLatency`, rather
than the 'decoding' step.

We could get rid of the futureSemaphore and stop using the dedicated
blocking-operations pool here, but we'll leave that to another PR.

Co-authored-by: Ravi <[email protected]>
Co-authored-by: Ioanna Kokkini <[email protected]>
rtyley pushed a commit that referenced this issue Aug 2, 2023
Context: #26335

In this change:

* Use AWS SDK v2 and non-blocking async code to avoid blocking a thread
  while downloading the Facia JSON data from S3
* Directly transform bytes into JsValue, without constructing a `String`
  first

Note that the work done between the two metrics `FrontDownloadLatency`
& `FrontDecodingLatency` has changed slightly - the conversion to the
basic JSON model (JsValue) now occurs in `FrontDownloadLatency`, rather
than the 'decoding' step.

We could get rid of the futureSemaphore and stop using the dedicated
blocking-operations pool here, but we'll leave that to another PR.

Co-authored-by: Ravi <[email protected]>
Co-authored-by: Ioanna Kokkini <[email protected]>
rtyley pushed a commit that referenced this issue Aug 2, 2023
Context: #26335

In this change:

* Use AWS SDK v2 and non-blocking async code to avoid blocking a thread
  while downloading the Facia JSON data from S3
* Directly transform bytes into JsValue, without constructing a `String`
  first

Note that the work done between the two metrics `FrontDownloadLatency`
& `FrontDecodingLatency` has changed slightly - the conversion to the
basic JSON model (JsValue) now occurs in `FrontDownloadLatency`, rather
than the 'decoding' step.

We could get rid of the futureSemaphore and stop using the dedicated
blocking-operations pool here, but we'll leave that to another PR.

Co-authored-by: Ravi <[email protected]>
Co-authored-by: Ioanna Kokkini <[email protected]>
rtyley pushed a commit that referenced this issue Aug 8, 2023
Context: #26335

In this change:

* Use AWS SDK v2 and non-blocking async code to avoid blocking a thread
  while downloading the Facia JSON data from S3
* Directly transform bytes into JsValue, without constructing a `String`
  first

Note that the work done between the two metrics `FrontDownloadLatency`
& `FrontDecodingLatency` has changed slightly - the conversion to the
basic JSON model (JsValue) now occurs in `FrontDownloadLatency`, rather
than the 'decoding' step.

We could get rid of the futureSemaphore and stop using the dedicated
blocking-operations pool here, but we'll leave that to another PR.

Co-authored-by: Ravi <[email protected]>
Co-authored-by: Ioanna Kokkini <[email protected]>
rtyley pushed a commit that referenced this issue Aug 9, 2023
Context: #26335

In this change:

* Use AWS SDK v2 and non-blocking async code to avoid blocking a thread
  while downloading the Facia JSON data from S3
* Directly transform bytes into JsValue, without constructing a `String`
  first

Note that the work done between the two metrics `FrontDownloadLatency`
& `FrontDecodingLatency` has changed slightly - the conversion to the
basic JSON model (JsValue) now occurs in `FrontDownloadLatency`, rather
than the 'decoding' step.

We could get rid of the futureSemaphore and stop using the dedicated
blocking-operations pool here, but we'll leave that to another PR.

Co-authored-by: Ravi <[email protected]>
Co-authored-by: Ioanna Kokkini <[email protected]>
sophie-macmillan pushed a commit that referenced this issue Aug 9, 2023
* Switch to async AWS SDK v2 for Facia JSON download

Context: #26335

In this change:

* Use AWS SDK v2 and non-blocking async code to avoid blocking a thread
  while downloading the Facia JSON data from S3
* Directly transform bytes into JsValue, without constructing a `String`
  first

Note that the work done between the two metrics `FrontDownloadLatency`
& `FrontDecodingLatency` has changed slightly - the conversion to the
basic JSON model (JsValue) now occurs in `FrontDownloadLatency`, rather
than the 'decoding' step.

We could get rid of the futureSemaphore and stop using the dedicated
blocking-operations pool here, but we'll leave that to another PR.

Co-authored-by: Ravi <[email protected]>
Co-authored-by: Ioanna Kokkini <[email protected]>

* Reduce CPU & network consumption of Facia JSON download

This change introduces ETag-based caching with the new
https://github.com/guardian/etag-caching library, yielding
these benefits:

* Significant savings in terms of resources:
  * **Network**: `PressedPage`s are only re-downloaded if the S3 content has _changed_
  * **CPU**: Cached `PressedPage`s are only re-parsed if the S3 content has _changed_
* Retains the 'stay current' behaviour of the old, non-caching solution: S3 queried with every request to ensure the ETag is up-to-date.
* Also, given the cache is based on the ([S](https://github.com/blemale/scaffeine))[Caffeine](https://github.com/ben-manes/caffeine) caching library:
  * In-flight requests for a given key are unified, so 100 simultaneous requests for a key make just **1** fetch-and-parse

Currently `facia` runs on [`c6g.2xlarge`](https://instances.vantage.sh/aws/ec2/c6g.2xlarge) instances (16GB RAM), with a JVM heap of 12GB. I'm going to suggest that it's reasonable for the `ETagCache` to consume up to 4GB of the 12GB RAM.

There are ~300 fronts, each of which can have [4 different variants](https://github.com/guardian/frontend/blob/f64c5b681f53a9c87ae1e76c529d08b3ad16ef6b/common/app/model/PressedPage.scala#L68-L86), so currently there could be up to 1200 different `PressedPage` objects. From [heapdump](https://drive.google.com/file/d/1yjfqLMpWqww6-3L8RBN0yrAf9hQHHpSC/view?usp=sharing) analysis, the largest `PressedPage` retains ~22MB of memory (most instances are smaller, averaging at 4MB). With the 4GB budget, and assuming a worse case of 22MB per `PressedPage`, we can afford to set a max size on the cache of ~180 entries. Although it's disappointing we can't hold _all_ of the `PressedPage`s, the [priorities of the eviction policy used by the Caffeine caching library](https://github.com/ben-manes/caffeine/wiki/Efficiency) should ensure that we get a good hit rate on the most in-demand fronts.

_Incidentally, heapdump analysis also shows that some structures within `PressedPage` are memory inefficient when considered from the perspective of holding many `PressedPage` objects in memory at once - using object pooling on `model.Tag` for instance, would probably lead to a 80% reduction of the total retained memory._

---------

Co-authored-by: George B <[email protected]>
Co-authored-by: Ravi <[email protected]>
Co-authored-by: Ioanna Kokkini <[email protected]>
rtyley added a commit that referenced this issue Aug 10, 2023
ETag Caching was introduced for Facia `PressedPage` JSON downloading
with #26338 in order to improve
scalability and address #26335,
but a limiting factor was the number of `PressedPage` objects that could
be stored in the cache.

With a max `PressedPage` size of 22MB and a memory budget of 4GB, a
cautious max cache size limit of only 180 `PressedPage` objects was set.
As a result, the cache hit rate was relatively low, and we saw elevated GC,
probably because of object continually being evicted out of the small cache:

#26338 (comment)

The change in this new commit dramatically reduces the combined size of
the `PressedPage` objects held in memory, taking the average retained size
per `PressedPage` from 4MB to 0.5MB (based on a sample of 125 `PressedPage`
objects held in memory at the same time).

It does this by deduplicating the `Tag` objects held by the `PressedPage`s.
Previously, as the `Tag`s for different `PressedPage`s were deserialised
from JSON, many identical tags would created over and over again, and held
in memory. After dedeuplication, those different `PressedPage`s will all
reference the _same_ `Tag` object for a given tag.

The deduplication is done as the `Tag`s are deserialised - a new cache
(gotta love caches!) holds `Tag`s keyed by their hashcode and tag id,
and if a new `Tag` is created with a matching key, it's thrown away, and
the old one is used instead. Thus we end up with just one instance of
that `Tag`, instead of many duplicated ones.

See also:

* https://en.wikipedia.org/wiki/String_interning - a similar technique
  used by Java for Strings: https://www.geeksforgeeks.org/interning-of-string/
rtyley added a commit that referenced this issue Aug 10, 2023
ETag Caching was introduced for Facia `PressedPage` JSON downloading
with #26338 in order to improve
scalability and address #26335,
but a limiting factor was the number of `PressedPage` objects that could
be stored in the cache.

With a max `PressedPage` size of 22MB and a memory budget of 4GB, a
cautious max cache size limit of only 180 `PressedPage` objects was set.
As a result, the cache hit rate was relatively low, and we saw elevated GC,
probably because of object continually being evicted out of the small cache:

#26338 (comment)

The change in this new commit dramatically reduces the combined size of
the `PressedPage` objects held in memory, taking the average retained size
per `PressedPage` from 4MB to 0.5MB (based on a sample of 125 `PressedPage`
objects held in memory at the same time).

It does this by deduplicating the `Tag` objects held by the `PressedPage`s.
Previously, as the `Tag`s for different `PressedPage`s were deserialised
from JSON, many identical tags would created over and over again, and held
in memory. After dedeuplication, those different `PressedPage`s will all
reference the _same_ `Tag` object for a given tag.

The deduplication is done as the `Tag`s are deserialised - a new cache
(gotta love caches!) holds `Tag`s keyed by their hashcode and tag id,
and if a new `Tag` is created with a matching key, it's thrown away, and
the old one is used instead. Thus we end up with just one instance of
that `Tag`, instead of many duplicated ones.

See also:

* https://en.wikipedia.org/wiki/String_interning - a similar technique
  used by Java for Strings: https://www.geeksforgeeks.org/interning-of-string/
rtyley added a commit that referenced this issue Sep 5, 2023
ETag Caching was introduced for Facia `PressedPage` JSON downloading
with #26338 in order to improve
scalability and address #26335,
but a limiting factor was the number of `PressedPage` objects that could
be stored in the cache.

With a max `PressedPage` size of 22MB and a memory budget of 4GB, a
cautious max cache size limit of only 180 `PressedPage` objects was set.
As a result, the cache hit rate was relatively low, and we saw elevated GC,
probably because of object continually being evicted out of the small cache:

#26338 (comment)

The change in this new commit dramatically reduces the combined size of
the `PressedPage` objects held in memory, taking the average retained size
per `PressedPage` from 4MB to 0.5MB (based on a sample of 125 `PressedPage`
objects held in memory at the same time).

It does this by deduplicating the `Tag` objects held by the `PressedPage`s.
Previously, as the `Tag`s for different `PressedPage`s were deserialised
from JSON, many identical tags would created over and over again, and held
in memory. After dedeuplication, those different `PressedPage`s will all
reference the _same_ `Tag` object for a given tag.

The deduplication is done as the `Tag`s are deserialised - a new cache
(gotta love caches!) holds `Tag`s keyed by their hashcode and tag id,
and if a new `Tag` is created with a matching key, it's thrown away, and
the old one is used instead. Thus we end up with just one instance of
that `Tag`, instead of many duplicated ones.

See also:

* https://en.wikipedia.org/wiki/String_interning - a similar technique
  used by Java for Strings: https://www.geeksforgeeks.org/interning-of-string/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant