Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize parsing headers for pekko-http and akka-http #3575

Merged
merged 5 commits into from
Mar 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ private[akkahttp] object AkkaModel {

def parseHeadersOrThrowWithoutContentHeaders(hs: HasHeaders): Seq[HttpHeader] =
hs.headers
.map(parseHeaderOrThrow)
.filterNot(h => h.is(ctHeaderNameLowerCase) || h.is(clHeaderNameLowerCase) || h.is(teHeaderNameLowerCase))
.map(parseHeaderOrThrow)

def parseHeaderOrThrow(h: Header): HttpHeader =
HttpHeader.parse(h.name, h.value) match {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ private[akkahttp] class AkkaToResponseBody(implicit m: Materializer, ec: Executi
}

private def parseContentType(ct: String): ContentType =
ContentType.parse(ct).getOrElse(throw new IllegalArgumentException(s"Cannot parse content type: $ct"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if can get some "wild" media types, e.g. with some parameters, which could fill our cache with junk. We have a list of (currently) 1208 mime types in mimeByExtension.txt, but it has some very weird media types, and parsing them upfront would take significant time. Or maybe we could cache only if contentType.mediaType.params.isEmpty? (well .. maybe except for the charset)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of storing only values which have empty params (except the charset), this should significantly limit the cases where the cache is not useful after being filled with a lot of types.

ContentTypeCache.getOrParse(ct)

private def charsetToHttpCharset(charset: Charset): HttpCharset = HttpCharset.custom(charset.name())

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
package sttp.tapir.server.akkahttp

import akka.http.scaladsl.model.ContentType
import scala.collection.concurrent.TrieMap

/** Pekko-specific ConentType has to be created if an endpoint overrides it, but we want to reduce overhead of the expensive
* ContentType.parse operation if possible. Parsing may also happen for cases not listed explictly in
* PekkoToResponseBody.formatToContentType. This cache doesn't have to save atomically, because the worst case scenario is that we parse he
* same header a few times before it's saved. The cache is not cleared, because the number of different content types is limited and the
* cache is not expected to grow too much. The only exception is when there is a boundary in the header, but in such situation the endpoint
* contentType shouldn't be overriden. Just in case this happens, we limit the cache size.
*/
private[akkahttp] object ContentTypeCache {
private val cache = TrieMap[String, ContentType]()
private val Limit = 100

def getOrParse(headerValue: String): ContentType = {
cache.get(headerValue) match {
case Some(contentType) =>
contentType
case None =>
val contentType =
ContentType.parse(headerValue).getOrElse(throw new IllegalArgumentException(s"Cannot parse content type: $headerValue"))
// We don't want to fill the cache with parameterized media types (BTW charset does not appear in params)
val _ = if (cache.size <= Limit && contentType.mediaType.params.isEmpty) cache.putIfAbsent(headerValue, contentType)
contentType
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
package sttp.tapir.server.pekkohttp

import org.apache.pekko.http.scaladsl.model.ContentType
import scala.collection.concurrent.TrieMap

/** Pekko-specific ConentType has to be created if an endpoint overrides it, but we want to reduce overhead of the expensive
* ContentType.parse operation if possible. Parsing may also happen for cases not listed explictly in
* PekkoToResponseBody.formatToContentType. This cache doesn't have to save atomically, because the worst case scenario is that we parse he
* same header a few times before it's saved. The cache is not cleared, because the number of different content types is limited and the
* cache is not expected to grow too much. The only exception is when there is a boundary in the header, but in such situation the endpoint
* contentType shouldn't be overriden. Just in case this happens, we limit the cache size.
*/
private[pekkohttp] object ContentTypeCache {
private val cache = TrieMap[String, ContentType]()
private val Limit = 100

def getOrParse(headerValue: String): ContentType = {
cache.get(headerValue) match {
case Some(contentType) =>
contentType
case None =>
val contentType =
ContentType.parse(headerValue).getOrElse(throw new IllegalArgumentException(s"Cannot parse content type: $headerValue"))
// We don't want to fill the cache with parameterized media types (BTW charset does not appear in params)
val _ = if (cache.size <= Limit && contentType.mediaType.params.isEmpty) cache.putIfAbsent(headerValue, contentType)
contentType
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ private[pekkohttp] object PekkoModel {

def parseHeadersOrThrowWithoutContentHeaders(hs: HasHeaders): Seq[HttpHeader] =
hs.headers
.map(parseHeaderOrThrow)
.filterNot(h => h.is(ctHeaderNameLowerCase) || h.is(clHeaderNameLowerCase) || h.is(teHeaderNameLowerCase))
.map(parseHeaderOrThrow)

def parseHeaderOrThrow(h: Header): HttpHeader =
HttpHeader.parse(h.name, h.value) match {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ private[pekkohttp] class PekkoToResponseBody(implicit m: Materializer, ec: Execu
}

private def parseContentType(ct: String): ContentType =
ContentType.parse(ct).getOrElse(throw new IllegalArgumentException(s"Cannot parse content type: $ct"))
ContentTypeCache.getOrParse(ct)

private def charsetToHttpCharset(charset: Charset): HttpCharset = HttpCharset.custom(charset.name())

Expand Down
Loading