-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fix the slow xml marshaling when returning a large list of objects #1256
Conversation
return nil, err | ||
} | ||
totalObjects = append(totalObjects, objects...) | ||
err = b.db.Table(GetObjectsTableName(bucketName)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As all objects for the same buckets will only be stored into the same single sharded object table, only 1 query against GetObjectsTableName(bucketName) will be enough.
modular/gater/metadata_handler.go
Outdated
@@ -277,14 +276,21 @@ func (g *GateModular) listObjectsByBucketNameHandler(w http.ResponseWriter, r *h | |||
ContinuationToken: continuationToken, | |||
} | |||
|
|||
respBytes, err = xml.Marshal(grpcResponse) | |||
if format == "json" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supporting JSON format in all metadata API will be implemented soon. Here just implementing the json response for one API so that later others can follow the pattern.
modular/gater/metadata_handler.go
Outdated
@@ -611,6 +609,9 @@ func (m GfSpListObjectsByIDsResponse) MarshalXML(e *xml.Encoder, start xml.Start | |||
} | |||
|
|||
for k, v := range m.Objects { | |||
for i, c := range v.ObjectInfo.Checksums { | |||
v.ObjectInfo.Checksums[i] = []byte(base64.StdEncoding.EncodeToString(c)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the xml marshal happens, here I encode the checksum elements into base64 encoding string and convert it to byte slice. Later the xml marshaler will still marshal []byte to string by simply invoking string([]byte) but produce the correct base64 encoded string.
@@ -2283,7 +2278,7 @@ func (g *GateModular) getGroupMembersHandler(w http.ResponseWriter, r *http.Requ | |||
if requestStartAfter != "" { | |||
if ok := common.IsHexAddress(requestStartAfter); !ok { | |||
log.Errorw("failed to check start after", "start-after", requestStartAfter, "error", err) | |||
err = ErrInvalidHeader | |||
err = ErrInvalidQuery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor correction
@@ -2522,7 +2513,7 @@ func (g *GateModular) listPaymentAccountStreamsHandler(w http.ResponseWriter, r | |||
|
|||
if ok := common.IsHexAddress(paymentAccount); !ok { | |||
log.Errorw("failed to check payment account", "payment-account", paymentAccount, "error", err) | |||
err = ErrInvalidHeader | |||
err = ErrInvalidQuery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor correction
@@ -2881,170 +2893,3 @@ func (g *GateModular) getBucketSizeHandler(w http.ResponseWriter, r *http.Reques | |||
w.Header().Set(ContentTypeHeader, ContentTypeXMLHeaderValue) | |||
w.Write(respBytes) | |||
} | |||
|
|||
// processObjectsXmlResponse process the unhandled Uint id and checksum of object xml unmarshal | |||
func processObjectsXmlResponse(respBytes []byte, objects []*types.Object) (respBytesProcessed []byte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These code to search and replace xml content could be removed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
gorm.io/driver/mysql v1.4.6 | ||
gorm.io/gorm v1.24.5 | ||
) | ||
|
||
require ( | ||
google.golang.org/genproto/googleapis/api v0.0.0-20230711160842-782d3b101e98 // indirect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge this require
area.
Description
fix the slow xml marshaling when returning a large list of objects
The Problem To Be Solved
The default xml marshaler does not marshal the following type:
The default xml marshaler will simply marshal the following type ([]byte) to string by invoking string([]byte).
This will leads to unreadable strings in marshaled result.
Comparing to go json marshaler, the latter by default encodes []byte as a base64-encoded string , see https://cs.opensource.google/go/go/+/refs/tags/go1.21.4:src/encoding/json/encode.go;l=56-58
And the checksums data returned by greenfield chain API (e.g. Head Object) is also encoded as base64-encoded string.
Original Solution
To resolve the above 2 issues, the original solution was to modify the default marshaled xml string by inserting Int/Uint/checksums values. It works well in most situations but will be very slow for cases with large data (e.g. query 1000 objects for a given bucket in ListObjectsByBucketName API).
New Solution in this PR
This PR improves the solution by the following 2 aspects:
Changes
Notable changes: