Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix the slow xml marshaling when returning a large list of objects #1256

Merged
merged 6 commits into from
Nov 30, 2023

Conversation

ruojunm
Copy link
Collaborator

@ruojunm ruojunm commented Nov 29, 2023

Description

fix the slow xml marshaling when returning a large list of objects

The Problem To Be Solved

  • The default xml marshaler does not marshal the following type:

    • github.com/bnb-chain/greenfield-cosmos-sdk/math.int
    • github.com/bnb-chain/greenfield-cosmos-sdk/math.uint
  • The default xml marshaler will simply marshal the following type ([]byte) to string by invoking string([]byte).

    • [][]byte for *types.ObjectInfo.Checksums of GfSpListObjectsByBucketNameResponse
    • [][]byte for *types.ObjectInfo.Checksums of GfSpGetObjectMetaResponse
    • [][]byte for *types.ObjectInfo.Checksums of GfSpListObjectsByIDsResponse

    This will leads to unreadable strings in marshaled result.
    Comparing to go json marshaler, the latter by default encodes []byte as a base64-encoded string , see https://cs.opensource.google/go/go/+/refs/tags/go1.21.4:src/encoding/json/encode.go;l=56-58

    And the checksums data returned by greenfield chain API (e.g. Head Object) is also encoded as base64-encoded string.

Original Solution

To resolve the above 2 issues, the original solution was to modify the default marshaled xml string by inserting Int/Uint/checksums values. It works well in most situations but will be very slow for cases with large data (e.g. query 1000 objects for a given bucket in ListObjectsByBucketName API).

New Solution in this PR

This PR improves the solution by the following 2 aspects:

  1. Implement "MarshalXML" method for
  • github.com/bnb-chain/greenfield-cosmos-sdk/math.int
  • github.com/bnb-chain/greenfield-cosmos-sdk/math.uint
  1. Encode checksums into base64-encoded string and convert it to byte[], before the real XML marshaling happens. See
  • func (m GfSpListObjectsByBucketNameResponse) MarshalXML
  • func (m GfSpGetObjectMetaResponse) MarshalXML
  • func (m GfSpListObjectsByIDsResponse) MarshalXML
  1. Remove code which was to search/replace in xml results

Changes

Notable changes:

  • Refine the Object query implementation from querying 64 sharded object tables to query 1 object table. (See filterObjects method in prefix.go)
  • xml marshaler for cosmos Unit type (feat: add xml marshal for Uint type greenfield-cosmos-sdk#364)
  • xml marshaler for cosmos Int type (feat: add xml marshal for Int type greenfield-cosmos-sdk#365)
  • Encode checksums into base64-encoded string before xml marshaling
  • Making UT coverage more than 80% for the following APIs
    1. ListObjectsByBucketNameHandler
    2. GetObjectMetaHandler
    3. ListObjectsByIDsHandler
    4. ListGroupsByIDsHandler
    5. GetGroupListHandler
    6. ListPaymentAccountStreamsHandler
    7. ListUserPaymentAccountsHandler
    8. ListBucketsByIDsHandler
    9. GetUserGroupsHandler
    10. GetGroupMembersHandler
    11. GetUserOwnedGroupsHandler
    12. GetBucketMetaHandler
    13. GetUserBucketsHandler

return nil, err
}
totalObjects = append(totalObjects, objects...)
err = b.db.Table(GetObjectsTableName(bucketName)).
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As all objects for the same buckets will only be stored into the same single sharded object table, only 1 query against GetObjectsTableName(bucketName) will be enough.

@@ -277,14 +276,21 @@ func (g *GateModular) listObjectsByBucketNameHandler(w http.ResponseWriter, r *h
ContinuationToken: continuationToken,
}

respBytes, err = xml.Marshal(grpcResponse)
if format == "json" {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting JSON format in all metadata API will be implemented soon. Here just implementing the json response for one API so that later others can follow the pattern.

@@ -611,6 +609,9 @@ func (m GfSpListObjectsByIDsResponse) MarshalXML(e *xml.Encoder, start xml.Start
}

for k, v := range m.Objects {
for i, c := range v.ObjectInfo.Checksums {
v.ObjectInfo.Checksums[i] = []byte(base64.StdEncoding.EncodeToString(c))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the xml marshal happens, here I encode the checksum elements into base64 encoding string and convert it to byte slice. Later the xml marshaler will still marshal []byte to string by simply invoking string([]byte) but produce the correct base64 encoded string.

@@ -2283,7 +2278,7 @@ func (g *GateModular) getGroupMembersHandler(w http.ResponseWriter, r *http.Requ
if requestStartAfter != "" {
if ok := common.IsHexAddress(requestStartAfter); !ok {
log.Errorw("failed to check start after", "start-after", requestStartAfter, "error", err)
err = ErrInvalidHeader
err = ErrInvalidQuery
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor correction

@@ -2522,7 +2513,7 @@ func (g *GateModular) listPaymentAccountStreamsHandler(w http.ResponseWriter, r

if ok := common.IsHexAddress(paymentAccount); !ok {
log.Errorw("failed to check payment account", "payment-account", paymentAccount, "error", err)
err = ErrInvalidHeader
err = ErrInvalidQuery
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor correction

@@ -2881,170 +2893,3 @@ func (g *GateModular) getBucketSizeHandler(w http.ResponseWriter, r *http.Reques
w.Header().Set(ContentTypeHeader, ContentTypeXMLHeaderValue)
w.Write(respBytes)
}

// processObjectsXmlResponse process the unhandled Uint id and checksum of object xml unmarshal
func processObjectsXmlResponse(respBytes []byte, objects []*types.Object) (respBytesProcessed []byte) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These code to search and replace xml content could be removed now.

modular/gater/metadata_handler_test.go Show resolved Hide resolved
store/bsdb/prefix.go Outdated Show resolved Hide resolved
modular/gater/metadata_handler.go Outdated Show resolved Hide resolved
Copy link
Contributor

@BarryTong65 BarryTong65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

gorm.io/driver/mysql v1.4.6
gorm.io/gorm v1.24.5
)

require (
google.golang.org/genproto/googleapis/api v0.0.0-20230711160842-782d3b101e98 // indirect
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge this require area.

@ruojunm ruojunm merged commit f019f7f into develop Nov 30, 2023
11 checks passed
@ruojunm ruojunm deleted the fix-slow-xml-marshal branch November 30, 2023 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants