Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Draft PR Support Edit Values for JSON operator #1132 #761

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion pkg/component/operator/json/v0/config/definition.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
"availableTasks": [
"TASK_MARSHAL",
"TASK_UNMARSHAL",
"TASK_JQ"
"TASK_JQ",
"TASK_EDIT_VALUES"
],
"custom": false,
"documentationUrl": "https://www.instill.tech/docs/component/operator/json",
Expand All @@ -19,3 +20,4 @@
"description": "Manipulate and convert JSON entities",
"releaseStage": "RELEASE_STAGE_ALPHA"
}

42 changes: 42 additions & 0 deletions pkg/component/operator/json/v0/config/tasks.json
Original file line number Diff line number Diff line change
Expand Up @@ -198,5 +198,47 @@
"title": "Output",
"type": "object"
}
},
"TASK_EDIT_VALUES": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I found I put the wrong JSON schema in the issue.
Could you fetch this JSON schema?

So, it means your Golang reader will change the key name.
e.g. conflictResolution -> conflict-resolution

"instillShortDescription": "Edit JSON values.",
"input": {
"data":{
"type":"object",
"description":"Original data, which can be a JSON object or array of objects."
},
"updates":{
"type":"array",
"description": "An array of objects specifing the value to be updated.",
"items":{
"type":"object",
"properties":{
"field":{
"type":"string",
"description":"The field in the original data whose value needs to be updated, supports nested paths if \"supportDotNotaion\" is true."
},
"newValue":{
"description": "The new value that will replace the current value at specific fields."
}
}
},
"supportDotNotaion":{
"type":"boolean",
"default":"true",
"description":"Determines whether to interpret the field as paths using dot notation. If false, field is treated as a literal key."
},
"conflictResolution":{
"type":"string",
"enum":["create","skip","error"],
"default": "skip",
"description": "Defines hot to handle cases where the field does not exist in the data."
}
}
},
"output": {
"data":{
"type": "object",
"description": "The modified data with the specified data updated."
}
}
}
}
171 changes: 171 additions & 0 deletions pkg/component/operator/json/v0/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,17 @@ import (

"github.com/instill-ai/pipeline-backend/pkg/component/base"
"github.com/instill-ai/x/errmsg"

"strconv"
"strings"
"reflect"
)

const (
taskMarshal = "TASK_MARSHAL"
taskUnmarshal = "TASK_UNMARSHAL"
taskJQ = "TASK_JQ"
taskEditValues= "TASK_EDIT_VALUES"
)

var (
Expand Down Expand Up @@ -67,6 +72,8 @@ func (c *component) CreateExecution(x base.ComponentExecution) (base.IExecution,
e.execute = e.unmarshal
case taskJQ:
e.execute = e.jq
case taskEditValues:
e.execute = e.updateJson
default:
return nil, errmsg.AddMessage(
fmt.Errorf("not supported task: %s", x.Task),
Expand Down Expand Up @@ -155,3 +162,167 @@ func (e *execution) jq(in *structpb.Struct) (*structpb.Struct, error) {
func (e *execution) Execute(ctx context.Context, jobs []*base.Job) error {
return base.SequentialExecutor(ctx, jobs, e.execute)
}

func (e *execution) updateJson(in *structpb.Struct) (*structpb.Struct, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateJson

There is another function called TASK_RENAME_FIELDS in the near future.
So, let's make this function name specific. How about just editValue?

data := in.Fields["data"].AsInterface()
updates := in.Fields["updates"].GetListValue().AsSlice()
conflictResolution := in.Fields["conflictResolution"].GetStringValue()
supportDotNotation := in.Fields["supportDotNotation"].GetBoolValue()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other tasks in JSON, there are only 1 or 2 fields, so they don't use Struct to convert data from schema.

How about we make it as struct? It can convert data easily.
You can take this as sample.

	data := in.Fields["data"].AsInterface()
	updates := in.Fields["updates"].GetListValue().AsSlice()
	conflictResolution := in.Fields["conflictResolution"].GetStringValue()
	supportDotNotation := in.Fields["supportDotNotation"].GetBoolValue()


// Perform deep copy of the data before updates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think good comments are for

  • Exported document
  • Reason why the code is like this
  • If it is hard to understand, add what the code does

The code explains itself well, so I'd remove the comments here.
Same as other places.

updatedData := deepCopy(data)

switch data := updatedData.(type) {
case []interface{}:
// Process each object in the array
for i, obj := range data {
if err := applyUpdatesToObject(obj, updates, supportDotNotation, conflictResolution); err != nil {
msg := fmt.Sprintf("Error in object %d: %v\n", i, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg := fmt.Sprintf("Error in object %d: %v\n", i, err)
msg := fmt.Sprintf("apply updates to object %d: %v\n", i, err)

Now, we follow the pattern of describing what it does but not error statement when error handling.

return nil, errmsg.AddMessage(err, msg)
}
}
case map[string]interface{}:
// Process the single object
if err := applyUpdatesToObject(data, updates, supportDotNotation, conflictResolution); err != nil {
msg := fmt.Sprintf("Error in single object: %v\n", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg := fmt.Sprintf("Error in single object: %v\n", err)
msg := fmt.Sprintf("apply update to single object: %v\n", err)

return nil, errmsg.AddMessage(err, msg)
}
default:
msg := fmt.Sprintf("Invalid data format")
return nil, errmsg.AddMessage(fmt.Errorf("Error "),msg)
}
output := map[string]interface{}{
"data": updatedData,
}
outputStruct, err := structpb.NewStruct(output)
if err != nil {
msg := fmt.Sprintf("Failed to convert output to structpb.Struct:", err)
return nil, errmsg.AddMessage(err, msg)
}
return outputStruct,nil
}

func applyUpdatesToObject(data interface{}, updates []interface{}, supportDotNotation bool, conflictResolution string) error {
for _, update := range updates {
updateMap := update.(map[string]interface{})
field := updateMap["field"].(string)
newValue := updateMap["newValue"]

err := applyUpdate(data, field, newValue, supportDotNotation, conflictResolution)
if err != nil {
// Handle the "error" conflictResolution case by stopping and returning the error
if conflictResolution == "error" {
return err
}
// Continue for other conflictResolution cases
}
}
return nil
}

func applyUpdate(data interface{}, field string, newValue interface{}, supportDotNotation bool, conflictResolution string) error {
var fieldParts []string
if supportDotNotation {
fieldParts = strings.Split(field, ".")
} else {
fieldParts = []string{field}
}

current := data
for i, part := range fieldParts {
if i == len(fieldParts)-1 {
// We're at the final part of the path, apply the update

existingValue, fieldExisting := getFieldValue(current, part)

// Check if the field exists and compare types
if fieldExisting {
if !(reflect.TypeOf(existingValue)==reflect.TypeOf(newValue)){
return fmt.Errorf("type mismatch: existing field '%s' has type '%T' but got value of type '%T'", part, existingValue, newValue)
}
}
switch conflictResolution {
case "create":
// Create the field if it doesn't exist
setFieldValue(current, part, newValue)
case "skip":
// Skip if the field doesn't exist
if !fieldExists(current, part) {
return nil
}
setFieldValue(current, part, newValue)
case "error":
// Return an error if the field doesn't exist
if !fieldExists(current, part) {
return fmt.Errorf("Field '%s' does not exist", part)
}
setFieldValue(current, part, newValue)
}
} else {
// Traverse to the next part of the path
if next, ok := getFieldValue(current, part); ok {
current = next
} else {
// Field doesn't exist and we're not at the final part
if conflictResolution == "create" {
newMap := make(map[string]interface{})
setFieldValue(current, part, newMap)
current = newMap
} else {
return fmt.Errorf("Field '%s' does not exist", part)
}
}
}
}
return nil
}

func setFieldValue(data interface{}, field string, value interface{}) {
// Update the field in the map or array (handle different data structures)
switch data := data.(type) {
case map[string]interface{}:
data[field] = value
case []interface{}:
idx, _ := strconv.Atoi(field)
if idx >= 0 && idx < len(data) {
data[idx] = value
}
}
}

func getFieldValue(data interface{}, field string) (interface{}, bool) {
// Retrieve the field value from the map or array
switch data := data.(type) {
case map[string]interface{}:
val, ok := data[field]
return val, ok
case []interface{}:
idx, err := strconv.Atoi(field)
if err != nil || idx < 0 || idx >= len(data) {
return nil, false
}
return data[idx], true
}
return nil, false
}

func fieldExists(data interface{}, field string) bool {
// Check if the field exists in the map or array
switch data := data.(type) {
case map[string]interface{}:
_, ok := data[field]
return ok
case []interface{}:
idx, err := strconv.Atoi(field)
return err == nil && idx >= 0 && idx < len(data)
}
return false
}

func deepCopy(data interface{}) interface{} {
// Deep copy the data structure to avoid modifying the original input
b, _ := json.Marshal(data)
var copiedData interface{}
json.Unmarshal(b, &copiedData)
return copiedData
}
Loading