Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Agent Policy is broken when JSON contains numeric string names #4200

Open
aleksmaus opened this issue Feb 6, 2024 · 12 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@aleksmaus
Copy link
Member

aleksmaus commented Feb 6, 2024

Tested with the latest Agent 8.13.
This is first uncovered with osquery when the users tried to use numbers as query id
elastic/kibana#175421

The incoming policy is correct.
Narrowed it down to to c, err := config.NewConfigFrom(action.Policy) in PolicyChangeHandler
https://github.com/elastic/elastic-agent/blob/main/internal/pkg/agent/application/actions/handlers/handler_action_policy_change.go#L101

Repro code, for example, with JSON extracted from the policy

func TestNewConfigFrom(t *testing.T) {
	var m map[string]interface{}
	err := json.Unmarshal([]byte(testPolicy), &m)
	if err != nil {
		t.Fatal(err)
	}

	cfg, err := NewConfigFrom(m)
	if err != nil {
		t.Fatal(err)
	}
	m, err = cfg.ToMapStr()
	if err != nil {
		t.Fatal(err)
	}
	s, err := json.Marshal(m)
	_ = err
	fmt.Printf("RESULT:%s\n", string(s))
}

const testPolicy = `
{
    "inputs": [
        {
            "osquery": {
                "packs": {
                    "1233": {
                        "queries": {
                            "test": {
                                "interval": 60,
                                "query": "select * from uptime",
                                "removed": false,
                                "snapshot": true,
                                "timeout": 60
                            }
                        }
                    },
                    "test": {
                        "queries": {
                            "12312": {
                                "interval": 60,
                                "query": "select * from uptime",
                                "timeout": 60
                            }
                        },
                        "shard": 100
                    }
                }
            }
        }
    ]
}
`

Result (truncated) looks like this:

{
    "inputs": [
        {
            "osquery": {
                "packs": {
                    "0": null,
                    "1": null,
                    "10": null,
                    "100": null,
                    "1000": null,
                    "1001": null,
                    ...........................
                    "1225": null,
                    "1226": null,
                    "1227": null,
                    "1228": null,
                    "1229": null,
                    "123": null,
                    "1230": null,
                    "1231": null,
                    "1232": null,
                    "1233": {
                        "queries": {
                            "test": {
                                "interval": 60,
                                "query": "select * from uptime",
                                "removed": false,
                                "snapshot": true,
                                "timeout": 60
                            }
                        }
                    },
                    "124": null,
                    "125": null,
                    "994": null,
                    .....
                    "995": null,
                    "996": null,
                    "997": null,
                    "998": null,
                    "999": null,
                    "test": {
                        "queries": [
                            null,
                            null,
                            null,
                            null,
                            null,
                            .....................
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            null,
                            {
                                "interval": 60,
                                "query": "select * from uptime",
                                "timeout": 60
                            }
                        ],
                        "shard": 100
                    }
                }
            }
        }
    ]
}

Full result is attached, since it doesn't fit into description
result.json

@aleksmaus aleksmaus added the bug Something isn't working label Feb 6, 2024
@aleksmaus
Copy link
Member Author

Assigning this to myself for now. Will dig some more today.

@aleksmaus aleksmaus self-assigned this Feb 6, 2024
@aleksmaus
Copy link
Member Author

Changing "numeric" strings to "text" in the input config eliminates the issue:

Input:

{
    "inputs": [
        {
            "osquery": {
                "packs": {
                    "a1233": {
                        "queries": {
                            "test": {
                                "interval": 60,
                                "query": "select * from uptime",
                                "removed": false,
                                "snapshot": true,
                                "timeout": 60
                            }
                        }
                    },
                    "test": {
                        "queries": {
                            "a12312": {
                                "interval": 60,
                                "query": "select * from uptime",
                                "timeout": 60
                            }
                        },
                        "shard": 100
                    }
                }
            }
        }
    ]
}

Output:

{
    "inputs": [
        {
            "osquery": {
                "packs": {
                    "a1233": {
                        "queries": {
                            "test": {
                                "interval": 60,
                                "query": "select * from uptime",
                                "removed": false,
                                "snapshot": true,
                                "timeout": 60
                            }
                        }
                    },
                    "test": {
                        "queries": {
                            "a12312": {
                                "interval": 60,
                                "query": "select * from uptime",
                                "timeout": 60
                            }
                        },
                        "shard": 100
                    }
                }
            }
        }
    ]
}

@tomsonpl
Copy link

tomsonpl commented Feb 6, 2024

Thank you @aleksmaus 🙇

@aleksmaus
Copy link
Member Author

Abbreviated repro case with go-ucfg

func TestConfigBug(t *testing.T) {
	var m map[string]interface{}
	_ = json.Unmarshal([]byte(testConfig), &m)

	cfg, _ := ucfg.NewFrom(m)

	c := newConfigFrom(cfg)

	m, _ = c.ToMapStr()
	s, _ := json.Marshal(m)
	fmt.Printf("RESULT:%s\n", string(s))
}

const testConfig = `{"a":{"12":{}}}`

Output (the length of the array is 13 == 12+1):

RESULT:{"a":[null,null,null,null,null,null,null,null,null,null,null,null,null]}

Not super familiar with go-ucfg, but the field name is parsed into number here
https://github.com/elastic/go-ucfg/blob/main/path.go#L79

and here is where the array idx+1 is created
https://github.com/elastic/go-ucfg/blob/main/ucfg.go#L295

@aleksmaus
Copy link
Member Author

aleksmaus commented Feb 6, 2024

Another unrelated caveat

func TestConfigBug(t *testing.T) {
	var m map[string]interface{}
	_ = json.Unmarshal([]byte(testConfig), &m)

	cfg, _ := ucfg.NewFrom(m)

	c := newConfigFrom(cfg)

	m, _ = c.ToMapStr()
	s, _ := json.Marshal(m)
	fmt.Printf("RESULT:%s\n", string(s))
}

const testConfig = `{"a":{"b12":{}}}`

Results in the output:

RESULT:{"a":{"b12":null}}

Which is not quite correct, since the it was "b12":{} (empty object) in the input document.

@aleksmaus
Copy link
Member Author

the worse case

INPUT:

{"a":{"9223372036854775807":{}}}

RESULT:

panic: runtime error: makeslice: len out of range [recovered]
	panic: runtime error: makeslice: len out of range

INPUT:

{"a":{"9223372036854":{}}}

RESULT:

runtime: out of memory: cannot allocate 147573956411392-byte block (3833856 in use)
fatal error: out of memory

@aleksmaus
Copy link
Member Author

Made some changed in go-ucfg lib in order to address the issues above, PR is open for review.

@aleksmaus
Copy link
Member Author

aleksmaus commented Feb 8, 2024

Testing the change further with the agent and beats. Another place where it breaks is on the beats configuration parsing side: after the agent picked up the "workaround" and sends the correct configuration to the beats, the beats configuration parsing code breaks until go-ucfg and elastic-agent-libs changes are picked up.

Beats generates the config here:
https://github.com/elastic/beats/blob/main/x-pack/libbeat/management/generate.go#L156

using elastic-agent-libs conf.NewConfigFrom
https://github.com/elastic/elastic-agent-libs/blob/473983911d7c78e57bf30af91f66f5bca50d0a59/config/config.go#L81
where the config options are hardcoded
https://github.com/elastic/elastic-agent-libs/blob/473983911d7c78e57bf30af91f66f5bca50d0a59/config/config.go#L44

So it looks like the final scope of the update, if we start with go-ucfg, would include:

  1. go-ucfg
  2. elastic-agent
  3. elastic-agent-libs (config)
  4. beats (would need recompile with the latest go-ucfg and elastic-agent-libs)

@aleksmaus
Copy link
Member Author

After applying the change to elastic-agent-libs to call the newly added ucfg.EnableNumKeys(true) workaround API on go-ucfg, and recompiling the agent and beats with these changes, confirmed OS query is working with numeric query ID.

The incoming configuration is correct:
Screenshot 2024-02-08 at 2 20 11 PM

The results from the scheduled query are posted as expected:
Screenshot 2024-02-08 at 2 19 55 PM

Another possibly faster fix for osquery specific issue with much smaller change surface it to restrict users from assigning the numeric strings to the query ids. @tomsonpl opened a draft PR here elastic/kibana#176507

Still, any place that allows the numeric keys to be inserted into the policy will introduce this kind of issue. Another possible place for example is the oquery advanced configuration section that allows users to specify the free form JSON configuration for osquery.

Another possible approach is to rewrite how incoming map[string]interface{} is parsed into Config so we don't have to work around/fix the current ucfg implementation and possibly break some other users who count on the current behavior, or maybe ditch the ucfg Config altogether. Need some feedback from the agent/beats team here on all the possible consequences of making this change.

@cmacknz @blakerouse @andrewkroh

@cmacknz
Copy link
Member

cmacknz commented Feb 12, 2024

ucfg is still used widely enough in our own code (both beats and agent) that fixing the problem there feels correct. Getting rid of uses of go-ucfg in one place doesn't help the other uses of it.

I can't imagine anyone is depending on this currently completely broken behavior.

@aleksmaus
Copy link
Member Author

PR against go-ucfg is open elastic/go-ucfg#198. I don't have permissions to set reviewers or any labels there though.
I noticed @fearful-symmetry also had another PR opened a bit earlier that fixes/"works around" another oddity in that same library. The suggestion was to bring this up during the next agent meeting.

@aleksmaus
Copy link
Member Author

The PR is merged. Need to update the Agent/beats/libs to pick up the version with this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants