Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative implementation of XML parser #8121

Closed
wants to merge 13 commits into from

Conversation

M0rdecay
Copy link
Contributor

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

@ssoroka, @srebhan, please take a look.

This is a slightly different implementation of the parser from PR #7460.
This version is closer to the json parser.

This parser, on the one hand, is less flexible than other proposed solutions, on the other hand, due to naming similar to that used in the json parser, it does not lose data if the names of nodes or attributes are repeated.

I would like to know your opinion.

@M0rdecay
Copy link
Contributor Author

I think I would like to combine functionality from both.
The ability to add tags and fields from an arbitrary location seems useful.

Of course, if you approve of it.

@M0rdecay
Copy link
Contributor Author

It looks like the test timed out because it didn’t return any information for 10 minutes.
It seems it shouldn't be related to the last commit - it only changed the README

@M0rdecay M0rdecay mentioned this pull request Sep 14, 2020
3 tasks
@M0rdecay
Copy link
Contributor Author

Friends, @ssoroka, @reimda, anyone? :(
Please...

As I wrote earlier, now, to work with XML, I have made a wrapper for the parser and work with it as with an external processor, but not everything is as good with it as I would like.
When processing large documents (~7 thousand lines), receiving data back from the processor fails with an error:

E! [processors.execd] Error reading stdout: bufio.Scanner: token too long

The reason is clear, but the point is that we have to work with such large metrics.
If we use the parser inside a telegraf, everything is fine.

We are looking forward to the news.

@M0rdecay
Copy link
Contributor Author

M0rdecay commented Sep 15, 2020

Taking this opportunity, I want to share a solution that we use to break down a metric that contains the result of parsing an array in an array into separate metrics:

measurement,tag=1115 SOME_VALUE=55i,SOME_DATA=66i 1600178359000000000
measurement,tag=1337 NODE_0_SOME_VALUE=11i,NODE_0_SOME_DATA=33i,NODE_1_SOME_VALUE=22i,NODE_1_SOME_DATA=44i 1600178359000000000
measurement,tag=1226 NODE_0_SOME_VALUE=11i,NODE_0_SOME_DATA=33i,NODE_1_SOME_VALUE=22i,NODE_1_SOME_DATA=44i 1600178359000000000
[[processors.strings]]
  order = 1
  namepass = [ "measurement" ]
  [[processors.strings.trim_prefix]]
    field_key = "*"
    prefix = "NODE_"
  [[processors.strings.trim_prefix]]
    tag_key = "*"
    prefix = "NODE_"

[[processors.starlark]]
  order = 4
  namepass = [ "measurement" ]
  source = '''
def apply(metric):
    ids = []

    for v in metric.fields.keys():
        id = v.split("_")[0]
        if id.isdigit() and id not in ids:
            ids.append(id)

    if len(ids) > 0:
        metrics = []
        for id in ids:
            m = deepcopy(metric)

            for k, v in m.fields.items():
                if k.startswith("%s_" % (id)):
                    new_field = k.replace("%s_" % (id), "")
                    m.fields[new_field] = v

                m.fields.pop(k, None)

            metrics.append(m)
        return metrics
    else:
        return metric
'''
measurement,tag=1115 SOME_VALUE=55i,SOME_DATA=66i 1600178359000000000
measurement,tag=1337 SOME_VALUE=11i,SOME_DATA=33i 1600178359000000000
measurement,tag=1226 SOME_VALUE=22i,SOME_DATA=44i 1600178359000000000
measurement,tag=1337 SOME_VALUE=22i,SOME_DATA=44i 1600178359000000000
measurement,tag=1226 SOME_VALUE=11i,SOME_DATA=33i 1600178359000000000

@M0rdecay M0rdecay mentioned this pull request Sep 17, 2020
3 tasks
@M0rdecay M0rdecay mentioned this pull request Oct 23, 2020
@M0rdecay
Copy link
Contributor Author

Linked - #6968

@sjwang90
Copy link
Contributor

XML Input Issue #1758

@sjwang90 sjwang90 mentioned this pull request Oct 23, 2020
@ssoroka
Copy link
Contributor

ssoroka commented Nov 16, 2020

@M0rdecay
the long buffer line issue has since been resolved.
I'm going to close this in favor of #8047, which I think is going to be a more flexible approach. Are there any features from here that you'd like to see implemented there, as well?

@ssoroka ssoroka closed this Nov 16, 2020
@M0rdecay
Copy link
Contributor Author

M0rdecay commented Nov 16, 2020

@ssoroka ah, never mind. I'll adjust to the accepted implementation.
At least in case of problems, i can open issues.
Hopefully the PR will be merged before the 1.17 release.

@ssoroka
Copy link
Contributor

ssoroka commented Nov 16, 2020

@M0rdecay we're going to work on getting it merged soon. I do want to give you a chance to provide your feedback on the other PR. If there's anything you would like to see added, please add a comment to the other PR. feel free to do a full review and bring up any cases you want to see supported right away.

Thanks for your work here! I appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants