Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yaml/v2: use stack based parser. #7661

Merged
merged 50 commits into from
Aug 22, 2023
Merged

yaml/v2: use stack based parser. #7661

merged 50 commits into from
Aug 22, 2023

Conversation

pwhelan
Copy link
Contributor

@pwhelan pwhelan commented Jul 6, 2023

Summary

This PR includes a refactoring of the YAML parser to use a stack based parser instead of using a single state that is continuously modified.

This simplifies the code as well as allowing the reuse of code where appropriate, ie: properties for customs, inputs, filters, outputs, processors.

I also introduced debug output for the parser which requires a modification to fluent-bit to allow setting verbose mode when parsing command line arguments. A PR to do so is forthcoming.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.


@pwhelan
Copy link
Contributor Author

pwhelan commented Jul 6, 2023

This is a test run using the an input with a processor through a filter and then to two different outputs:

---
pipeline:
  inputs:
    - name: dummy
      dummy: '{"boo": "far"}'
      processors:
        logs:
          - name: record_modifier
            record: processed true
  filters:
    - name: record_modifier
      match: "*"
      record:
        - powered_by calyptia
        - hostname localhost
  outputs:
    - name: stdout
      format: json_lines
      match: "*"
    - name: exit
      match: "*"
      flush_count: 10
bash$ valgrind --leak-check=full ./bin/fluent-bit -c input-simple.yaml
==492741== Memcheck, a memory error detector
==492741== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==492741== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==492741== Command: ./build/bin/fluent-bit -c build/input-simple.yaml
==492741== 
Fluent Bit v2.1.7
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/07/06 14:14:36] [ info] [fluent bit] version=2.1.7, commit=fb7d4c8c9b, pid=492741
[2023/07/06 14:14:36] [ info] [storage] ver=1.2.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2023/07/06 14:14:36] [ info] [cmetrics] version=0.6.3
[2023/07/06 14:14:37] [ info] [output:stdout:stdout.0] worker #0 started
[2023/07/06 14:14:36] [ info] [ctraces ] version=0.3.1
[2023/07/06 14:14:37] [ info] [input:dummy:dummy.0] initializing
[2023/07/06 14:14:37] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2023/07/06 14:14:37] [ info] [sp] stream processor started
{"date":1688667277.249725,"boo":"far","processed":"true","powered_by":"calyptia","hostname":"localhost"}
{"date":1688667278.273059,"boo":"far","processed":"true","powered_by":"calyptia","hostname":"localhost"}
{"date":1688667279.236527,"boo":"far","processed":"true","powered_by":"calyptia","hostname":"localhost"}
{"date":1688667280.236557,"boo":"far","processed":"true","powered_by":"calyptia","hostname":"localhost"}
{"date":1688667281.253871,"boo":"far","processed":"true","powered_by":"calyptia","hostname":"localhost"}
{"date":1688667282.23624,"boo":"far","processed":"true","powered_by":"calyptia","hostname":"localhost"}
{"date":1688667283.236297,"boo":"far","processed":"true","powered_by":"calyptia","hostname":"localhost"}
{"date":1688667284.236255,"boo":"far","processed":"true","powered_by":"calyptia","hostname":"localhost"}
{"date":1688667285.236251,"boo":"far","processed":"true","powered_by":"calyptia","hostname":"localhost"}
{"date":1688667286.2378,"boo":"far","processed":"true","powered_by":"calyptia","hostname":"localhost"}
{"date":1688667287.236259,"boo":"far","processed":"true","powered_by":"calyptia","hostname":"localhost"}
[2023/07/06 14:14:47] [ warn] [engine] service will shutdown in max 5 seconds
[2023/07/06 14:14:47] [ info] [input] pausing dummy.0
[2023/07/06 14:14:48] [ info] [engine] service has stopped (0 pending tasks)
[2023/07/06 14:14:48] [ info] [input] pausing dummy.0
[2023/07/06 14:14:48] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2023/07/06 14:14:48] [ info] [output:stdout:stdout.0] thread worker #0 stopped
==492741== 
==492741== HEAP SUMMARY:
==492741==     in use at exit: 0 bytes in 0 blocks
==492741==   total heap usage: 2,974 allocs, 2,974 frees, 5,978,725 bytes allocated
==492741== 
==492741== All heap blocks were freed -- no leaks are possible
==492741== 
==492741== For lists of detected and suppressed errors, rerun with: -s
==492741== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@pwhelan
Copy link
Contributor Author

pwhelan commented Jul 12, 2023

There is a pending compile bug on windows:

D:\a\fluent-bit\fluent-bit\src\config_format\flb_cf_yaml.c(471): error C2065: 'state': undeclared identifier
D:\a\fluent-bit\fluent-bit\src\config_format\flb_cf_yaml.c(471): error C2223: left of '->file' must point to struct/union
D:\a\fluent-bit\fluent-bit\src\config_format\flb_cf_yaml.c(471): error C2198: 'read_config': too few arguments for call
D:\a\fluent-bit\fluent-bit\src\config_format\flb_cf_yaml.c(610): warning C4020: 'read_glob': too many actual parameters
D:\a\fluent-bit\fluent-bit\src\config_format\flb_cf_yaml.c(1416): warning C4047: '=': 'flb_sds_t' differs in levels of indirection from 'int'
NMAKE : fatal error U1077: '"C:\Program Files\CMake\bin\cmake.exe" -E cmake_cl_compile_depends --dep-file=CMakeFiles\fluent-bit-static.dir\config_format\flb_cf_yaml.c.obj.d --working-dir=D:\a\fluent-bit\fluent-bit\build\src --filter-prefix="Note: including file: " -- C:\PROGRA~1\MICROS~2\2022\ENTERP~1\VC\Tools\MSVC\1435~1.322\bin\HostX86\x86\cl.exe @C:\Users\RUNNER~1\AppData\Local\Temp\nm1711.tmp' : return code '0x2'

@pwhelan pwhelan force-pushed the pwhelan-yaml-state-stack branch from 7a2cd40 to 7c1dfc1 Compare July 18, 2023 18:42
@pwhelan pwhelan temporarily deployed to pr July 18, 2023 19:53 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr July 18, 2023 19:53 — with GitHub Actions Inactive
pwhelan added 15 commits August 21, 2023 11:05
Signed-off-by: Phillip Whelan <[email protected]>
Signed-off-by: Phillip Whelan <[email protected]>
Signed-off-by: Phillip Whelan <[email protected]>
Signed-off-by: Phillip Whelan <[email protected]>
@pwhelan pwhelan force-pushed the pwhelan-yaml-state-stack branch from 72c6262 to 267e230 Compare August 21, 2023 15:06
@pwhelan pwhelan temporarily deployed to pr August 21, 2023 15:06 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr August 21, 2023 15:06 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr August 21, 2023 15:06 — with GitHub Actions Inactive
@pwhelan pwhelan temporarily deployed to pr August 21, 2023 15:38 — with GitHub Actions Inactive
Copy link
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the PR looks good to me.
I added a nitpick comment.

src/config_format/flb_cf_yaml.c Show resolved Hide resolved
Copy link
Collaborator

@niedbalski niedbalski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@niedbalski niedbalski merged commit eabc1a1 into master Aug 22, 2023
@niedbalski niedbalski deleted the pwhelan-yaml-state-stack branch August 22, 2023 22:07
leonardo-albertovich pushed a commit that referenced this pull request Oct 5, 2023
* yaml/v2: move to a stack based parser.

Use a LIFO list (or stack) to save parser states. This allows code to be
reused between inputs, outputs, filters, customs and processors.

With this support for list-based properties for processors now works.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: remove unused STATE_PROCESSOR_MAP.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: use debug printing the structure/state event messages.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: fix assignment when adding multiple processors.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: move variable declarations out of switch case statements.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: wrap state_names behind state_str function.

Signed-off-by: Phillip Whelan <[email protected]>

* processors: update fallback call in flb_processor_unit_set_property.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: remove unused variables and functions.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: move variable declaration out of case statement.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: rename all instances 'i' to idx and remove local declarations.

Signed-off-by: Phillip Whelan <[email protected]>

* processors: fix memory leak in updated processor test.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: improve tests.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: add single parser file test.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: finish test for parsers_file for yaml.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: move to a stack based parser.

Use a LIFO list (or stack) to save parser states. This allows code to be
reused between inputs, outputs, filters, customs and processors.

With this support for list-based properties for processors now works.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: remove unused STATE_PROCESSOR_MAP.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: windows fixes.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: add missing parsers.conf test file.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: free flb_config in parser test.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: minor section fixes.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: refactor state_name to use a switch case.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: remove single letter variables.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: check return values.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: rewrite allocation flags as macros.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: move all type definitions to the start.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: rename get_last_included_file to state_get_last.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: check return value of snprintf in read_glob.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: fix return value for error in state_push_witharr.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: remove comment.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: adhere to code standard.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: redefine HAS_KEY and HAS_KEYVALS as macros.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: erase redefinitions.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: release local variable 'file' and not 'cfg_file'.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: check list size when getting current state.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: move initialization out of declaration.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: check return value of get_current_state.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: check return value of flb_sds_printf when concatenating filename.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: syntax fixes.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: check everything in yaml_error_event.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: check return values.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: fix bugs introduced by code deduplication.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: check return values for state_get_last.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: fix 'e' undeclared error.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: rename kv to prop.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: add missing error handling.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: fix codig violations in comments.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: rename short variables.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: check return values and fix memory leaks.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: rename file to include_dir.

Signed-off-by: Phillip Whelan <[email protected]>

* yaml/v2: add new line before if, break out else to next line.

Signed-off-by: Phillip Whelan <[email protected]>

---------

Signed-off-by: Phillip Whelan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants