-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grok Processor does not support non-(a-zA-Z_) field characters for field names #21745
Comments
Maybe something like this would do the job
accept all non cc/ @ruflin |
@talevy ALL unicode characters? Including vertical tab, non-breaking space, newline, combining characters? Maybe you want to use unicode properties instead, and include allowed punctuation? |
@clintongormley maybe not all :) does Elasticsearch have a specific allowed set of field name character? ideally it should follow that as much as it can, right? mainly used as an example to remind me what to change. I am still in the process of investigating how to properly declare these specific unicode character types within Joni |
No we don't :( You can quite happily create a field named |
OK, then I guess we should allow that here as well. looking at Joni, I think I found a solution. called the and
another that is similar is called
so the change could be:
running locally, and it looks pretty good, not sure we want to match all of the types of unicode characters as is described by the test handles above ^^ I'll start a PR and incorporate these different character tests on our end as well |
Pinging @elastic/es-core-infra |
* master: Do not check for object existence when deleting repository index files (#31680) Remove extra check for object existence in repository-gcs read object (#31661) Support multiple system store types (#31650) [Test] Clean up some repository-s3 tests (#31601) [Docs] Use capital letters in section headings (#31678) [DOCS] Add PQL language Plugin (#31237) Merge AzureStorageService and AzureStorageServiceImpl and clean up tests (#31607) TEST: Fix test task invocation (#31657) Revert "[TEST] Mute failing tests in NativeRealmInteg and ReservedRealmInteg" Fix RealmInteg test failures Extend allowed characters for grok field names (#21745) (#31653) [DOCS] Fix licensing API details (#31667) [TEST] Mute failing tests in NativeRealmInteg and ReservedRealmInteg Fix CreateSnapshotRequestTests Failure (#31630) Configurable password hashing algorithm/cost (#31234) [TEST] Mute failing NamingConventionsTaskIT tests [DOCS] Replace CONFIG_DIR with ES_PATH_CONF (#31635) Core: Require all actions have a Task (#31627)
* 6.x: Fix not waiting for Netty ThreadDeathWatcher in IT (#31758) (#31789) [Docs] Correct default window_size (#31582) S3 fixture should report 404 on unknown bucket (#31782) [ML] Limit ML filter items to 10K (#31731) Fixture for Minio testing (#31688) [ML] Return statistics about forecasts as part of the jobsstats and usage API (#31647) [DOCS] Add missing get mappings docs to HLRC (#31765) [DOCS] Starting Elasticsearch (#31701) Fix coerce validation_method in GeoBoundingBoxQueryBuilder (#31747) Painless: Complete Removal of Painless Type (#31699) Consolidate watcher setting update registration (#31762) [DOCS] Adds empty 6.3.1 release notes page ingest: Introduction of a bytes processor (#31733) [test] don't run bats tests for suse boxes (#31749) Add analyze API to high-level rest client (#31577) Implemented XContent serialisation for GetIndexResponse (#31675) [DOCS] Typos DOC: Add examples to the SQL docs (#31633) Add support for AWS session tokens (#30414) Watcher: Reenable start/stop yaml tests (#31754) JDBC: Fix stackoverflow on getObject and timestamp conversion (#31735) Support multiple system store types (#31650) Add write*Blob option to replace existing blob (#31729) Split CircuitBreaker-related tests (#31659) Painless: Add Context Docs (#31190) Docs: Remove missing reference Migrate scripted metric aggregation scripts to ScriptContext design (#30111) Watcher: Fix chain input toXcontent serialization (#31721) Remove _all example (#31711) rest-high-level: added get cluster settings (#31706) Docs: Match the examples in the description (#31710) [Docs] Correct typos (#31720) Extend allowed characters for grok field names (#21745) (#31653) (#31722) [DOCS] Check for Windows and *nix file paths (#31648) [ML] Validate ML filter_id (#31535) Fix gradle4.8 deprecation warnings (#31654) Update numbers to reflect 4-byte UTF-8-encoded characters (#27083)
With this commit we implement a workaround in the mapping of the http_logs track when we use the grok processor. Due to elastic/elasticsearch#21745 the grok processor fails when a field name contains a non-alpha character. Therefore, we replace `@timestamp` with `timestamp` in the mapping and the grok pattern for these cases. Relates elastic/elasticsearch#21745
example Ingest pipeline that fails:
@
field name.exception:
The Grok Parser in Ingest requires that field names match
a-zA-Z_
, this should be expanded to support all unicode characters.Must update the regex here to do so: https://github.com/talevy/elasticsearch/blob/82f7bfad98253e94305136df481cd1c7dc4e8ca8/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/Grok.java#L47-L47
might be a relevant Issue in Joni: jruby/joni#13
The text was updated successfully, but these errors were encountered: