-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Process delimited files like semi-structured text #56038
[ML] Process delimited files like semi-structured text #56038
Conversation
Changes the file upload functionality to process delimited files by splitting them into to messages, then sending these to the ingest pipeline as a single field for further processing in Elasticsearch. The csv_importer has been removed and the old sst_importer replaced with a similar message_importer that has been enhanced to cover the edge cases required by delimited file processing. Previously the file upload functionality parsed CSV in the browser, but by parsing CSV in the ingest pipeline it makes the Kibana file upload functionality more easily interchangable with Filebeat such that the configurations it creates can more easily be used to import data with the same structure repeatedly in production. Companion to elastic/elasticsearch#51492
Pinging @elastic/ml-ui (:ml) |
// multiline_start_pattern regex | ||
// if it does, it is a legitimate end of line and can be pushed into the list, | ||
// if not, it must be a newline char inside a field value, so keep looking. | ||
async read(text) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't seem like the method is async
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a left over from the very original csv parsing library that was used which needed to be async.
This can go, but you'll have also change the read
method in ndjson_importer.js
as well as removing the await
from line 210 of import_view.js
if (this.multilineStartRegex === null || line.match(this.multilineStartRegex) !== null) { | ||
message = message.replace(/\r$/, ''); | ||
data.push({ message }); | ||
message = ''; | ||
} else { | ||
message += '\n'; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the same as lines 39-50, might deserve a small dedicated method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine merge upstream |
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
Changes the file upload functionality to process delimited files by splitting them into to messages, then sending these to the ingest pipeline as a single field for further processing in Elasticsearch. The csv_importer has been removed and the old sst_importer replaced with a similar message_importer that has been enhanced to cover the edge cases required by delimited file processing. Previously the file upload functionality parsed CSV in the browser, but by parsing CSV in the ingest pipeline it makes the Kibana file upload functionality more easily interchangable with Filebeat such that the configurations it creates can more easily be used to import data with the same structure repeatedly in production. Companion to elastic/elasticsearch#51492
* master: (21 commits) [SIEM][Detection Engine] critical blocker updates to latest ECS version [Monitoring] Fix inaccuracies in logstash pipeline listing metrics (elastic#55868) Resetting errors and removing duplicates (elastic#56054) Add flag to opt out from sub url tracking (elastic#55672) [SIEM][Detection Engine] critical bug, fixes duplicate tags [ML] Anomaly Detection: Fix persist/restore of refreshInterval in globalState. (elastic#56113) [ML] Single Metric Viewer: Fix annnotations refresh. (elastic#56107) adapt ObjectToConfigAdapter.getFlattenedPaths to consider arrays as final values (elastic#56105) Add Appender.receiveAllLevels option to fix LegacyAppender (elastic#55752) [ML] Process delimited files like semi-structured text (elastic#56038) Charts plugin (combining ui/color_maps and EuiUtils) (elastic#55469) fix tutorial documentation (elastic#55996) [ML] Fix persist/restore of time/refreshInterval in data visualizer. (elastic#56122) [Index Management] Fix errors with validation (elastic#56072) [Index Management] Add try/catch when parsing index filter from URI (elastic#56051) [NP] add HTTP resources testing strategies (elastic#54908) [ML] Single Metric Viewer: Fix brush update on short recent timespans. (elastic#56125) [Uptime] Add timeout for slow process to skipped functional tests (elastic#56065) refactor (elastic#56121) Move tests in dashboard into appropriate folders (elastic#55304) ...
Summary
Changes the file upload functionality to process delimited
files by splitting them into to messages, then sending
these to the ingest pipeline as a single field for further
processing in Elasticsearch.
The csv_importer has been removed and the old sst_importer
replaced with a similar message_importer that has been
enhanced to cover the edge cases required by delimited
file processing.
Previously the file upload functionality parsed CSV in the
browser, but by parsing CSV in the ingest pipeline it
makes the Kibana file upload functionality more easily
interchangable with Filebeat such that the configurations
it creates can more easily be used to import data with the
same structure repeatedly in production.
Companion to elastic/elasticsearch#51492
Checklist
Use
strikethroughsto remove checklist items you don't feel are applicable to this PR.- [ ] This was checked for cross-browser compatibility, including a check against IE11- [ ] Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support- [ ] Documentation was added for features that require explanation or tutorials- [ ] Unit or functional tests were updated or added to match the most common scenarios- [ ] This was checked for keyboard-only and screenreader accessibilityFor maintainers