Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] File data visualizer time out on CSV file <1Mb #29821

Closed
dan-frohlich opened this issue Feb 1, 2019 · 4 comments
Closed

[ML] File data visualizer time out on CSV file <1Mb #29821

dan-frohlich opened this issue Feb 1, 2019 · 4 comments
Assignees
Labels

Comments

@dan-frohlich
Copy link

Kibana version: 6.5.4

Elasticsearch version: 6.5.4

Server OS version: OSX 10.14.2

Browser version: Chrome Version 71.0.3578.98 (Official Build) (64-bit)

Browser OS version:

Original install method (e.g. download page, yum, from source, etc.): docker

Description of the problem including expected versus actual behavior: can't load csv that is <1Mb

[timeout_exception] Aborting structure analysis during [timestamp format determination] as it has taken longer than the timeout of [25s]

Steps to reproduce:

  1. dl sample from https://s3.amazonaws.com/conventionshare.tabletop.events/schedules/gary-con-x-schedule.csv
    2.from kibana home, select [Import a CSV, NDJSON, or log file]
  2. select file

Errors in browser console (if relevant): N/A

Provide logs and/or server output (if relevant):
elasticsearch_1 | [2019-02-01T12:23:00,570][WARN ][r.suppressed ] [4MElse6] path: /_xpack/ml/find_file_structure, params: {}
elasticsearch_1 | org.elasticsearch.ElasticsearchTimeoutException: Aborting structure analysis during [timestamp format determination] as it has taken longer than the timeout of [25s]
elasticsearch_1 | at org.elasticsearch.xpack.ml.filestructurefinder.TimeoutChecker.check(TimeoutChecker.java:82) ~[x-pack-ml-6.5.4.jar:6.5.4]
elasticsearch_1 | at org.elasticsearch.xpack.ml.filestructurefinder.TimeoutChecker.grokCaptures(TimeoutChecker.java:103) ~[x-pack-ml-6.5.4.jar:6.5.4]
elasticsearch_1 | at org.elasticsearch.xpack.ml.filestructurefinder.TimestampFormatFinder.findFirstMatch(TimestampFormatFinder.java:236) ~[x-pack-ml-6.5.4.jar:6.5.4]
elasticsearch_1 | at org.elasticsearch.xpack.ml.filestructurefinder.TimestampFormatFinder.findFirstMatch(TimestampFormatFinder.java:191) ~[x-pack-ml-6.5.4.jar:6.5.4]
elasticsearch_1 | at org.elasticsearch.xpack.ml.filestructurefinder.TextLogFileStructureFinder.mostLikelyTimestamp(TextLogFileStructureFinder.java:158) ~[x-pack-ml-6.5.4.jar:6.5.4]
elasticsearch_1 | at org.elasticsearch.xpack.ml.filestructurefinder.TextLogFileStructureFinder.makeTextLogFileStructureFinder(TextLogFileStructureFinder.java:35) ~[x-pack-ml-6.5.4.jar:6.5.4]
elasticsearch_1 | at org.elasticsearch.xpack.ml.filestructurefinder.TextLogFileStructureFinderFactory.createFromSample(TextLogFileStructureFinderFactory.java:45) ~[x-pack-ml-6.5.4.jar:6.5.4]
elasticsearch_1 | at org.elasticsearch.xpack.ml.filestructurefinder.FileStructureFinderManager.makeBestStructureFinder(FileStructureFinderManager.java:278) ~[x-pack-ml-6.5.4.jar:6.5.4]
elasticsearch_1 | at org.elasticsearch.xpack.ml.filestructurefinder.FileStructureFinderManager.findFileStructure(FileStructureFinderManager.java:150) ~[x-pack-ml-6.5.4.jar:6.5.4]
elasticsearch_1 | at org.elasticsearch.xpack.ml.filestructurefinder.FileStructureFinderManager.findFileStructure(FileStructureFinderManager.java:121) ~[x-pack-ml-6.5.4.jar:6.5.4]
elasticsearch_1 | at org.elasticsearch.xpack.ml.action.TransportFindFileStructureAction.buildFileStructureResponse(TransportFindFileStructureAction.java:50) ~[x-pack-ml-6.5.4.jar:6.5.4]
elasticsearch_1 | at org.elasticsearch.xpack.ml.action.TransportFindFileStructureAction.lambda$doExecute$0(TransportFindFileStructureAction.java:39) [x-pack-ml-6.5.4.jar:6.5.4]
elasticsearch_1 | at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-6.5.4.jar:6.5.4]
elasticsearch_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
elasticsearch_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
elasticsearch_1 | at java.lang.Thread.run(Thread.java:834) [?:?]

Describe the feature:

@jgowdyelastic jgowdyelastic changed the title Error: "[timeout_exception] Aborting structure analysis" for a CSV <1Mb [ML] File data visualizer time out on CSV file <1Mb Feb 1, 2019
@jgowdyelastic jgowdyelastic self-assigned this Feb 1, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui

@droberts195
Copy link
Contributor

droberts195 commented Feb 1, 2019

The stack trace contains TextLogFileStructureFinderFactory. This means that the structure finder rejected the idea of the file being CSV. The reason is this:

Not CSV because row [2] has a different number of fields to the first row: [24] and [21]

We have to have a rule that CSV files have the same number of fields on each line because if we just looked for lines with commas we'd say this was a CSV file:

[2018-10-08T10:49:15,240][INFO ][o.e.e.NodeEnvironment    ] [node-0] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [166.4gb], net total_space [464.7gb], types [hfs]
[2018-10-08T10:49:15,244][INFO ][o.e.e.NodeEnvironment    ] [node-0] heap size [494.9mb], compressed ordinary object pointers [true]

We rule it out because the first line has 6 fields and the second only has 3 fields.

Getting that reason is currently hard. If the analysis doesn't time out then the ?explain option on the backend endpoint will tell you. But if there's a timeout then the explanation is lost. I'll open a PR to add the explanation so far in the exception when there's a timeout.

I got the reason by running this:

curl -s -H "Content-Type: application/json" -XPOST "localhost:9200/_xpack/ml/find_file_structure?pretty&explain&lines_to_sample=119" -T gary-con-x-schedule.csv

at the command line. (With 119 changed to 120 it times out. I'll do some more investigation to find out why analyzing it as a text file takes so long when line 120 is included.)

Try analyzing gary-con-x-schedule-with-blank-fields-at-line-ends.csv with the file data visualizer. I created that by simply loading your CSV file into Excel and saving it under a different name. Excel padded all the lines to contain the same number of fields. The backend endpoint certainly treats that file as CSV, so I'd expect the UI to visualize it sensibly.

@jgowdyelastic
Copy link
Member

The backend endpoint certainly treats that file as CSV, so I'd expect the UI to visualize it sensibly.

Which it does...

image

droberts195 added a commit to droberts195/elasticsearch that referenced this issue Feb 1, 2019
The explanation so far can be invaluable for troubleshooting
as incorrect decisions made early on in the structure analysis
can result in seemingly crazy decisions or timeouts later on.

Relates elastic/kibana#29821
@droberts195
Copy link
Contributor

I'll open a PR to add the explanation so far in the exception when there's a timeout.

I opened elastic/elasticsearch#38191 for this.

I'll do some more investigation to find out why analyzing it as a text file takes so long when line 120 is included.

Once the file is misclassified as semi-structured text rather than CSV the problem documented in elastic/elasticsearch#35137 occurs.

Since this isn't a UI problem I'll close this issue in the UI repo. If you want to follow along with the backend fixes please subscribe to the PR and issue in the backend repo.

droberts195 added a commit to elastic/elasticsearch that referenced this issue Feb 4, 2019
The explanation so far can be invaluable for troubleshooting
as incorrect decisions made early on in the structure analysis
can result in seemingly crazy decisions or timeouts later on.

Relates elastic/kibana#29821
droberts195 added a commit to droberts195/elasticsearch that referenced this issue Feb 4, 2019
…tic#38191)

The explanation so far can be invaluable for troubleshooting
as incorrect decisions made early on in the structure analysis
can result in seemingly crazy decisions or timeouts later on.

Relates elastic/kibana#29821
droberts195 added a commit to elastic/elasticsearch that referenced this issue Feb 4, 2019
…8336)

The explanation so far can be invaluable for troubleshooting
as incorrect decisions made early on in the structure analysis
can result in seemingly crazy decisions or timeouts later on.

Relates elastic/kibana#29821
droberts195 added a commit to elastic/elasticsearch that referenced this issue Feb 4, 2019
The explanation so far can be invaluable for troubleshooting
as incorrect decisions made early on in the structure analysis
can result in seemingly crazy decisions or timeouts later on.

Relates elastic/kibana#29821
2lambda123 pushed a commit to 2lambda123/elastic-elasticsearch that referenced this issue May 2, 2024
The explanation so far can be invaluable for troubleshooting
as incorrect decisions made early on in the structure analysis
can result in seemingly crazy decisions or timeouts later on.

Relates elastic/kibana#29821
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants