Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about obtaining the benchmark result #7

Closed
cherishwsx opened this issue May 27, 2020 · 5 comments
Closed

Question about obtaining the benchmark result #7

cherishwsx opened this issue May 27, 2020 · 5 comments

Comments

@cherishwsx
Copy link

Thank you for all the amazing work you've done!

I successfully ran through the training and predicting process of deeplog model using the same HDFS data file that you are using (from loghub).

And I'm using Drain as my parsing tool to get the structured log data. I ended up having 48 unique event ID in the template. And I'm using around 5000 sessions for the training and the train loss and validation loss converged to 0.2 (start from 0.8) around 300+ epochs. I didn't change the default parameter setting in the deeplog.py file except for the number of classes (48 in my case).

The result that I got from prediction is shown below. It does not look as promising as the benchmark.
image

I'm not sure why but is it because of the parsing tool?

And idea or suggetions of improving the model results are welcome!!

@cherishwsx
Copy link
Author

And forgot to ask, could you breifly explain what is the num_candidates parameter for in the prediction?

Thank you!!!!

@d0ng1ee
Copy link
Owner

d0ng1ee commented May 28, 2020

It depend on your parsing tool, my benchmark result is depend on "the ground truth" number of the template(28) in dataset"
num_candidates means the label in top num_candidates is labeled as normal log.
(you need to read the deeplog paper to get a better understanding of num_candidates...)

  1. try to finetune num_candidates to get a better F1 score.
  2. try to modify your parsing code to get a result close to the Ground truth(28 templates)

@cherishwsx
Copy link
Author

Thank you so much for the suggestions! That's really helpful!

One follow up question I have is that, this may sounds a naive question, but do we always know the ground truth number of the log? And when we are using the parsing tool, we want to have the result/template as close as possible to the ground truth number we know by modifying the parsing code?

@d0ng1ee
Copy link
Owner

d0ng1ee commented May 28, 2020

In industrial applications, the constantly updated log has no definite ground truth templates, you need to continuously optimize the model based on performance indicators :)

@cherishwsx
Copy link
Author

Got it! Thank you! I don't have further question for now! :))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants