Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logs Explorer] Indicate data stream activity in dataset selector #171394

Closed
weltenwort opened this issue Nov 16, 2023 · 14 comments
Closed

[Logs Explorer] Indicate data stream activity in dataset selector #171394

weltenwort opened this issue Nov 16, 2023 · 14 comments
Labels
needs design Team:obs-ux-logs Observability Logs User Experience Team

Comments

@weltenwort
Copy link
Member

📓 Summary

Upon installation integrations install all their datasets even if only a subset of them will be populated by shippers. In the dataset selector we want to indicate to the user whether the underlying data stream has any recent data so they are less likely to visit empty datasets.

✔️ Acceptance criteria

  • The dataset selector annotates each dataset entry with the information whether the corresponding data stream has data. This might even be in the form of some kind of "recency indicator" like "never" or "3 minutes ago".
  • The performance of the dataset selector is not significantly lower than before. This might also be achieved by loading the information asynchronously.

🎨 Mock-ups

🚧 TODO

💡 Implementation hints

  • The data stream stats API provides storage size and recent timestamp information. The performance characteristics of this are unknown, though.
@weltenwort weltenwort changed the title Indicate data stream activity in dataset selector [Logs Explorer] Indicate data stream activity in dataset selector Nov 16, 2023
@weltenwort weltenwort added the Team:obs-ux-logs Observability Logs User Experience Team label Nov 16, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)

@ruflin
Copy link
Contributor

ruflin commented Nov 24, 2023

Currently the dataset selection under integrations also shows dataset for which a data stream not even exists. We should grey these out or remove completely.

@tonyghiani
Copy link
Contributor

The data stream stats API provides storage size and recent timestamp information. The performance characteristics of this are unknown, though.

@weltenwort Do you think we could keep a better performance to parallelize a query on the installed packages endpoint and associate this status to each dataset directly there? It's mostly to avoid additional back and forth between server and client and to avoid flashing UI changes when the user uses the selector and the dataset status is not resolved yet.

Just thinking out loud, I didn't check its performance, and it might be worse than I expected.

@weltenwort
Copy link
Member Author

Let's think about it this way: We can't do much about the performance of the stats query itself, but we can influence whether it's on the critical path of the page load or not:

  • If the performance is not a problem, then putting it into the "installed packages" API is nice for simplicity.
  • If the performance is poor we should probably load the stats asynchronously so the dataset selector is already usable even if the activity annotation is not shown yet.

@tonyghiani
Copy link
Contributor

Let's think about it this way: We can't do much about the performance of the stats query itself, but we can influence whether it's on the critical path of the page load or not:

  • If the performance is not a problem, then putting it into the "installed packages" API is nice for simplicity.
  • If the performance is poor we should probably load the stats asynchronously so the dataset selector is already usable even if the activity annotation is not shown yet.

Agree with everything, as additional context to consider when measuring the performance impact of this, let's keep in mind the installed packages are fetched in the background (the first page of 15 integrations) even if the DatasetSelector is not opened 👌

@isaclfreire
Copy link

Here in this Figma file you can find some initial explorations and questions I have regarding this issue.

Screenshot 2024-01-12 at 12 59 08

@achyutjhunjhunwala
Copy link
Contributor

achyutjhunjhunwala commented Jan 13, 2024

As a SRE, what benefit do i get from knowing the count of log documents ? if its only to let user know that there is some data present, we could use some other visual indicator as # of documents could be overwhelming.

The idea to load the Last Activity on Popover could solve performance issue as this can be loaded on demand in a popover for each dataset.

@isaclfreire
Copy link

Hi @achyutjhunjhunwala thanks for the input! That's a good question. We still have many assumptions to look into and a user testing session will definitely help with that.

For the docs count, I referred to this issue requirements. It says:

The number of docs is also an indicator to the user, how "active" a dataset is.

Regarding the # of docs, I wonder if users remember the last count and, therefore, are able to understand that it increased or decreased. That's why I ask if we can track if there are new data coming in from user's last session, but it feels ultimately very unreliable to me. I added a green dot as a marker of new activity, but I'm not sure it will be enough.

(I still haven't addressed with the designs what happens when the count is 0, so bear in mind this is all up for discussion 👍)

The idea to load the Last Activity on Popover could solve performance issue as this can be loaded on demand in a popover for each dataset.

Yep, that's what I thought to try not to slow the performance. The question that remains is in what time frame this will be updated. Every x minutes, every x seconds...?

@ruflin
Copy link
Contributor

ruflin commented Jan 15, 2024

Another idea triggered by the comment from @achyutjhunjhunwala : What if we could show a trendline? It would solve multiple problems at once. It would show if there is recent activity (and how much), it shows if there is activity at all and if activity is different from other dataset / integrations. Unfortunately the trendline could be expensive to compute ...

@achyutjhunjhunwala
Copy link
Contributor

achyutjhunjhunwala commented Jan 15, 2024

A date_histogram can give us this trendline, but yes this could get expensive for each dataset. We can limit this to last 24 hours and will have to run a test against a beefy cluster to see how it behaves

{
"aggs": {
  "Group By Hour": {
     "date_histogram": {
        "field": "@timestamp",
        "interval": "hour",
        "format" : "k"
        }
     }
  }
}

Yep, that's what I thought to try not to slow the performance. The question that remains is in what time frame this will be updated. Every x minutes, every x seconds...?

@isaclfreire My idea was real time, When the user clicks on the 3 dots, we fire the query for that dataset.
If we are firing only 1 (stats) query for any dataset, this should not be expensive

@weltenwort
Copy link
Member Author

The "last activity" is an appealing option because it's part of the data stream stats API and therefore cheap to fetch. IMHO in order for it to be really useful it would have to be visible right in the dataset list. If I have to click around to see it in a context menu I might just as well select it.

@achyutjhunjhunwala
Copy link
Contributor

In that case we can replace document count with last activity

@isaclfreire
Copy link

I have started some UX explorations in this Figma file, feel free to comment.

@isaclfreire isaclfreire removed their assignment Feb 2, 2024
@gbamparop
Copy link
Contributor

There's another implementation issue, can this one be closed?

@gbamparop gbamparop closed this as not planned Won't fix, can't repro, duplicate, stale Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs design Team:obs-ux-logs Observability Logs User Experience Team
Projects
None yet
Development

No branches or pull requests

7 participants