Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better defaults for new users and other feedback #299

Open
davkean opened this issue Jul 4, 2017 · 10 comments
Open

Better defaults for new users and other feedback #299

davkean opened this issue Jul 4, 2017 · 10 comments

Comments

@davkean
Copy link
Member

davkean commented Jul 4, 2017

The past 2 weeks I've been using PerfView fairly significantly and I found it took me over a week of real usage to figure out how to better utilize it. I know that @CyrusNajmabadi and @sharwell has individual feedback, but the following is mine. I'll open bugs for those that you think should be tracked via individual items.

  • Provide a 64-bit executable. My very first trace I opened OOM'd. I found that to be a common occurrence for the first few days, until I just ran corflags the binary and forced it to run 64-bit.
  • Optimized workflow for capturing a single executable.
    • Almost all my traces were capturing on an existing process - it would be great to have a "Attach to Process"-like experience similar to the debugger for just listening to a single process to reduce trace size.
  • Check "No v3.X NGEN symbols" by default. Often ran into this - forgetting to check this box to improve the throughput of making a change, running a trace, inspecting a trace.
  • Turn off merging by default, or have a better default experience for capturing a trace for "some else" vs capturing a trace for "me" on my machine.
    • You can cancel this - but annoyingly this sometimes errors and also causes the trace to not show up in the tree.
  • Remember my settings between opens
  • Broken stacks. For reasons I cannot understand, I often have 20% -> 80% of broken stacks.
    • Read the troubleshooting section, none of it seems to apply to me. When this happens I just start another trace and retry the scenario and it often works. It's hit and miss.
  • "Just My App" - I read the docs, but I cannot figure out how this grouping works - it never shows the code that I'm looking for. I almost always end up just switching to "no grouping" and just delving in
  • Tell me a method I enter in "Find" is hidden due to the current grouping. I spend the first two days extremely confused thinking Find was just broken (or that I was entering the wrong regex) - turns out it was just filtered out.
  • "Back" and "Forward" don't work how I'd expect them to work. I often find myself getting lost, and I instinctively choose "Back" to go back when I came from (like most Windows apps that have this concept) but it doesn't do that. I can't figure out what it actually does.
  • Let me peel off a new Window with the exact same context that I'm already at. There's a concept of a new window, but it resets the view that I'm looking at back to the start.
  • Make it obvious like the debugger makes it obvious when I don't have symbols. Grey out the stacks, and make it look exactly like the debugger. I spent a non-trivial amount of time not understand why I couldn't see any useful data until I realized I needed to load symbols to get it.

cc @vancem

@CyrusNajmabadi
Copy link

Please remember my column sizes, preferred sorting, group pats and foldpats.. Allow for horizontal scrolling.

@sharwell
Copy link
Member

sharwell commented Jul 5, 2017

Provide a 64-bit executable.

A 64-bit executable is now available (starting with #194). Is this working for you?

For each of the other bullets, a separate issue would be good.

@vancem
Copy link
Contributor

vancem commented Jul 5, 2017

Provide a 64-bit executable. My very first trace I opened OOM'd. I found that to be a common occurrence for the first few days, until I just ran corflags the binary and forced it to run 64-bit.

It is reasonable to provide a popup that tells people how to get/move to the 64 bit version. In fact it is not that hard to make PerfView simply relaunch itself as a 64 bit process (it could be a user-set default as well. You can make an issue for that if you wish (as Sam mentions, you can get a 64 bit version if you want).

I have pushed back against simply making PerfView 64 bit because frankly large data sets will lead to a pretty poor (slow) experience, and the better solution is to determine how to get the data volume down for the scenarios in questions. There is a good chance there is a memory leak or a memory inefficiency that moving to 64 bit is simply masking.

Optimized workflow for capturing a single executable.
Almost all my traces were capturing on an existing process - it would be great to have a "Attach to Process"-like experience similar to the debugger for just listening to a single process to reduce trace size.

More details would be useful here. Fundamentally, ETW is a machine wide data collection mechanism so there is no 'attach' (and generally speaking this is a good thing). The 'Run' command lets you do it for a single executable (but it is still machine wide, it is just that the starting, running and stopping are done for you). If you need 'attach' frankly that is what 'Collect' does. The viewing already allows you to pick a process quickly (it is typically the first (the one that used the most CPU)), so it is pretty quick. If you have a suggestion on improving this please provide details.

  • Check "No v3.X NGEN symbols" by default. Often ran into this - forgetting to check this box to improve the throughput of making a change, running a trace, inspecting a trace.

Generally speaking the defaults need to be SAFE (that is if it takes longer, that is OK, it is more important that it work without hiccups). If you DID care about a 3.X runtime and this was not checked, then you get a VERY bad experience (no symbols for NGEN images), and there is no great way to let them know the solution. Thus until 3.X scenarios really become very close to 0, I would prefer to keep this the way it is.

However we do remember certain values from run to run (like whether you want to merge/zip your file) (see App.ConfigData["Zip"] in the code), and adding the 3.X NGEN to this list is reasonable (thus one a more advanced user changes the default, it will be remembered until he explicitly changes it.

  • Turn off merging by default, or have a better default experience for capturing a trace for "some else" vs capturing a trace for "me" on my machine.

If you unclick the 'merge' box in the gui, PerfView will remember it run-to-run, and thus it will never merge from then on. I do this, and it works great (in fact there are popups that tell you about it).

Again I want this way because it is SAFE. Long ago PerfView did not merge by default, and event with LOTS of warnings, users screwed it up ALOT (copied data off the machine without merging). I think the current system works well (users who use it a lot can avoid the overhead, but they have to learn more (read the hints).

Remember my settings between opens

PerfView does this for some things (merging), and should probably do it for other (3.X rundown), but it is not a slam dunk to do this uniformly because although there is a 'clear user config' option, it is easy to get into states you don't know how to get out of easily. Thus a case could be made that the amount of persistent state should be kept low. For any particular thing, however it is straightforward to fix, and you should make an issue for it.

Broken stacks. For reasons I cannot understand, I often have 20% -> 80% of broken stacks.
Read the troubleshooting section, none of it seems to apply to me. When this happens I just start another trace and retry the scenario and it often works. It's hit and miss.

I am suspicious that you are running into the ETW 196 frame limit (can happen with deep async calls). Historically this was rare, but it seems things are changing. More investigation is reasonable, and we can probably fix it so that we have a 'BROKEN_TOO_MANY_FRAMES' tag to make diagnosis easier, but actually fixing the broken stacks is much harder. Note that the bottom up approaches suggested as work-around do continue to work.

"Just My App" - I read the docs, but I cannot figure out how this grouping works - it never shows the code that I'm looking for. I almost always end up just switching to "no grouping" and just delving in

Sounds like you wish the docs to be better. Specific suggestions are the way forward.

Tell me a method I enter in "Find" is hidden due to the current grouping. I spend the first two days extremely confused thinking Find was just broken (or that I was entering the wrong regex) - turns out it was just filtered out.

The thing is, unless you have empty grouping, filtering and folding, this could be true. Moreover, because CPU is sampled, you STILL may not see a method you 'KNOW' has been called if it is not on the stack long enough.

Perhaps docs would help, but generally speaking people don't read them. I don't have a good solution but if you have a specific suggestion we can discuss it.

"Back" and "Forward" don't work how I'd expect them to work. I often find myself getting lost, and I instinctively choose "Back" to go back when I came from (like most Windows apps that have this concept) but it doesn't do that. I can't figure out what it actually does.

Back and forward change the values in the textboxes to their previous values (as a unit). I agree is it not intuitive, but the intuitive thing seems to be hard to implement (I could not do it easily back when this was being implemented). If some experienced GUI programmer can fix it great.

Docs are also reasonable.

Let me peel off a new Window with the exact same context that I'm already at. There's a concept of a new window, but it resets the view that I'm looking at back to the start.

This is a deficiency in New window. It does set the TextBoxes and the tab to what you had before, but I notice it does not set the focus node for the 'callers and 'callees' view. It could also set the context for the treeview (what nodes where expanded), but that may be harder. This should be straightforward to fix and should have its own issue.

Note that my may find the 'Drill Into' feature on the right click context menu helpful in working around this. Basically you can take any set of values (e.g. some chunk of the call tree) and create a new window with just those samples. It is true that you may not have the tree opened in just the same way (but typically getting there is easy because there are no other samples to confuse things.

Make it obvious like the debugger makes it obvious when I don't have symbols. Grey out the stacks, and make it look exactly like the debugger. I spent a non-trivial amount of time not understand why I couldn't see any useful data until I realized I needed to load symbols to get it.

For what it is worth, this is a FAQ. I am not convinced greying things out is that much better than !?, but if it is easy to do in the GUI, I have not real objection. We can make an issue for it.

Thanks for the feedback.

@davkean
Copy link
Member Author

davkean commented Jul 6, 2017

Vance, I first want to say that I find PerfView to have been invaluable these past few weeks to diagnose performance issues both CPU, UI-blocking delays and GC-wise. It's much more elegant and less invasive way than instrumentation and plain CPU sampling that I've used in the past.

Before using it, I watched all the Channel 9 videos that you did around using it - mind you some of my scenarios were a lot more complex than what you covered (blocked threads), but I didn't enter into using PerfView blind. I found myself using PerfView for 3 different scenarios:

  1. Finding and capturing issues to file against another team.
  2. Finding and capturing issues that I needed to fix.
  3. Verifying that I actually fixed an issue.

There were 3 major things that made these scenarios harder:

  • The amount of data that was captured. This affected two main things:
    • I have a 2 Mbps upload to Redmond, even though zipping etl's cut down the size by 1/10 - it still takes 12 - 15 minutes to upload a 2 GB (compressed down to 200 MB) trace.
    • The general time it takes to go from "I've captured this trace" to "I can start looking at stacks", while fast given how big the traces are, still adds up over time while you are tracing over and over again.

This is why I was looking for ways to reduce the amount of data and processing time here - such as being able to limit it to a single process that I absolutely know is the cause of the issue.

  • Broken stacks.

    • From what I could see, this was affecting my results for scenario 3 above. When I had large amounts of broken stacks - my numbers didn't "make sense" for individual methods that I had fixed, causing me to redo the trace to get less broken stacks (it seems I'm destined while tracing VS to always at least get 20% of broken stacks).
  • Usability.

    • I was often finding myself "lost" in the results - where I'd accidentally navigated to the wrong method, and it wasn't clear to me how to get back to what I was just looking at. This resulted in me restarting the navigation by going back to the "By Name" and navigating from there.
    • The GroupPlat "Just My App" filter. Maybe this is just problematic because I'm debugging devenv which doesn't fit into the traditional app pattern, but I'd kinda expected that this would filter out CLR and Windows, but leave everything else in.
    • Save my settings. It would be great if PerfView remembered all the things that I'd changed (maybe with a way to reset events back to "default") , from the GroupPat to any check-box that I'd checked.

I'm not convinced that better docs are way to resolve these issues - there's already very comprehensive but very overwhelming amount of docs, I think a few tweaks of some UI elements and tweaking of the defaults could really make a difference here.

I'll file individual issues here for what I believe are bugs, feel free to resolve them if you disagree.

@davkean
Copy link
Member Author

davkean commented Jul 6, 2017

Just played around with GroupPats, I think I expected Just My App to work like group CLS/OS - hence my troubles. I think if this was remembered between sessions, this would probably resolve that issue.

@mattwarren
Copy link

I want to echo @davkean, PerfView is a great tool, it lets you do some many things that are hard/impossible to do otherwise. Thankyou soo much for making it!

The 'Run' command lets you do it for a single executable (but it is still machine wide, it is just that the starting, running and stopping are done for you).
...
The viewing already allows you to pick a process quickly (it is typically the first (the one that used the most CPU)), so it is pretty quick. If you have a suggestion on improving this please provide details.

This has always confused me, why, when you specify the launch of a single executable, does it still collect traces from all processes that are currently running? I know ETW is machine-wide, but in this scenario couldn't the default option be just collection of events from the process that you ask PerfView to launch? (I know you are prompted with a list of processes whenever you look at some events, but it seems strange to need to select a process each time)

For instance I use 'Run' quite often (but maybe me use-case isn't that common?) :

image

In this case, couldn't the default behaviour be to only capture data from CoreRun.exe, with a check-box to allow 'capture data from all other processes' if wanted?

@sharwell
Copy link
Member

sharwell commented Jul 11, 2017

📝 I have comments ready for most of the above issues (in most cases either supporting or supporting with design conditions), but I'm waiting for separate issues to be filed so the discussion doesn't go all over the place attempting to discuss many things in a single GitHub issue.

@vancem
Copy link
Contributor

vancem commented Jul 11, 2017

In this case, couldn't the default behavior be to only capture data from CoreRun.exe, with a check-box to allow 'capture data from all other processes' if wanted?

Perhaps it is better to wait for individual issues to be made, but just FYI, when PerfView was created ETW did not have the option of collecting for a single process. You had no choice but to collect machine wide. Some per-process filtering was added in Win8, however event today all kernel events are system wide (which are the most interestingly and in particular include CPU and context switch). Also for most investigations besides CPU (e.g. blocked time, disk I/O etc), knowing that other processes are interfering (stealing your processor or disk) is important. Finally, the main value of filtering is to lower file-size/overhead, but this overhead tends to be concentrated in the process of interest anyway (and if other processes are generating lots of events they are probably interesting!). Thus you typically don't save much (the only exception to this that I have seen is a multi-processor server box running many things but you only care about one of the services (and it is a CPU investigation). It does not seem worth tuning for... For what it is worth...

@davkean
Copy link
Member Author

davkean commented Jul 13, 2017

Finally, the main value of filtering is to lower file-size/overhead, but this overhead tends to be concentrated in the process of interest anyway (and if other processes are generating lots of events they are probably interesting!).

@vancem That doesn't match my experience, here's why:

  • VS is split into many separate processes including many ServiceHub, MSBuild, Vbcscompiler, VStest.xxx.exe and devenv itself. All of these processes produce large numbers of events. For the past month I've only been investigating memory usage in devenv - that's all I'm interested in.
  • Because some scenarios take a long time to reproduce - I'm often doing other things on box at the same time, including surfing the web, reading email, or writing code, opening other instances of devenv, build the tree, etc. I don't care about any of those results.

I can see where it maybe be interesting investigating and looking at events from all the processes on the box, but that shouldn't be to the detriment of when I only care about a single process case.

@vancem
Copy link
Contributor

vancem commented Jul 13, 2017

Your scenario is a reasonable argument for support of process level filtering. However as mentioned, at the present time ETW simply does not support filtering of kernel events, which are the most important ones to filter. The work-around is to simply live with the larger files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants