-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial version of LavaProfiler user interface #233
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start but lots to do left...
src/lava/utils/profiler.py
Outdated
@@ -22,7 +22,8 @@ | |||
from lava.magma.runtime.runtime import Runtime | |||
from lava.magma.core.process.message_interface_enum import ActorType | |||
from lava.magma.compiler.compiler import Compiler | |||
from lava.magma.core.resources import Loihi1NeuroCore, Loihi2NeuroCore | |||
from lava.magma.core.resources import ( | |||
AbstractComputeResource, Loihi1NeuroCore, Loihi2NeuroCore) | |||
|
|||
|
|||
class LavaProfiler(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always good to have some docstrings by the time of a PR in which you expect feedback from others.
@@ -34,7 +35,7 @@ def __init__(self, start: int = 0, end: int = 0, | |||
self.end = end | |||
self.bin_size = bin_size | |||
self.buffer_size = buffer_size | |||
self.loihi_flag = False # Exchange with list of resources | |||
self.used_resources: ty.List[AbstractComputeResource] = [] | |||
|
|||
def profile(self, proc: AbstractProcess): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring
self.buffer_size = buffer_size | ||
self.loihi_flag = False # Exchange with list of resources | ||
|
||
def profile(self, proc: AbstractProcess): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about two API variants. The old one in which the profiler wraps the Proc and probably this new version. Did you get feedback which one is actually better. I believe our conclusion was that the old one seemed more suitable for what we want to do so you wanted to prepare both side by side to show DR and PS the alternative.
Because one class overwriting a method attribute of another class looks like borderline invasive in terms of unexpected side effects. Python allows it but I'd imagine this would draw the anger of the Python community upon us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that the user API would be better if the user could simply define a profiler without having to change his line
proc.run(...),
i.e. the interface that PP coded here. But I also see Andreas point: Overwriting the run function feels bad. In particular, what happens if we ever change the general run method of Processes? Then we may forget to change it in the Profiler, and it will stop working.
What does the Profiler actually do under the hood? Does it just count how often the internal run_spk method is called, and how often spikes are being sent? In that case, could we
- ask people to just add a decorator @count to count runs in front of the run_spk function:
class MyProcModel(list):
@counter
def run_spk(self, *args, **kwargs):
...
with
def counter(f):
def count(*args, **kwargs):
count.calls += 1
return f(*args, **kwargs)
count.calls = 0
return count
This decorator would mean that we don't need to rewrite the whole run method. - subscribe to the port of the Process to read its sent spikes?
To me, that would be way less invasive. But I know too little about what's going on under the hood to say if anything like that would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both versions are slide by slide in the PowerPoint ;)
I still prefer the black box version, but the functionality is the same either way so switching it should be simple.
@phstratmann We need to do quite a few things under the hood, especially modifying the compilation process (the sketch of what we need to do is in form of comments and mock methods in this PR).
We do not really count how often run_spk is called.
src/lava/utils/profiler.py
Outdated
|
||
Functionally, this method does the same as run(..) of AbstractProcess, | ||
but modifies the chosen ProcModels and executables to be able to use the | ||
LavaProfiler. From the user perspective, it should not be noticeable as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be noticable and cause confusion as soon as somebody uses a debugger with breakpoints.
src/lava/utils/profiler.py
Outdated
... | ||
return proc_map | ||
|
||
def _set_profiler(self, proc_map): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name does not reflect was method does.
You don't seem to be doing anything with the Profiler itself but with the ProcModels so they can work with the Profiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to "_prepare_proc_models"
def profile(self, proc: AbstractProcess): | ||
proc.run = types.MethodType(self.run, proc) | ||
|
||
def get_energy(self) -> np.array: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need something more elaborate instead or besides this method. Yes we want a total time series but in simulation, we also need the ability to get time series for specific Procs or cores or specific contributors to the entire energy.
How are we going to do this?:
src/lava/utils/profiler.py
Outdated
@@ -12,3 +12,125 @@ | |||
elementary operation has a defined execution time and energy cost, which is | |||
used in a performance model to calculate execution time and energy. | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring:
will contain -> contains [otherwise, this needs to be rewritten once it works next month, and we will forget it]
power and performance of workloads -> isn't power an aspect of performance?
Maybe a few lines on what the Profiler simulation is good for?
- Simulate a network on CPU before it is being deployed - in which situations is that feature helpful? If people don't have access to Loihi yet. If they want to understand the performance on a fine-granular layer. ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this Docstring to the Profiler class docstring without the "will".
exchange the process models accordingly. | ||
Tell the user which Processes will not be profiled, as they lack a | ||
profileable ProcModel.""" | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we automatically switch process models? I could imagine that people may get confused in some cases. Let's assume the user first runs a process without profiler and the compiler chooses a process model that is not profilable. Then the user runs the same process with profiler. The profiler will automatically switch the process model to one that can be profiled. But these process models may differ - maybe because of a bug, maybe because of other reasons. The user will not have expected to see any different process behavior just because (s)he activated the profiler. Instead, I would expect that I receive an error message if the default process model cannot be profiled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My approach would be that each ProcModel has a profileable counter part ProcModel, which only adds the operation counters.
So for the LIF neuron for example, we would need a profilable ProcModel for "fixed_pt" and "floating_pt" and there is a 1:1 mapping to exchange them if the profiler is present. I do not mess with the compiler, but only afterwards look for ProcModels which can be exchanged.
If there is no such ProcModel, then the user will be informed that this Process is not considered by the Profiler. If no chosen ProcModel has a profileable version and we run on simulation only, than there will be an error stating no Process is considered for the Profiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean in terms of code duplication? Does it mean there is a ProcModel class and then an almost identical ProcModel class that just adds counters?
Before we go into a whole lot of implementation, as usual, we should first write down an end to end (mock) example of what we are trying to enable and then agree that this is the best way to go. Such decisions are best made not in the abstract, for people who have not thought about the pros and cons deeply before, but using a concrete exmple.
Do we have such an example already?
If not I suggest you draft one and share it. We are at an important fork in the road, so we should get some wider input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an example in proc/lif/models.py
We can inherit the ProcModel and add the code for operation counters.
"""Returns the energy estimate per time step in µJ.""" | ||
... | ||
return 0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we give the user the option to either receive the whole time series or just the total time / energy? I could imagine that for particularly long runs, we run into memory or runtime problems if we store one value for each time step. If we just accumulate the values, it may often suffice.
In addition, what's about a value to provide the frequency of measurements? We may often not need a value each time step, but only each 1000th time step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this is just a first example of getting results. We can offer more sophisticated options, including standard plots etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Some comments and questions, though.
@PhilippPlank I propose to close this unmerged as it has grown too far out of sync with our current plan for PnP, are you okay with that? |
Issue Number: #232
Objective of pull request: Implement an universal tool to capture power, energy and execution time in Lava which works on all backends with one interface across backends and which gives insights about power and performance of the components. This tool will be the LavaProfiler.
Pull request checklist
Your PR fulfills the following requirements:
flakeheaven lint src/lava tests/
) and (bandit -r src/lava/.
) pass locallypytest
) passes locallyPull request type
Please check your PR type:
What is the current behavior?
What is the new behavior?
Does this introduce a breaking change?
Supplemental information