Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add event logging support #679

Merged
merged 13 commits into from
Oct 8, 2018
Merged

Add event logging support #679

merged 13 commits into from
Oct 8, 2018

Conversation

yuvipanda
Copy link
Collaborator

@yuvipanda yuvipanda commented Oct 3, 2018

This PR aims to address parts of jupyterhub/mybinder.org-deploy#97

Our goal is to have an analytics pipeline that:

  1. Requires the least amount of code we need to write
  2. Is not tied to any particular cloud vendor / product
  3. Requires transparency in our event schemas so users &
    developers can see easily & exactly what is being
    collected.
  4. Balance needs of analysts who are analyzing
    the event streams & developers who are adding the
    event emitting code.

This PR introduces an EventLog class. We'll iterate on
the design here, but eventually it should be spun out into
its own little library that can be integrated into other
parts of Jupyter.

For mybinder.org, we can sink these events into
Stackdriver with https://cloud.google.com/logging/docs/setup/python.
This should give us an easy sink that we don't have to maintain,
and an easy way to export to files / real-time query. This is
much easier than maintaining something like logstash or fluent-bit.

A lot of this work is inspired by the (IMO very well designed)
Wikimedia Event Logging
system. I think it provides a good mix of analytics, performance,
privacy, transparency, and ease of use for analysis

@yuvipanda
Copy link
Collaborator Author

@minrk if I set handler to logging.StreamHandler(), I get this error:

20:29 $ python3 -m binderhub -f testing/minikube/binderhub_config.py 
E1002 20:29:54.919839   16837 ip.go:48] Error getting IP:  Host is not running
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/yuvipanda/code/binderhub/binderhub/__main__.py", line 3, in <module>
    main()
  File "/home/yuvipanda/.local/share/virtualenvs/binderhub-QT7oPY12/lib/python3.6/site-packages/traitlets/config/application.py", line 657, in launch_instance
    app.initialize(argv)
  File "/home/yuvipanda/code/binderhub/binderhub/app.py", line 474, in initialize
    self.event_log = EventLog(parent=self)
  File "/home/yuvipanda/code/binderhub/binderhub/events.py", line 27, in __init__
    super().__init__(*args, **kwargs)
  File "/home/yuvipanda/.local/share/virtualenvs/binderhub-QT7oPY12/lib/python3.6/site-packages/traitlets/config/configurable.py", line 84, in __init__
    self.config = config
  File "/home/yuvipanda/.local/share/virtualenvs/binderhub-QT7oPY12/lib/python3.6/site-packages/traitlets/traitlets.py", line 585, in __set__
    self.set(obj, value)
  File "/home/yuvipanda/.local/share/virtualenvs/binderhub-QT7oPY12/lib/python3.6/site-packages/traitlets/traitlets.py", line 574, in set
    obj._notify_trait(self.name, old_value, new_value)
  File "/home/yuvipanda/.local/share/virtualenvs/binderhub-QT7oPY12/lib/python3.6/site-packages/traitlets/traitlets.py", line 1139, in _notify_trait
    type='change',
  File "/home/yuvipanda/.local/share/virtualenvs/binderhub-QT7oPY12/lib/python3.6/site-packages/traitlets/traitlets.py", line 1176, in notify_change
    c(change)
  File "/home/yuvipanda/.local/share/virtualenvs/binderhub-QT7oPY12/lib/python3.6/site-packages/traitlets/traitlets.py", line 819, in compatible_observer
    return func(self, change)
  File "/home/yuvipanda/.local/share/virtualenvs/binderhub-QT7oPY12/lib/python3.6/site-packages/traitlets/config/configurable.py", line 186, in _config_changed
    self._load_config(change.new, traits=traits, section_names=section_names)
  File "/home/yuvipanda/.local/share/virtualenvs/binderhub-QT7oPY12/lib/python3.6/site-packages/traitlets/config/configurable.py", line 153, in _load_config
    setattr(self, name, deepcopy(config_value))
  File "/home/yuvipanda/.local/share/virtualenvs/binderhub-QT7oPY12/lib/python3.6/copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/yuvipanda/.local/share/virtualenvs/binderhub-QT7oPY12/lib/python3.6/copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/yuvipanda/.local/share/virtualenvs/binderhub-QT7oPY12/lib/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/home/yuvipanda/.local/share/virtualenvs/binderhub-QT7oPY12/lib/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/yuvipanda/.local/share/virtualenvs/binderhub-QT7oPY12/lib/python3.6/copy.py", line 169, in deepcopy
    rv = reductor(4)
TypeError: can't pickle _thread.RLock objects

Looks like traitlets is trying to do a deep copy maybe? I'm not entirely sure why it's doing that! Any thoughts?

@minrk
Copy link
Member

minrk commented Oct 3, 2018

traitlets does a deep copy of the config object. What are you passing to the config?

- Define & emit events in a unified way with EventLog class
- Events can be sent to a sink via logging handlers
- Events are JSON formatted, with a 'capsule' that contains
  useful metadata
- Defaults to not doing anything at all unless a handler is
  configured
- Adds 'launch' events that are emitted when a launch is
  successful.
- Schemas are used to make sure we are only capturing the
  information we really need, and nothing more.
traitlets does a deep copy of config, so you can not pass
unpicklable Instance objects through it. Most handlers are
unpicklable
This installs binderhub and its dependencies into environment.
Also runs npm install and webpack, so we don't have to run
it manually.

Doing this since travis can't find the jsonlogger package
otherwise.
Trying to fix the following error from CI:

> traitlets.traitlets.TraitError: No default value found for handlers_maker trait
@yuvipanda
Copy link
Collaborator Author

The test failures now seem unrelated?

@yuvipanda yuvipanda changed the title [WIP] Add Event Logging Add event logging support Oct 3, 2018
@yuvipanda
Copy link
Collaborator Author

Tests and docs added!

.travis.yml Outdated Show resolved Hide resolved
.travis.yml Outdated Show resolved Hide resolved
binderhub/events.py Outdated Show resolved Hide resolved
binderhub/tests/test_eventlog.py Outdated Show resolved Hide resolved
binderhub/events.py Outdated Show resolved Hide resolved
binderhub/events.py Outdated Show resolved Hide resolved
@minrk
Copy link
Member

minrk commented Oct 4, 2018

As a design question: will each event type be added as a method on EventLogger, or will there be direct calls to EventLogger.emit(...) in the code?

@yuvipanda
Copy link
Collaborator Author

@minrk I think to begin with we'll add methods to EventLogger, but I want to add schema validation + registration to this soon. Once we do that, we can just have emit. The methods are there just to enforce a schema until then.

This will also allow us to make this more generic later on.

binderhub/events.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants