Skip to content
This repository has been archived by the owner on Apr 3, 2024. It is now read-only.

Commit

Permalink
feat: Add event listener for course publish
Browse files Browse the repository at this point in the history
Creates the edx-platform plugin plumbing, adds some new requirements, maps the appropriate Django Signal to push course structure to ClickHouse.
  • Loading branch information
bmtcril committed May 2, 2023
1 parent 2a55abd commit 089b2d0
Show file tree
Hide file tree
Showing 23 changed files with 1,363 additions and 38 deletions.
52 changes: 43 additions & 9 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,25 @@ Event Sink ClickHouse
Purpose
*******

A listener for `Open edX events`_ to send them to ClickHouse. This project
acts as a plugin to the Edx Platform, listens for configured Open edX events,
and sends them to a ClickHouse database for analytics or other processing. This
is being maintained as part of the Open Analytics Reference System (OARS)
project.
This project acts as a plugin to the `Edx Platform`_, listens for
configured `Open edX events`_, and sends them to a `ClickHouse`_ database for
analytics or other processing. This is being maintained as part of the Open
Analytics Reference System (`OARS`_) project.

OARS consumes the data sent to ClickHouse by this plugin as part of data
enrichment for reporting, or capturing data that otherwise does not fit in
xAPI.

Currently the only sink is in the CMS. It listens for the ``COURSE_PUBLISHED``
signal and serializes a subset of the published course blocks into one table
and the relationships between blocks into another table. With those we are
able to recreate the "graph" of the course and get relevant data, such as
block names, for reporting.

.. _Open edX events: https://github.com/openedx/openedx-events
.. _Edx Platform: https://github.com/openedx/edx-platform
.. _ClickHouse: https://clickhouse.com
.. _OARS: https://docs.openedx.org/projects/openedx-oars/en/latest/index.html

Getting Started
***************
Expand Down Expand Up @@ -75,12 +83,38 @@ Every time you develop something in this repo
Deploying
=========

TODO: How can a new user go about deploying this component? Is it just a few
commands? Is there a larger how-to that should be linked here?
This plugin will be deployed by default in an OARS Tutor environment. For other
deployments install the library or add it to private requirements of your
virtual environment ( ``requirements/private.txt`` ).

#. Run ``pip install openedx-event-sink-clickhouse``.

#. Run migrations:

- ``python manage.py lms migrate``

- ``python manage.py cms migrate``

PLACEHOLDER: For details on how to deploy this component, see the `deployment how-to`_
#. Restart LMS service and celery workers of edx-platform.

Configuration
===============

Currently all events will be listened to by default (there is only one). So
the only necessary configuration is a ClickHouse connection:

.. code-block::
.. _deployment how-to: https://docs.openedx.org/projects/openedx-event-sink-clickhouse/how-tos/how-to-deploy-this-component.html
EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG = {
# URL to a running ClickHouse server's HTTP interface. ex: https://foo.openedx.org:8443/ or
# http://foo.openedx.org:8123/ . Note that we only support the ClickHouse HTTP interface
# to avoid pulling in more dependencies to the platform than necessary.
"url": "http://clickhouse:8123",
"username": "changeme",
"password": "changeme",
"database": "event_sink",
"timeout_secs": 3,
}
Getting Help
************
Expand Down
39 changes: 39 additions & 0 deletions event_sink_clickhouse/apps.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"""

from django.apps import AppConfig
from edx_django_utils.plugins import PluginSettings, PluginSignals


class EventSinkClickhouseConfig(AppConfig):
Expand All @@ -11,3 +12,41 @@ class EventSinkClickhouseConfig(AppConfig):
"""

name = 'event_sink_clickhouse'
verbose_name = "Event Sink ClickHouse"

plugin_app = {
PluginSettings.CONFIG: {
'lms.djangoapp': {
'production': {PluginSettings.RELATIVE_PATH: 'settings.production'},
'common': {PluginSettings.RELATIVE_PATH: 'settings.common'},
'devstack': {PluginSettings.RELATIVE_PATH: 'settings.devstack'},
},
'cms.djangoapp': {
'production': {PluginSettings.RELATIVE_PATH: 'settings.production'},
'common': {PluginSettings.RELATIVE_PATH: 'settings.common'},
'devstack': {PluginSettings.RELATIVE_PATH: 'settings.devstack'},
}
},
# Configuration setting for Plugin Signals for this app.
PluginSignals.CONFIG: {
# Configure the Plugin Signals for each Project Type, as needed.
'cms.djangoapp': {
# List of all plugin Signal receivers for this app and project type.
PluginSignals.RECEIVERS: [{
# The name of the app's signal receiver function.
PluginSignals.RECEIVER_FUNC_NAME: 'receive_course_publish',

# The full path to the module where the signal is defined.
PluginSignals.SIGNAL_PATH: 'xmodule.modulestore.django.COURSE_PUBLISHED',
}],
}
},
}

def ready(self):
"""
Import our Celery tasks for initialization.
"""
super().ready()

from . import tasks # pylint: disable=import-outside-toplevel, unused-import
Empty file.
19 changes: 19 additions & 0 deletions event_sink_clickhouse/settings/common.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
"""
Default settings for the openedx_event_sink_clickhouse app.
"""


def plugin_settings(settings):
"""
Adds default settings
"""
settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG = {
# URL to a running ClickHouse server's HTTP interface. ex: https://foo.openedx.org:8443/ or
# http://foo.openedx.org:8123/ . Note that we only support the ClickHouse HTTP interface
# to avoid pulling in more dependencies to the platform than necessary.
"url": "http://clickhouse:8123",
"username": "changeme",
"password": "changeme",
"database": "event_sink",
"timeout_secs": 3,
}
13 changes: 13 additions & 0 deletions event_sink_clickhouse/settings/production.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
"""
Production settings for the openedx_event_sink_clickhouse app.
"""


def plugin_settings(settings):
"""
Override the default app settings with production settings.
"""
settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG = settings.ENV_TOKENS.get(
'EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG',
settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG
)
13 changes: 13 additions & 0 deletions event_sink_clickhouse/signals.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
"""
Signal handler functions, mapped to specific signals in apps.py.
"""


def receive_course_publish(sender, course_key, **kwargs): # pylint: disable=unused-argument
"""
Receives COURSE_PUBLISHED signal and queues the dump job.
"""
# import here, because signal is registered at startup, but items in tasks are not yet able to be loaded
from .tasks import dump_course_to_clickhouse # pylint: disable=import-outside-toplevel

dump_course_to_clickhouse.delay(str(course_key))
Empty file.
26 changes: 26 additions & 0 deletions event_sink_clickhouse/sinks/base_sink.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
"""
Base classes for event sinks
"""

from django.conf import settings


class BaseSink:
"""
Base class for ClickHouse event sink, allows overwriting of default settings
"""
def __init__(self, connection_overrides, log):
self.log = log
self.ch_url = settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG["url"]
self.ch_auth = (settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG["username"],
settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG["password"])
self.ch_database = settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG["database"]
self.ch_timeout_secs = settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG["timeout_secs"]

# If any overrides to the ClickHouse connection
if connection_overrides:
self.ch_url = connection_overrides.get("url", self.ch_url)
self.ch_auth = (connection_overrides.get("username", self.ch_auth[0]),
connection_overrides.get("password", self.ch_auth[1]))
self.ch_database = connection_overrides.get("database", self.ch_database)
self.ch_timeout_secs = connection_overrides.get("timeout_secs", self.ch_timeout_secs)
Loading

0 comments on commit 089b2d0

Please sign in to comment.