Skip to content
This repository has been archived by the owner on Apr 3, 2024. It is now read-only.

Add Course Published event listener and plugin plumbing #1

Merged
merged 7 commits into from
May 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ Change Log
Unreleased
**********

*
* First functional version, includes a CMS listener for COURSE_PUBLISHED
* README updates
* New configuration settings for connection to ClickHouse

0.1.0 – 2023-04-24
**********************************************
Expand Down
52 changes: 43 additions & 9 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,25 @@ Event Sink ClickHouse
Purpose
*******

A listener for `Open edX events`_ to send them to ClickHouse. This project
acts as a plugin to the Edx Platform, listens for configured Open edX events,
and sends them to a ClickHouse database for analytics or other processing. This
is being maintained as part of the Open Analytics Reference System (OARS)
project.
This project acts as a plugin to the `Edx Platform`_, listens for
configured `Open edX events`_, and sends them to a `ClickHouse`_ database for
analytics or other processing. This is being maintained as part of the Open
Analytics Reference System (`OARS`_) project.

OARS consumes the data sent to ClickHouse by this plugin as part of data
enrichment for reporting, or capturing data that otherwise does not fit in
xAPI.

Currently the only sink is in the CMS. It listens for the ``COURSE_PUBLISHED``

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does sink mean in this context? [curious]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a message receiver that, in the context of this code, is just saving the data elsewhere. It's not performing any meaningful work or operating in the transactional environment of the service.

signal and serializes a subset of the published course blocks into one table
and the relationships between blocks into another table. With those we are
able to recreate the "graph" of the course and get relevant data, such as
block names, for reporting.

.. _Open edX events: https://github.com/openedx/openedx-events
.. _Edx Platform: https://github.com/openedx/edx-platform
.. _ClickHouse: https://clickhouse.com
.. _OARS: https://docs.openedx.org/projects/openedx-oars/en/latest/index.html

Getting Started
***************
Expand Down Expand Up @@ -75,12 +83,38 @@ Every time you develop something in this repo
Deploying
=========

TODO: How can a new user go about deploying this component? Is it just a few
commands? Is there a larger how-to that should be linked here?
This plugin will be deployed by default in an OARS Tutor environment. For other
deployments install the library or add it to private requirements of your
virtual environment ( ``requirements/private.txt`` ).

#. Run ``pip install openedx-event-sink-clickhouse``.

#. Run migrations:

- ``python manage.py lms migrate``

- ``python manage.py cms migrate``

PLACEHOLDER: For details on how to deploy this component, see the `deployment how-to`_
#. Restart LMS service and celery workers of edx-platform.

Configuration
===============

Currently all events will be listened to by default (there is only one). So
the only necessary configuration is a ClickHouse connection:

.. code-block::

.. _deployment how-to: https://docs.openedx.org/projects/openedx-event-sink-clickhouse/how-tos/how-to-deploy-this-component.html
EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG = {
# URL to a running ClickHouse server's HTTP interface. ex: https://foo.openedx.org:8443/ or
# http://foo.openedx.org:8123/ . Note that we only support the ClickHouse HTTP interface
# to avoid pulling in more dependencies to the platform than necessary.
"url": "http://clickhouse:8123",
"username": "changeme",
"password": "changeme",
"database": "event_sink",
"timeout_secs": 3,
}

Getting Help
************
Expand Down
16 changes: 4 additions & 12 deletions catalog-info.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
# https://open-edx-proposals.readthedocs.io/en/latest/processes/oep-0055-proc-project-maintainers.html

apiVersion: backstage.io/v1alpha1
kind: ""
kind: "Component"
metadata:
name: 'openedx_event_sink_clickhouse'
name: 'openedx-event-sink-clickhouse'
description: "A sink for Open edX events to send them to ClickHouse"
annotations:
# (Optional) Annotation keys and values can be whatever you want.
Expand All @@ -15,18 +15,10 @@ metadata:
spec:

# (Required) This can be a group(`group:<group_name>` or a user(`user:<github_username>`)
owner: ""
owner: "group:openedx-event-sink-clickhouse-maintainers"

# (Required) Acceptable Type Values: service, website, library
type: ''
type: 'library'

# (Required) Acceptable Lifecycle Values: experimental, production, deprecated
lifecycle: 'experimental'

# (Optional) The value can be the name of any known component.
subcomponentOf: '<name_of_a_component>'

# (Optional) An array of different components or resources.
dependsOn:
- '<component_or_resource>'
- '<another_component_or_resource>'
37 changes: 37 additions & 0 deletions event_sink_clickhouse/apps.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"""

from django.apps import AppConfig
from edx_django_utils.plugins import PluginSettings, PluginSignals


class EventSinkClickhouseConfig(AppConfig):
Expand All @@ -11,3 +12,39 @@ class EventSinkClickhouseConfig(AppConfig):
"""

name = 'event_sink_clickhouse'
verbose_name = "Event Sink ClickHouse"

plugin_app = {
PluginSettings.CONFIG: {
'lms.djangoapp': {
'production': {PluginSettings.RELATIVE_PATH: 'settings.production'},
'common': {PluginSettings.RELATIVE_PATH: 'settings.common'},
},
'cms.djangoapp': {
'production': {PluginSettings.RELATIVE_PATH: 'settings.production'},
'common': {PluginSettings.RELATIVE_PATH: 'settings.common'},
}
},
# Configuration setting for Plugin Signals for this app.
Copy link

@mariajgrimaldi mariajgrimaldi May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove these inline comments since the configurations are pretty self-explanatory

PluginSignals.CONFIG: {
# Configure the Plugin Signals for each Project Type, as needed.
'cms.djangoapp': {
# List of all plugin Signal receivers for this app and project type.
PluginSignals.RECEIVERS: [{
# The name of the app's signal receiver function.
PluginSignals.RECEIVER_FUNC_NAME: 'receive_course_publish',

# The full path to the module where the signal is defined.
PluginSignals.SIGNAL_PATH: 'xmodule.modulestore.django.COURSE_PUBLISHED',
}],
}
},
}

def ready(self):
"""
Import our Celery tasks for initialization.
"""
super().ready()

from . import tasks # pylint: disable=import-outside-toplevel, unused-import

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use absolute imports across the project?

Empty file.
19 changes: 19 additions & 0 deletions event_sink_clickhouse/settings/common.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
"""
Default settings for the openedx_event_sink_clickhouse app.
"""


def plugin_settings(settings):
"""
Adds default settings
"""
settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG = {
# URL to a running ClickHouse server's HTTP interface. ex: https://foo.openedx.org:8443/ or
# http://foo.openedx.org:8123/ . Note that we only support the ClickHouse HTTP interface
# to avoid pulling in more dependencies to the platform than necessary.
"url": "http://clickhouse:8123",
"username": "changeme",
"password": "changeme",
"database": "event_sink",
"timeout_secs": 3,
}
13 changes: 13 additions & 0 deletions event_sink_clickhouse/settings/production.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
"""
Production settings for the openedx_event_sink_clickhouse app.
"""


def plugin_settings(settings):
"""
Override the default app settings with production settings.
"""
settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG = settings.ENV_TOKENS.get(
'EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG',
settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG
)
13 changes: 13 additions & 0 deletions event_sink_clickhouse/signals.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
"""
Signal handler functions, mapped to specific signals in apps.py.
"""


def receive_course_publish(sender, course_key, **kwargs): # pylint: disable=unused-argument
"""
Receives COURSE_PUBLISHED signal and queues the dump job.
"""
# import here, because signal is registered at startup, but items in tasks are not yet able to be loaded
from .tasks import dump_course_to_clickhouse # pylint: disable=import-outside-toplevel

dump_course_to_clickhouse.delay(str(course_key))
Empty file.
47 changes: 47 additions & 0 deletions event_sink_clickhouse/sinks/base_sink.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
"""
Base classes for event sinks
"""
from collections import namedtuple

import requests
from django.conf import settings

ClickHouseAuth = namedtuple("ClickHouseAuth", ["username", "password"])


class BaseSink:
"""
Base class for ClickHouse event sink, allows overwriting of default settings
"""
def __init__(self, connection_overrides, log):
self.log = log

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the log to be part of the class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next PR will be to add a management command that will call into here, so I'm using the pattern established in Coursegraph that passes in the log so that it can go to the celery log or normal IDA log based on how it's being run.

self.ch_url = settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG["url"]
self.ch_auth = ClickHouseAuth(settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG["username"],
settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG["password"])
self.ch_database = settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG["database"]
self.ch_timeout_secs = settings.EVENT_SINK_CLICKHOUSE_BACKEND_CONFIG["timeout_secs"]

# If any overrides to the ClickHouse connection
if connection_overrides:
self.ch_url = connection_overrides.get("url", self.ch_url)
self.ch_auth = ClickHouseAuth(connection_overrides.get("username", self.ch_auth.username),
connection_overrides.get("password", self.ch_auth.password))
self.ch_database = connection_overrides.get("database", self.ch_database)
self.ch_timeout_secs = connection_overrides.get("timeout_secs", self.ch_timeout_secs)

def _send_clickhouse_request(self, request):
"""
Perform the actual HTTP requests to ClickHouse.
"""
session = requests.Session()
prepared_request = request.prepare()

try:
response = session.send(prepared_request, timeout=self.ch_timeout_secs)
response.raise_for_status()
except requests.exceptions.HTTPError as e:
self.log.error(str(e))
self.log.error(e.response.headers)
self.log.error(e.response)
self.log.error(e.response.text)
raise
Loading