Analytics are an essential component of an online learning platform: you need to know whether your courses are effective and which parts need some improvement. You need to know if your students are falling by, and if they do, you need to detect the early warning signs. When your courses are successful, you want to get periodical engagement reports.
We created a tool to help you answer all these questions. Cairn is a Tutor plugin that you install on top of an Open edX platform and that gives you access to a powerful, full-blown analytics stack. Cairn comes with the following features out of the box:
βοΈ Unified data lake of learner events and stateful data: both learner-triggered events, coming from the Open edX tracking logs, and stateful data, coming from the existing databases, are available for querying in a single unified interface. This means that you can, for instance, query the grades of the students that visited your platform in the past 24 hours, or collect the email addresses of the students who did not yet complete the latest assignment.
β‘ο»Ώ Real-time: new events are visible immediately in your analytics interface. No more waiting for slow batch jobs to complete!
π Course- and org-based data access rights: your course staff is granted access only to the data rows that concern them. Cairn makes it easy to create new users with granular access permissions.
π Working dashboards out of the box: Cairn comes with a fully functional dashboard that you can start playing with right away.
π οΈ Fully customizable data and dashboards: your data scientists, business intelligence team and other tinkerers can freely explore your course data, create and share their own queries, datasets and dashboards. All it takes is a little bit of SQL.
π Scalable: Cairn scales as much as its backend, which was designed for Internet scale.
List of features | Cairn | Open edX Insights | Figures |
---|---|---|---|
Event aggregation | β | β | β |
Real-time data | β | β | β |
Easy to install | β | β | β |
Custom queries and dashboards | β | β | β |
Works with the latest Open edX versions | β | β | β |
Cairn uses the same collect/store/expose paradigm made popular by other frameworks such as the ELK Stack -- excepts that all the components are different and better suited to Open edX:
- On the server side, tracking logs are collected by Vector, an efficient, cloud-native log collector.
- Tracking log events are then stored in a Clickhouse table, which is the cornerstone of Cairn. Clickhouse also exposes MySQL data via live and materialized views. This is the magic piece of the puzzle which allows us to join event and MySQL data.
- The data inside Clickhouse is made visible to the end-users in a Superset frontend.
Cairn used to be a commercial plugin, but is now available for free to all Tutor users. Install the plugin with:
tutor plugins install cairn
Enable the plugin with:
tutor plugins enable cairn
Then, restart your platform and run the initialization scripts:
tutor local launch
Open http(s)://data.<YOUR_LMS_HOST> in your browser. When running locally, this will be http://data.local.openedx.io. (http://data.local.openedx.io:2247 in development). Users authenticate with their LMS user. By default, they have access to the data generated by the courses in which they have the "staff role". Once a user has successfully logged-in, their account can be modified with the cairn-createuser
command.
For instance, to convert an existing user to administrator status, run:
tutor local do cairn-createuser --admin YOURUSERNAME [email protected]
To add the default dashboards to the new user, add the --bootstrap-dashboards
option:
tutor local do cairn-createuser --bootstrap-dashboards YOURUSERNAME [email protected]
Some event data might be missing from your dashboards: just start using your LMS and refresh your dashboard. The new events should appear immediately.
By default, AUTH_ROLES_SYNC_AT_LOGIN
is false
which means admin can customize the permissions associated to a user. To disable this behaviour, modify the CAIRN_AUTH_ROLES_SYNC_AT_LOGIN
setting:
tutor config save --set CAIRN_AUTH_ROLES_SYNC_AT_LOGIN=true tutor local restart
Cairn allows you to collect and view just any metric from your Open edX platform, as long as the data is available from the tracking logs or the MySQL database. Out of the box, the default Cairn dashboard comes with visualizations for the following metrics:
- Weekly course engagement:
- Number of enrolled learners
- Number of active learners
- Number of learners who watched a video
- Number of learners who tried a problem
- Average time spent in course
- Course progress:
- Number of students who completed each unit
- Grade distribution histogram
- Course demographics:
- Gender distribution
- Level of education distribution
- Video engagement:
- Number of unique viewers
- Average watch time
- Total watch time
- Second-per-second statistics: Number of unique viewers, Total number of views
By default, authentication uses single sign-on (SSO) with the LMS such that users do not have to create separate accounts in Superset. In previous versions of Cairn (v15 and earlier), user accounts had to be created manually. To restore this behaviour, modify the CAIRN_ENABLE_SSO
setting:
tutor config save --set CAIRN_ENABLE_SSO=false tutor local restart
SSO will then disabled, and only manually created users will be able to login. To create a user, run the cairn-createuser
command, as explained above:
tutor local do cairn-createuser --password=yourpassword YOURUSERNAME [email protected]
To restrict a given user to one or more courses or organizations, select the course IDs and/or organization IDS to which the user should have access:
tutor local do cairn-createuser --course-id='course-v1:edX+DemoX+Demo_Course' YOURUSERNAME [email protected]
Cairn has a cairn-watchcourses
service that looks for changes to the course structure and refreshes the course structure in the datalake automatically. However, the changes may take up to 5 minutes to show up in superset as this service utilizes batch processing.
If you would like to manually refresh the course structure, run:
tutor local do init --limit=cairn
Or, if you want to avoid running the full plugin initialization:
tutor local run \ -v $(tutor config printroot)/env/plugins/cairn/apps/openedx/scripts/:/openedx/scripts \ -v $(tutor config printroot)/env/plugins/cairn/apps/clickhouse/auth.json:/openedx/clickhouse-auth.json \ lms python /openedx/scripts/importcoursedata.py
When running on Kubernetes instead of locally, most commands above can be re-written with tutor k8s exec service "command"
instead of tutor local run service command
. For instance:
# Privileged user creation tutor k8s exec cairn-superset "superset fab create-admin --username YOURUSERNAME --email [email protected]" # Unprivileged user creation tutor k8s exec cairn-clickhouse "cairn createuser --course-id='course-v1:edX+DemoX+Demo_Course' --org-id='edX' YOURUSERNAME" tutor k8s exec cairn-superset "cairn createuser YOURUSERNAME [email protected]"
When Cairn is launched for the first time, past events that were triggered prior to the plugin installation will not be loaded in the data lake. If you are interested in loading past events, you should load them manually by running:
tutor local start -d cairn-clickhouse tutor local run \ --volume="$(tutor config printroot)/data/lms/logs/:/var/log/openedx/:ro" \ --volume="$(tutor config printroot)/env/plugins/cairn/apps/vector/file.toml:/etc/vector/file.toml:ro" \ -e VECTOR_CONFIG=/etc/vector/file.toml cairn-vector
The latter command will parse tracking log events from the $(tutor config printroot)/data/lms/logs/tracking.log
file that contains all the tracking logs since the creation of your platform. The command will take a while to run if you have a large platform that has been running for a long time. It can be interrupted at any time and started again, as the log collector keeps track of its position within the tracking log file.
Tables created in Clickhouse are managed by a lightweight migration system. You can view existing migrations that ship by default with Cairn in the following folder: $VIRTUAL_ENV/lib/python3.9/site-packages/tutorcairn/templates/cairn/apps/clickhouse/migrations.d/
.
You are free to create your own migrations that will automatically be created in Clickhouse every time the tutor local launch
or tutor local do init
commands are run. To do so, as usual in Tutor, you should create a Tutor plugin. This plugin should include the CAIRN_MIGRATIONS_FOLDER
configuration. This setting should point to a template folder, inside the plugin, where migration templates are defined. For instance, assuming you created the "customcairn" plugin:
config = { "defaults": { "CAIRN_MIGRATIONS_FOLDER": "customcairn/apps/migrations.d" } }
In this example, the following folder should be created in the plugin:: tutorcustomcairn/templates/customcairn/apps/migrations.d/
. Then, you should add your migration files there. Migrations will be applied in alphabetical order whenever you run tutor local launch
or tutor local do init
.
In development, the Superset user interface will be available at http://data.local.openedx.io:2247.
To reload Vector configuration after changes to vector.toml, run:
tutor config save && tutor local exec cairn-vector sh -c "kill -s HUP 1"
To explore the clickhouse database as root, run:
tutor local run cairn-clickhouse cairn client
To launch a Python shell in Superset, run:
tutor local run cairn-superset superset shell
Cairn is configured by several Tutor settings. Each one of these settings may be modified individually by running:
tutor config save --set SETTING_NAME=settingvalue
Then apply changes with:
tutor local launch
CAIRN_HOST
(default:"data.{{ LMS_HOST }}"
): hostname at which the Cairn frontend (i.e: Superset) will be accessible. By default, this is the "data" subdomain of the LMS. Thus, if your students access the LMS at https://learn.mydomain.com then Cairn will be accessible at https://data.learn.mydomain.com.CAIRN_DOCKER_HOST_SOCK_PATH
(default:"/var/run/docker.sock"
): path to the Docker host socket on the host. This is required to collect logs from Docker when running locally, but it is not used when running on Kubernetes.
CAIRN_RUN_CLICKHOUSE
(default:true
): set tofalse
to run your own Clickhouse cluster separately from Cairn. In that case, you should also configure the Clickhouse credentials below.CAIRN_CLICKHOUSE_DOCKER_IMAGE
(default:"{{ DOCKER_REGISTRY }}overhangio/cairn-clickhouse:{{ CAIRN_VERSION }}"
): name of the Docker image that runs Clickhouse. Override this setting to build your own image of Clickhouse.CAIRN_CLICKHOUSE_HOST
(default:"cairn-clickhouse"
): hostname where Clickhouse will be accessible from Superset. By default, this is the internal docker-compose/Kubernetes service name.CAIRN_CLICKHOUSE_HTTP_PORT
(default:8123
): port at which Clickhouse exposes its HTTP API, which is necessary to bulk import unit names.CAIRN_CLICKHOUSE_HTTP_SCHEME
(default:"http"
): HTTP scheme to access the Clickhouse HTTP API. If you self-host a Clickhouse cluster (RUN_CLICKHOUSE=false
) then it is strongly recommended to set this to "https".CAIRN_CLICKHOUSE_PORT
(default:9000
): native Clickhouse API port.CAIRN_CLICKHOUSE_DATABASE
(default:"openedx"
): name of the Clickhouse database which will store all analytics from your Open edX platform.CAIRN_CLICKHOUSE_USERNAME
(default:"openedx"
): username to access theCAIRN_CLICKHOUSE_DATABASE
.CAIRN_CLICKHOUSE_PASSWORD
(default:"{{ 20|random_string }}"
): randomly-generated password forCAIRN_CLICKHOUSE_USERNAME
.
CAIRN_RUN_POSTGRESQL
(default:true
): set tofalse
to run your own Postgresql cluster separately from Cairn. Postgresql is the database that stores all data related to Superset, which is the Cairn frontend.CAIRN_SUPERSET_LANGUAGE_CODE
(default:"{{ LANGUAGE_CODE[:2] }}"
): 2-letter code of the default language for the Superset frontend. View the list of all supported languages here. When different than "en", users will have the opportunity to switch from English to this language via a flag icon in the top-right corner.CAIRN_SUPERSET_DOCKER_IMAGE
(default:"{{ DOCKER_REGISTRY }}overhangio/cairn-superset:{{ CAIRN_VERSION }}"
): name of the Docker image that runs Postgresql.CAIRN_POSTGRESQL_DATABASE
(default:"superset"
): name of the Postgresql database.CAIRN_POSTGRESQL_USERNAME
(default:"superset"
): Postgresql username.CAIRN_POSTGRESQL_PASSWORD
(default:"{{ 20|random_string }}"
): Postgresql password.CAIRN_SUPERSET_SECRET_KEY
(default:"{{ 20|random_string }}"
): randomly-generated secret key for the Superset frontend.
Use cairn-superset-settings
patch to add or update Superset configurations.
e.g, to customize visualizations, add following configurations in cairn-superset-settings
patch:
APP_NAME = "<APP_NAME>" APP_ICON = "<APP_ICON>" APP_ICON_WIDTH = <APP_ICON_WIDTH>
Then apply changes with:
tutor local launch
This Tutor plugin is maintained by Danyal Faheem from Edly. Community support is available from the official Open edX forum. Do you need help with this plugin? See the troubleshooting section from the Tutor documentation.
This software is licensed under the terms of the AGPLv3.