Skip to content

Latest commit

 

History

History
278 lines (181 loc) · 13.9 KB

README.rst

File metadata and controls

278 lines (181 loc) · 13.9 KB

CourseGraph plugin for Tutor

Neo4j logo A plus sign, indicating the combination of Neo4j and Tutor Tutor logo

CourseGraph is a tool that allows Open edX developers and support specialists to inspect their platform instance's learning content. It consists of two simple components:

  1. The CourseGraph support application, which loads courses from the CMS's internal course store and dumps them into an instance of...
  2. Neo4j, a popular open-source graph database. Staff for an Open edX instance can query the course graph via Neo4j's Web console using the Cypher query language.

CourseGraph was initially an internal tool at edX, Inc., but as of the Maple release it was shared with the greater Open edX community. This Tutor plugin aims to provide an easy mechanism for developers and operators to trial and deploy CourseGraph.

Status & Contributing

This is Beta software. It was developed as part of the Tutor Adoption Initiative for the purpose learning about the Tutor Plugin API, and it has been tested to work for with Tutor and Open edX for local development, local deployment, and Kubernetes deployment. However, it has not yet been tested with a production Open edX instance. Although this plugin is nominally maintained by the Center for Reimagining Learning, we do not currently have plans to do any comprehensive production-tier testing. You have been warned!

If you choose to use this plugin, we're eager to hear about your experiences! Please let us know by opening an issue in this repository or posting on the Open edX or Tutor forums. Also, if you're interested in contributing, feel free to open an issue or a pull request. We'll try to give it a first look within a week.

Branches

This plugin uses the same branching model as Tutor:

nightly Latest development. Compatible with Tutor Nightly and Open edX master. Merged into master at each Open edX named release.
master Latest stable release and patches. Compatible with Tutor master and latest Open edX named release. Merged into nightly continuously.

The syncing between branches is currently done manually.

If your change is backwards-compatible with the last Open edX named release, then propose it against master. If not, then propose it against nightly.

This repository aims to adhere to all relevant Open edX Proposals, including OEP-55, Conventional Commits.

Installation

Install the latest stable version (requires the latest Tutor release):

pip install tutor-contrib-coursegraph

Or, install the latest nightly version (requires Tutor Nightly):

pip install git+https://github.com/openedx/tutor-contrib-coursegraph@nightly

Or, install the plugin to be hacked on:

git clone [email protected]:openedx/tutor-contrib-coursegraph
cd tutor-contrib-coursegraph
source {{ PATH TO A VIRTUAL ENVIRONMENT }}
make dev-requirements

Setup

Enable the plugin and re-generate your Tutor environment:

tutor plugins enable coursegraph
tutor config save

Then, run initialization in order to dump your platform's existing courses into CourseGraph:

tutor [dev|local|k8s] init --limit=coursegraph

Start CourseGraph:

tutor [dev|local|k8s] start coursegraph

Usage

Finding the Web interface

Once CourseGraph is started and courses have been dumped to it, the tool can be viewed at:

http://coursegraph.{{ LMS_HOST }}

For example, if your LMS is at openedx.example.edu and you're using the default Neo4j HTTP port, that'd be:

http://coursegraph.openedx.example.edu

When running in development mode, you'll also need to specify port 7474. For example:

http://coursegraph.local.overhang.io:7474

Using the Web interface

The Connect URL should already be populated correctly. If it's not, here's what to enter, substituting openedx.example.edu for your own LMS root domain:

Mode Connect URL
tutor k8s bolt://bolt.coursegraph.openedx.example.edu:80
tutor local bolt://bolt.coursegraph.openedx.example.edu:80
tutor dev bolt://coursegraph.local.overhang.io:7687

By default, you can log in using Username / Password authentication:

Username "neo4j"
Password $(tutor config printvalue COURSEGRAPH_NEO4J_PASSWORD)

If you set COURSEGRAPH_NEO4J_PASSWORD to null before initializing CourseGraph, then instead select No Authentication.

Now that you're in, try querying your courses!

The Neo4j Web interface can be used to visualize relationships between blocks in a course. Here, the query "MATCH (course)-[:PARENT_OF*]->(p:problem) WHERE p.data CONTAINS 'jsinput' RETURN * LIMIT 50" is used to visualize problem blocks that use custom JavaScript, along with their ancestry.

Operations

Operating CourseGraph is fairly straightforward, especially if you treat CourseGraph data as a non-critical secondary view into the CMS's course data. That is: you should be willing to completely drop and re-generate the CourseGraph data stord in Neo4j. By doing so, you avoid needing to back up Neo4j, and you de-risk the Neo4j schema version upgrades that you'll need to perform over time with new Open edX releases.

Managing Data

By default, this plugin configures CMS to dump each course to CourseGraph whenever it is published, allowing you to "set and forget" the tool. You can disable this behavior by setting COURSEGRAPH_DUMP_COURSE_ON_PUBLISH to false, regenerating your Tutor environment, and restarting CMS.

If you have disabled automatic dumping, then you'll need to periodically refresh the data in CourseGraph manually. You can do so via the CMS administration console at, under the COURSE GRAPH COURSE DUMPS page in the COURSE GRAPH app:

CourseGraph dump page in CMS admin console, demonstrating that individual courses can be selected for dump CourseGraph dump page in CMS admin console, showing message after course dumps are successfully enqueued

Alternatively, you can skip the admin console by using a CMS management command:

tutor [dev|local|k8s] exec coursegraph ./manage.py cms dump_to_neo4j

Managing Ingress Access

By default, CourseGraph's Neo4j service is accessible to the outside world for HTTP and Bolt traffic via Caddy, the proxy provided by Tutor. Neo4j's bfasic authentication scheme (described in Managing Users) may or may not be sufficient for your security needs. If it is not sufficient, then you have two options:

  1. Use your own Web proxy in place of Tutor's Caddy service by setting ENABLE_WEB_PROXY=false. Via your Web proxy, restrict ingress CourseGraph access as desired. If you're not already using your own Web proxy, then this is probably not what you want to do, as it'd affect your entire Tutor deployment just for the sake of this plugin.
  2. Set COURSEGRAPH_RUN_NEO4J=false and self-manage a separate Neo4j service, whose ingress access you could control as desired. Then, configure this plugin to point at your self-managed Neo4j service so that CMS can dump to content to it. See the Configuration section for specifics. If you choose to go this route, the authors of this plugin recommend the official Neo4j docker image as a starting point for self-hosting Neo4j.

Managing Users

This plugin currently ships with Neo4j 3.5 Community Edition, which provides some very rudimentary authentication controls via the Neo4j Web console. The Neo4j 3.5 page on procedures for native user management describes the available operations. Take note that only four operations are available in the Community Edition:

  • CALL dbms.security.createUser(<username>, <password>, <requirePasswordChange>)
  • CALL dbms.security.changePassword(<password>, <requirePasswordChange>)
  • CALL dbms.security.deleteUser(<username>)
  • CALL dbms.security.listUsers()

Since authorization control is not availble in the Neo4j Community Edition, all users will have full administrative control over Neo4j data, including ability to to read all graphs, modify nodes, modify relationships, and create new users. Because of this, access to any set of CourseGraph Neo4j credentials confers read access to your entire course catalog, as well as the ability to insert fake data into CourseGraph. As such, if your CourseGraph instance is accessible by the public, then all Neo4j credential sets should be treated as production secrets.

By default, this plugin initializes Neo4j with one user, whose username is neo4j and whose password is set from the COURSEGRAPH_NEO4J_PASSWORD Tutor setting. When CMS pushes data to Neo4j, it also authenticates with neo4j as its username and COURSEGRAPH_NEO4J_PASSWORD as its password. Keep in mind that:

  • If the neo4j user is destroyed, CMS will be unable to dump new data to Neo4j until a new neo4j user is created.
  • Once CourseGraph is initialized, changing COURSEGRAPH_NEO4J_PASSWORD alone does not update the neo4j user's password. You will need to use CALL dbms.security.changePassword(...) from the Neo4j Web console for the password change to take effect.

Finally, initializing CourseGraph with COURSEGRAPH_NEO4J_PASSWORD set to !!null disables authentication entirely. Anyone with access to CourseGraph will have full read and write access to the Neo4j database. Obviously, this is only suitable when CourseGraph endpoint access is restricted to trusted private network.

Configuration

The Tutor plugin can be configured with several settings. The names of all settings below are prefixed with COURSEGRAPH_.

CourseGraph Setting Type Default Description
NEO4J_PASSWORD str (20 random characters) Initial password set for neo4j user, as well as password used to by CMS to authenticate as neo4j user. See Managing Users for more details.
NEO4J_VERSION str "3.5.28" Version of Neo4j to use. Appended to default image. Overriding NEO4J_DOCKER_IMAGE annuls this setting.
NEO4J_DOCKER_IMAGE str "docker.io/neo4j:3.5.28" Neo4j Docker image to be pulled and used. By default, based on your DOCKER_REGISTRY and COURSEGRAPH_NEO4J_VERSION.
NEO4J_HOST str Prod: "coursegraph.www.openedx.com", Dev: "coursegraph.local.overhang.io" Hostname of CourseGraph. By default, based on your LMS_HOST.
DUMP_COURSE_ON_PUBLISH bool true Should CMS automatically dump a course to CourseGraph whenever it's published? If disabled, you will instead need to periodically dump courses via the management command or admin console.
RUN_NEO4J bool true Whether the Neo4j service should be started. Only disable if you wish to manage your own Neo4j instance outside of Tutor. If you disable this, then set COURSEGRAPH_NEO4J_HOST to the hostname of your self-managed instance, set COURSEGRAPH_NEO4J_PASSWORD to the password of your instance's neo4j user, and expose port 7687 for unencrypted Bolt traffic.

Development

Upgrade version pins:

make upgrade

Run just static checks:

make test-format test-lint test-types

Run all tests:

cp $(tutor config printroot)/config.yml tutor_config.bak.yml
make test  # clobbers some Tutor configuration
mv tutor_config.bak.yml $(tutor config printroot)/config.yml  # restore original config

License

This software is licensed under the terms of the AGPLv3.