This package generates a variety of test data used for integration and performance testing of Open edX Aspects. Currently it populates the following datasets:
- xAPI statements, simulating those generated by event-routing-backends
- Course and learner data, simulating that generated by event-sink-clickhouse
The xAPI events generated match the current specifications of the Open edX event-routing-backends package, but are not yet maintained to advance alongside them so may be expected to fall out of sync over time. Almost all current statements are simulated, but statements that not yet used in Aspects reporting have been skipped.
Once an appropriate database has been created using Aspects, data can be generated in the following ways:
Useful for testing configuration, integration, and permissions, this uses batch POSTs to Ralph for xAPI statements, but still writes directly to ClickHouse for course and actor data. This is the slowest method, but exercises the largest surface area of the project.
Useful for getting a medium to large amount of data into the database to test configuration and view reports. xAPI statements are batched, other data is currently inserted one row at a time.
Useful for creating datasets that can be reused for checking performance
changes with the exact same data, and for extremely large tests. The files can
be generated locally or on any service supported by smart_open. They can then
optionally be imported to ClickHouse if written locally or to S3. They can also
be directly imported from S3 to ClickHouse at any time using the
load-db-from-s3
subcommand. This is by far the fastest method for large
scale tests.
A configuration file is required to run a test. If no file is given, a small test will be run using the default_config.yaml included in the project:
❯ xapi-db-load load-db
To specify a config file:
❯ xapi-db-load load-db --config_file private_configs/my_huge_test.yaml
There is also a sub-command for just performing a load of previously generated CSV data from S3:
❯ xapi-db-load load-db-from-s3 --config_file private_configs/my_s3_test.yaml
There are a number of different configuration options for tuning the output.
In addition to the documentation below, there are example settings files to
review in the example_configs
directory.
These settings apply to all backends, and determine the size and makeup of the test:
# Location where timing logs will be saved log_dir: logs # xAPI statements will be generated in batches, the total number of # statements is ``num_batches * batch_size``. The batch size is the number # of statements sent to the backend (Ralph POST, ClickHouse insert, etc.) num_batches: 3 batch_size: 100 # Overall start and end date for the entire run. All xAPI statements # will fall within these dates. Different courses will have different start # and end dates between these days, based on course_length_days below. start_date: 2014-01-01 end_date: 2023-11-27 # All courses will be this long, they will be fit between start_date and # end_date, therefore this must be less than end_date - start_date days. course_length_days: 120 # The number of organizations, courses will be evenly spread among these num_organizations: 3 # The number of learners to create, random subsets of these will be # "registered" for each course and have statements generated for them # between their registration date and the end of the course num_actors: 10 # How many of each size course to create. The sum of these is the total # number of courses created for the test. The keys are arbitrary, you can # name them whatever you like and have as many or few sizes as you like. # The keys must exactly match the definitions in course_size_makeup below. num_course_sizes: small: 1 medium: 1 ... # Course type configurations, how many of each type of object are created # for each course of this size. "actors" must be less than or equal to # "num_actors". Keys here must exactly match the keys in num_course_sizes. course_size_makeup: small: actors: 5 problems: 20 videos: 10 chapters: 3 sequences: 10 verticals: 20 forum_posts: 20 medium: actors: 7 problems: 40 videos: 20 chapters: 4 sequences: 20 verticals: 30 forum_posts: 40 ...
Generates gzipped CSV files to a local directory:
backend: csv_file csv_output_destination: logs/
Generates gzipped CSV files to remote location:
backend: csv_file # This can be anything smart-open can handle (ex. a local directory or # an S3 bucket etc.) but importing to ClickHouse using this tool only # supports S3 or compatible services like MinIO right now. # Note that this *must* be an s3:// link, https links will not work # https://pypi.org/project/smart-open/ csv_output_destination: s3://openedx-aspects-loadtest/logs/large_test/ # These settings are shared with the ClickHouse backend s3_key: s3_secret:
Generates gzipped CSV files to a remote location, then automatically loads them to ClickHouse:
backend: csv_file # csv_output_destination can be anything smart_open can handle, a local # directory or an S3 bucket etc., but importing to ClickHouse using this # tool only supports S3 or compatible services (ex: MinIO) right now # https://pypi.org/project/smart-open/ csv_output_destination: s3://openedx-aspects-loadtest/logs/large_test/ csv_load_from_s3_after: true # Note that this *must* be an https link, s3:// links will not work, # this must point to the same location as csv_output_destination. s3_source_location: https://openedx-aspects-loadtest.s3.amazonaws.com/logs/large_test/ # This also requires all of the ClickHouse backend variables!
Backend is only necessary if you are writing directly to ClickHouse, for
integrations with Ralph or CSV, use their backend
instead:
backend: clickhouse
Variables necessary to connect to ClickHouse, whether directly, through Ralph, or as part of loading CSV files:
# ClickHouse connection variables db_host: localhost # db_port is also used to determine the "secure" parameter. If the port # ends in 443 or 440, the "secure" flag will be set on the connection. db_port: 8443 db_username: ch_admin db_password: secret # Schema name for the xAPI schema db_name: xapi # Schema name for the event sink schema db_event_sink_name: event_sink # These S3 settings are shared with the CSV backend, but passed to # ClickHouse when loading files from S3 s3_key: <...> s3_secret: <...>
Variables necessary to send xAPI statements via Ralph:
backend: ralph_clickhouse lrs_url: http://ralph.tutor-nightly-local.orb.local/xAPI/statements lrs_username: ralph lrs_password: secret # This also requires all of the ClickHouse backend variables!
Variables necessary to run xapi-db-load load-db-from-s3
, which skips the
event generation process and just loads pre-existing CSV files from S3:
# Note that this must be an https link, s3:// links will not work s3_source_location: https://openedx-aspects-loadtest.s3.amazonaws.com/logs/large_test/ # This also requires all of the ClickHouse backend variables!
# Clone the repository git clone [email protected]:openedx/xapi-db-load.git cd xapi-db-load # Set up a virtualenv using virtualenvwrapper with the same name as the repo # and activate it mkvirtualenv -p python3.11 xapi-db-load
# Activate the virtualenv workon xapi-db-load # Grab the latest code git checkout main git pull # Install/update the dev requirements make requirements # Run the tests and quality checks (to verify the status before you make any # changes) make validate # Make a new branch for your changes git checkout -b <your_github_username>/<short_description> # Using your favorite editor, edit the code to make your change. vim ... # Run your new tests pytest ./path/to/new/tests # Run all the tests and quality checks make validate # Commit all your changes git commit ... git push # Open a PR and ask for review.
Start by going through the documentation (in progress!).
If you're having trouble, we have discussion forums at https://discuss.openedx.org where you can connect with others in the community.
Our real-time conversations are on Slack. You can request a Slack invitation, then join our community Slack workspace.
For anything non-trivial, the best path is to open an issue in this repository with as many details about the issue you are facing as you can provide.
https://github.com/openedx/xapi-db-load/issues
For more information about these options, see the Getting Help page.
The code in this repository is licensed under the AGPL 3.0 unless otherwise noted.
Please see LICENSE.txt for details.
Contributions are very welcome. Please read How To Contribute for details.
This project is currently accepting all types of contributions, bug fixes, security fixes, maintenance work, or new features. However, please make sure to have a discussion about your new feature idea with the maintainers prior to beginning development to maximize the chances of your change being accepted. You can start a conversation by creating a new issue on this repo summarizing your idea.
All community members are expected to follow the Open edX Code of Conduct.
The assigned maintainers for this component and other project details may be
found in Backstage. Backstage pulls this data from the catalog-info.yaml
file in this repo.
Please do not report security issues in public. Please email [email protected].