The Programs & Events Dashboard (outreachdashboard.wmflabs.org) is a web application designed to support the global Wikimedia community in organizing various programs, including edit-a-thons, education initiatives, and other events. See the Source Code and Phabricator Project for more details.
This guide provides an overview of the Program & Events Dashboard infrastructure, detailing the servers, tools, and third-party dependencies that power the system. It also provides resources for managing and troubleshooting the system.
The Program & Events Dashboard is hosted within the Wikimedia Cloud VPS project globaleducation, which provides the infrastructure for all servers, allowing the dashboard to run on virtual machines that are flexible and easily managed within Wikimedia Cloud.
The dashboard relies on several core servers and external tools to function. These components ensure that different tasks are isolated to avoid bottlenecks and improve system performance.
The dashboard operates on a distributed server architecture to handle web requests, process background jobs, and store application data. Each server is dedicated to specific roles, minimizing competition for resources and improving reliability by isolating potential bottlenecks and failures.
Below is a breakdown of the key servers and their roles within the infrastructure:
-
Web Server
peony-web.globaleducation.eqiad1.wikimedia.cloud
- Hosts the main web application and core Sidekiq processes using RVM (Ruby Version Manager), Phusion Passenger, and Apache.
- Capistrano is used for deployments
- Sidekiq processes hosted:
sidekiq-default
: Manages frequently run tasks (e.g., adding courses to update queues).sidekiq-constant
: Handles transactional jobs (e.g., wiki edits, email notifications).sidekiq-daily
: Executes long-running daily update tasks.
-
Sidekiq Servers: These dedicated servers handle the other Sidekiq processes to isolate bottlenecks and failures:
peony-sidekiq.globaleducation.eqiad1.wikimedia.cloud
: Hostssidekiq-long
for long-running course updates with higher queue latency.peony-sidekiq-medium.globaleducation.eqiad1.wikimedia.cloud
: Hostssidekiq-medium
for typical course updates.peony-sidekiq-3.globaleducation.eqiad1.wikimedia.cloud
: Hostssidekiq-short
for short-running course updates.
-
Database Server
peony-database.globaleducation.eqiad1.wikimedia.cloud
: Stores program, user, and revision data. It supports the dashboard’s data queries and updates.
-
Redis Server
p-and-e-dashboard-redis.globaleducation.eqiad1.wikimedia.cloud
: Stores all task (job) details and is shared across all Sidekiq processes for task queuing and caching.
-
wikiedudashboard
The Dashboard uses this tool's PHP endpoints to query Wikimedia Replica databases for detailed revision and article data. The specific replica database the tool connects to is dependent on the wiki being queried. These endpoints support features like retrieving user contributions, identifying existing articles or revisions, and checking for deleted content. For example, the Dashboard uses the/revisions.php
endpoint to fetch revisions by specific users within a time range, and/articles.php
to verify the existence of articles or revisions. See replica.rb for implementation details. -
Reference Counter API
The Reference Counter API is used to retrieve the number of references in a specified revision ID from a Wiki. The Dashboard interacts with the API through theReferenceCounterApi
class, which handles requests for reference counts by revision ID and processes multiple revisions in batch. It's important to note that theReferenceCounterApi
class and thereference-counter
Toolforge API do not support Wikidata, as it uses a different method for calculating references. -
Suspected Plagiarism API
This API is used to detect and report suspected plagiarism in course-related content. It leverages CopyPatrol to detect instances of potential plagiarism by comparing revisions of Wikipedia articles. The API then retrieves data on suspected plagiarism, which includes information such as the revision ID, the user responsible, and the article involved. ThePlagiabotImporter
class uses this data to identify recent instances of suspected plagiarism and match them with relevant revisions in the Dashboard's database. If a new case is found, an alert is generated for suspected plagiarism in course materials and sent to content experts for review. -
Copypatrol
A plagiarism detection tool, that allows you to see recent Wikipedia edits that are flagged as possible copyright violations. It is responsible for detecting instances of potential plagiarism by comparing revisions of Wikipedia articles.[Live Tool, Source Code, Documentation, Phabricator Project]
-
PagePile
PagePile manages static lists of Wiki pages. The Dashboard utilizes it to fetch a permanent snapshot of article titles through PagePile IDs or URLs. This is integrated into the course creation process, where users can input PagePile IDs or URLs to define a set of articles for the course. ThePagePileApi
class is responsible for retrieving page titles from PagePile, ensuring the category's wiki is consistent with the PagePile data, and updating the system with the retrieved titles. The data is then used to scope course materials to specific articles - see pagepile_scoping.jsx.
-
PetScan
The PetScan API is used in the Dashboard to integrate dynamic lists of articles based on user-defined queries. Users can enter PetScan IDs (PSIDs) or URLs to fetch a list of articles relevant to a course. ThePetScanApi
class handles retrieving the list of page titles associated with a given PSID by querying PetScan's API. This data is used for scoping course materials to specific sets of articles - see petscan_scoping.jsx, ensuring the Dashboard reflects the most up-to-date information from PetScan queries. The system ensures proper error handling for invalid or unreachable PSIDs to avoid disrupting the course creation process. -
WikiWho API
The WikiWho API is used in the Dashboard to parse historical revisions of Wikipedia articles and track the provenance of each word in the article. This data is particularly useful for displaying authorship information, such as identifying who added, removed, or reintroduced specific tokens (words) across different revisions. TheURLBuilder
class constructs the necessary URLs to interact with the WikiWho API, allowing the Dashboard to fetch parsed article data and token-level authorship highlights. This data is then used in theArticleViewer
component to enhance the display of articles by showing detailed authorship information, providing insights into the contributions of different editors over time. -
WhoColor API
The WhoColor API is used in the Dashboard to add color-coding to the authorship data provided by the WikiWho API. It enhances the parsed article revisions by highlighting each token (word) with a color corresponding to its original author, making it easier to visualize contributions. The Dashboard processes this color-coded data by using thehighlightAuthors
function, which replaces the span elements in the HTML with styled versions that include user-specific color classes. This allows theArticleViewer
component to display the article text with visual cues, highlighting which user contributed each part of the article, helping quick identification of the contributions of different authors. -
WikidataDiffAnalyzer
The WikidataDiffAnalyzer gem is used to analyze differences between Wikidata revisions. It is utilized by theupdate_wikidata_stats.rb
service to process a list of revision IDs and determine the changes made between them, such as diffs added, removed, or changed claims, references, and labels. The results of the analysis are serialized and stored in the summary field of Wikidata revisions, providing detailed statistics about the nature of the edits. This enables the Dashboard to track and display revision-level changes. -
Liftwing API
The Liftwing API is used to fetch article quality and item quality data by making predictions about pages and edits using machine learning models. The Dashboard interacts with this API to assess the quality of articles and revisions, utilizing theLiftWingApi
service to retrieve scores and features associated with each revision. Thearticle_finder_action.js
class is responsible for fetching and processing article data. It takes the revision IDs from fetched revision data and sends them to the LiftWing API for processing by calling thefetchPageRevisionScore
function. The LiftWing API then processes the revision data and returns the quality scores for the articles.
To view Kubernetes namespace details for a Toolforge tool, go to https://k8s-status.toolforge.org/namespaces/tool-toolName/, replacing toolName
with the name of the tool.
- Internal Server Error: Restart the web server.
- Unresponsive Web Service:
- Usually caused by high-activity events or surges in ongoing activity, leading to system overload.
- Solution: Reboot the VM (instance) running the web server.
- The web service typically recovers within a few hours.
- Usually caused by high-activity events or surges in ongoing activity, leading to system overload.
- Full Disk: Free up space by deleting temporary tables.
- High-Edit / Long Courses Causing Errors:
- Consider turning off the 'long' and 'very_long_update' queues.
- Stuck Transactions: If results in the Rails server becoming unresponsive, restart MySQL.
- Database Errors:
- Verify that the app and database server versions are compatible.
- Performing a Dump for a table:
- Put the database in
innodb_force_recovery=1
mode.- Note:
OPTIMIZE TABLE revisions;
cannot run in recovery mode because the database is read-only.
- Note:
- Start the dump process.
- Once the dump is complete, drop the table.
- Remove the database from recovery mode and restore the table.
- Put the database in
Issues could also be caused by maintenance or outages in third-party dependencies or other services stated above.