Skip to content

Latest commit

 

History

History
150 lines (106 loc) · 16.3 KB

admin_guide.md

File metadata and controls

150 lines (106 loc) · 16.3 KB

Back to README

Admin Guide for Program & Events Dashboard

The Programs & Events Dashboard (outreachdashboard.wmflabs.org) is a web application designed to support the global Wikimedia community in organizing various programs, including edit-a-thons, education initiatives, and other events. See the Source Code and Phabricator Project for more details.

This guide provides an overview of the Program & Events Dashboard infrastructure, detailing the servers, tools, and third-party dependencies that power the system. It also provides resources for managing and troubleshooting the system.

Table of Contents

  1. Infrastructure Overview
  2. Monitoring and Logs
  3. Troubleshooting
  4. More Resources

Infrastructure Overview

The Program & Events Dashboard is hosted within the Wikimedia Cloud VPS project globaleducation, which provides the infrastructure for all servers, allowing the dashboard to run on virtual machines that are flexible and easily managed within Wikimedia Cloud.

The dashboard relies on several core servers and external tools to function. These components ensure that different tasks are isolated to avoid bottlenecks and improve system performance.

Servers

The dashboard operates on a distributed server architecture to handle web requests, process background jobs, and store application data. Each server is dedicated to specific roles, minimizing competition for resources and improving reliability by isolating potential bottlenecks and failures.

Below is a breakdown of the key servers and their roles within the infrastructure:

  1. Web Server

    • peony-web.globaleducation.eqiad1.wikimedia.cloud
      • Hosts the main web application and core Sidekiq processes using RVM (Ruby Version Manager), Phusion Passenger, and Apache.
      • Capistrano is used for deployments
      • Sidekiq processes hosted:
        • sidekiq-default: Manages frequently run tasks (e.g., adding courses to update queues).
        • sidekiq-constant: Handles transactional jobs (e.g., wiki edits, email notifications).
        • sidekiq-daily: Executes long-running daily update tasks.
  2. Sidekiq Servers: These dedicated servers handle the other Sidekiq processes to isolate bottlenecks and failures:

    • peony-sidekiq.globaleducation.eqiad1.wikimedia.cloud: Hosts sidekiq-long for long-running course updates with higher queue latency.
    • peony-sidekiq-medium.globaleducation.eqiad1.wikimedia.cloud: Hosts sidekiq-medium for typical course updates.
    • peony-sidekiq-3.globaleducation.eqiad1.wikimedia.cloud: Hosts sidekiq-short for short-running course updates.
  3. Database Server

    • peony-database.globaleducation.eqiad1.wikimedia.cloud: Stores program, user, and revision data. It supports the dashboard’s data queries and updates.
  4. Redis Server

    • p-and-e-dashboard-redis.globaleducation.eqiad1.wikimedia.cloud: Stores all task (job) details and is shared across all Sidekiq processes for task queuing and caching.

Integrated Toolforge Tools

  • wikiedudashboard
    The Dashboard uses this tool's PHP endpoints to query Wikimedia Replica databases for detailed revision and article data. The specific replica database the tool connects to is dependent on the wiki being queried. These endpoints support features like retrieving user contributions, identifying existing articles or revisions, and checking for deleted content. For example, the Dashboard uses the /revisions.php endpoint to fetch revisions by specific users within a time range, and /articles.php to verify the existence of articles or revisions. See replica.rb for implementation details.

    [Live Tool, Source Code]

  • Reference Counter API
    The Reference Counter API is used to retrieve the number of references in a specified revision ID from a Wiki. The Dashboard interacts with the API through the ReferenceCounterApi class, which handles requests for reference counts by revision ID and processes multiple revisions in batch. It's important to note that the ReferenceCounterApi class and the reference-counter Toolforge API do not support Wikidata, as it uses a different method for calculating references.

    [Live Tool, Source Code, Phabricator Documentation]

  • Suspected Plagiarism API
    This API is used to detect and report suspected plagiarism in course-related content. It leverages CopyPatrol to detect instances of potential plagiarism by comparing revisions of Wikipedia articles. The API then retrieves data on suspected plagiarism, which includes information such as the revision ID, the user responsible, and the article involved. The PlagiabotImporter class uses this data to identify recent instances of suspected plagiarism and match them with relevant revisions in the Dashboard's database. If a new case is found, an alert is generated for suspected plagiarism in course materials and sent to content experts for review.

    [Live Tool, Source Code]

  • Copypatrol
    A plagiarism detection tool, that allows you to see recent Wikipedia edits that are flagged as possible copyright violations. It is responsible for detecting instances of potential plagiarism by comparing revisions of Wikipedia articles.

    [Live Tool, Source Code, Documentation, Phabricator Project]

  • PagePile
    PagePile manages static lists of Wiki pages. The Dashboard utilizes it to fetch a permanent snapshot of article titles through PagePile IDs or URLs. This is integrated into the course creation process, where users can input PagePile IDs or URLs to define a set of articles for the course. The PagePileApi class is responsible for retrieving page titles from PagePile, ensuring the category's wiki is consistent with the PagePile data, and updating the system with the retrieved titles. The data is then used to scope course materials to specific articles - see pagepile_scoping.jsx.

    [Live Tool, Source Code, Documentation]

Other Integrated APIs and Third-Party Dependencies

  • PetScan
    The PetScan API is used in the Dashboard to integrate dynamic lists of articles based on user-defined queries. Users can enter PetScan IDs (PSIDs) or URLs to fetch a list of articles relevant to a course. The PetScanApi class handles retrieving the list of page titles associated with a given PSID by querying PetScan's API. This data is used for scoping course materials to specific sets of articles - see petscan_scoping.jsx, ensuring the Dashboard reflects the most up-to-date information from PetScan queries. The system ensures proper error handling for invalid or unreachable PSIDs to avoid disrupting the course creation process.

    [Source Code, Documentation]

  • WikiWho API
    The WikiWho API is used in the Dashboard to parse historical revisions of Wikipedia articles and track the provenance of each word in the article. This data is particularly useful for displaying authorship information, such as identifying who added, removed, or reintroduced specific tokens (words) across different revisions. The URLBuilder class constructs the necessary URLs to interact with the WikiWho API, allowing the Dashboard to fetch parsed article data and token-level authorship highlights. This data is then used in the ArticleViewer component to enhance the display of articles by showing detailed authorship information, providing insights into the contributions of different editors over time.

    [Source Code, Documentation]

  • WhoColor API
    The WhoColor API is used in the Dashboard to add color-coding to the authorship data provided by the WikiWho API. It enhances the parsed article revisions by highlighting each token (word) with a color corresponding to its original author, making it easier to visualize contributions. The Dashboard processes this color-coded data by using the highlightAuthors function, which replaces the span elements in the HTML with styled versions that include user-specific color classes. This allows the ArticleViewer component to display the article text with visual cues, highlighting which user contributed each part of the article, helping quick identification of the contributions of different authors.

    [Source Code, Documentation]

  • WikidataDiffAnalyzer
    The WikidataDiffAnalyzer gem is used to analyze differences between Wikidata revisions. It is utilized by the update_wikidata_stats.rb service to process a list of revision IDs and determine the changes made between them, such as diffs added, removed, or changed claims, references, and labels. The results of the analysis are serialized and stored in the summary field of Wikidata revisions, providing detailed statistics about the nature of the edits. This enables the Dashboard to track and display revision-level changes.

    [Source Code and Documentation]

  • Liftwing API
    The Liftwing API is used to fetch article quality and item quality data by making predictions about pages and edits using machine learning models. The Dashboard interacts with this API to assess the quality of articles and revisions, utilizing the LiftWingApi service to retrieve scores and features associated with each revision. The article_finder_action.js class is responsible for fetching and processing article data. It takes the revision IDs from fetched revision data and sends them to the LiftWing API for processing by calling the fetchPageRevisionScore function. The LiftWing API then processes the revision data and returns the quality scores for the articles.

    [Source Code, Documentation, Phabricator Project]

Monitoring and Logs

Toolforge

To view Kubernetes namespace details for a Toolforge tool, go to https://k8s-status.toolforge.org/namespaces/tool-toolName/, replacing toolName with the name of the tool.

Cloud VPS

Troubleshooting

Web Server Issues

  • Internal Server Error: Restart the web server.
  • Unresponsive Web Service:
    • Usually caused by high-activity events or surges in ongoing activity, leading to system overload.
      • Solution: Reboot the VM (instance) running the web server.
      • The web service typically recovers within a few hours.

Database Issues

  • Full Disk: Free up space by deleting temporary tables.
  • High-Edit / Long Courses Causing Errors:
    • Consider turning off the 'long' and 'very_long_update' queues.
  • Stuck Transactions: If results in the Rails server becoming unresponsive, restart MySQL.
  • Database Errors:
    • Verify that the app and database server versions are compatible.

Data Dumps and Recovery

  • Performing a Dump for a table:
    1. Put the database in innodb_force_recovery=1 mode.
      • Note: OPTIMIZE TABLE revisions; cannot run in recovery mode because the database is read-only.
    2. Start the dump process.
    3. Once the dump is complete, drop the table.
    4. Remove the database from recovery mode and restore the table.

Issues could also be caused by maintenance or outages in third-party dependencies or other services stated above.

More Resources