Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server-Side Query Blacklisting #6

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open

Server-Side Query Blacklisting #6

wants to merge 23 commits into from

Conversation

Roaster05
Copy link
Owner

@Roaster05 Roaster05 commented Jun 21, 2024

Implementation Plan for Blacklist Functionality in Elasticsearch

Overview

The goal is to implement blacklist functionality in Elasticsearch to prevent the execution of queries based on criteria such as execution time thresholds. This requires integrating ClusterState for distributed state management, task delegation to master nodes for updates, and seamless integration into search request handling and response processing.

Components and Classes

  1. ClusterState Integration

    • Introduce a new parameter in ClusterState to manage the blacklist across all nodes.
    • Implement methods to update ClusterState with new blacklist entries and synchronize changes across the cluster.
  2. Blacklist Class

    • Define Blacklist class:
      • BlacklistEntry: Represents details of a blacklisted query including identifier ID, hashed query code, execution time, and timestamp.
      • Methods for adding new entries, checking if a query should be blocked, and updating the blacklist state.
  3. BlacklistData Class

    • Implement BlacklistData to manage:
      • Local storage of blacklisted queries on each node.
      • Methods for adding entries to ClusterState, checking query status against the local blacklist, and synchronizing with ClusterState updates.
      • Scheduler to perform decay logic, removing entries that exceed a certain age threshold.
  4. BlacklistUpdateTask

    • Create BlacklistUpdateTask:
      • Triggered when a query exceeds defined thresholds.
      • Responsible for updating ClusterState on the master node to reflect local blacklist changes.
      • Ensure synchronization and consistency across the cluster.
  5. BlacklistUpdateRequest/Response

    • Define communication objects (BlacklistUpdateRequest, BlacklistUpdateResponse) between nodes and master nodes for updating the blacklist.
    • Handle acknowledgments and failures for ClusterState updates.
  6. Settings

    • Configure boolean settings to enable/disable blacklist functionality.
    • Define threshold settings for execution time limits before queries are blacklisted.
  7. Exception Handling

    • Implement BlacklistException to throw when a query is blocked due to blacklisting, ensuring proper error handling and response management.
  8. Integration into Elasticsearch Components

    • Search Request Handling:

      • Modify RestSearchAction:
        • Extract query details and identifiers.
        • Validate against BlacklistData to determine if query execution should proceed.
        • Throw BlacklistException if query is blocked.
      • Adjust TransportSearchAction, AbstractSearchAsyncAction, SearchPhaseContext, and SearchResponse to incorporate query identifiers and integrate with BlacklistData for real-time blacklist checks during search execution.
    • ClusterState Update Flow:

      • When a query execution exceeds defined thresholds:
        • Add the query to local BlacklistData.
        • Set a flag (lock) to indicate the need for ClusterState update.
        • Trigger BlacklistUpdateTask to update ClusterState on the master node.
        • Propagate changes across the cluster and ensure consistency in blacklist data.
  9. BigArrayTracker

    • Implement BigArrayTracker to maintain the memory used by queries during the aggregation phase.
    • Methods to initialize, update, and remove entries based on memory usage.
    • Remove entries with the highest memory usage when a circuit breaker is triggered, and add them to the blacklist.
  10. API Endpoints

    • Cluster Blacklist State API:

      • Define a new API endpoint (_cluster/state/clusterblacklist) to retrieve the current state of the cluster blacklist. This can be used for implementing a visualization tool for blacklisted queries.
    • Unblacklist API:

      • Implement RestUnblacklistIdentifierAction to allow unblacklisting of identifiers via API calls, updating both local and cluster-level blacklist storage.
      • API routes:
        • POST /unblacklist
        • POST /unblacklist/{identifier}

ClusterState Update Flow

  • When a query execution exceeds defined thresholds:

    • Add the query to local BlacklistData.
    • Set a flag (lock) to indicate the need for ClusterState update.
    • Trigger BlacklistUpdateTask to update ClusterState on the master node.
    • Propagate changes across the cluster and ensure consistency in blacklist data.
  • When a Circuit Breaking Exception triggers:

    • Implement handling in BigArrayTracker to manage memory usage.
    • When memory usage surpasses limits, invoke checkParentLimit() in CircuitBreaker to assess the situation.
    • Remove the entry with the highest memory usage from BigArrayTracker.
    • Blacklist the query associated with the removed entry, adding it to BlacklistData.

@Roaster05 Roaster05 closed this Jun 21, 2024
@Roaster05 Roaster05 reopened this Jun 21, 2024
@Roaster05 Roaster05 changed the title Blacklist final Blacklisting Queries on Server Side Jun 24, 2024
@Roaster05 Roaster05 changed the title Blacklisting Queries on Server Side Server-Side Query Blacklisting Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant