Server-Side Query Blacklisting #6

Roaster05 · 2024-06-21T12:59:15Z

Implementation Plan for Blacklist Functionality in Elasticsearch

Overview

The goal is to implement blacklist functionality in Elasticsearch to prevent the execution of queries based on criteria such as execution time thresholds. This requires integrating ClusterState for distributed state management, task delegation to master nodes for updates, and seamless integration into search request handling and response processing.

Components and Classes

ClusterState Integration
- Introduce a new parameter in ClusterState to manage the blacklist across all nodes.
- Implement methods to update ClusterState with new blacklist entries and synchronize changes across the cluster.
Blacklist Class
- Define Blacklist class:
  - BlacklistEntry: Represents details of a blacklisted query including identifier ID, hashed query code, execution time, and timestamp.
  - Methods for adding new entries, checking if a query should be blocked, and updating the blacklist state.
BlacklistData Class
- Implement BlacklistData to manage:
  - Local storage of blacklisted queries on each node.
  - Methods for adding entries to ClusterState, checking query status against the local blacklist, and synchronizing with ClusterState updates.
  - Scheduler to perform decay logic, removing entries that exceed a certain age threshold.
BlacklistUpdateTask
- Create BlacklistUpdateTask:
  - Triggered when a query exceeds defined thresholds.
  - Responsible for updating ClusterState on the master node to reflect local blacklist changes.
  - Ensure synchronization and consistency across the cluster.
BlacklistUpdateRequest/Response
- Define communication objects (BlacklistUpdateRequest, BlacklistUpdateResponse) between nodes and master nodes for updating the blacklist.
- Handle acknowledgments and failures for ClusterState updates.
Settings
- Configure boolean settings to enable/disable blacklist functionality.
- Define threshold settings for execution time limits before queries are blacklisted.
Exception Handling
- Implement BlacklistException to throw when a query is blocked due to blacklisting, ensuring proper error handling and response management.
Integration into Elasticsearch Components
- Search Request Handling:
  - Modify RestSearchAction:
    - Extract query details and identifiers.
    - Validate against BlacklistData to determine if query execution should proceed.
    - Throw BlacklistException if query is blocked.
  - Adjust TransportSearchAction, AbstractSearchAsyncAction, SearchPhaseContext, and SearchResponse to incorporate query identifiers and integrate with BlacklistData for real-time blacklist checks during search execution.
- ClusterState Update Flow:
  - When a query execution exceeds defined thresholds:
    - Add the query to local BlacklistData.
    - Set a flag (lock) to indicate the need for ClusterState update.
    - Trigger BlacklistUpdateTask to update ClusterState on the master node.
    - Propagate changes across the cluster and ensure consistency in blacklist data.
BigArrayTracker
- Implement BigArrayTracker to maintain the memory used by queries during the aggregation phase.
- Methods to initialize, update, and remove entries based on memory usage.
- Remove entries with the highest memory usage when a circuit breaker is triggered, and add them to the blacklist.
API Endpoints
- Cluster Blacklist State API:
  - Define a new API endpoint (_cluster/state/clusterblacklist) to retrieve the current state of the cluster blacklist. This can be used for implementing a visualization tool for blacklisted queries.
- Unblacklist API:
  - Implement RestUnblacklistIdentifierAction to allow unblacklisting of identifiers via API calls, updating both local and cluster-level blacklist storage.
  - API routes:
    - POST /unblacklist
    - POST /unblacklist/{identifier}

ClusterState Update Flow

When a query execution exceeds defined thresholds:
- Add the query to local BlacklistData.
- Set a flag (lock) to indicate the need for ClusterState update.
- Trigger BlacklistUpdateTask to update ClusterState on the master node.
- Propagate changes across the cluster and ensure consistency in blacklist data.
When a Circuit Breaking Exception triggers:
- Implement handling in BigArrayTracker to manage memory usage.
- When memory usage surpasses limits, invoke checkParentLimit() in CircuitBreaker to assess the situation.
- Remove the entry with the highest memory usage from BigArrayTracker.
- Blacklist the query associated with the removed entry, adding it to BlacklistData.

Roaster05 added 11 commits June 3, 2024 23:25

task2-initialised

cdbbcfa

added enable/disable setting

0c2c342

partial-cluster-based-cache

0c08f0b

Cluster based cache implemented

50677ea

Refactored

6e9406d

optimization on transport request

d29309c

fixed

7879f7b

added blacklist to persisted cluster state and fixed reset option

d939457

optimized inflight cluster state response

f6900c1

changed to hashed query and configured cluster state response

df5c93b

added scheduler for decaying entries based on some age factor

910e2d0

Roaster05 closed this Jun 21, 2024

Roaster05 reopened this Jun 21, 2024

Roaster05 added 3 commits June 22, 2024 15:00

refractored

69a502a

fixed extra changes

3a59499

fix on hashing

28c57f0

Roaster05 changed the title ~~Blacklist final~~ Blacklisting Queries on Server Side Jun 24, 2024

Roaster05 added 6 commits June 27, 2024 01:01

[DRAFT] BigArray Usecase and unblacklist API

cb9c775

refractored

124a792

minor fixes

38fedf3

minor fixes

cf7b578

added Javadoc's for prominent methods and classes

3404b97

minor fix

f01082c

Roaster05 changed the title ~~Blacklisting Queries on Server Side~~ Server-Side Query Blacklisting Jul 1, 2024

Roaster05 added 3 commits July 1, 2024 17:53

added testing scripts

5300436

minor fix

a2e5ba7

minor fix

e50d9fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server-Side Query Blacklisting #6

Server-Side Query Blacklisting #6

Roaster05 commented Jun 21, 2024 •

edited

Loading

Server-Side Query Blacklisting #6

Are you sure you want to change the base?

Server-Side Query Blacklisting #6

Conversation

Roaster05 commented Jun 21, 2024 • edited Loading

Implementation Plan for Blacklist Functionality in Elasticsearch

Overview

Components and Classes

ClusterState Update Flow

Roaster05 commented Jun 21, 2024 •

edited

Loading