Skip to content

Latest commit

 

History

History
76 lines (47 loc) · 3.29 KB

File metadata and controls

76 lines (47 loc) · 3.29 KB

Services dashboard

Time settings, including the overall time range and granularity of aggregations, can be changed in UTC time button in the top left.

Dashboard tiles:

  • Availability

    • (# Availability test failures) / (# Availability test runs)
    • Availability tests are run every 5 minutes by the health-monitor-timer-func in web-workers. The test verifies that we can:
      1. Ping the health endpoint to check that the service is available
      2. Submit a scan request
      3. Get the status of the submitted scan request
      4. Get the scan report once the scan completes
    • If any of these steps fails (either an http error code or an internal scan failure), an availability failure is sent. Otherwise, once all steps complete, an availability success is sent.
    • If an availability test exceeds the amount of time between tests, the next availability test will start running in parallel when the timer trigger is fired.
  • Reliability

    • (# Requests failed with 5xx status code) / (# Total requests)
    • This is calculated using all requests, including those generated by our availability tests and user requests
  • Performance

    • Scan performance as a percentage of cases where the scanExecutionTime (see notes on Scan duration graph) did not exceed the target
  • Scan duration (Seconds)

    • scanExecutionTime: The amount of time it took for the scan to run (starting when the batch worker begins the scan)
    • scanWaitTime: The time from when the scan was submitted to when the batch worker started the scan
    • scanTotalTime: The time from when the scan was submitted to when the scan completed and the report became available (scanExecutionTime + scanWaitTime)
  • API Response Time (Seconds)

    • Average response time for each azure function (including web-workers) during the time range
  • API Request Count

    • Total number of requests for each azure function (including web-workers) during the time range
  • Failed Requests By Function Name

    • Sum of failures by all azure functions (including web-workers) during the time range
    • Failure counts are aggregated over the set time granularity
  • API Response Codes Count

    • Count of API response codes by category
  • Scan Requests Accepted vs Rejected

    • The total number of submitted scan requests, as well as how many were accepted or rejected (for invalid urls)
    • Counts are aggregated over the set time granularity
  • Scan Requests Succeeded vs Completed

    • The total number of scans completed, as well as how many succeeded or failed
    • Counts are aggregated over the set time granularity
    • Note: If there are no failures for the given time range, ScanTaskFailure may not appear in the legend
  • Nodes State

    • State of all nodes in batch pools. Can be used to check batch pool availability. (This is the same as the Node States graph on the batch account overview)
  • Average pool Sampling Interval (Seconds)

    • Sampling interval used for polling batch tasks
  • Batch Running Tasks

    • Number of running pool tasks compared to the maximum number of allowed parallel tasks