Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: collect detailed logs for tests in datadog[infeng-752] #9637

Merged
merged 84 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
a85d3b7
wip: agent testing
djanicekpach Jun 25, 2024
47b6084
syntax
djanicekpach Jun 25, 2024
0d2e66d
debug
djanicekpach Jun 25, 2024
eda783a
try config
djanicekpach Jun 25, 2024
c32321d
config permissions
djanicekpach Jun 25, 2024
bd71691
config permissions
djanicekpach Jun 25, 2024
febb27e
config update
djanicekpach Jun 25, 2024
575fae6
small syntax typo
djanicekpach Jun 25, 2024
cb29e93
make dir
djanicekpach Jun 25, 2024
7b11714
datadog config permission
djanicekpach Jun 25, 2024
d123d3c
syntax
djanicekpach Jun 25, 2024
906b07d
more permissions stuff
djanicekpach Jun 25, 2024
ebcf4bb
config test
djanicekpach Jun 25, 2024
c0fa5a9
reorder config
djanicekpach Jun 25, 2024
4ac57e9
reorder
djanicekpach Jun 25, 2024
eed3d5c
try quoting tags
djanicekpach Jun 25, 2024
af2555a
tag formatting
djanicekpach Jun 25, 2024
cb977ed
syntax
djanicekpach Jun 25, 2024
e1636c0
remove quotes
djanicekpach Jun 25, 2024
49965f0
remove url
djanicekpach Jun 25, 2024
20176fd
remove branch
djanicekpach Jun 25, 2024
ab6b38a
permissions
djanicekpach Jun 26, 2024
01e45f6
enable log collection
djanicekpach Jun 26, 2024
4551a26
add better tags
djanicekpach Jun 26, 2024
ee87b96
debug tracing
djanicekpach Jun 27, 2024
bcdd90f
try collecting logs form file
djanicekpach Jun 27, 2024
0254905
folder structure
djanicekpach Jun 27, 2024
7b45e60
folder permissions
djanicekpach Jun 28, 2024
0d01529
fixing quotes
djanicekpach Jun 28, 2024
217ab4c
syntax
djanicekpach Jun 28, 2024
f171c02
datadog debug
djanicekpach Jul 3, 2024
6b577d6
debug
djanicekpach Jul 3, 2024
7f75b30
move config folder
djanicekpach Jul 3, 2024
74579d3
fix mkdir location too
djanicekpach Jul 3, 2024
b37b248
filepath fixes
djanicekpach Jul 3, 2024
8be959d
debug
djanicekpach Jul 3, 2024
b482e5f
debug status
djanicekpach Jul 3, 2024
76a4334
ddtrace settings tweaks
djanicekpach Jul 3, 2024
9aca2a4
format
djanicekpach Jul 3, 2024
150654f
rename tags
djanicekpach Jul 3, 2024
e076604
format
djanicekpach Jul 3, 2024
b36d4c5
redundant variable
djanicekpach Jul 3, 2024
f35d536
debug logging
djanicekpach Jul 5, 2024
c242cae
config debug
djanicekpach Jul 5, 2024
85254ad
logging testing
djanicekpach Jul 5, 2024
d9678d9
debug
djanicekpach Jul 5, 2024
b5d559b
formatting
djanicekpach Jul 5, 2024
7cafb06
get task logs on failure too
djanicekpach Jul 5, 2024
3d0cccd
debugging
djanicekpach Jul 5, 2024
d6e5562
change log service
djanicekpach Jul 5, 2024
a30d5e6
set service field by test suite
djanicekpach Jul 5, 2024
b6fafd0
debug
djanicekpach Jul 5, 2024
0a410d1
test no apm
djanicekpach Jul 5, 2024
a9ebe19
debug config
djanicekpach Jul 5, 2024
67532cf
wrong log folder
djanicekpach Jul 5, 2024
1bb0edd
try no moitoring
djanicekpach Jul 8, 2024
e7cdd70
remove apm
djanicekpach Jul 8, 2024
2bcaa06
remove apm
djanicekpach Jul 8, 2024
d2bbb73
add additional logs
djanicekpach Jul 8, 2024
6f45e62
log permissisons
djanicekpach Jul 8, 2024
b498bd0
typo
djanicekpach Jul 8, 2024
8a56715
turn off container and process monitor
djanicekpach Jul 8, 2024
83b5df1
manually stop agent
djanicekpach Jul 8, 2024
28a8374
turn off container logs
djanicekpach Jul 8, 2024
5e16819
parameterize DD
djanicekpach Jul 8, 2024
b6836d8
stop needs parameter too
djanicekpach Jul 8, 2024
f15fd98
syntax
djanicekpach Jul 8, 2024
5c4a790
try env var
djanicekpach Jul 8, 2024
3bbc7f5
test env var
djanicekpach Jul 8, 2024
c6f6fa8
env var debug
djanicekpach Jul 8, 2024
e48587d
test flag
djanicekpach Jul 8, 2024
034bd0a
debug
djanicekpach Jul 8, 2024
177afae
extra parameter
djanicekpach Jul 8, 2024
fb8c725
debug pipeline
djanicekpach Jul 9, 2024
371c824
remove unused system-probe
djanicekpach Jul 10, 2024
fb69757
always collect logs
djanicekpach Jul 10, 2024
768612c
unbreak debug tes
djanicekpach Jul 11, 2024
adea345
eof newlines
djanicekpach Jul 11, 2024
e5045f5
cleanup config for ease of reading
djanicekpach Jul 16, 2024
afe833d
no agent on docker
djanicekpach Jul 17, 2024
9ab8c76
datadog only localhost
djanicekpach Jul 17, 2024
5a4bd14
fix condition
djanicekpach Jul 17, 2024
85c45d9
change config ordering
djanicekpach Jul 17, 2024
46855eb
Revert "change config ordering"
djanicekpach Jul 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions .circleci/datadog/ci-local-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Config file taken from https://github.com/DataDog/datadog-agent/blob/main/pkg/config/config_template.yaml
# These values are overridden by environment variables:
# api_key
# dd_site
# dd_url
# DD_TAGS
# DD_EXTRA_TAGS
# DD_ENV
# apm is explicitly disabled here for cost reasons


##################################
## Log collection Configuration ##
##################################

## @param logs_enabled - boolean - optional - default: false
## @env DD_LOGS_ENABLED - boolean - optional - default: false
## Enable Datadog Agent log collection by setting logs_enabled to true.
#
logs_enabled: true

## @param logs_config - custom object - optional
## Enter specific configurations for your Log collection.
## Uncomment this parameter and the one below to enable them.
## See https://docs.*************/agent/logs/
#
logs_config:

## @param container_collect_all - boolean - optional - default: false
## @env DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL - boolean - optional - default: false
## Enable container log collection for all the containers (see ac_exclude to filter out containers)
#
container_collect_all: true


####################################
## Trace Collection Configuration ##
####################################

## @param apm_config - custom object - optional
## Enter specific configurations for your trace collection.
## Uncomment this parameter and the one below to enable them.
## See https://docs.*************/agent/apm/
#
apm_config:

## @param enabled - boolean - optional - default: true
## @env DD_APM_ENABLED - boolean - optional - default: true
## Set to true to enable the APM Agent.
#
enabled: false

######################################
## Process Collection Configuration ##
######################################

# @param process_config - custom object - optional
# Enter specific configurations for your Process data collection.
# Uncomment this parameter and the one below to enable them.
# See https://docs.*************/graphing/infrastructure/process/

process_config:

# @param process_collection - custom object - optional
# Specifies settings for collecting processes.
process_collection:
# @param enabled - boolean - optional - default: false
# Enables collection of information about running processes.
enabled: false

# @param container_collection - custom object - optional
# Specifies settings for collecting containers.
container_collection:
# @param enabled - boolean - optional - default: true
# Enables collection of information about running containers.
enabled: false

# Deprecated - use `process_collection.enabled` and `container_collection.enabled` instead
# @param enabled - string - optional - default: "false"
# @env DD_PROCESS_CONFIG_ENABLED - string - optional - default: "false"
# A string indicating the enabled state of the Process Agent:
# * "false" : The Agent collects only containers information.
# * "true" : The Agent collects containers and processes information.
# * "disabled" : The Agent process collection is disabled.

enabled: "false"
loksonarius marked this conversation as resolved.
Show resolved Hide resolved

# @param process_discovery - custom object - optional
# Specifies custom settings for the `process_discovery` object.
process_discovery:
# @param enabled - boolean - optional - default: true
# Toggles the `process_discovery` check. If enabled, this check gathers information about running integrations.
enabled: false

# @param interval - duration - optional - default: 4h - minimum: 10m
# An interval in hours that specifies how often the process discovery check should run.
interval: 10m


###########################
## Logging Configuration ##
###########################

## @param log_level - string - optional - default: info
## @env DD_LOG_LEVEL - string - optional - default: info
## Minimum log level of the Datadog Agent.
## Valid log levels are: trace, debug, info, warn, error, critical, and off.
## Note: When using the 'off' log level, quotes are mandatory.
#
log_level: 'debug'

## @param log_file - string - optional
## @env DD_LOG_FILE - string - optional
## Path of the log file for the Datadog Agent.
## See https://docs.*************/agent/guide/agent-log-files/
#
log_file: /tmp/artifacts/logs/dd-agent-log.txt


13 changes: 13 additions & 0 deletions .circleci/datadog/e2e-log-settings.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
logs:
- type: file
path: "/tmp/artifacts/logs/*.log"
service: "<SERVICE_NAME>"
source: "determined-task-logs"
- type: file
path: "/tmp/devcluster/*.log"
service: "<SERVICE_NAME>"
source: "devcluster-logs"
- type: file
path: "/tmp/priority_scheduler/*.log"
service: "<SERVICE_NAME>"
source: "devcluster-priority-scheduler-logs"
95 changes: 84 additions & 11 deletions .circleci/real_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -151,8 +151,8 @@ commands:
description: Collect logs from the cluster tasks.
steps:
- run:
when: on_fail
name: "Ensure necessary Python packages are available."
when: always
command: |
pkg_names="fire determined"
for pkg_name in $pkg_names; do
Expand All @@ -161,8 +161,8 @@ commands:
fi
done
- run:
when: on_fail
name: "Collect logs and calculate statistics"
when: always
command: |
target_dir="<< parameters.store_path >>"
mkdir -p $target_dir
Expand Down Expand Up @@ -626,6 +626,50 @@ commands:
type: boolean
default: true
steps:
- when:
condition:
and:
- equal: [<<parameters.master-host>>,'localhost']
- <<parameters.wait-for-master>>
- not: <<parameters.managed-devcluster>>
steps:
- run:
name: Install DataDog agent
command: |
if [ "$AIS_DD_ENABLE_MONITORING" == "true" ]; then
host_tags="test.mark:<<parameters.mark>>,\
ci.pipeline_id:${CIRCLE_PIPELINE_ID},\
ci.workflow_id:${CIRCLE_WORKFLOW_ID},\
ci.job_num:${CIRCLE_BUILD_NUM},\
ci.username:${CIRCLE_USERNAME},\
git.tag:${CIRCLE_TAG},\
git.commit:${CIRCLE_SHA1},\
git.repo:${CIRCLE_PROJECT_REPONAME},\
ci.totalNodes:${CIRCLE_NODE_TOTAL},\
ci.nodeIdx:${CIRCLE_NODE_INDEX},\
git.pr_num:${CIRCLE_PR_NUMBER}"

sudo mkdir -p /tmp/artifacts/logs
sudo chmod -R a+rw /tmp/artifacts/logs

DD_ENV="ci-${CIRCLE_JOB}" \
DD_HOST_TAGS="$host_tags" \
DD_SERVICE="determined-pytest-<<parameters.mark>>" \
bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script_agent7.sh)"

# config files for the agent have an expected file structure
sudo mkdir -p /etc/datadog-agent/conf.d/determined-master.d/
sudo chmod a+rw /etc/datadog-agent/datadog.yaml
sudo chmod -R a+rw /etc/datadog-agent/conf.d/determined-master.d/
sudo cat .circleci/datadog/ci-local-config.yaml >> /etc/datadog-agent/datadog.yaml
sudo sed -e "s/<SERVICE_NAME>/determined-pytest-<<parameters.mark>>/g" .circleci/datadog/e2e-log-settings.yaml > /etc/datadog-agent/conf.d/determined-master.d/conf.yaml
# restart agent with config
sudo usermod -a -G docker dd-agent
sudo systemctl stop datadog-agent
sudo systemctl start datadog-agent
sleep 5
sudo datadog-agent status
fi
# Wait for master before splitting tests, since so many splits depend on
# asking master for its configuration in order to apply skipifs.
- when:
Expand Down Expand Up @@ -662,24 +706,38 @@ commands:
echo "No Determined master listening on '<<parameters.master-scheme>>://<<parameters.master-host>>:<<parameters.master-port>>'"
fi

cat /tmp/all-relevant-files | circleci tests run --command="DD_CIVISIBILITY_AGENTLESS_ENABLED=true \
DD_SITE='datadoghq.com' \
DD_ENV='ci-<<parameters.mark>>' DD_SERVICE='determined-pytest-<<parameters.mark>>' \
tags="test.mark:<<parameters.mark>>,\
ci.pipeline_id:${CIRCLE_PIPELINE_ID},\
ci.workflow_id:${CIRCLE_WORKFLOW_ID},\
ci.job_num:${CIRCLE_BUILD_NUM},\
ci.username:${CIRCLE_USERNAME},\
git.tag:${CIRCLE_TAG},\
git.commit:${CIRCLE_SHA1},\
ci.totalNodes:${CIRCLE_NODE_TOTAL},\
ci.nodeIdx:${CIRCLE_NODE_INDEX},\
git.pr_num:${CIRCLE_PR_NUMBER}"

CMD="DD_CIVISIBILITY_AGENTLESS_ENABLED=true \
DD_TAGS='${tags}' \
DD_ENV='ci-<<parameters.mark>>' \
DD_SERVICE='determined-pytest-<<parameters.mark>>' \
DET_MASTER_CERT_FILE=<<parameters.master-cert>> \
DET_MASTER_CERT_NAME=<<parameters.master-cert-name>> \
IS_CIRCLECI_JOB=1 XDG_CONFIG_HOME=/tmp \
xargs pytest --capture=tee-sys -vv \
-m '<<parameters.mark>>' \
--durations=0 \
--ddtrace \
--master-scheme="<<parameters.master-scheme>>" \
--master-host="<<parameters.master-host>>" \
--master-port="<<parameters.master-port>>" \
--master-scheme='<<parameters.master-scheme>>' \
--master-host='<<parameters.master-host>>' \
--master-port='<<parameters.master-port>>' \
-o junit_family=xunit1 \
--junit-xml="<<parameters.junit-path>>" \
<<parameters.extra-pytest-flags>>" \
--verbose --split-by=timings
--junit-xml='<<parameters.junit-path>>' \
<<parameters.extra-pytest-flags>>"

echo "$CMD"
cat /tmp/all-relevant-files | circleci tests run --command="$CMD" \
--verbose --split-by=timings
pytest_status=$?
echo Pytest exited with $pytest_status
exit $pytest_status
Expand All @@ -694,6 +752,21 @@ commands:
master_address: "<<parameters.master-scheme>>://<<parameters.master-host>>:<<parameters.master-port>>"
- store_artifacts:
path: /tmp/artifacts/logs
- when:
condition:
and:
- equal: [<<parameters.master-host>>,'localhost']
- <<parameters.wait-for-master>>
- not: <<parameters.managed-devcluster>>
steps:
- run: # We don't know how long Circle leaves these machines running in the background. Take down the agent for safety.
name: Stop DataDog agent
when: always
command: |
if [ "$AIS_DD_ENABLE_MONITORING" == "true" ]; then
sudo systemctl stop datadog-agent || true
fi


run-det-deploy-tests:
parameters:
Expand Down
Loading