Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev to prod merge release ciprod01312022 #708

Merged
merged 195 commits into from
Feb 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
195 commits
Select commit Hold shift + click to select a range
eb2c5f3
separate build yamls for ci_prod branch (#415)
ganga1980 Aug 5, 2020
df29e35
re-enable adx path (#420)
vishiy Aug 6, 2020
bcc8506
Gangams/release changes (#419)
ganga1980 Aug 6, 2020
39534d6
fix for zero filled metrics (#423)
rashmichandrashekar Aug 6, 2020
5e0b429
consolidate windows agent image docker files (#422)
ganga1980 Aug 7, 2020
c5c28f0
Gangams/cluster creation scripts (#414)
ganga1980 Aug 13, 2020
d7a3750
fix: Pin to a particular version of ltsc2019 by SHA (#427)
bragi92 Aug 14, 2020
5e8de91
enable collecting npm metrics (optionally) (#425)
vishiy Aug 14, 2020
17e7ff8
Saaror patch 3 (#426)
saaror Aug 17, 2020
6c7c675
Gangams/add containerd support to windows agent (#428)
ganga1980 Aug 18, 2020
bac8a32
Gangams/arc k8s metrics (#413)
ganga1980 Aug 20, 2020
ab03640
fix: Reverting back to ltsc2019 tag (#429)
bragi92 Aug 21, 2020
af0f981
more kubelet metrics (#430)
vishiy Aug 27, 2020
7fc4d4c
fix nom issue when config is empty (#432)
vishiy Sep 1, 2020
281a77c
support multiple docker paths when docker root is updated thru knode …
vishiy Sep 1, 2020
d8d7f9f
Gangams/doc and other related updates (#434)
ganga1980 Sep 11, 2020
2d56087
add missing serviceprincipal in ps scripts (#435)
ganga1980 Sep 14, 2020
a28aaf0
fix telemetry bug (#436)
vishiy Sep 15, 2020
0062b32
Gangams/readmeupdates non aks 09162020 (#437)
ganga1980 Sep 16, 2020
1a7ef1c
Gangams/fix weird conflicts (#439)
ganga1980 Sep 16, 2020
bf75bf0
fix quote issue for the region (#441)
ganga1980 Sep 21, 2020
6287724
fix cpucapacity/limit bug (#442)
vishiy Sep 21, 2020
bd30a47
grwehner/pv-usage-metrics (#431)
gracewehner Sep 23, 2020
7304a6b
add new custom metric regions (#444)
vishiy Sep 23, 2020
2d8c03f
add 'Terminating' state (#443)
vishiy Sep 23, 2020
da06d76
Gangams/sept agent release tasks (#445)
ganga1980 Sep 25, 2020
5453054
grwehner/pv-collect-volume-name (#448)
gracewehner Sep 28, 2020
fe9f14d
Changes for september agent release (#449)
rashmichandrashekar Sep 30, 2020
f1657c6
Gangams/arc k8s related scripts, charts and doc updates (#450)
ganga1980 Oct 1, 2020
e6dad83
Install CA certs from wireserver (#451)
rashmichandrashekar Oct 1, 2020
23397ed
grwehner/pv-volume-name-in-mdm (#452)
gracewehner Oct 1, 2020
7562a96
Release changes for 10052020 release (#453)
vishiy Oct 5, 2020
4b47f44
Update onboarding_instructions.md (#456)
saaror Oct 12, 2020
3f86b23
chart update for sept2020 release (#457)
ganga1980 Oct 19, 2020
6203c3a
add missing version update in the script (#458)
ganga1980 Oct 19, 2020
5b15469
November release fixes - activate one agent, adx schema v2, win perf …
vishiy Oct 27, 2020
157ba20
remove hiphen for params in chart (#462)
vishiy Oct 28, 2020
7c448bc
Changes for cutting a new build for ciprod10272020 release (#460)
vishiy Oct 28, 2020
62b27d7
using latest stable version of msys2 (#465)
ganga1980 Oct 29, 2020
909cc16
fixing the windows-perf-dups (#466)
rashmichandrashekar Oct 29, 2020
d481c06
chart updates related to new microsoft/charts repo (#467)
ganga1980 Nov 6, 2020
aff1e13
Changes for creating 11092020 release (#468)
vishiy Nov 9, 2020
ca18850
MDM exception aggregation (#470)
rashmichandrashekar Nov 10, 2020
18c27dd
grwehner/mdm custom metric regions (#471)
gracewehner Nov 23, 2020
a5c12e9
updaitng rs limit to 1gb (#474)
rashmichandrashekar Dec 4, 2020
7453fd4
grwehner/pv inventory (#455)
gracewehner Dec 10, 2020
24b709f
Gangams/fix for build release pipeline issue (#476)
ganga1980 Dec 15, 2020
9061201
add pv fluentd plugin config to helm rs config (#477)
gracewehner Dec 15, 2020
064bc06
Gangams/fix rs ooming (#473)
ganga1980 Dec 16, 2020
9cb058c
Gangams/enable arc onboarding to ff (#478)
ganga1980 Dec 18, 2020
ef9d726
Convert PV type dictionary to json for telemetry so it shows up in lo…
gracewehner Jan 4, 2021
97bdb94
fix 2 windows tasks - 1) Dont log to termination log 2) enable ADX ro…
vishiy Jan 6, 2021
94237be
fix ci envvar collection in large pods (#483)
ganga1980 Jan 6, 2021
aacd496
grwehner/jan agent tasks (#481)
gracewehner Jan 7, 2021
148d739
updating fbit version and cpu limit (#485)
rashmichandrashekar Jan 8, 2021
bd33dd9
reverting to older version (#487)
rashmichandrashekar Jan 8, 2021
d5164d2
Gangams/add fbsettings configurable via configmap (#486)
ganga1980 Jan 11, 2021
908d9b0
Gangams/jan agent release tasks (#484)
ganga1980 Jan 11, 2021
8ede536
remove per container logs in ci (#488)
ganga1980 Jan 11, 2021
37e5218
updates for ciprod01112021 release (#489)
ganga1980 Jan 12, 2021
3c97af6
new yaml files (#491)
deagraw Jan 14, 2021
90e1a5b
Use cloud-specific instrumentation keys (#494)
daweim0 Jan 22, 2021
98b6d77
upgrade apt to latest version (#492)
ganga1980 Jan 22, 2021
ddcd3ee
Gangams/add support for extension msi for arc k8s cluster (#495)
ganga1980 Jan 27, 2021
0cd99e4
Gangams/arm template arc k8s extension (#496)
ganga1980 Jan 27, 2021
13521c5
Gangams/aks monitoring via policy (#497)
ganga1980 Feb 1, 2021
e4f36c7
revert to use operatingSystem from osImage for node os telemety (#498)
ganga1980 Feb 1, 2021
ec15ac1
Container log v2 schema changes (#499)
vishiy Feb 4, 2021
6031be8
Add priority class to the daemonsets (#500)
Michael-Sinz Feb 9, 2021
4212e1a
fix node metric issue (#502)
ganga1980 Feb 11, 2021
24644ce
Bug fixes for Feb release (#504)
rashmichandrashekar Feb 18, 2021
e56104c
Gangams/feb 2021 agent bug fix (#505)
ganga1980 Feb 23, 2021
e00b2aa
changes for release -ciprod02232021 (#506)
vishiy Feb 23, 2021
31f0e5f
Gangams/e2e test framework (#503)
ganga1980 Feb 23, 2021
91f954f
scrape new kubelet pod count metric name (#508)
gracewehner Feb 25, 2021
4a8ff23
Adding explicit json output to az commands as the script fails if az …
nyuen Mar 20, 2021
512e5c0
Gangams/arc proxy contract and token renewal updates (#511)
ganga1980 Mar 22, 2021
6b48b6a
doc updates for microsoft charts repo release (#512)
ganga1980 Mar 22, 2021
d93c680
Update enable-monitoring.sh (#514)
seenu433 Mar 23, 2021
4d386ce
Prometheus scraping from sidecar and OSM changes (#515)
rashmichandrashekar Mar 25, 2021
16936aa
add liveness timeout for exec (#518)
vishiy Mar 26, 2021
12964be
chart and other updates (#519)
rashmichandrashekar Mar 26, 2021
73548c0
Saaror osmdoc (#523)
saaror Apr 5, 2021
fea4ffa
telemetry bug fix (#527)
rashmichandrashekar Apr 6, 2021
e31cc87
Fix conflicting logrotate settings (#526)
gracewehner Apr 6, 2021
ca8fa12
bug fix (#528)
rashmichandrashekar Apr 6, 2021
1f6f6d2
Gangams/arc ev2 deployment (#522)
ganga1980 Apr 7, 2021
97678b6
added liveness and telemetry for telegraf (#517)
daweim0 Apr 9, 2021
63ea896
Windows metric fix (#530)
daweim0 Apr 13, 2021
42730a4
OSM doc update (#533)
rashmichandrashekar Apr 13, 2021
7ad52cd
Adding MDM metrics for threshold violation (#531)
rashmichandrashekar Apr 14, 2021
34d1f64
Rashmi/april agent 2021 (#538)
rashmichandrashekar Apr 21, 2021
fcc5048
add Read_from_Head config for all fluentbit tail plugins (#539)
gracewehner Apr 21, 2021
01e5529
fix programdata mount issue on containerd win nodes (#542)
ganga1980 Apr 22, 2021
b5d074a
Update sidecar mem limits (#541)
rashmichandrashekar Apr 22, 2021
5feeb3e
David/release 4 22 2021 (#544)
daweim0 Apr 22, 2021
1b2da4a
1m, 1m, 1s by default (#543)
daweim0 Apr 23, 2021
83e5816
David/aad stage 1 release (#556)
daweim0 May 12, 2021
8beabe3
Update ReleaseNotes.md (#558)
ganga1980 May 13, 2021
3805f44
Add wait time for telegraf and also force mdm egress to use tls 1.2 (…
vishiy May 19, 2021
7c5087f
partially disabled telegraf liveness probe check, we'll still have te…
daweim0 May 19, 2021
0d33489
changes for 05202021 release (#563)
daweim0 May 20, 2021
486acfd
Rashmi/jedi wireserver (#566)
rashmichandrashekar May 21, 2021
0fa350e
Update ReadMe.md (#565)
saaror May 21, 2021
c707539
Gangams/aad stage2 full switch to mdsd (#559)
ganga1980 May 22, 2021
959b455
Send perf metrics to MDM from windows daemonset (#568)
rashmichandrashekar May 27, 2021
e4da519
updating json gem to address CVE-2020-10663 (#567)
daweim0 May 28, 2021
49486a8
update recommended alerts readme (#570)
daweim0 May 28, 2021
ef23fc6
trying again to fix the json gem (#571)
daweim0 May 28, 2021
cfa804a
Addressing PR comments for - https://github.com/microsoft/Docker-Prov…
rashmichandrashekar Jun 1, 2021
0d3e4a1
Mem_Buf_limit is configurable via ConfigMap (#574)
tsubasaxZZZ Jun 10, 2021
50b99ff
add log rotation settings for fluentd logs (#577)
ganga1980 Jun 11, 2021
4cebe73
Gangams/release 06112021 (#578)
ganga1980 Jun 11, 2021
adabaf9
release note update (#579)
ganga1980 Jun 11, 2021
0c70120
Make sidecar fluentbit chunk size configurable (#573)
rashmichandrashekar Jun 14, 2021
a7a2d73
Fix vulnerabilities (#583)
vishiy Jun 15, 2021
154c11d
Windows build optimization (#582)
rashmichandrashekar Jun 16, 2021
68e90b6
fix windows build failure due to msys2 version
vishiy Jun 23, 2021
cf68a4f
Fix telegraf startup issue when endpoint is unreachable (#587)
rashmichandrashekar Jun 23, 2021
cd22753
revert fbit tail plugins defaults to std defaults (#586)
ganga1980 Jun 23, 2021
8c41a42
fixed another bug (#593)
daweim0 Jul 1, 2021
00f1a0d
feat: add new metrics to MDM for allocatable % calculation of cpu and…
bragi92 Jul 9, 2021
e1f9978
update adx sdk for perf issue (#601)
vishiy Jul 12, 2021
c9ade1b
remove md check
vishiy Jul 13, 2021
6e2732e
Gangams/release notes update for hotfix (#596)
ganga1980 Jul 13, 2021
6df299f
Cherry picking hotfix changes to ci_dev (#605)
rashmichandrashekar Jul 14, 2021
3b38337
release changes (#607)
rashmichandrashekar Jul 15, 2021
bcea7fc
Gangams/aad stage3 msi auth (#585)
ganga1980 Jul 19, 2021
13eb3a6
Gangams/remove chart version dependency (#589)
ganga1980 Jul 20, 2021
63f22d9
Gangams/july 2021 release tasks 3 (#613)
ganga1980 Jul 23, 2021
902c939
remove un-used output plugin (#614)
vishiy Jul 23, 2021
a76905a
fix telegraf telemetry and improve fluentd liveness (#611)
ganga1980 Jul 23, 2021
52612b5
Gangams/july 2021 release tasks 2 (#612)
ganga1980 Jul 23, 2021
5b5d048
Fix out_oms.go dependency vulnerabilities (#623)
gracewehner Aug 13, 2021
2a0f4ec
revert libsystemd0 update (#616)
ganga1980 Aug 13, 2021
45f35ae
updates for ci-prod release instructions (#619)
ganga1980 Aug 13, 2021
10b2ea6
cherry pick changes from ci_prod (#622)
ganga1980 Aug 13, 2021
ad31c55
Support az login for passwords starting with dash ('-') (#626)
vladimir-babichev Aug 14, 2021
57beb59
Gangams/add telemetry fbit settings (#628)
ganga1980 Aug 17, 2021
cf4775a
check onboarding status (#629)
ganga1980 Aug 19, 2021
da55fe5
Gangams/arc k8s conformance test updates (#617)
ganga1980 Aug 19, 2021
e39b83b
upgrade golang version for windows in pipeline build and locally (#630)
gracewehner Aug 20, 2021
3a02a4f
Updating a link in Readme.md (#632)
daweim0 Aug 25, 2021
e56c74b
Updating omsagent yaml to have parity with omsagent yaml file in AKS …
rashmichandrashekar Aug 25, 2021
d2817cb
Unit test tooling (#625)
daweim0 Aug 27, 2021
32f958b
run unit tests after a merge too (#634)
daweim0 Aug 27, 2021
c4a3bbc
flag stale PRs & issues
vishiy Aug 31, 2021
beb7f42
Adding script to collect logs (for troubleshooting) (#636)
daweim0 Sep 1, 2021
01e8178
Sarah/ev2 (#640)
sarahpeiffer Sep 11, 2021
ef7cb89
documenting fbit tail plugin configmap settings. (#638)
daweim0 Sep 13, 2021
6b42f13
Install unzip package on shell extension (#642)
sarahpeiffer Sep 13, 2021
7ef07e1
Changing installation in ev2 script (#644)
sarahpeiffer Sep 14, 2021
a025ce7
Adjust release pipeline to use cdpx acr (#647)
sarahpeiffer Sep 21, 2021
c6bc993
Sarah/ev2 prod (#649)
sarahpeiffer Sep 22, 2021
5e37947
CDPX repo naming change (#652)
sarahpeiffer Sep 23, 2021
a36d8df
Sarah/ev2 update (#654)
sarahpeiffer Sep 27, 2021
fdc99f6
change tag syntax for mcr repo check (#655)
sarahpeiffer Sep 28, 2021
6292218
Gangams/optimize win livenessprobe (#653)
ganga1980 Sep 28, 2021
cfacf39
Gangams/addon token adapter image tag to telemetry (#656)
ganga1980 Sep 29, 2021
ae9ebd7
Sarah/ev2 helm (#658)
sarahpeiffer Sep 30, 2021
a6c6c4a
Sarah/ev2 pipeline (#661)
sarahpeiffer Oct 4, 2021
9e2df4d
add charts directory to build artifacts (#662)
sarahpeiffer Oct 4, 2021
f1d0e43
Sarah/remove cdpx creds (#664)
sarahpeiffer Oct 6, 2021
6ff747c
chart updates for rbac api version change (#660)
ganga1980 Oct 7, 2021
f77587a
proxy support (for non-aks) (#665)
daweim0 Oct 8, 2021
34f5c52
Gangams/agent release ciprod10082021 & win-ciprod10082021 (#666)
ganga1980 Oct 8, 2021
c4d2254
use buildcommand for prod pipeline (#668)
sarahpeiffer Oct 8, 2021
3b008e5
fixed merge issues. (#671) (#672)
ganga1980 Oct 11, 2021
d16d84b
changes related to mdsd version update (#673) (#674)
ganga1980 Oct 11, 2021
ce65f2c
Sarah/enable metrics (#675)
sarahpeiffer Oct 12, 2021
608f92e
Gangams/chart updates oct2021 release (#676)
ganga1980 Oct 13, 2021
ab98c4b
Gangams/msi mode mdsd crash fix (#677)
ganga1980 Oct 13, 2021
a105a00
update to use extension GA api version (#679)
ganga1980 Oct 20, 2021
87ff281
Gangams/arm template msi onboarding (#659)
ganga1980 Oct 20, 2021
ac5dec3
Gangams/conf test updates to handle sidecar (#681)
ganga1980 Oct 22, 2021
0bd3056
Fix scan break due to latest trivy changes
vishiy Oct 26, 2021
761b641
Anjohans/configurable database name (#663)
asjddd Oct 26, 2021
fc955b3
Gangams/troubelshooting script for arc k8s (#682)
ganga1980 Oct 27, 2021
7c9cdc8
Sarah/remove cdpx creds (#685)
sarahpeiffer Nov 2, 2021
f75eea6
fix: subtract number instead of string + update fluentd version 1.14.…
bragi92 Nov 5, 2021
15ee6c5
Faster Linux builds (part 1) (#687)
daweim0 Nov 5, 2021
b4ca054
Sarah/fluentbit windows log (#688)
sarahpeiffer Jan 13, 2022
5b9988c
default to port 10250 & containerd for linux agent (#699)
ganga1980 Jan 21, 2022
4c460c6
Updating pod annotation for latest agent version (#697)
rashmichandrashekar Jan 24, 2022
f2c2904
fix windows build failure due to msys2 version (#700)
bragi92 Jan 25, 2022
78440cf
Jan agent tasks (#698)
gracewehner Jan 28, 2022
3dce72f
remove v1 fallback hidden option (#705)
ganga1980 Jan 28, 2022
2726d01
collect telemetry containerlog records with emptystamp (#703)
ganga1980 Jan 28, 2022
28599b3
Fixing telegraf bug for placeholder name (#706)
rashmichandrashekar Jan 28, 2022
7452ee2
Gangams/jan 2022 release tasks 3 (#702)
ganga1980 Jan 28, 2022
bfc41a4
Gangams/jan 2022 release tasks 2 (#701)
ganga1980 Jan 29, 2022
ec2b09f
release updates for ciprod01312022 & win-ciprod01312022release (#707)
ganga1980 Jan 31, 2022
fb82f31
merge latest dev changes to prod
ganga1980 Jan 31, 2022
9a292c0
fix merge issue
ganga1980 Jan 31, 2022
04ebd94
fix logger exception
ganga1980 Feb 1, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/pr-checker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ jobs:
format: 'table'
severity: 'CRITICAL,HIGH'
vuln-type: 'os,library'
skip-dirs: 'opt/telegraf,usr/sbin/telegraf'
skip-dirs: '/usr/sbin'
exit-code: '1'
timeout: '5m0s'
WINDOWS-build:
Expand Down Expand Up @@ -94,4 +94,3 @@ jobs:
cd ./kubernetes/windows/ && docker build . --file Dockerfile -t $env:IMAGETAG --build-arg IMAGE_TAG=$env:IMAGETAG_TELEMETRY
- name: List-docker-images
run: docker images --digests --all

2 changes: 1 addition & 1 deletion .github/workflows/run_unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
uses: actions/checkout@v2
- name: install fluent
run: |
sudo gem install fluentd -v "1.12.2" --no-document
sudo gem install fluentd -v "1.14.2" --no-document
sudo fluentd --setup ./fluent
- name: Run unit tests
run: |
Expand Down
35 changes: 35 additions & 0 deletions ReleaseNotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,41 @@ additional questions or comments.

Note : The agent version(s) below has dates (ciprod<mmddyyyy>), which indicate the agent build dates (not release dates)

### 1/31/2022 -
##### Version microsoft/oms:ciprod01312022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod01312022 (linux)
##### Version microsoft/oms:win-ciprod01312022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod01312022 (windows)
##### Code change log
- Linux Agent
- Configurable DB name via configmap for ADX (default DB name:containerinsights)
- Default to cAdvisor port to 10250 and container runtime to Containerd
- Update AgentVersion annotation in yamls (omsagent and chart) with released MDSD agent version
- Incresing windows agent CPU limits from 200m to 500m
- Ignore new disk path that comes from containerd starting with k8s version >= 1.19.x, which was adding unnecessary InsightsMetrics logs and increasing cost
- Route the AI SDK logs to log file instead of stdout
- Telemetry to collect ContainerLog Records with empty Timestamp
- FluentBit version upgrade from 1.6.8 to 1.7.8
- Windows Agent
- Update to use FluentBit for container log collection and removed FluentD dependency for container log collection
- Telemetry to track if any of the variable fields of windows container inventory records has field size >= 64KB
- Add windows os check in in_cadvisor_perf plugin to avoid making call in MDSD in MSI auth mode
- Bug fix for placeholder_hostname in telegraf metrics
- FluentBit version upgrade from 1.4.0 to 1.7.8
- Common
- Upgrade FluentD gem version from 1.12.2 to 1.14.2
- Upgrade Telegraf version from 1.18.0 to 1.20.3
- Fix for exception in node allocatable
- Telemetry to track nodeCount & containerCount
- Other changes
- Updates to Arc K8s Extension ARM Onboarding templates with GA API version
- Added ARM Templates for MSI Based Onboarding for AKS
- Conformance test updates relates to sidecar container
- Troubelshooting script to detect issues related to Arc K8s Extension onboarding
- Remove the dependency SP for CDPX since configured to use MSI
- Linux Agent Image build improvements
- Update msys2 version to fix windows agent build
- Add explicit exit code 1 across all the PS scripts


### 10/13/2021 -
##### Version microsoft/oms:ciprod10132021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod10132021 (linux)
##### Version microsoft/oms:win-ciprod10132021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod10132021 (windows)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@

@td_agent_bit_conf_path = "/etc/opt/microsoft/docker-cimprov/td-agent-bit.conf"

@os_type = ENV["OS_TYPE"]
if !@os_type.nil? && !@os_type.empty? && @os_type.strip.casecmp("windows") == 0
@td_agent_bit_conf_path = "/etc/fluent-bit/fluent-bit.conf"
end

@default_service_interval = "15"
@default_mem_buf_limit = "10"

Expand All @@ -20,14 +25,14 @@ def substituteFluentBitPlaceHolders
bufferMaxSize = ENV["FBIT_TAIL_BUFFER_MAX_SIZE"]
memBufLimit = ENV["FBIT_TAIL_MEM_BUF_LIMIT"]

serviceInterval = (!interval.nil? && is_number?(interval) && interval.to_i > 0 ) ? interval : @default_service_interval
serviceInterval = (!interval.nil? && is_number?(interval) && interval.to_i > 0) ? interval : @default_service_interval
serviceIntervalSetting = "Flush " + serviceInterval

tailBufferChunkSize = (!bufferChunkSize.nil? && is_number?(bufferChunkSize) && bufferChunkSize.to_i > 0) ? bufferChunkSize : nil

tailBufferMaxSize = (!bufferMaxSize.nil? && is_number?(bufferMaxSize) && bufferMaxSize.to_i > 0) ? bufferMaxSize : nil

if ((!tailBufferChunkSize.nil? && tailBufferMaxSize.nil?) || (!tailBufferChunkSize.nil? && !tailBufferMaxSize.nil? && tailBufferChunkSize.to_i > tailBufferMaxSize.to_i))
if ((!tailBufferChunkSize.nil? && tailBufferMaxSize.nil?) || (!tailBufferChunkSize.nil? && !tailBufferMaxSize.nil? && tailBufferChunkSize.to_i > tailBufferMaxSize.to_i))
puts "config:warn buffer max size must be greater or equal to chunk size"
tailBufferMaxSize = tailBufferChunkSize
end
Expand Down
264 changes: 264 additions & 0 deletions build/common/installer/scripts/tomlparser-agent-config.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,264 @@
#!/usr/local/bin/ruby

#this should be require relative in Linux and require in windows, since it is a gem install on windows
@os_type = ENV["OS_TYPE"]
if !@os_type.nil? && !@os_type.empty? && @os_type.strip.casecmp("windows") == 0
require "tomlrb"
else
require_relative "tomlrb"
end

require_relative "ConfigParseErrorLogger"

@configMapMountPath = "/etc/config/settings/agent-settings"
@configSchemaVersion = ""
@enable_health_model = false

# 250 Node items (15KB per node) account to approximately 4MB
@nodesChunkSize = 250
# 1000 pods (10KB per pod) account to approximately 10MB
@podsChunkSize = 1000
# 4000 events (1KB per event) account to approximately 4MB
@eventsChunkSize = 4000
# roughly each deployment is 8k
# 500 deployments account to approximately 4MB
@deploymentsChunkSize = 500
# roughly each HPA is 3k
# 2000 HPAs account to approximately 6-7MB
@hpaChunkSize = 2000
# stream batch sizes to avoid large file writes
# too low will consume higher disk iops
@podsEmitStreamBatchSize = 200
@nodesEmitStreamBatchSize = 100

# higher the chunk size rs pod memory consumption higher and lower api latency
# similarly lower the value, helps on the memory consumption but incurrs additional round trip latency
# these needs to be tuned be based on the workload
# nodes
@nodesChunkSizeMin = 100
@nodesChunkSizeMax = 400
# pods
@podsChunkSizeMin = 250
@podsChunkSizeMax = 1500
# events
@eventsChunkSizeMin = 2000
@eventsChunkSizeMax = 10000
# deployments
@deploymentsChunkSizeMin = 500
@deploymentsChunkSizeMax = 1000
# hpa
@hpaChunkSizeMin = 500
@hpaChunkSizeMax = 2000

# emit stream sizes to prevent lower values which costs disk i/o
# max will be upto the chunk size
@podsEmitStreamBatchSizeMin = 50
@nodesEmitStreamBatchSizeMin = 50

# configmap settings related fbit config
@fbitFlushIntervalSecs = 0
@fbitTailBufferChunkSizeMBs = 0
@fbitTailBufferMaxSizeMBs = 0
@fbitTailMemBufLimitMBs = 0


def is_number?(value)
true if Integer(value) rescue false
end

# Use parser to parse the configmap toml file to a ruby structure
def parseConfigMap
begin
# Check to see if config map is created
if (File.file?(@configMapMountPath))
puts "config::configmap container-azm-ms-agentconfig for agent settings mounted, parsing values"
parsedConfig = Tomlrb.load_file(@configMapMountPath, symbolize_keys: true)
puts "config::Successfully parsed mounted config map"
return parsedConfig
else
puts "config::configmap container-azm-ms-agentconfig for agent settings not mounted, using defaults"
return nil
end
rescue => errorStr
ConfigParseErrorLogger.logError("Exception while parsing config map for agent settings : #{errorStr}, using defaults, please check config map for errors")
return nil
end
end

# Use the ruby structure created after config parsing to set the right values to be used as environment variables
def populateSettingValuesFromConfigMap(parsedConfig)
begin
if !parsedConfig.nil? && !parsedConfig[:agent_settings].nil?
if !parsedConfig[:agent_settings][:health_model].nil? && !parsedConfig[:agent_settings][:health_model][:enabled].nil?
@enable_health_model = parsedConfig[:agent_settings][:health_model][:enabled]
puts "enable_health_model = #{@enable_health_model}"
end
chunk_config = parsedConfig[:agent_settings][:chunk_config]
if !chunk_config.nil?
nodesChunkSize = chunk_config[:NODES_CHUNK_SIZE]
if !nodesChunkSize.nil? && is_number?(nodesChunkSize) && (@nodesChunkSizeMin..@nodesChunkSizeMax) === nodesChunkSize.to_i
@nodesChunkSize = nodesChunkSize.to_i
puts "Using config map value: NODES_CHUNK_SIZE = #{@nodesChunkSize}"
end

podsChunkSize = chunk_config[:PODS_CHUNK_SIZE]
if !podsChunkSize.nil? && is_number?(podsChunkSize) && (@podsChunkSizeMin..@podsChunkSizeMax) === podsChunkSize.to_i
@podsChunkSize = podsChunkSize.to_i
puts "Using config map value: PODS_CHUNK_SIZE = #{@podsChunkSize}"
end

eventsChunkSize = chunk_config[:EVENTS_CHUNK_SIZE]
if !eventsChunkSize.nil? && is_number?(eventsChunkSize) && (@eventsChunkSizeMin..@eventsChunkSizeMax) === eventsChunkSize.to_i
@eventsChunkSize = eventsChunkSize.to_i
puts "Using config map value: EVENTS_CHUNK_SIZE = #{@eventsChunkSize}"
end

deploymentsChunkSize = chunk_config[:DEPLOYMENTS_CHUNK_SIZE]
if !deploymentsChunkSize.nil? && is_number?(deploymentsChunkSize) && (@deploymentsChunkSizeMin..@deploymentsChunkSizeMax) === deploymentsChunkSize.to_i
@deploymentsChunkSize = deploymentsChunkSize.to_i
puts "Using config map value: DEPLOYMENTS_CHUNK_SIZE = #{@deploymentsChunkSize}"
end

hpaChunkSize = chunk_config[:HPA_CHUNK_SIZE]
if !hpaChunkSize.nil? && is_number?(hpaChunkSize) && (@hpaChunkSizeMin..@hpaChunkSizeMax) === hpaChunkSize.to_i
@hpaChunkSize = hpaChunkSize.to_i
puts "Using config map value: HPA_CHUNK_SIZE = #{@hpaChunkSize}"
end

podsEmitStreamBatchSize = chunk_config[:PODS_EMIT_STREAM_BATCH_SIZE]
if !podsEmitStreamBatchSize.nil? && is_number?(podsEmitStreamBatchSize) &&
podsEmitStreamBatchSize.to_i <= @podsChunkSize && podsEmitStreamBatchSize.to_i >= @podsEmitStreamBatchSizeMin
@podsEmitStreamBatchSize = podsEmitStreamBatchSize.to_i
puts "Using config map value: PODS_EMIT_STREAM_BATCH_SIZE = #{@podsEmitStreamBatchSize}"
end
nodesEmitStreamBatchSize = chunk_config[:NODES_EMIT_STREAM_BATCH_SIZE]
if !nodesEmitStreamBatchSize.nil? && is_number?(nodesEmitStreamBatchSize) &&
nodesEmitStreamBatchSize.to_i <= @nodesChunkSize && nodesEmitStreamBatchSize.to_i >= @nodesEmitStreamBatchSizeMin
@nodesEmitStreamBatchSize = nodesEmitStreamBatchSize.to_i
puts "Using config map value: NODES_EMIT_STREAM_BATCH_SIZE = #{@nodesEmitStreamBatchSize}"
end
end
# fbit config settings
fbit_config = parsedConfig[:agent_settings][:fbit_config]
if !fbit_config.nil?
fbitFlushIntervalSecs = fbit_config[:log_flush_interval_secs]
if !fbitFlushIntervalSecs.nil? && is_number?(fbitFlushIntervalSecs) && fbitFlushIntervalSecs.to_i > 0
@fbitFlushIntervalSecs = fbitFlushIntervalSecs.to_i
puts "Using config map value: log_flush_interval_secs = #{@fbitFlushIntervalSecs}"
end

fbitTailBufferChunkSizeMBs = fbit_config[:tail_buf_chunksize_megabytes]
if !fbitTailBufferChunkSizeMBs.nil? && is_number?(fbitTailBufferChunkSizeMBs) && fbitTailBufferChunkSizeMBs.to_i > 0
@fbitTailBufferChunkSizeMBs = fbitTailBufferChunkSizeMBs.to_i
puts "Using config map value: tail_buf_chunksize_megabytes = #{@fbitTailBufferChunkSizeMBs}"
end

fbitTailBufferMaxSizeMBs = fbit_config[:tail_buf_maxsize_megabytes]
if !fbitTailBufferMaxSizeMBs.nil? && is_number?(fbitTailBufferMaxSizeMBs) && fbitTailBufferMaxSizeMBs.to_i > 0
if fbitTailBufferMaxSizeMBs.to_i >= @fbitTailBufferChunkSizeMBs
@fbitTailBufferMaxSizeMBs = fbitTailBufferMaxSizeMBs.to_i
puts "Using config map value: tail_buf_maxsize_megabytes = #{@fbitTailBufferMaxSizeMBs}"
else
# tail_buf_maxsize_megabytes has to be greater or equal to tail_buf_chunksize_megabytes
@fbitTailBufferMaxSizeMBs = @fbitTailBufferChunkSizeMBs
puts "config::warn: tail_buf_maxsize_megabytes must be greater or equal to value of tail_buf_chunksize_megabytes. Using tail_buf_maxsize_megabytes = #{@fbitTailBufferMaxSizeMBs} since provided config value not valid"
end
end
# in scenario - tail_buf_chunksize_megabytes provided but not tail_buf_maxsize_megabytes to prevent fbit crash
if @fbitTailBufferChunkSizeMBs > 0 && @fbitTailBufferMaxSizeMBs == 0
@fbitTailBufferMaxSizeMBs = @fbitTailBufferChunkSizeMBs
puts "config::warn: since tail_buf_maxsize_megabytes not provided hence using tail_buf_maxsize_megabytes=#{@fbitTailBufferMaxSizeMBs} which is same as the value of tail_buf_chunksize_megabytes"
end

fbitTailMemBufLimitMBs = fbit_config[:tail_mem_buf_limit_megabytes]
if !fbitTailMemBufLimitMBs.nil? && is_number?(fbitTailMemBufLimitMBs) && fbitTailMemBufLimitMBs.to_i > 0
@fbitTailMemBufLimitMBs = fbitTailMemBufLimitMBs.to_i
puts "Using config map value: tail_mem_buf_limit_megabytes = #{@fbitTailMemBufLimitMBs}"
end
end
end
rescue => errorStr
puts "config::error:Exception while reading config settings for agent configuration setting - #{errorStr}, using defaults"
@enable_health_model = false
end
end

@configSchemaVersion = ENV["AZMON_AGENT_CFG_SCHEMA_VERSION"]
puts "****************Start Config Processing********************"
if [email protected]? && [email protected]? && @configSchemaVersion.strip.casecmp("v1") == 0 #note v1 is the only supported schema version , so hardcoding it
configMapSettings = parseConfigMap
if !configMapSettings.nil?
populateSettingValuesFromConfigMap(configMapSettings)
end
else
if (File.file?(@configMapMountPath))
ConfigParseErrorLogger.logError("config::unsupported/missing config schema version - '#{@configSchemaVersion}' , using defaults, please use supported schema version")
end
@enable_health_model = false
end

# Write the settings to file, so that they can be set as environment variables
file = File.open("agent_config_env_var", "w")

if !file.nil?
file.write("export AZMON_CLUSTER_ENABLE_HEALTH_MODEL=#{@enable_health_model}\n")
file.write("export NODES_CHUNK_SIZE=#{@nodesChunkSize}\n")
file.write("export PODS_CHUNK_SIZE=#{@podsChunkSize}\n")
file.write("export EVENTS_CHUNK_SIZE=#{@eventsChunkSize}\n")
file.write("export DEPLOYMENTS_CHUNK_SIZE=#{@deploymentsChunkSize}\n")
file.write("export HPA_CHUNK_SIZE=#{@hpaChunkSize}\n")
file.write("export PODS_EMIT_STREAM_BATCH_SIZE=#{@podsEmitStreamBatchSize}\n")
file.write("export NODES_EMIT_STREAM_BATCH_SIZE=#{@nodesEmitStreamBatchSize}\n")
# fbit settings
if @fbitFlushIntervalSecs > 0
file.write("export FBIT_SERVICE_FLUSH_INTERVAL=#{@fbitFlushIntervalSecs}\n")
end
if @fbitTailBufferChunkSizeMBs > 0
file.write("export FBIT_TAIL_BUFFER_CHUNK_SIZE=#{@fbitTailBufferChunkSizeMBs}\n")
end
if @fbitTailBufferMaxSizeMBs > 0
file.write("export FBIT_TAIL_BUFFER_MAX_SIZE=#{@fbitTailBufferMaxSizeMBs}\n")
end
if @fbitTailMemBufLimitMBs > 0
file.write("export FBIT_TAIL_MEM_BUF_LIMIT=#{@fbitTailMemBufLimitMBs}\n")
end
# Close file after writing all environment variables
file.close
else
puts "Exception while opening file for writing config environment variables"
puts "****************End Config Processing********************"
end

def get_command_windows(env_variable_name, env_variable_value)
return "[System.Environment]::SetEnvironmentVariable(\"#{env_variable_name}\", \"#{env_variable_value}\", \"Process\")" + "\n" + "[System.Environment]::SetEnvironmentVariable(\"#{env_variable_name}\", \"#{env_variable_value}\", \"Machine\")" + "\n"
end

if !@os_type.nil? && !@os_type.empty? && @os_type.strip.casecmp("windows") == 0
# Write the settings to file, so that they can be set as environment variables
file = File.open("setagentenv.ps1", "w")

if !file.nil?
if @fbitFlushIntervalSecs > 0
commands = get_command_windows('FBIT_SERVICE_FLUSH_INTERVAL', @fbitFlushIntervalSecs)
file.write(commands)
end
if @fbitTailBufferChunkSizeMBs > 0
commands = get_command_windows('FBIT_TAIL_BUFFER_CHUNK_SIZE', @fbitTailBufferChunkSizeMBs)
file.write(commands)
end
if @fbitTailBufferMaxSizeMBs > 0
commands = get_command_windows('FBIT_TAIL_BUFFER_MAX_SIZE', @fbitTailBufferMaxSizeMBs)
file.write(commands)
end
if @fbitTailMemBufLimitMBs > 0
commands = get_command_windows('FBIT_TAIL_MEM_BUF_LIMIT', @fbitTailMemBufLimitMBs)
file.write(commands)
end
# Close file after writing all environment variables
file.close
puts "****************End Config Processing********************"
else
puts "Exception while opening file for writing config environment variables for WINDOWS LOG"
puts "****************End Config Processing********************"
end
end
Loading