Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gangams/cost optimizations public preview #860

Merged
merged 305 commits into from
Nov 28, 2022

Conversation

ganga1980
Copy link
Contributor

@ganga1980 ganga1980 commented Nov 21, 2022

This PR has following changes

  1. AKS, Arc K8s and Provisioned cluster template updates for Data collection settings
  2. Implementation of data collection settings
  3. Telemetry to track data collection settings enablement and settings

rashmichandrashekar and others added 30 commits January 8, 2021 13:47
* wip

* fbit config settings

* add config warn message

* handle one config provided but not other

* fixed pr feedback

* fix copy paste error

* rename config parameter names

* fix typo

* fix fbit crash in helm path

* fix nil check
* wip

* explicit amd64 affinity for hybrid workloads

* fix space issue

* wip

* revert vscode setting file
If APPLICATIONINSIGHTS_AUTH_URL is set/non-empty then the agent will now grab a custom IKey from a URL stored in APPLICATIONINSIGHTS_AUTH_URL
* upgrade apt to latest version

* fix pr feedback
* wip

* add env var for the arc k8s extension name

* chart update

* extension msi updates

* fix bug

* revert chart and image to prod version

* minor text changes

* image tag to prod

* wip

* wip

* wip

* wip

* final updates

* fix whitespaces

* simplify crd yaml
* arm templates for arc k8s extension

* update to use official extension type name

* update

* add identity property

* add proxyendpointurl parameter

* add default values
* enable monitoring through policy

* wip

* handle tags

* wip

* add alias

* wip

* working

* updates

* working

* with deployment name

* doc updates

* doc updates

* fix typo in the docs
* make pod name in mdsd definition as str for consistency. msgp has no type checking, as it has type metadata in it the message itself.
* Add priority class to the daemonsets

Add a priority class for omsagent and have the daemonsets use this
to be sure to schedule the pods.

Daemonset pods are constrained in scheduling to run on specific
nodes.  This is done by the daemonset controller.  When a node shows
up it will create a pod with a strong affinity to that node.  When a
node goes away, it will delete the pod with the node affinity to that
node.

Kubernetes pod scheduling does not know it is a daemonset but it does
know it is tied to a specific node.  With default scheduling, it is
possible for the pods to be "frozen out" of a node because the node
already is full.  This can happen because "normal" pods may already
exist and are looking for a node to get scheduled on when a node is
added to the cluster.  The daemonset controller will only first
create the pod for the node at around the same time.  The kubernetes
scheduler is running async from all of this and thus there can be a
race as to who gets scheduled on the node.

The pod priority class (and thus the pod priority) is a way to indicate
that the pod has a higher scheduling priority than a default pod.

By default, all pods are at priority 0.  Higher numbers are higher
priority.  Setting the priority to something greater than zero will
allow the omsagent daemonsets to win a race against "normal" pods for
scheduled resources on a node - and will also allow for graceful
eviction in the case the node is too full.

Without this, omsagent can be left out of node in clusters that are
very busy, especially in dynamic scaling situations.

I did not test the windows pod as we have no windows clusters.

* CR feedback
* bug fix for mdm metrics with no limits

* fix exception bug
* fix npe in getKubeServiceRecords

* use image fields from spec

* fix typo

* cover all cases

* handle scenario only digest specified
* add agent e2e fw and tests

* doc and script updates

* add validation script

* doc updates

* yaml updates

* fix typo

* doc updates

* more doc updates

* add ISTEST for helm chart to use arc conf

* refactor test code

* fix pr feedback

* fix pr feedback

* fix pr feedback

* fix pr feedback
* fix issue with crd status updates

* handle renewal token delays

* add proxy contract

* updates for proxy cert for linux

* remove proxycert related changes

* fix whitespace issue

* fix whitespace issue

* remove proxy in arm template
* doc updates for microsoft charts repo release

* wip
Line 314 and 343 seems to have trailing spaces for some subscriptions which is exiting the script even for valid scenarios

Co-authored-by: Ganga Mahesh Siddem <[email protected]>
* Create ReadMe.md

* Update ReadMe.md

* Update ReadMe.md

* Update ReadMe.md

* Update ReadMe.md

* Add files via upload

* Update ReadMe.md

* Update ReadMe.md

* Update ReadMe.md

* Update ReadMe.md

* Update ReadMe.md

* Update ReadMe.md
The node and the omsagent container both have a cron.daily file to rotate certain logs daily. These settings are the same for some files in /var/log (mounted from the node with read/write access), causing the rotation to fail when both try to rotate at the same time. So then the /var/log/*.1 file is written to forever. Since these files are always written to and never rotated, it causes high memory usage on the node after a while.

This fix removes the container logrotate settings for /var/log, which the container does not write to.
Copy link
Contributor

@pfrcks pfrcks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good. have left a few comments. please take a look

Copy link
Contributor

@pfrcks pfrcks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nameSpace mentions are remaining. pls fix them as well.

Copy link
Contributor

@pfrcks pfrcks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nameSpace still left in couple of places. I have marked this in couple of places(some still left please replace across the change). There is also some extra whitespace which I have added pointed out as a comment

pfrcks
pfrcks previously approved these changes Nov 23, 2022
Copy link
Contributor

@pfrcks pfrcks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thanks for addressing the nits

@ganga1980 ganga1980 requested a review from pfrcks November 24, 2022 04:13
Copy link
Contributor

@pfrcks pfrcks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ganga1980 ganga1980 merged commit 6115caa into ci_prod Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.