Skip to content

Commit

Permalink
Gangams/aad stage3 msi auth (#585)
Browse files Browse the repository at this point in the history
* changes related to aad msi auth feature

* use existing envvars

* fix imds token expiry interval

* refactor the windows agent ingestion token code

* code cleanup

* fix build errors

* code clean up

* code clean up

* code clean up

* code clean up

* more refactoring

* fix bug

* fix bug

* add debug logs

* add nil checks

* revert changes

* revert yaml change since this added in aks side

* fix pr feedback

* fix pr feedback

* refine retry code

* update mdsd env as per official build

* cleanup

* update env vars per mdsd

* update with mdsd official build

* skip cert gen & renewal incase of aad msi auth

* add nil check

* cherry windows agent nodeip issue

* fix merge issue

Co-authored-by: rashmichandrashekar <[email protected]>
  • Loading branch information
ganga1980 and rashmichandrashekar authored Jul 19, 2021
1 parent 3b38337 commit bcea7fc
Show file tree
Hide file tree
Showing 30 changed files with 1,612 additions and 183 deletions.
3 changes: 3 additions & 0 deletions build/linux/installer/datafiles/base_container.data
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,9 @@ MAINTAINER: 'Microsoft Corporation'

/etc/fluent/plugin/omslog.rb; source/plugins/utils/omslog.rb; 644; root; root
/etc/fluent/plugin/oms_common.rb; source/plugins/utils/oms_common.rb; 644; root; root
/etc/fluent/plugin/extension.rb; source/plugins/utils/extension.rb; 644; root; root
/etc/fluent/plugin/extension_utils.rb; source/plugins/utils/extension_utils.rb; 644; root; root


/etc/fluent/kube.conf; build/linux/installer/conf/kube.conf; 644; root; root
/etc/fluent/container.conf; build/linux/installer/conf/container.conf; 644; root; root
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,18 @@ def start

def enumerate
begin
puts "Calling certificate renewal code..."
maintenance = OMS::OnboardingHelper.new(
ENV["WSID"],
ENV["DOMAIN"],
ENV["CI_AGENT_GUID"]
)
ret_code = maintenance.register_certs()
puts "Return code from register certs : #{ret_code}"
if !ENV["AAD_MSI_AUTH_MODE"].nil? && !ENV["AAD_MSI_AUTH_MODE"].empty? && ENV["AAD_MSI_AUTH_MODE"].downcase == "true"
puts "skipping certificate renewal code since AAD MSI auth configured"
else
puts "Calling certificate renewal code..."
maintenance = OMS::OnboardingHelper.new(
ENV["WSID"],
ENV["DOMAIN"],
ENV["CI_AGENT_GUID"]
)
ret_code = maintenance.register_certs()
puts "Return code from register certs : #{ret_code}"
end
rescue => errorStr
puts "in_heartbeat_request::enumerate:Failed in enumerate: #{errorStr}"
# STDOUT telemetry should alredy be going to Traces in AI.
Expand Down
112 changes: 72 additions & 40 deletions kubernetes/linux/main.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ waitforlisteneronTCPport() {
echo "${FUNCNAME[0]} called with incorrect arguments<$1 , $2>. Required arguments <#port, #wait-time-in-seconds>"
return -1
else

if [[ $port =~ $numeric ]] && [[ $waittimesecs =~ $numeric ]]; then
#local varlistener=$(netstat -lnt | awk '$6 == "LISTEN" && $4 ~ ":25228$"')
while true
Expand Down Expand Up @@ -57,7 +57,11 @@ else
export customResourceId=$AKS_RESOURCE_ID
echo "export customResourceId=$AKS_RESOURCE_ID" >> ~/.bashrc
source ~/.bashrc
echo "customResourceId:$customResourceId"
echo "customResourceId:$customResourceId"
export customRegion=$AKS_REGION
echo "export customRegion=$AKS_REGION" >> ~/.bashrc
source ~/.bashrc
echo "customRegion:$customRegion"
fi

#set agent config schema version
Expand Down Expand Up @@ -194,9 +198,15 @@ fi
if [ -z $domain ]; then
ClOUD_ENVIRONMENT="unknown"
elif [ $domain == "opinsights.azure.com" ]; then
CLOUD_ENVIRONMENT="public"
else
CLOUD_ENVIRONMENT="national"
CLOUD_ENVIRONMENT="azurepubliccloud"
elif [ $domain == "opinsights.azure.cn" ]; then
CLOUD_ENVIRONMENT="azurechinacloud"
elif [ $domain == "opinsights.azure.us" ]; then
CLOUD_ENVIRONMENT="azureusgovernmentcloud"
elif [ $domain == "opinsights.azure.eaglex.ic.gov" ]; then
CLOUD_ENVIRONMENT="usnat"
elif [ $domain == "opinsights.azure.microsoft.scloud" ]; then
CLOUD_ENVIRONMENT="ussec"
fi
export CLOUD_ENVIRONMENT=$CLOUD_ENVIRONMENT
echo "export CLOUD_ENVIRONMENT=$CLOUD_ENVIRONMENT" >> ~/.bashrc
Expand Down Expand Up @@ -233,9 +243,9 @@ if [ ${#APPLICATIONINSIGHTS_AUTH_URL} -ge 1 ]; then # (check if APPLICATIONINSI
fi


aikey=$(echo $APPLICATIONINSIGHTS_AUTH | base64 --decode)
export TELEMETRY_APPLICATIONINSIGHTS_KEY=$aikey
echo "export TELEMETRY_APPLICATIONINSIGHTS_KEY=$aikey" >> ~/.bashrc
aikey=$(echo $APPLICATIONINSIGHTS_AUTH | base64 --decode)
export TELEMETRY_APPLICATIONINSIGHTS_KEY=$aikey
echo "export TELEMETRY_APPLICATIONINSIGHTS_KEY=$aikey" >> ~/.bashrc

source ~/.bashrc

Expand Down Expand Up @@ -421,7 +431,7 @@ export KUBELET_RUNTIME_OPERATIONS_ERRORS_METRIC="kubelet_docker_operations_error
if [ "$CONTAINER_RUNTIME" != "docker" ]; then
# these metrics are avialble only on k8s versions <1.18 and will get deprecated from 1.18
export KUBELET_RUNTIME_OPERATIONS_METRIC="kubelet_runtime_operations"
export KUBELET_RUNTIME_OPERATIONS_ERRORS_METRIC="kubelet_runtime_operations_errors"
export KUBELET_RUNTIME_OPERATIONS_ERRORS_METRIC="kubelet_runtime_operations_errors"
fi

echo "set caps for ruby process to read container env from proc"
Expand All @@ -445,34 +455,56 @@ DOCKER_CIMPROV_VERSION=$(dpkg -l | grep docker-cimprov | awk '{print $3}')
echo "DOCKER_CIMPROV_VERSION=$DOCKER_CIMPROV_VERSION"
export DOCKER_CIMPROV_VERSION=$DOCKER_CIMPROV_VERSION
echo "export DOCKER_CIMPROV_VERSION=$DOCKER_CIMPROV_VERSION" >> ~/.bashrc
echo "*** activating oneagent in legacy auth mode ***"
CIWORKSPACE_id="$(cat /etc/omsagent-secret/WSID)"
#use the file path as its secure than env
CIWORKSPACE_keyFile="/etc/omsagent-secret/KEY"
cat /etc/mdsd.d/envmdsd | while read line; do
echo $line >> ~/.bashrc
done
source /etc/mdsd.d/envmdsd
echo "setting mdsd workspaceid & key for workspace:$CIWORKSPACE_id"
export CIWORKSPACE_id=$CIWORKSPACE_id
echo "export CIWORKSPACE_id=$CIWORKSPACE_id" >> ~/.bashrc
export CIWORKSPACE_keyFile=$CIWORKSPACE_keyFile
echo "export CIWORKSPACE_keyFile=$CIWORKSPACE_keyFile" >> ~/.bashrc
export OMS_TLD=$domain
echo "export OMS_TLD=$OMS_TLD" >> ~/.bashrc
export MDSD_FLUENT_SOCKET_PORT="29230"
echo "export MDSD_FLUENT_SOCKET_PORT=$MDSD_FLUENT_SOCKET_PORT" >> ~/.bashrc

#skip imds lookup since not used in legacy auth path
#skip imds lookup since not used either legacy or aad msi auth path
export SKIP_IMDS_LOOKUP_FOR_LEGACY_AUTH="true"
echo "export SKIP_IMDS_LOOKUP_FOR_LEGACY_AUTH=$SKIP_IMDS_LOOKUP_FOR_LEGACY_AUTH" >> ~/.bashrc

# this used by mdsd to determine cloud specific LA endpoints
export OMS_TLD=$domain
echo "export OMS_TLD=$OMS_TLD" >> ~/.bashrc
cat /etc/mdsd.d/envmdsd | while read line; do
echo $line >> ~/.bashrc
done
source /etc/mdsd.d/envmdsd
MDSD_AAD_MSI_AUTH_ARGS=""
# check if its AAD Auth MSI mode via USING_AAD_MSI_AUTH
export AAD_MSI_AUTH_MODE=false
if [ "${USING_AAD_MSI_AUTH}" == "true" ]; then
echo "*** activating oneagent in aad auth msi mode ***"
# msi auth specific args
MDSD_AAD_MSI_AUTH_ARGS="-a -A"
export AAD_MSI_AUTH_MODE=true
echo "export AAD_MSI_AUTH_MODE=true" >> ~/.bashrc
# this used by mdsd to determine the cloud specific AMCS endpoints
export customEnvironment=$CLOUD_ENVIRONMENT
echo "export customEnvironment=$customEnvironment" >> ~/.bashrc
export MDSD_FLUENT_SOCKET_PORT="28230"
echo "export MDSD_FLUENT_SOCKET_PORT=$MDSD_FLUENT_SOCKET_PORT" >> ~/.bashrc
export ENABLE_MCS="true"
echo "export ENABLE_MCS=$ENABLE_MCS" >> ~/.bashrc
export MONITORING_USE_GENEVA_CONFIG_SERVICE="false"
echo "export MONITORING_USE_GENEVA_CONFIG_SERVICE=$MONITORING_USE_GENEVA_CONFIG_SERVICE" >> ~/.bashrc
export MDSD_USE_LOCAL_PERSISTENCY="false"
echo "export MDSD_USE_LOCAL_PERSISTENCY=$MDSD_USE_LOCAL_PERSISTENCY" >> ~/.bashrc
else
echo "*** activating oneagent in legacy auth mode ***"
CIWORKSPACE_id="$(cat /etc/omsagent-secret/WSID)"
#use the file path as its secure than env
CIWORKSPACE_keyFile="/etc/omsagent-secret/KEY"
echo "setting mdsd workspaceid & key for workspace:$CIWORKSPACE_id"
export CIWORKSPACE_id=$CIWORKSPACE_id
echo "export CIWORKSPACE_id=$CIWORKSPACE_id" >> ~/.bashrc
export CIWORKSPACE_keyFile=$CIWORKSPACE_keyFile
echo "export CIWORKSPACE_keyFile=$CIWORKSPACE_keyFile" >> ~/.bashrc
export MDSD_FLUENT_SOCKET_PORT="29230"
echo "export MDSD_FLUENT_SOCKET_PORT=$MDSD_FLUENT_SOCKET_PORT" >> ~/.bashrc
fi
source ~/.bashrc

dpkg -l | grep mdsd | awk '{print $2 " " $3}'

if [ "${CONTAINER_TYPE}" == "PrometheusSidecar" ]; then
echo "starting mdsd with mdsd-port=26130, fluentport=26230 and influxport=26330 in legacy auth mode in sidecar container..."
if [ "${CONTAINER_TYPE}" == "PrometheusSidecar" ]; then
echo "starting mdsd with mdsd-port=26130, fluentport=26230 and influxport=26330 in sidecar container..."
#use tenant name to avoid unix socket conflict and different ports for port conflict
#roleprefix to use container specific mdsd socket
export TENANT_NAME="${CONTAINER_TYPE}"
Expand All @@ -482,23 +514,23 @@ if [ "${CONTAINER_TYPE}" == "PrometheusSidecar" ]; then
source ~/.bashrc
mkdir /var/run/mdsd-${CONTAINER_TYPE}
# add -T 0xFFFF for full traces
mdsd -r ${MDSD_ROLE_PREFIX} -p 26130 -f 26230 -i 26330 -e ${MDSD_LOG}/mdsd.err -w ${MDSD_LOG}/mdsd.warn -o ${MDSD_LOG}/mdsd.info -q ${MDSD_LOG}/mdsd.qos &
else
echo "starting mdsd in legacy auth mode in main container..."
# add -T 0xFFFF for full traces
mdsd -e ${MDSD_LOG}/mdsd.err -w ${MDSD_LOG}/mdsd.warn -o ${MDSD_LOG}/mdsd.info -q ${MDSD_LOG}/mdsd.qos &
mdsd ${MDSD_AAD_MSI_AUTH_ARGS} -r ${MDSD_ROLE_PREFIX} -p 26130 -f 26230 -i 26330 -e ${MDSD_LOG}/mdsd.err -w ${MDSD_LOG}/mdsd.warn -o ${MDSD_LOG}/mdsd.info -q ${MDSD_LOG}/mdsd.qos &
else
echo "starting mdsd mode in main container..."
# add -T 0xFFFF for full traces
mdsd ${MDSD_AAD_MSI_AUTH_ARGS} -e ${MDSD_LOG}/mdsd.err -w ${MDSD_LOG}/mdsd.warn -o ${MDSD_LOG}/mdsd.info -q ${MDSD_LOG}/mdsd.qos &
fi

# no dependency on fluentd for prometheus side car container
if [ "${CONTAINER_TYPE}" != "PrometheusSidecar" ]; then
# no dependency on fluentd for prometheus side car container
if [ "${CONTAINER_TYPE}" != "PrometheusSidecar" ]; then
if [ ! -e "/etc/config/kube.conf" ]; then
echo "*** starting fluentd v1 in daemonset"
fluentd -c /etc/fluent/container.conf -o /var/opt/microsoft/docker-cimprov/log/fluentd.log --log-rotate-age 5 --log-rotate-size 20971520 &
else
echo "*** starting fluentd v1 in replicaset"
fluentd -c /etc/fluent/kube.conf -o /var/opt/microsoft/docker-cimprov/log/fluentd.log --log-rotate-age 5 --log-rotate-size 20971520 &
fi
fi
fi
fi

#If config parsing was successful, a copy of the conf file with replaced custom settings file is created
if [ ! -e "/etc/config/kube.conf" ]; then
Expand Down Expand Up @@ -635,7 +667,7 @@ echo "getting rsyslog status..."
service rsyslog status

shutdown() {
pkill -f mdsd
pkill -f mdsd
}

trap "shutdown" SIGTERM
Expand Down
4 changes: 2 additions & 2 deletions kubernetes/linux/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ sed -i -e 's/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gen && \
dpkg-reconfigure --frontend=noninteractive locales && \
update-locale LANG=en_US.UTF-8

#install oneagent - Official bits (05/17/2021)
wget https://github.com/microsoft/Docker-Provider/releases/download/05172021-oneagent/azure-mdsd_1.10.1-build.master.213_x86_64.deb
#install oneagent - Official bits (06/24/2021)
wget https://github.com/microsoft/Docker-Provider/releases/download/06242021-oneagent/azure-mdsd_1.10.3-build.master.241_x86_64.deb

/usr/bin/dpkg -i $TMPDIR/azure-mdsd*.deb
cp -f $TMPDIR/mdsd.xml /etc/mdsd.d
Expand Down
72 changes: 66 additions & 6 deletions kubernetes/windows/main.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -43,17 +43,49 @@ function Start-FileSystemWatcher {

function Set-EnvironmentVariables {
$domain = "opinsights.azure.com"
$cloud_environment = "public"
$mcs_endpoint = "monitor.azure.com"
$cloud_environment = "azurepubliccloud"
if (Test-Path /etc/omsagent-secret/DOMAIN) {
# TODO: Change to omsagent-secret before merging
$domain = Get-Content /etc/omsagent-secret/DOMAIN
$cloud_environment = "national"
if (![string]::IsNullOrEmpty($domain)) {
if ($domain -eq "opinsights.azure.com") {
$cloud_environment = "azurepubliccloud"
$mcs_endpoint = "monitor.azure.com"
} elseif ($domain -eq "opinsights.azure.cn") {
$cloud_environment = "azurechinacloud"
$mcs_endpoint = "monitor.azure.cn"
} elseif ($domain -eq "opinsights.azure.us") {
$cloud_environment = "azureusgovernmentcloud"
$mcs_endpoint = "monitor.azure.us"
} elseif ($domain -eq "opinsights.azure.eaglex.ic.gov") {
$cloud_environment = "usnat"
$mcs_endpoint = "monitor.azure.eaglex.ic.gov"
} elseif ($domain -eq "opinsights.azure.microsoft.scloud") {
$cloud_environment = "ussec"
$mcs_endpoint = "monitor.azure.microsoft.scloud"
} else {
Write-Host "Invalid or Unsupported domain name $($domain). EXITING....."
exit 1
}
} else {
Write-Host "Domain name either null or empty. EXITING....."
exit 1
}
}

Write-Host "Log analytics domain: $($domain)"
Write-Host "MCS endpoint: $($mcs_endpoint)"
Write-Host "Cloud Environment: $($cloud_environment)"

# Set DOMAIN
[System.Environment]::SetEnvironmentVariable("DOMAIN", $domain, "Process")
[System.Environment]::SetEnvironmentVariable("DOMAIN", $domain, "Machine")

# Set MCS Endpoint
[System.Environment]::SetEnvironmentVariable("MCS_ENDPOINT", $mcs_endpoint, "Process")
[System.Environment]::SetEnvironmentVariable("MCS_ENDPOINT", $mcs_endpoint, "Machine")

# Set CLOUD_ENVIRONMENT
[System.Environment]::SetEnvironmentVariable("CLOUD_ENVIRONMENT", $cloud_environment, "Process")
[System.Environment]::SetEnvironmentVariable("CLOUD_ENVIRONMENT", $cloud_environment, "Machine")
Expand Down Expand Up @@ -158,7 +190,7 @@ function Set-EnvironmentVariables {
Write-Host $_.Exception
}
}

# Check if the fetched IKey was properly encoded. if not then turn off telemetry
if ($aiKeyFetched -match '^[A-Za-z0-9=]+$') {
Write-Host "Using cloud-specific instrumentation key"
Expand Down Expand Up @@ -229,6 +261,21 @@ function Set-EnvironmentVariables {
Write-Host "Failed to set environment variable HOSTNAME for target 'machine' since it is either null or empty"
}

# check if its AAD Auth MSI mode via USING_AAD_MSI_AUTH environment variable
$isAADMSIAuth = [System.Environment]::GetEnvironmentVariable("USING_AAD_MSI_AUTH", "process")
if (![string]::IsNullOrEmpty($isAADMSIAuth)) {
[System.Environment]::SetEnvironmentVariable("AAD_MSI_AUTH_MODE", $isAADMSIAuth, "Process")
[System.Environment]::SetEnvironmentVariable("AAD_MSI_AUTH_MODE", $isAADMSIAuth, "Machine")
Write-Host "Successfully set environment variable AAD_MSI_AUTH_MODE - $($isAADMSIAuth) for target 'machine'..."
}

# check if use token proxy endpoint set via USE_IMDS_TOKEN_PROXY_END_POINT environment variable
$useIMDSTokenProxyEndpoint = [System.Environment]::GetEnvironmentVariable("USE_IMDS_TOKEN_PROXY_END_POINT", "process")
if (![string]::IsNullOrEmpty($useIMDSTokenProxyEndpoint)) {
[System.Environment]::SetEnvironmentVariable("USE_IMDS_TOKEN_PROXY_END_POINT", $useIMDSTokenProxyEndpoint, "Process")
[System.Environment]::SetEnvironmentVariable("USE_IMDS_TOKEN_PROXY_END_POINT", $useIMDSTokenProxyEndpoint, "Machine")
Write-Host "Successfully set environment variable USE_IMDS_TOKEN_PROXY_END_POINT - $($useIMDSTokenProxyEndpoint) for target 'machine'..."
}
$nodeIp = [System.Environment]::GetEnvironmentVariable("NODE_IP", "process")
if (![string]::IsNullOrEmpty($nodeIp)) {
[System.Environment]::SetEnvironmentVariable("NODE_IP", $nodeIp, "machine")
Expand Down Expand Up @@ -427,7 +474,15 @@ function Start-Telegraf {
else {
Write-Host "Failed to set environment variable KUBERNETES_SERVICE_PORT for target 'machine' since it is either null or empty"
}

$nodeIp = [System.Environment]::GetEnvironmentVariable("NODE_IP", "process")
if (![string]::IsNullOrEmpty($nodeIp)) {
[System.Environment]::SetEnvironmentVariable("NODE_IP", $nodeIp, "machine")
Write-Host "Successfully set environment variable NODE_IP - $($nodeIp) for target 'machine'..."
}
else {
Write-Host "Failed to set environment variable NODE_IP for target 'machine' since it is either null or empty"
}

Write-Host "Installing telegraf service"
C:\opt\telegraf\telegraf.exe --service install --config "C:\etc\telegraf\telegraf.conf"

Expand Down Expand Up @@ -524,8 +579,13 @@ if (![string]::IsNullOrEmpty($requiresCertBootstrap) -and `
Bootstrap-CACertificates
}

Generate-Certificates
Test-CertificatePath
$isAADMSIAuth = [System.Environment]::GetEnvironmentVariable("USING_AAD_MSI_AUTH")
if (![string]::IsNullOrEmpty($isAADMSIAuth) -and $isAADMSIAuth.ToLower() -eq 'true') {
Write-Host "skipping agent onboarding via cert since AAD MSI Auth configured"
} else {
Generate-Certificates
Test-CertificatePath
}
Start-Fluent-Telegraf

# List all powershell processes running. This should have main.ps1 and filesystemwatcher.ps1
Expand Down
Loading

0 comments on commit bcea7fc

Please sign in to comment.