diff --git a/eng/common/testproxy/transition-scripts/README.md b/eng/common/testproxy/transition-scripts/README.md new file mode 100644 index 000000000..862d22ed6 --- /dev/null +++ b/eng/common/testproxy/transition-scripts/README.md @@ -0,0 +1,131 @@ +# Transitioning recording assets from language repositories into + +## Setting some context + +The azure-sdk monorepos are growing quickly due to the presence of recordings. Due to this, the engineering system team has been tasked with providing a mechanism that allows recordings to live _elsewhere_. The actual implementation of this goal is already present within the `test-proxy` tool, and this document reflects how to TRANSITION to storing recordings elsewhere! + +The script `generate-assets-json.ps1` will execute the initial migration of your recordings from within a language repo to the [assets repo](https://github.com/Azure/azure-sdk-assets) as well as creating the assets.json file for those assets. + +The script is [generate-assets-json.ps1](https://github.com/Azure/azure-sdk-tools/blob/main/eng/common/testproxy/transition-scripts/generate-assets-json.ps1) + +### Download the transition script locally + +```powershell +Invoke-WebRequest -OutFile "generate-assets-json.ps1" https://raw.githubusercontent.com/Azure/azure-sdk-tools/main/eng/common/testproxy/transition-scripts/generate-assets-json.ps1 +``` + +```bash +wget https://raw.githubusercontent.com/Azure/azure-sdk-tools/main/eng/common/testproxy/transition-scripts/generate-assets-json.ps1 -o generate-assets-json.ps1 +``` + +## Setup + +Before running the script, understand that **only services that have migrated to use the `test-proxy` as their record/playback solution can store recordings into the external assets repository.** The test-proxy itself contains the code for `restoring`/`push`ing recordings, so if it is NOT being used for record/playback, that work must be completed before recordings can be moved. + +Running the script requires these base requirements. + +- [x] The targeted library is already migrated to use the test-proxy. +- [x] Git version `>2.25.0` needs to be on the machine and in the path. Git is used by the script and test-proxy. +- [x] [Powershell Core](https://learn.microsoft.com/powershell/scripting/install/installing-powershell?view=powershell-7.2) at least version 7. +- [x] Ensure global git config settings for `user.name` and `user.email` are updated. [Reference](https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup) + - Override with environment variables `GIT_COMMIT_EMAIL` and `GIT_COMMIT_OWNER`. If either of these are set, they will override the default values pulled from `git config --global`. + +Once the above requirements are met, developers are welcome to choose one of the following paths. + +### `test-proxy` dotnet tool installed and called directly + +Provide `TestProxyExe` argument of `test-proxy` or leave it **blank**. This is the default use-case of this transition script. + +- [x] Test-proxy needs to be on the machine and in the path. Instructions for that are [here](https://github.com/Azure/azure-sdk-tools/blob/main/tools/test-proxy/Azure.Sdk.Tools.TestProxy/README.md#installation). + +The newly installed test-proxy tool will be used during the recording migration portion of this script. + +### `docker` or `podman` invocation + +To utilize this methodology, the user must set input argument `TestProxyExe` to `docker` or `podman`. + +Other requirements: + +- [x] Install [docker](https://docs.docker.com/engine/install/) or [podman](https://podman.io/getting-started/installation.html) +- [x] Set the environment variable `GIT_TOKEN` a valid token representing YOUR user + +## Permissions + +Check your github group membership. If you are part of the group `azure-sdk-write` directly or through a sub-team, you have the necessary permissions to create tags in the assets repository. + +You will not be able to clean them up however. There exists [planned work](https://github.com/Azure/azure-sdk-tools/issues/4298) to clean up unused assets repo tags. Erroneously pushed tags will be auto cleaned. + +## Nomenclature + +- `language` repo - An individual language repository eg. azure-sdk-for-python or azure-sdk-for-net etc. +- `assets` repo - The repository where assets are being moved to. + +The `test-proxy` tool is integrated with the ability to automatically restore these assets. This process is kick-started by the presence of an `assets.json` alongside a dev's actual code. This means that while assets will be cloned down externally, the _map_ to those assets will be stored alongside the tests. Normally, it is recommended to create an `assets.json` under the path `sdk/`. However, more granular storage is also possible. + +Service/Package-Level examples: + +- `sdk/storage/assets.json` +- `sdk/storage/azure-storage-file-datalake/assets.json` + +The location of the actual test code is referred to as the `language repo`. + +The location of the automatically restored assets is colloquially referred to as the `assets repo`. There is an individual `assets repo` cloned for **each `assets.json` in the language repo.** + +## Running the script + +[generate-assets-json.ps1](https://github.com/Azure/azure-sdk-tools/blob/main/eng/common/testproxy/transition-scripts/generate-assets-json.ps1) is a standalone powershell script with no supporting script requirements. The easiest way to run the script would be to use a one-liner [defined above](#download-the-transition-script-locally) to grab the file directly. **Please ensure you have the newest version of this script before continuing!** + +```powershell +# if downloading the file singly, cd to the directory containing generate-assets-json.ps1 +cd "/sdk/" +/generate-assets-json.ps1 +``` + +The script needs to be executed inside an `sdk/` or deeper and from within an up to date language repository. A good rule here would be look at where the ci.yml is for an service directory. In the case where each library for a given service directory has their own pipelines, at the `sdk//` level, it is recommended that the assets.json is created there. If the `ci.yml` exists deeper than the `sdk//` level, then it is recommended to run the script from that directory. + +```powershell +# calling transition script against tool, given local clones of azure-sdk-for-java and azure-sdk-tools +cd c:/src/azure-sdk-for-java/sdk/attestation +/generate-assets-json.ps1 -InitialPush +``` + +```powershell +# calling transition script against docker, given local clones of azure-sdk-for-java and azure-sdk-tools +$env:GIT_TOKEN="my git token" +cd c:/src/azure-sdk-for-java/sdk/attestation +/generate-assets-json.ps1 -TestProxyExe "docker" -InitialPush +``` + +After running a script, executing a `git status` from within the language repo, where the script was invoked from, will reflect two primary results: + +- A new `assets.json` present in the directory from which they invoked the transition script. +- A **bunch** of deleted files from where their recordings _were_ before they were pushed to the assets repo. + +Running the script without the `-InitialPush` option will just create the assets.json with an empty tag. No data movement. + +### What's the script doing behind the scenes? + +Given the previous example of `sdk/attestation` transition script invocation, users should see the following: + +- Creation of the assets.json file in the `sdk/attestation` directory. + - If `-InitialPush` has not been specified, the script stops here and exits. +- test-proxy's CLI restore is called on the current assets.json. Since there's nothing there, it'll just initialize an empty assets directory under the `.assets` directory under repo root. +- The recordings are moved from their initial directories within the language repo into a temp directory that was created in the previous step. + - The relative paths from root are preserved. + - For example, the recordings for `C:/src/azure-sdk-for-python/sdk/tables` live in the `azure-data-tables/tests/recordings` subdirectory and in the target repository they'll live in `python/sdk/tables/azure-data-tables/tests/recordings`. All the azure-sdk supported languages will leverage [Azure/azure-sdk-assets](https://github.com/Azure/azure-sdk-assets), so adding a prefix to the output path `python` ensures that these recordings can live alongside others in the assets repo. +- Call `test-proxy push` on the assets.json created in the first step. The push will happen automatically and not require a manual PR. + - On completion of the push, the newly created tag will be stamped into the assets.json. + +At this point the script is complete. The assets.json and deleted recording files will need to be pushed into the language repository as a manual PR. + +#### Why does the script analyze the remotes to compute the language? + +This is necessary because the language is used in several places. + +1. The AssetsRepoPrefixPath in assets.json is set to the language. +2. The TagPrefix is set to the `/` or `//` etc. +3. The language also used to determine what the [recording directories within a repository are named](https://github.com/Azure/azure-sdk-tools/blob/main/eng/common/testproxy/transition-scripts/generate-assets-json.ps1#L47). + +## A final note about the initial push + +If a directory with several thousand recordings is being migrated, the move and the initial push can take several minutes. For example, java storage recordings were used as a stress test. There are 4,693 files, with a combined size of 666 MB, and the initial push took about 7 minutes. This is a one time cost as the files do not exist yet within the assets repository. Subsequent pushes should have dramatically reduced push time. diff --git a/eng/common/testproxy/transition-scripts/generate-assets-json.ps1 b/eng/common/testproxy/transition-scripts/generate-assets-json.ps1 new file mode 100644 index 000000000..2b24bdcb1 --- /dev/null +++ b/eng/common/testproxy/transition-scripts/generate-assets-json.ps1 @@ -0,0 +1,401 @@ +<# +.SYNOPSIS +Create a default assets.json for a given ServiceDirectory or deeper. + +.DESCRIPTION +Requirements: +1. git will need to be in the path. +2. This script will need to be run locally in a an azure-sdk-for- repository. Further, this +needs to be run at an sdk/ or deeper. For example sdk/core if the assets.json is +being created at the service directory level or sdk/core/ if the assets.json is being +created at the library level. A good rule here would be to run this in the same directory where the ci.yml +file lives. For most this is the sdk/, but some services emplace a ci.yml alongside each package. +In that case, the assets.json should live alongside the ci.yml in the sdk// directory. + +Generated assets.json file contents +- AssetsRepo: "Azure/azure-sdk-assets" - This is the assets repository, aka where your recordings will live after this script runs. +- AssetsRepoPrefixPath: "" - this is will be computed from repository it's being run in. +- TagPrefix: "/" or "//" or deeper if things + are nested in such a manner. All tags created for this assets.json will start with this name. +- Tag: "" - Initially empty, as nothing has yet been pushed. + +If flag InitialPush is set, recordings will be automatically pushed to the assets repo and the Tag property updated. + +.PARAMETER TestProxyExe +The executable used during the "InitialPush" action. Defaults to the dotnet tool test-proxy, but also supports "docker" or "podman". + +If the user provides their own value that doesn't match options "test-proxy", "docker", or "podman", the script will use this input as the test-proxy exe +when invoking commands. EG "$TestProxyExe push -a sdk/keyvault/azure-keyvault-keys/assets.json." + +.PARAMETER InitialPush +Pass this flag to automagically move all recordings found UNDER your assets.json to an assets repo. + +Detailed process: +- Create a temp directory. +- Call "restore" against that assets directory to prepare it to receive updates. +- Move all recordings found under the assets.json within the language repo to the assets directory prepared by the restore operation in the previous step. +- Push moved recordings to the assets repo. +- Update the assets.json with the new tag. + +.PARAMETER UseTestRepo +Enabling this parameter will result in an assets.json that points at repo Azure/azure-sdk-assets-integration. This is the +integration repo that the azure-sdk EngSys team uses to integration test this script and other asset-sync features. + +Most library devs should ignore this setting unless directed otherwise (or if they're curious!). Permissions to the integration +repo are identical to the default assets repo. + +#> +param( + [Parameter(Mandatory = $false)] + [string] $TestProxyExe = "test-proxy", + [switch] $InitialPush, + [switch] $UseTestRepo +) + +# Git needs to be in the path to determine the language and, if the initial push +# is being performed, for the CLI commands to work +$GitExe = "git" + +# The built test proxy on a dev machine will have the version 1.0.0-dev.20221013.1 +# whereas the one installed from nuget will have the version 20221013.1 (minus the 1.0.0-dev.) +$MinTestProxyVersion = "20221017.4" + +$DefaultAssetsRepo = "Azure/azure-sdk-assets" +if ($UseTestRepo) { + $DefaultAssetsRepo = "Azure/azure-sdk-assets-integration" + Write-Host "UseTestRepo was true, setting default repo to $DefaultAssetsRepo" +} + +# Unsure of the following language recording directories: +# 1. andriod +# 2. c +# 3. ios +$LangRecordingDirs = @{"cpp" = "recordings"; + "go" = "recordings"; + "java" = "session-records"; + "js" = "recordings"; + "net" = "SessionRecords"; + "python" = "recordings"; +}; + +class Assets { + [string]$AssetsRepo = $DefaultAssetsRepo + [string]$AssetsRepoPrefixPath = "" + [string]$TagPrefix = "" + [string]$Tag = "" + Assets( + [string]$AssetsRepoPrefixPath, + [string]$TagPrefix + ) { + $this.TagPrefix = $TagPrefix + $this.AssetsRepoPrefixPath = $AssetsRepoPrefixPath + } +} + +class Version { + [int]$Year + [int]$Month + [int]$Day + [int]$Revision + Version( + [string]$VersionString + ) { + if ($VersionString -match "(?20\d{2})(?\d{2})(?\d{2}).(?\d+)") { + $this.Year = [int]$Matches["year"] + $this.Month = [int]$Matches["month"] + $this.Day = [int]$Matches["day"] + $this.Revision = [int]$Matches["revision"] + } + else { + # This should be a Write-Error however powershell apparently cannot utilize that + # in the constructor in certain cases + Write-Warning "Version String '$($VersionString)' is invalid and cannot be parsed" + exit 1 + } + } + [bool] IsGreaterEqual([string]$OtherVersionString) { + [Version]$OtherVersion = [Version]::new($OtherVersionString) + if ($this.Year -lt $OtherVersion.Year) { + return $false + } + elseif ($this.Year -eq $OtherVersion.Year) { + if ($this.Month -lt $OtherVersion.Month) { + return $false + } + elseif ($this.Month -eq $OtherVersion.Month) { + if ($this.Day -lt $OtherVersion.Day) { + return $false + } + elseif ($this.Day -eq $OtherVersion.Day) { + if ($this.Revision -lt $OtherVersion.Revision) { + return $false + } + } + } + } + return $true + } +} + +Function Test-Exe-In-Path { + Param([string] $ExeToLookFor) + if ($null -eq (Get-Command $ExeToLookFor -ErrorAction SilentlyContinue)) { + Write-Error "Unable to find $ExeToLookFor in your PATH" + exit 1 + } +} + +Function Test-TestProxyVersion { + param( + [string] $TestProxyExe + ) + + Write-Host "$TestProxyExe --version" + [string] $output = & "$TestProxyExe" --version + + [Version]$CurrentProxyVersion = [Version]::new($output) + if (!$CurrentProxyVersion.IsGreaterEqual($MinTestProxyVersion)) { + Write-Error "$TestProxyExe version, $output, is less than the minimum version $MinTestProxyVersion" + Write-Error "Please refer to https://github.com/Azure/azure-sdk-tools/blob/main/tools/test-proxy/Azure.Sdk.Tools.TestProxy/README.md#installation to upgrade your $TestProxyExe" + exit 1 + } +} + +Function Get-Repo-Language { + + $GitRepoOnDiskErr = "This script can only be called from within an azure-sdk-for- repository on disk." + # Git remote -v is going to give us output similar to the following + # origin git@github.com:Azure/azure-sdk-for-java.git (fetch) + # origin git@github.com:Azure/azure-sdk-for-java.git (push) + # upstream git@github.com:Azure/azure-sdk-for-java (fetch) + # upstream git@github.com:Azure/azure-sdk-for-java (push) + # We're really only trying to get the language from the git remote + Write-Host "git remote -v" + [array] $remotes = & git remote -v + foreach ($line in $remotes) { + Write-Host "$line" + } + + # Git remote -v returned "fatal: not a git repository (or any of the parent directories): .git" + # and the list of remotes will be null + if (-not $remotes) { + Write-Error $GitRepoOnDiskErr + exit 1 + } + + # The regular expression needed to be updated to handle the following types of input: + # origin git@github.com:Azure/azure-sdk-for-python.git (fetch) + # origin git@github.com:Azure/azure-sdk-for-python-pr.git (fetch) + # fork git@github.com:UserName/azure-sdk-for-python (fetch) + # azure-sdk https://github.com/azure-sdk/azure-sdk-for-net.git (fetch) + # origin https://github.com/Azure/azure-sdk-for-python/ (fetch) + # ForEach-Object splits the string on whitespace so each of the above strings is actually + # 3 different strings. The first and last pieces won't match anything, the middle string + # will match what is below. If the regular expression needs to be updated the following + # link below will go to a regex playground + # https://regex101.com/r/auOnAr/1 + $lang = $remotes[0] | ForEach-Object { if ($_ -match "azure-sdk-for-(?[^\-\.\/ ]+)") { + #Return the named language match + return $Matches["lang"] + } + } + + if ([String]::IsNullOrWhitespace($lang)) { + Write-Error $GitRepoOnDiskErr + exit 1 + } + + Write-Host "Current language=$lang" + return $lang +} + +Function Get-Repo-Root { + [string] $currentDir = Get-Location + # -1 to strip off the trialing directory separator + return $currentDir.Substring(0, $currentDir.LastIndexOf("sdk") - 1) +} + +Function New-Assets-Json-File { + param( + [Parameter(Mandatory = $true)] + [string] $Language + ) + $AssetsRepoPrefixPath = $Language + + [string] $currentDir = Get-Location + + $sdkDir = "$([IO.Path]::DirectorySeparatorChar)sdk$([IO.Path]::DirectorySeparatorChar)" + + # if we're not in a /sdk/ or deeper then this script isn't + # being run in the right place + if (-not $currentDir.contains($sdkDir)) { + Write-Error "This script needs to be run at an sdk/ or deeper." + exit 1 + } + + $TagPrefix = $currentDir.Substring($currentDir.LastIndexOf("sdk") + 4) + $TagPrefix = $TagPrefix.Replace("\", "/") + $TagPrefix = "$($AssetsRepoPrefixPath)/$($TagPrefix)" + [Assets]$Assets = [Assets]::new($AssetsRepoPrefixPath, $TagPrefix) + + $AssetsJson = $Assets | ConvertTo-Json + + $AssetsFileName = Join-Path -Path $currentDir -ChildPath "assets.json" + Write-Host "Writing file $AssetsFileName with the following contents" + Write-Host $AssetsJson + $Assets | ConvertTo-Json | Out-File $AssetsFileName + + return $AssetsFileName +} + +# Invoke the proxy command and echo the output. +Function Invoke-ProxyCommand { + param( + [string] $TestProxyExe, + [string] $CommandArgs, + [string] $TargetDirectory + ) + $updatedDirectory = $TargetDirectory.Replace("`\", "/") + + if ($TestProxyExe -eq "docker" -or $TestProxyExe -eq "podman"){ + $token = $env:GIT_TOKEN + $committer = $env:GIT_COMMIT_OWNER + $email = $env:GIT_COMMIT_EMAIL + + if (-not $committer) { + $committer = & git config --global user.name + } + + if (-not $email) { + $email = & git config --global user.email + } + + if(-not $token -or -not $committer -or -not $email){ + Write-Error ("When running this transition script in `"docker`" or `"podman`" mode, " ` + + "the environment variables GIT_TOKEN, GIT_COMMIT_OWNER, and GIT_COMMIT_EMAIL must be set to reflect the appropriate user. ") + exit 1 + } + + $targetImage = if ($env:TRANSITION_SCRIPT_DOCKER_TAG) { $env:TRANSITION_SCRIPT_DOCKER_TAG } else { "azsdkengsys.azurecr.io/engsys/test-proxy:latest" } + + $CommandArgs = @( + "run --rm --name transition.test.proxy", + "-v `"${updatedDirectory}:/srv/testproxy`"", + "-e `"GIT_TOKEN=${token}`"", + "-e `"GIT_COMMIT_OWNER=${committer}`"", + "-e `"GIT_COMMIT_EMAIL=${email}`"", + $targetImage, + "test-proxy", + $CommandArgs + ) -join " " + } + + Write-Host "$TestProxyExe $CommandArgs" + [array] $output = & "$TestProxyExe" $CommandArgs.Split(" ") --storage-location="$updatedDirectory" + # echo the command output + foreach ($line in $output) { + Write-Host "$line" + } +} + +# Get the shorthash directory under PROXY_ASSETS_FOLDER +Function Get-AssetsRoot { + param( + [string] $AssetsJsonFile + ) + $repoRoot = Get-Repo-Root + $relPath = [IO.Path]::GetRelativePath($repoRoot, $AssetsJsonFile).Replace("`\", "/") + $assetsJsonDirectory = Split-Path $relPath + $breadcrumbFile = Join-Path $repoRoot ".assets" ".breadcrumb" + + $breadcrumbString = Get-Content $breadcrumbFile | Where-Object { $_.StartsWith($relPath) } + $assetRepo = $breadcrumbString.Split(";")[1] + $assetsPrefix = (Get-Content $AssetsJsonFile | Out-String | ConvertFrom-Json).AssetsRepoPrefixPath + + return Join-Path $repoRoot ".assets" $assetRepo $assetsPrefix $assetsJsonDirectory +} + +Function Move-AssetsFromLangRepo { + param( + [string] $AssetsRoot + ) + $filter = $LangRecordingDirs[$language] + Write-Host "Language recording directory name=$filter" + Write-Host "Get-ChildItem -Recurse -Filter ""*.json"" | Where-Object { `$_.DirectoryName.Split([IO.Path]::DirectorySeparatorChar) -contains ""$filter"" }" + $filesToMove = Get-ChildItem -Recurse -Filter "*.json" | Where-Object { $_.DirectoryName.Split([IO.Path]::DirectorySeparatorChar) -contains "$filter" } + [string] $currentDir = Get-Location + + foreach ($fromFile in $filesToMove) { + $relPath = [IO.Path]::GetRelativePath($currentDir, $fromFile) + + $toFile = Join-Path -Path $AssetsRoot -ChildPath $relPath + # Write-Host "Moving from=$fromFile" + # Write-Host " to=$toFile" + $toPath = Split-Path -Path $toFile + + Write-Host $toFile + if (!(Test-Path $toPath)) { + New-Item -Path $toPath -ItemType Directory -Force | Out-Null + } + Move-Item -LiteralPath $fromFile -Destination $toFile -Force + } +} + +Test-Exe-In-Path -ExeToLookFor $GitExe +$language = Get-Repo-Language + +# If the initial push is being performed, ensure that test-proxy is +# in the path and that we're able to map the language's recording +# directories +if ($InitialPush) { + Test-Exe-In-Path -ExeToLookFor $TestProxyExe + + if ($TestProxyExe -eq "test-proxy") { + Test-TestProxyVersion -TestProxyExe $TestProxyExe + } + + if (!$LangRecordingDirs.ContainsKey($language)) { + Write-Error "The language, $language, does not have an entry in the LangRecordingDirs dictionary." + exit 1 + } +} + +$repoRoot = Get-Repo-Root + +# Create the assets-json file +$assetsJsonFile = New-Assets-Json-File -Language $language + +# If the initial push is being done: +# 1. Do a restore on the assetsJsonFile, it'll setup the directory that will allow a push to be done +# 2. Move all of the assets over, preserving the directory structure +# 3. Push the repository which will update the assets.json with the new Tag +if ($InitialPush) { + try { + $assetsJsonRelPath = [System.IO.Path]::GetRelativePath($repoRoot, $assetsJsonFile) + + # Execute a restore on the current assets.json, it'll prep the root directory that + # the recordings need to be copied into + $CommandArgs = "restore --assets-json-path $assetsJsonRelPath" + Invoke-ProxyCommand -TestProxyExe $TestProxyExe -CommandArgs $CommandArgs -TargetDirectory $repoRoot + + $assetsRoot = (Get-AssetsRoot -AssetsJsonFile $assetsJsonFile) + Write-Host "assetsRoot=$assetsRoot" + + Move-AssetsFromLangRepo -AssetsRoot $assetsRoot + + $CommandArgs = "push --assets-json-path $assetsJsonRelPath" + Invoke-ProxyCommand -TestProxyExe $TestProxyExe -CommandArgs $CommandArgs -TargetDirectory $repoRoot + + # Verify that the assets.json file was updated + $updatedAssets = Get-Content $assetsJsonFile | Out-String | ConvertFrom-Json + if ([String]::IsNullOrWhitespace($($updatedAssets.Tag))) { + Write-Error "AssetsJsonFile ($assetsJsonFile) did not have it's tag updated. Check above output messages for erroneous git output." + exit 1 + } + } + catch { + $ex = $_ + Write-Host $ex + exit 1 + } +} diff --git a/eng/ignore-links.txt b/eng/ignore-links.txt index 4e99ac1e5..11ac39d5b 100644 --- a/eng/ignore-links.txt +++ b/eng/ignore-links.txt @@ -1 +1,2 @@ # Placeholder file for links that need exceptions from link checking +https://github.com/Azure/azure-sdk-tools/blob/main/eng/common/testproxy/transition-scripts/generate-assets-json.ps1