Skip to content

Guides and FAQ for AMD Accelerator Cloud Users

License

Notifications You must be signed in to change notification settings

sanjtrip/AMDAcceleratorCloudGuides

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to AMD Accelerator Cloud (AAC) Reference Documentation

Getting Started

Contact your AMD Sponsor to sign up for access to AMD Accelerator Cloud resources.

How to Login to the Web Interface Login

Go to https://aac.amd.com

How to SSH to the Plano Slurm Cluster

  1. From the laptop or system used to generate SSH keys for AAC User Account Registration, enter the following at the Terminal or PowerShell prompt to SSH to the AAC Plano Slurm cluster:
    ssh <your_userid>@aac1.amd.com
    
    The SSH keys should be accessible under $HOME/.ssh directory or via PuTTy or Mobaterm tools used to generate the SSH keys

Contacting AAC Support Team

For questions or support requests, please email them to [email protected]


July 13-14th AAC Maintenance Update Notice for AAC Users


The AAC Web Interface has moved to https://aac.amd.com

1. How to Fix ssh <USERID>@aac1.amd.com failed with Host key verification failed message

The Slurm login node was changed during maintenance, so the host key fingerprint is different. SSH users may see a failure to login with a WARNING message such as one shown below. Please update $HOME/.ssh/known_hosts by removing the existing entries for dell-r08-01 and retry ssh <USERID>@aac1.amd.com and accept the new fingerprints:

ECDSA key fingerprint is SHA256:u1u0/uh0GLcs19KNHrmZIA6EDLMvJACK5y2fMkVg1fg.
ECDSA key fingerprint is MD5:76:6a:a4:34:56:c0:04:fa:7f:84:e6:85:0b:f1:65:e5.

Solution:

  1. First remove existing fingerprint of old Slurm login host: ssh-keygen -R aac1.amd.com
  2. Login to the AAC Plano Slurm cluster: ssh <USERID>@aac1.amd.com
  3. Enter "yes" to accept new host key fingerprint at the prompt to continue to login.

2. How to Fix The selected queue is no longer available error

lhkcojnlehhnkkck

Solution:

The Slurm partition/queue names were changed during the maintenance to remove duplicate queue names and standardize on one set. The new partitition names can be used to allocate single node or a multi-node cluster using the Slurm commands.

1CN128C8G2H_2IB_MI210_RHEL9
1CN128C8G2H_2IB_MI210_RHEL8
1CN128C8G2H_2IB_MI210_SLES15
1CN128C8G2H_2IB_MI210_Ubuntu22
1CN96C8G1H_4IB_MI250_Ubuntu22

About

Guides and FAQ for AMD Accelerator Cloud Users

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published