-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Installation / Prepare Script #160
Conversation
This reverts commit fa942ce.
Initial testing not successful so please don't merge @larsll : - I need to do some investigation as to what's causing it. Ideas? |
What are you trying to do? (Platform, action etc.) |
Need to test several cases: - |
Trying to manually run from the created instance: - Assume some of the config that currently works no longer works with your slimmed down changes @larsll. Relevant bits of the code that works with current main branch: - Creation of AMI occurs here - https://github.com/aws-deepracer-community/deepracer-on-the-spot/blob/main/scripts/image-builder.yaml which mainly deals with prepare and install. Perhaps I now need to add in some pre-reqs you've removed? Then when the instance runs this is the bit of the code that runs when you're trying to use OpenGL (https://github.com/aws-deepracer-community/deepracer-on-the-spot/blob/main/spot-instance.yaml): -
|
Hmm, that piece of code is a bit of a mystery. On my test with GPU + OpenGL then you only need to do setup-xorg.sh and start-xorg.sh; the rest seems to be pieces copied together from prepare.sh and those scripts... (And there has been changes as to how xorg starts; have a look at those updated scripts...) |
It's code that I added to get OpenGL to work, prior to adding this it did not work. It could be because the approach Tyler took in the original set-up of DOTS is to run a bunch of things initially to 'bake' an AMI (where prepare and init is ran) to speed up deployment, but it means when you deploy the instance you're running a few things specifically on that new instance. It's part of this wider code that runs on the new instance from the AMI: - #!/bin/bash
I'll have to do some further testing. Also how long does it take and does it still need reboots now you've stripped back the install as perhaps we could do away with the AMI approach and just run from a fresh Ubuntu if it's only a short amount at startup (the AMI approach was designed to reduce time from creating an instance to training starting) |
Error on using OpenGL relates to runnignt he ./utils/setup-xorg.sh script, output below: - Reading package lists... Done ERROR: Unable to query GPU information nvidia-xconfig: option "--busid" requires an argument. Invalid commandline, please run BUS_ID var in the script is not being set. running the command that sets the BUS_ID var results in: - EC2 instance does have a GPU :-), it's a g4dn.2xlarge I've tested on. Looking through the PR I noticed this as thought it might be related to not being able to find the GPU info: - Post install of that line I can now get back the GPU info: - So it appears the problem is that the updated DRfC code isn't appropriately detecting the GPU on the EC2 instance and running the code to install the nvidia drivers @larsll? |
GPU for robomaker and sagemaker on OpenGL works now having removed the old nvidia driver install and replacing with up to date one: - GPU for sagemaker, CPU for robomaker with PenGL also now works: - Think we're good to go @larsll |
Significantly simplified prepare script with the following improvements: