-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploy benchrunner to AWS #7
base: cinder/3.10
Are you sure you want to change the base?
Conversation
File contains configuration needed for Terraform to an EC2 t2.micro instance
Include Terraform state files, directories, and logs in .gitignore. Format Terraform configuration.
Add requirements.txt to create the new files. Update .gitignore to ignore venv files.
Workflow file to be updated with steps to deploy benchrunner
Closes #4 |
Nice! Self-Hosted Runner on AWSI'll preface this by saying there's a lot of moving parts, so your questions are pretty on-point! I'm not sure what you already know though so it might be better to discuss in real-time, but I'll try and summarize my understanding in case it helps. BackgroundThe Self-Hosted Runner is a server that needs to run in some environment, and we're choosing that environment to be AWS. The AWS secrets enables GitHub to automatically access that server on your behalf. There's a question of when and how that server can become available, especially because it's wasteful to keep it running when there's no work to do. OptionsAlways OnFor simplicity, you could manually create an AWS EC2 instance and use AutoscalingThis approach lets GitHub communicate with AWS to automatically create and destroy EC2 instances based off configurations. There seems to be a fairly mature Terraform solution https://github.com/philips-labs/terraform-aws-github-runner Event-BasedThe Bare MetalI wouldn't worry about cloud vs bare-metal machines for now since that should be easy to switch out once we get all the configurations coded up. |
Thank you for summarizing Johnston! The background you described matches with what I understand as well.
This is also a question I have. In the project requirements, it is mentioned that the workflow should be run nightly and manually. In that case, it sounds like the best choice is this Event-based option? Since it starts up EC2 instances only when the workflow is running, so resources aren't wasted.
I know you mentioned the
When you mention switching out, do you mean that we are starting with the cloud machine first (AWS), but plan on having a bare-metal version available for the benchrunner? That's all the questions I have for now and thanks again for taking a look at the PR! |
Event-based sounds like a simple approach and seems like a reasonable implementation to target for the purposes of MLH. Eventually we might want to consider switching to autoscaling, depending on what we discover through implementation. It'd be a fantastic contribution if you're able to figure out tradeoffs for the options, in terms of ease of implementation/maintenance and cost. Consider that extra credit!
Yes, I'd prioritize a faster implementation for this manual approach (i.e. use systemd for manual testing and direct control to run commands/inspect system state). I see this PR as purely educational, to make it easier to see what's going on under the hood and inform the eventual implementation. We're going to have to make changes anyway once we migrate to a Terraform-based deployment. Manually integrating the full system and getting it to run end-to-end should help us understand how the pieces fit together and give ideas for how to debug them as we move onto the production implementation.
I'm assuming that AWS has a (more expensive) option for running on bare metal vs VM. The configurations should pretty much be identical though, which is why I'm not too concerned about it while we're still performing this preliminary investigation. We'll eventually need to provision the production servers from our end within Meta. Ideally you can provide the mostly complete configuration that's tested against a VM, and we should be able to trivially tweak it to run on bare metal if any issues crop up. |
Thanks for clarifying! I will look into using |
Add Terraform configuration to benchrunner branch
Workflow will run on push instead of manually.
Add key pair for SSH access to instance. Add resources for traffic rules.
Existing key pair referenced in a separate resource instead of directly in instance resource.
Key pair resource does not match with how SSH key is created.
During initialization, a test script is run. Update workflow.
Overview
Initially I added requirements to get the benchrunner running, based on Faster CPython's documentation.
Since the self-hosted runner is not set up yet, I switched to getting it set up with GitHub Actions.
CURRENT STATUS: I set up a workflow and have configured AWS credentials with GitHub Actions, but am unsure how to go about running something on my own EC2 instance through this workflow.
Any resources to help point me in the right direction would be appreciated!