Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solr on ECS #36

Merged
merged 50 commits into from
Jun 1, 2022
Merged

Solr on ECS #36

merged 50 commits into from
Jun 1, 2022

Conversation

nickumia-reisys
Copy link
Contributor

@nickumia-reisys nickumia-reisys commented May 19, 2022

Related to GSA/data.gov#3826
Support for a standalone, reliable solr instance on ECS with proper security/encryption.

New Additions:

  • New service plan solr-on-ecs
  • All of the same features of SolrCloud on EKS

List of AWS Services:

  • VPC
    • Security Groups
    • NAT/Internet Gateway
    • Elastic IP
  • EC2
    • Load Balancers / Target Groups
  • Route53
    • Zone Hosting
    • CloudMap Private DNS
    • SSL Certificate Validation
  • CloudWatch
    • Log Groups
  • ACM
    • SSL Certificate Management
  • KMS
    • Customer-managed encryption key management
  • IAM
    • ECS Role Policies
    • EFS Communication Policies
    • Load Balancer Communication Policies
  • EFS
    • File Storage
    • Private VPC Mounting
  • ECS
    • Cluster Management
    • Service Management
    • ContainerInsights
    • Tasks definitions
    • Service Discoveries
    • EFS Mounting
    • VPC Networking

List of Terraform Providers:

- Solr running on ECS in fargate
- Public IP (no load balancer)
- No authentication
- No persistent volumes
- Create IAM Role for ECS Task Role and ECS Execution Role (same role)
- Encrypt EFS Volume with key that is managed here
- Add EFS File Policy to ensure data transit is encrypted
- Optimize security groups for NFS communication from ECS to EFS
- Enable Cloudwatch Logging from ECS Cluster and ECS Solr Task Definition
To avoid collisions between deployments, make each unique
- Use Load Balancer in front of ecs task
- Run all workloads in private subnets
- Only Load Balancer is public
- Enable SSL for Load Balancer domain
- Create custom domain (copy from eks brokerpak, dns, dnssec, et cetera)
- Update security groups for load balancer and ecs task
- Force HTTPS connection to solr
- For the load balancer to serve traffic, it needs to be in a public subnet.
- Everything else is in a private subnet and security groups are used to facilitate internal traffic
- Update name for vpc
- Disable deletion protection on the load balancer so that it can be deprovisioned properly
- Disable public IP for ecs task
- Fix the DNS Alias record to load balancer
- Use GSA Solr Image (with cyber bugfixes)
- Allow LB to talk to GHCR to pull image
- Refine the start command to allow the ckan core to be created automatically
@nickumia-reisys nickumia-reisys marked this pull request as draft May 19, 2022 20:41
The 'root_direction' config in an efs access_point allows a unique directory to be created with specific chmod/chown permissions. This is perfect for solr user ownership :)
- Fix IAM and Security Groups for LB Health Check Passing
- Fix un-erroring problem-causing EFS mount issue (EFS would mount, but it was missing a security group rule to allow communication to task)
- Do some work to ensure longer Broker names work (WIP)
- Temporarily allow all Egress until GHCR Image pull is fixed again..
- Lots of things were opened up to be less restrictive during testing.  Will iterate back to a more secure version since we have a working point now **wipes sweat**
- Don't use built-in EFS volume mount support, use a modified docker image with EFS-utils installed and manually mount volume during startup
- Add more permissions to IAM role
- Disable some restrictions on EFS file system policy
- ECS Service is mounting efs directly, so it needs to depend on the EFS volume so that it can unmount before EFS gets destroyed
- Restore EFS File System Policy to deny insecure connections
- Can mount the /var/solr/data into a different directory on the EFS mount w/o efs ap :)
- Use a temporary password to create a randomly generated secure admin user/pass
- Output url/user/pass for user to see (hint: use 'terraform output -json' to see sensitive values)
Also, clean up some unused lines
This is a lot simpler since we are using the Solr URL API to add/remove users.  URL/USER/PASSWORD are inputs from the provisioning and the same previous outputs exist as outputs.
The authorization link was actually getting hit twice, this fixes that
NO KUBERNETES! ! ! !
Remove unused variables, update terraform files
Solr brokerpak is now specifying AWS Resources by itself and no longer depends on k8s; also, pass in aws creds
some variables are actually numbers and the docker image needs a colon to specify image from tag
- Fix variables definitions in solr-cloud.yml
- Fix terraform version in brokerpak
- Fix reference to service/plan ids in Makefile
added missing parts of the commands
This might cause issues if it has a max length limit, but will see and fix later
Need to revive the original service to be side-by-side able to support either;  Lots to do still
- In order for EFS mounting to work, the solr container needs to be set to root initially
- Attempt to hide the admin user/pass since the provision outputs are combined with the bind outputs (https://github.com/cloudfoundry/cloud-service-broker/blob/main/docs/brokerpak-dissection.md#outputs)
- Wait upto an hour for DNS to resolve (this is the next thing I'm going to fix)
- Implement ECS CloudMap Service Discovery to have container-to-container dns communication
- Create an init container to create the new admin user and delete the temporary admin user
- Copy terraform files to terraform/solrcloud
- Restore certain files (solr-cloud.yml, generate*.sh to their original forms)
- Caveat (unfortunately), solr-on-ecs won't be able to work without kind because the solrcloud service needs the configuration for kind in parallel to solr-cloud :/
- Lots of changes... mostly blind.. will have a lot of debugging to do
This file was important enough to make a separate commit
hopefully this works.. again, no formal tests, just make sure the brokerpak can provision, deprovision the services
Just waiting for secrets to be populated and this should put them where they need to be :)
@nickumia-reisys nickumia-reisys marked this pull request as ready for review May 27, 2022 20:34
make sure to deprovision before destroying the broker
@nickumia-reisys nickumia-reisys marked this pull request as draft May 30, 2022 00:34
@nickumia-reisys
Copy link
Contributor Author

It was worrying me and my suspicions were correct... it didn't actually mount the EFS volume in the container... still need to fix EFS mounting.

I tested it by restarting the service and seeing if the data was still there.. it wasn't 😭

I verified this by creating the service, connecting it to catalog-dev, running a re-index, restarting the solr service and verifying that the data was able to load when the solr service was re-created...
- This is not an optimal design, but it works, so if we want to improve from here, we can
@nickumia-reisys nickumia-reisys marked this pull request as ready for review May 31, 2022 16:31
@nickumia-reisys
Copy link
Contributor Author

The reason the tests are failing is because the AWS Development account needs to be cleaned up.

@nickumia-reisys nickumia-reisys changed the title [WIP] Solr on ECS Solr on ECS Jun 1, 2022
@nickumia-reisys nickumia-reisys requested review from a team and FuhuXia June 1, 2022 14:25
@nickumia-reisys nickumia-reisys enabled auto-merge June 1, 2022 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants