Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Create AWS Data Sync Instance #5175

Open
4 of 5 tasks
darren1988 opened this issue Aug 27, 2024 · 39 comments
Open
4 of 5 tasks

✨ Create AWS Data Sync Instance #5175

darren1988 opened this issue Aug 27, 2024 · 39 comments
Assignees

Comments

@darren1988
Copy link

darren1988 commented Aug 27, 2024

Describe the feature request.

Describe the context.

We embarked on this originally earlier in the year, where the request came in for a datasync instance that would allow OPG to move various pieces of unstructured/semi-structured data (PDFs, Documents etc.) into the Analytical Platform, so that they could be accessed directly from the AP without having to download files from a fileshare and manually reupload them. This would allow the data to be automatically replicated to our account from the fileshare, meaning analysts would be able to natively access all their files. This was for a good while pending the creation of a service account from ATOS, but said account has been created.

Work required:

We need to create an AWS Datasync Instance, and set it up to connect to/authenticate with the fileshare, using the service account provided by ATOS

Definition of done

  • Modernisation Platform's shared VPC added to analytical-platform-ingestion
  • Data Sync instance and agent deployed
  • Data Sync location configured
  • Data Sync destination configured
  • Tested
@darren1988 darren1988 changed the title ✨ < ✨ Create AWS Datasync Instance Aug 27, 2024
@YvanMOJdigital
Copy link

requires defined architecture before planning

@darren1988
Copy link
Author

Meeting scheduled for 29/08/24 to discuss scope and technical architecture for this work

@bagg3rs
Copy link
Contributor

bagg3rs commented Aug 29, 2024

I have the service account credentials from Gwion and have put them into 1Password OPG - AWS DataSync Service Account AP Shared Account

@darren1988
Copy link
Author

To be discussed at refinement.

@jacobwoffenden jacobwoffenden self-assigned this Sep 24, 2024
@jacobwoffenden
Copy link
Member

Reached out to @ministryofjustice/modernisation-platform about adding their shared platform VPC into our ingestion account

@jacobwoffenden
Copy link
Member

Have agreed with @ministryofjustice/modernisation-platform that this isn't a problem, we can add shared VPC, will inspect environment code in modernisation-platform and modernisation-platform-environments

@jacobwoffenden jacobwoffenden changed the title ✨ Create AWS Datasync Instance ✨ Create AWS Data Sync Instance Sep 26, 2024
@jacobwoffenden
Copy link
Member

shared VPC added to ingestion account, however upon further reading Data Sync does not support shared VPCs

@jacobwoffenden
Copy link
Member

Plan is to create VPCs using existing, soon to be retired, never connected to MoJ TGW, ranges from MP

@jacobwoffenden
Copy link
Member

jacobwoffenden commented Sep 30, 2024

VPC build-out in progress, EC2 instance build-out also in progress.

However the DataSync registration is not programatic, the DataSync server needs to be accessible from whatever machine is running Terraform/registering manually in the console. This is problematic because

  1. GitHub Actions is our primary CI/CD system that has no internal connectivity to our VPC and isn't really in scope for making work because its MP's system
  2. Our VPC currently has no public connectivity
  3. From what I've read, the AWS provided AMI does not include SSM agent, instead need you need to connect via SSH (I tried serial console but the default of admin / password didn't work)
  4. The above presents a chicken/egg problem 🤔

Do I open the endpoint to GitHub Actions? GlobalProtect?

Do I add userdata to install SSM agent and write the activation key to Secrets Manager? I don't even know if the activation key is held on disk or if I'd to run a command...

@jacobwoffenden
Copy link
Member

10/10/24 update:

  • Instance up
  • Able to route using NLB, however when Terraform is triggering activation, the target group becomes unhealthy presumably because DataSync activation server stops, it just hangs, need to possibly add private_link_endpoint

@jacobwoffenden
Copy link
Member

jacobwoffenden commented Oct 16, 2024

16/10/24 update:

  • Call with AWS who pointed out we were missing an egress rule on the VPC endpoint security group and the ENI security group needs attaching to the VPC endpoint, don't think either of those things are documented 🤷
  • DataSync agent is now registered and as expected, the agents server stops so the NLB is just sat

TODO:

  • Attach VPCs to MoJ TGW
  • Add credentials to Secrets Manager
  • Add DataSync location - can we look up dom1 by IP?
  • Add DataSync task
  • Add S3 destination bucket
  • Decide where data will be replicated to

@jacobwoffenden
Copy link
Member

Currently blocked by ministryofjustice/modernisation-platform#8275

@darren1988
Copy link
Author

Requested support from mod platform to help unblock this ticket

@jacobwoffenden
Copy link
Member

NVVS/LAN&Wifi team have given me access to https://github.com/ministryofjustice/deployment-tgw, so I'm not as blocked as last week 🙏

@jacobwoffenden
Copy link
Member

Moving back to blocked pending information on connecting to DOM1 from AWS

@jacobwoffenden
Copy link
Member

🎉 I am able to connect to DOM1 from my debugging instance! 🎉

@jacobwoffenden
Copy link
Member

Reached out to ATOS because I can't access one of the locations

@jacobwoffenden
Copy link
Member

Have reached out to @gwionap for clarification on source data

@jacobwoffenden
Copy link
Member

Updated locations received from @gwionap, will continue.

@jacobwoffenden
Copy link
Member

Have created a task but is failing...

Image

I can't explore this location with smbclient from the debug instance either

smb: \> ls hq/PGO/Shared/Group
do_connect: Connection to eucw4171nas002.dom1.infra.int failed (Error NT_STATUS_IO_TIMEOUT)
Unable to follow dfs referral [\eucw4171nas002.dom1.infra.int\mojshared002$]
do_list: [\hq\PGO\Shared\Group] NT_STATUS_IO_TIMEOUT

have escalated to @gwionap

@jacobwoffenden
Copy link
Member

A more verbose output from smbclient

smb: \> ls hq/PGO/Shared/Group/
dos_clean_name [\hq\PGO\Shared\Group\]
unix_clean_name [\hq\PGO\Shared\Group\]
signed SMB2 message (sign_algo_id=1)
signed SMB2 message (sign_algo_id=1)
sitename_fetch: No stored sitename for realm ''
namecache_fetch: no entry for eucw4171nas002.dom1.infra.int#20 found.
resolve_hosts: Attempting host lookup for name eucw4171nas002.dom1.infra.int<0x20>
namecache_store: storing 1 address for eucw4171nas002.dom1.infra.int#20: 10.172.69.24
Connecting to 10.172.69.24 at port 445
convert_string_handle: E2BIG: convert_string(UTF-8,CP850): srclen=30 destlen=16 error: No more room
Connecting to 10.172.69.24 at port 139
do_connect: Connection to eucw4171nas002.dom1.infra.int failed (Error NT_STATUS_IO_TIMEOUT)
Unable to follow dfs referral [\eucw4171nas002.dom1.infra.int\mojshared002$]
do_list: [\hq\PGO\Shared\Group\] NT_STATUS_IO_TIMEOUT
signed SMB2 message (sign_algo_id=1)
signed SMB2 message (sign_algo_id=1)
signed SMB2 message (sign_algo_id=1)
signed SMB2 message (sign_algo_id=1)
signed SMB2 message (sign_algo_id=1)

Seeing the following in VPC flow logs

2 730335344807 eni-05ad72cf6a35649b0 10.26.128.43 10.172.69.24 33266 139 6 3 180 1731627738 1731627767 ACCEPT OK

So maybe its the routing back from ATOS?

@jacobwoffenden
Copy link
Member

SMB traffic is being dropped at the Palo Altos 💀

@jacobwoffenden
Copy link
Member

@bagg3rs has raised a demand with Tech Services

@bagg3rs
Copy link
Contributor

bagg3rs commented Nov 29, 2024

Update from Demand here

@jacobwoffenden
Copy link
Member

Things are happening in TS, just slowly.

@YvanMOJdigital YvanMOJdigital moved this from 🚫 Blocked to 🚀 In Progress in Analytical Platform Dec 6, 2024
@jacobwoffenden
Copy link
Member

Blocked again, potential permissions issue on file server...

@jacobwoffenden jacobwoffenden moved this from 🚀 In Progress to 🚫 Blocked in Analytical Platform Dec 9, 2024
@jacobwoffenden
Copy link
Member

Raised with ATOS using original request

@jacobwoffenden jacobwoffenden moved this from 🚫 Blocked to 🚀 In Progress in Analytical Platform Dec 16, 2024
@jacobwoffenden
Copy link
Member

ATOS have asked us to raise a new request but we got around that by going through Networks

Our DataSync user was added to a new group and once we switched to testing with mount it worked,smbclient wasn't working as expected

I initiated a run yesterday, it detected 800k~ files but errored after 120k~

@jacobwoffenden
Copy link
Member

Spoken to @gwionap and the suggestion is to scope to a specific repository

@jacobwoffenden
Copy link
Member

Source scoped as advised by @gwionap, schedule added for 2300 on Wednesday so will run tonight

@jacobwoffenden
Copy link
Member

scheduled run failed, a variety of errors which I cannot via screenshot, but are:

  • Source location responded with: Stale file handle (content changed unexpectedly or became unavailable)

@jacobwoffenden jacobwoffenden moved this from 🚀 In Progress to 🚫 Blocked in Analytical Platform Dec 19, 2024
@jacobwoffenden
Copy link
Member

I've triggered a new task with changes to reporting and verification ministryofjustice/modernisation-platform-environments@fdd8c9e

@jacobwoffenden
Copy link
Member

Still failed, but did transfer some data.

Pausing until new year, we have a catch up with OPG.

@jacobwoffenden
Copy link
Member

Awaiting list of Excel databases to fetch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🚫 Blocked
Development

No branches or pull requests

4 participants