Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes - document how to get running on Amazon EKS #194

Closed
mmguero opened this issue May 10, 2023 · 3 comments
Closed

kubernetes - document how to get running on Amazon EKS #194

mmguero opened this issue May 10, 2023 · 3 comments
Assignees
Labels
cloud Relating to deployment of Malcolm in the cloud and/or with Kubernetes doc Relating to Malcolm documentation
Milestone

Comments

@mmguero
Copy link
Collaborator

mmguero commented May 10, 2023

The Kubernetes deployment (#149) has been released, but I still need to figure out how to get it working on Amazon AWS EKS.

I've got a work in progress document in my development fork where I'm putting the steps I've taken as I've tried to figure it out.

So far I've actually gotten Malcolm to deploy and start up okay, but have run into issues figuring out the right approach for the shared storage for the persistent volumes (a more complicated way of saying "the file systems the various Malcolm containers need to mount in order to share"). I was able to get a little bit further along using the gp2 storage type (which is like an ECS instance's default) but found that it didn't support multi-attach, which (barring some major architectural changes to Malcolm as a bunch of containers share mount point for zeek logs, pcap files, suricata logs, etc.) is a pretty firm requirement. I tried switching over to the io2 storage class, but in creating my storage volumes I ran into an error like "An error occurred (MaxIOPSLimitExceeded) when calling the CreateVolume operation: You have exceeded your maximum io2 IOPS limit of 100000 IOPS in this region. Please contact AWS Support to request an Elastic Block Store service limit increase" so I stopped that pretty quick and deleted the volumes. I tried it with io1 as well, but got a similar error about MULTI_NODE_MULTI_WRITER not supported for the io1 storage, and something about an instance not existing for the gp2 storage (which i created for the opensearch volumes as they don't require readwritemany).

So right now I'm kind of stuck on the storage side of it. I'm thinking that maybe I should try AWS NFS File Shares with EFS (?).

@mmguero mmguero added doc Relating to Malcolm documentation cloud Relating to deployment of Malcolm in the cloud and/or with Kubernetes labels May 10, 2023
@mmguero mmguero added this to Malcolm May 10, 2023
@mmguero
Copy link
Collaborator Author

mmguero commented May 10, 2023

This blog post says:

Comparing EFS to another popular Amazon service, EBS, the major advantage of EFS is that it offers shared storage. EBS has higher performance, but it is attached to a specific EC2 machine instance and cannot be shared across multiple machines.

Which seems to indicate that EFS seems to be what I need to look into.

mmguero added a commit to mmguero-dev/Malcolm that referenced this issue May 17, 2023
@mmguero
Copy link
Collaborator Author

mmguero commented May 18, 2023

Just wanted to happily report that I was able to figure out getting Malcolm running on EKS (and even accessible from the internet). Using ALB (instead of Ingress-NGINX) when deploying on EKS seemed to work well.

I've been documenting everything in the Malcolm documentation in my development fork, the latest and greatest docs regarding Malcolm in K8S (and now EKS) can be found here as they've yet to be merged into the main Malcolm repos:

  • Deploying Malcolm with Kubernetes: Ingress Controllers
    • I've added a section here where I describe and give examples for using Ingress-NGINX for self-hosting or other non-AWS Kubernetes installations, as that has worked well for us in-house. I also give an example manifest for and suggesting using the ALB controller for EKS deployment.
  • Deploying Malcolm on Amazon Elastic Kubernetes Service (EKS)
    • I've updated my instructions here for how I got my cluster and EFS configured to run Malcolm on EKS, including the manifest example and references for ALB. A lot of this is still pretty manual, @piercema and I are going to look at Terraform to do some of the cluster provisioning and whatnot. There are a few changes I need to make to some of my containers' initialization to ensure the necessary subdirectories get created underneath some of the persistent volume mount points on startup, but that won't be difficult.

One thing I still need to figure out is how (if?) ALB can handle a couple of the other services Malcolm uses to receive logs. I've got ALB working fine for the main HTTPS endpoint at 443/tcp and for the OpenSearch REST API endpoint at 9200/tcp, but traditionally Malcolm's Logstash instance accepts connections on 5044/tcp to receive logs from Beats. This is TLS-encrypted but not HTTPS. Can the ALB ingress controller be configured to allow me to accept arbitrary TCP socket connections that are not HTTP(S)-based?

@mmguero mmguero self-assigned this May 18, 2023
@mmguero mmguero moved this to In Progress in Malcolm May 18, 2023
@mmguero mmguero added this to the v23.06.0 milestone May 18, 2023
@mmguero
Copy link
Collaborator Author

mmguero commented May 18, 2023

Using the instructions I've outlined in the last comment I've got it up and running on EKS, connecting to HTTPS for 443 and 9200 and with fluent-bit to 5044 and 5045 for logs and metadata. There will surely be improvements but I'm going to mark as closed for now.

@mmguero mmguero closed this as completed May 18, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Malcolm May 18, 2023
mmguero added a commit to cisagov/Malcolm that referenced this issue Jul 19, 2023
Malcolm v23.07.0 is a feature release with a number of improvements, bux fixes and component updates.

v23.05.1...v23.07.0

* New features
    - scan docker images built via GitHub actions for vulnerabilities using Trivy (idaholab#218)
    - document building and deplolying Malcolm with an AWS AMI image (idaholab#205)
    - handle Arkime field actions (idaholab#200)
    - kubernetes: document how to get running on Amazon EKS (idaholab#194)
    - Populate NetBox inventory via passively-gathered network traffic metadata (basic functionality, work in progress) (idaholab#135)

* Enhancements
    - use .tar.xz instead of .tar.gz for packaging Malcolm docker images for better compression (and smaller ISO file size)
    - Malcolm documentation edits (idaholab#204)
    - add option to enable SSH via password in hedgehog's configure-interfaces.py script (idaholab#158)
    - updated "Network Traffic Analysis with Malcolm" slides
    - use an init container in Kubernetes container startup to ensure necessary directories get created under PersistentVolume objects before startup
    - improvements to identifying source of third-party logs sent via fluent bit
    - don't do unnecessary clone of Zeek plugins, just install using URL
    - parse [bacnet_device_control.log](https://github.com/cisagov/icsnpp-bacnet/#device-control-log-bacnet_device_controllog) produced by the icsnpp-bacnet parser for Zeek

* Bug fixes
    - maxlogins value includes tmux sessions, can lock user out of SSH (idaholab#214)
    - curl rc file for connecting to external OpenSearch without auth enabled causes logstash startup to fail (idaholab#209)
    - failure to parse some suricata alerts due to integer type which should be indexed as long (idaholab#206)
    - netbox-restore doesn't work in Kubernetes (idaholab#202)
    - PCAP File with no `-` in pcapng Fails to Upload (#265)
    - disable NetBox telemetry

* Component version updates
    - Alpine (docker container image base) to [v3.18.0](https://www.alpinelinux.org/posts/Alpine-3.18.0-released.html)
    - Arkime to [v4.3.2](https://github.com/arkime/arkime/blob/8bd9d1ccaf3214eeb07da910c45d6172f9ff4ca8/CHANGELOG#L40-L55)
    - capa to [v6.0.0](https://github.com/mandiant/capa/releases/tag/v6.0.0)
    - filebeat to [v8.8.2](https://www.elastic.co/guide/en/beats/libbeat/current/release-notes-8.8.2.html)
    - NetBox to [v3.5.4](https://github.com/netbox-community/netbox/releases/tag/v3.5.4)
    - OpenSearch and OpenSearch Dashboards to [v2.8.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.8.0.md)
    - Supercronic to [v0.2.25](https://github.com/aptible/supercronic/releases/tag/v0.2.25)
    - YARA to [v4.3.2](https://github.com/VirusTotal/yara/releases/tag/v4.3.2)
    - Zeek to [v5.2.2](https://github.com/zeek/zeek/releases/tag/v5.2.2)

Malcolm and Hedgehog Linux may be obtained by pulling or building the Docker images and/or building the ISO installer images as described in the documentation. Unofficial ISO installer images for Malcolm and Hedgehog Linux are not hosted on GitHub, but may be downloaded from [https://malcolm.fyi/](https://malcolm.fyi/docs/download.html).
mmguero added a commit that referenced this issue Jul 19, 2023
Malcolm v23.07.0 is a feature release with a number of improvements, bux fixes and component updates.

v23.05.1...v23.07.0

* New features
    - scan docker images built via GitHub actions for vulnerabilities using Trivy (#218)
    - document building and deplolying Malcolm with an AWS AMI image (#205)
    - handle Arkime field actions (#200)
    - kubernetes: document how to get running on Amazon EKS (#194)
    - Populate NetBox inventory via passively-gathered network traffic metadata (basic functionality, work in progress) (#135)

* Enhancements
    - use .tar.xz instead of .tar.gz for packaging Malcolm docker images for better compression (and smaller ISO file size)
    - Malcolm documentation edits (#204)
    - add option to enable SSH via password in hedgehog's configure-interfaces.py script (#158)
    - updated "Network Traffic Analysis with Malcolm" slides
    - use an init container in Kubernetes container startup to ensure necessary directories get created under PersistentVolume objects before startup
    - improvements to identifying source of third-party logs sent via fluent bit
    - don't do unnecessary clone of Zeek plugins, just install using URL
    - parse [bacnet_device_control.log](https://github.com/cisagov/icsnpp-bacnet/#device-control-log-bacnet_device_controllog) produced by the icsnpp-bacnet parser for Zeek

* Bug fixes
    - maxlogins value includes tmux sessions, can lock user out of SSH (#214)
    - curl rc file for connecting to external OpenSearch without auth enabled causes logstash startup to fail (#209)
    - failure to parse some suricata alerts due to integer type which should be indexed as long (#206)
    - netbox-restore doesn't work in Kubernetes (#202)
    - PCAP File with no `-` in pcapng Fails to Upload (cisagov#265)
    - disable NetBox telemetry

* Component version updates
    - Alpine (docker container image base) to [v3.18.0](https://www.alpinelinux.org/posts/Alpine-3.18.0-released.html)
    - Arkime to [v4.3.2](https://github.com/arkime/arkime/blob/8bd9d1ccaf3214eeb07da910c45d6172f9ff4ca8/CHANGELOG#L40-L55)
    - capa to [v6.0.0](https://github.com/mandiant/capa/releases/tag/v6.0.0)
    - filebeat to [v8.8.2](https://www.elastic.co/guide/en/beats/libbeat/current/release-notes-8.8.2.html)
    - NetBox to [v3.5.4](https://github.com/netbox-community/netbox/releases/tag/v3.5.4)
    - OpenSearch and OpenSearch Dashboards to [v2.8.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.8.0.md)
    - Supercronic to [v0.2.25](https://github.com/aptible/supercronic/releases/tag/v0.2.25)
    - YARA to [v4.3.2](https://github.com/VirusTotal/yara/releases/tag/v4.3.2)
    - Zeek to [v5.2.2](https://github.com/zeek/zeek/releases/tag/v5.2.2)

Malcolm and Hedgehog Linux may be obtained by pulling or building the Docker images and/or building the ISO installer images as described in the documentation. Unofficial ISO installer images for Malcolm and Hedgehog Linux are not hosted on GitHub, but may be downloaded from [https://malcolm.fyi/](https://malcolm.fyi/docs/download.html).
@mmguero mmguero moved this from Done to Released in Malcolm Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cloud Relating to deployment of Malcolm in the cloud and/or with Kubernetes doc Relating to Malcolm documentation
Projects
Status: Released
Development

No branches or pull requests

1 participant