-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubernetes - document how to get running on Amazon EKS #194
Comments
This blog post says:
Which seems to indicate that EFS seems to be what I need to look into. |
Just wanted to happily report that I was able to figure out getting Malcolm running on EKS (and even accessible from the internet). Using ALB (instead of Ingress-NGINX) when deploying on EKS seemed to work well. I've been documenting everything in the Malcolm documentation in my development fork, the latest and greatest docs regarding Malcolm in K8S (and now EKS) can be found here as they've yet to be merged into the main Malcolm repos:
One thing I still need to figure out is how (if?) ALB can handle a couple of the other services Malcolm uses to receive logs. I've got ALB working fine for the main HTTPS endpoint at 443/tcp and for the OpenSearch REST API endpoint at 9200/tcp, but traditionally Malcolm's Logstash instance accepts connections on 5044/tcp to receive logs from Beats. This is TLS-encrypted but not HTTPS. Can the ALB ingress controller be configured to allow me to accept arbitrary TCP socket connections that are not HTTP(S)-based? |
Using the instructions I've outlined in the last comment I've got it up and running on EKS, connecting to HTTPS for 443 and 9200 and with fluent-bit to 5044 and 5045 for logs and metadata. There will surely be improvements but I'm going to mark as closed for now. |
Malcolm v23.07.0 is a feature release with a number of improvements, bux fixes and component updates. v23.05.1...v23.07.0 * New features - scan docker images built via GitHub actions for vulnerabilities using Trivy (idaholab#218) - document building and deplolying Malcolm with an AWS AMI image (idaholab#205) - handle Arkime field actions (idaholab#200) - kubernetes: document how to get running on Amazon EKS (idaholab#194) - Populate NetBox inventory via passively-gathered network traffic metadata (basic functionality, work in progress) (idaholab#135) * Enhancements - use .tar.xz instead of .tar.gz for packaging Malcolm docker images for better compression (and smaller ISO file size) - Malcolm documentation edits (idaholab#204) - add option to enable SSH via password in hedgehog's configure-interfaces.py script (idaholab#158) - updated "Network Traffic Analysis with Malcolm" slides - use an init container in Kubernetes container startup to ensure necessary directories get created under PersistentVolume objects before startup - improvements to identifying source of third-party logs sent via fluent bit - don't do unnecessary clone of Zeek plugins, just install using URL - parse [bacnet_device_control.log](https://github.com/cisagov/icsnpp-bacnet/#device-control-log-bacnet_device_controllog) produced by the icsnpp-bacnet parser for Zeek * Bug fixes - maxlogins value includes tmux sessions, can lock user out of SSH (idaholab#214) - curl rc file for connecting to external OpenSearch without auth enabled causes logstash startup to fail (idaholab#209) - failure to parse some suricata alerts due to integer type which should be indexed as long (idaholab#206) - netbox-restore doesn't work in Kubernetes (idaholab#202) - PCAP File with no `-` in pcapng Fails to Upload (#265) - disable NetBox telemetry * Component version updates - Alpine (docker container image base) to [v3.18.0](https://www.alpinelinux.org/posts/Alpine-3.18.0-released.html) - Arkime to [v4.3.2](https://github.com/arkime/arkime/blob/8bd9d1ccaf3214eeb07da910c45d6172f9ff4ca8/CHANGELOG#L40-L55) - capa to [v6.0.0](https://github.com/mandiant/capa/releases/tag/v6.0.0) - filebeat to [v8.8.2](https://www.elastic.co/guide/en/beats/libbeat/current/release-notes-8.8.2.html) - NetBox to [v3.5.4](https://github.com/netbox-community/netbox/releases/tag/v3.5.4) - OpenSearch and OpenSearch Dashboards to [v2.8.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.8.0.md) - Supercronic to [v0.2.25](https://github.com/aptible/supercronic/releases/tag/v0.2.25) - YARA to [v4.3.2](https://github.com/VirusTotal/yara/releases/tag/v4.3.2) - Zeek to [v5.2.2](https://github.com/zeek/zeek/releases/tag/v5.2.2) Malcolm and Hedgehog Linux may be obtained by pulling or building the Docker images and/or building the ISO installer images as described in the documentation. Unofficial ISO installer images for Malcolm and Hedgehog Linux are not hosted on GitHub, but may be downloaded from [https://malcolm.fyi/](https://malcolm.fyi/docs/download.html).
Malcolm v23.07.0 is a feature release with a number of improvements, bux fixes and component updates. v23.05.1...v23.07.0 * New features - scan docker images built via GitHub actions for vulnerabilities using Trivy (#218) - document building and deplolying Malcolm with an AWS AMI image (#205) - handle Arkime field actions (#200) - kubernetes: document how to get running on Amazon EKS (#194) - Populate NetBox inventory via passively-gathered network traffic metadata (basic functionality, work in progress) (#135) * Enhancements - use .tar.xz instead of .tar.gz for packaging Malcolm docker images for better compression (and smaller ISO file size) - Malcolm documentation edits (#204) - add option to enable SSH via password in hedgehog's configure-interfaces.py script (#158) - updated "Network Traffic Analysis with Malcolm" slides - use an init container in Kubernetes container startup to ensure necessary directories get created under PersistentVolume objects before startup - improvements to identifying source of third-party logs sent via fluent bit - don't do unnecessary clone of Zeek plugins, just install using URL - parse [bacnet_device_control.log](https://github.com/cisagov/icsnpp-bacnet/#device-control-log-bacnet_device_controllog) produced by the icsnpp-bacnet parser for Zeek * Bug fixes - maxlogins value includes tmux sessions, can lock user out of SSH (#214) - curl rc file for connecting to external OpenSearch without auth enabled causes logstash startup to fail (#209) - failure to parse some suricata alerts due to integer type which should be indexed as long (#206) - netbox-restore doesn't work in Kubernetes (#202) - PCAP File with no `-` in pcapng Fails to Upload (cisagov#265) - disable NetBox telemetry * Component version updates - Alpine (docker container image base) to [v3.18.0](https://www.alpinelinux.org/posts/Alpine-3.18.0-released.html) - Arkime to [v4.3.2](https://github.com/arkime/arkime/blob/8bd9d1ccaf3214eeb07da910c45d6172f9ff4ca8/CHANGELOG#L40-L55) - capa to [v6.0.0](https://github.com/mandiant/capa/releases/tag/v6.0.0) - filebeat to [v8.8.2](https://www.elastic.co/guide/en/beats/libbeat/current/release-notes-8.8.2.html) - NetBox to [v3.5.4](https://github.com/netbox-community/netbox/releases/tag/v3.5.4) - OpenSearch and OpenSearch Dashboards to [v2.8.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.8.0.md) - Supercronic to [v0.2.25](https://github.com/aptible/supercronic/releases/tag/v0.2.25) - YARA to [v4.3.2](https://github.com/VirusTotal/yara/releases/tag/v4.3.2) - Zeek to [v5.2.2](https://github.com/zeek/zeek/releases/tag/v5.2.2) Malcolm and Hedgehog Linux may be obtained by pulling or building the Docker images and/or building the ISO installer images as described in the documentation. Unofficial ISO installer images for Malcolm and Hedgehog Linux are not hosted on GitHub, but may be downloaded from [https://malcolm.fyi/](https://malcolm.fyi/docs/download.html).
The Kubernetes deployment (#149) has been released, but I still need to figure out how to get it working on Amazon AWS EKS.
I've got a work in progress document in my development fork where I'm putting the steps I've taken as I've tried to figure it out.
So far I've actually gotten Malcolm to deploy and start up okay, but have run into issues figuring out the right approach for the shared storage for the persistent volumes (a more complicated way of saying "the file systems the various Malcolm containers need to mount in order to share"). I was able to get a little bit further along using the gp2 storage type (which is like an ECS instance's default) but found that it didn't support multi-attach, which (barring some major architectural changes to Malcolm as a bunch of containers share mount point for zeek logs, pcap files, suricata logs, etc.) is a pretty firm requirement. I tried switching over to the io2 storage class, but in creating my storage volumes I ran into an error like "An error occurred (MaxIOPSLimitExceeded) when calling the CreateVolume operation: You have exceeded your maximum io2 IOPS limit of 100000 IOPS in this region. Please contact AWS Support to request an Elastic Block Store service limit increase" so I stopped that pretty quick and deleted the volumes. I tried it with io1 as well, but got a similar error about
MULTI_NODE_MULTI_WRITER not supported
for theio1
storage, and something about an instance not existing for the gp2 storage (which i created for the opensearch volumes as they don't require readwritemany).So right now I'm kind of stuck on the storage side of it. I'm thinking that maybe I should try AWS NFS File Shares with EFS (?).
The text was updated successfully, but these errors were encountered: