Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add minio-sftp #825

Merged
merged 17 commits into from
Dec 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions .github/workflows/tests-docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -130,21 +130,25 @@ jobs:
curl -s http://localhost/data/metadata/$DISCOVERY_METADATA_ID.json --output /tmp/$DISCOVERY_METADATA_ID
pywcmp ets validate /tmp/$DISCOVERY_METADATA_ID
python3 wis2box-ctl.py execute wis2box data ingest -mdi $DISCOVERY_METADATA_ID -p $TEST_DATA
- name: add Congo synop data (synop2bufr) 🇨🇩
- name: add Congo synop data (synop2bufr) 🇨🇩, using sftp
env:
TOPIC_HIERARCHY: origin/a/wis2/cg-met/data/recommended/weather/surface-based-observations/synop
TERRITORY: COD
DISCOVERY_METADATA: /data/wis2box/metadata/discovery/cd-surface-weather-observations.yml
DISCOVERY_METADATA_ID: urn:wmo:md:cg-met:surface-weather-observations
STATION_METADATA: /data/wis2box/metadata/station/congo.csv
TEST_DATA: /data/wis2box/observations/congo
TEST_DATA: tests/data/observations/congo/SICG20FCBB_202308.txt
run: |
python3 wis2box-ctl.py execute wis2box dataset publish $DISCOVERY_METADATA
python3 wis2box-ctl.py execute wis2box metadata station publish-collection --path $STATION_METADATA --topic-hierarchy $TOPIC_HIERARCHY
curl -s http://localhost/data/metadata/$DISCOVERY_METADATA_ID.json --output /tmp/$DISCOVERY_METADATA_ID
#pywcmp ets validate /tmp/$DISCOVERY_METADATA_ID # uncomment once wis2box improves support for recommended data
python3 wis2box-ctl.py execute wis2box auth add-token --metadata-id $DISCOVERY_METADATA_ID -p token123 -y
python3 wis2box-ctl.py execute wis2box data ingest -mdi $DISCOVERY_METADATA_ID -p $TEST_DATA
sshpass -p 'minio123' sftp -P 8022 -oBatchMode=no -o StrictHostKeyChecking=no wis2box@localhost << EOF
mkdir wis2box-incoming/$DISCOVERY_METADATA_ID
put $TEST_DATA wis2box-incoming/$DISCOVERY_METADATA_ID/
exit
EOF
- name: add example hourly ship data (bufr2bufr) WMO
env:
TOPIC_HIERARCHY: origin/a/wis2/int-wmo-test/data/core/weather/surface-based-observations/ship
Expand Down
1 change: 1 addition & 0 deletions docker-compose.override.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ services:
ports:
- "9000:9000"
- "9001:9001"
- "8022:8022"
deploy:
replicas: 1

Expand Down
6 changes: 2 additions & 4 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,10 +55,10 @@ services:
- MINIO_BROWSER_LOGIN_ANIMATION=off
- MINIO_BROWSER_REDIRECT=false
- MINIO_UPDATE=off
command: server --quiet --console-address ":9001" /data
# in a production-setup minio needs to be
command: server --quiet --console-address ":9001" --sftp="address=:8022" --sftp="ssh-private-key=/home/miniouser/.ssh/id_rsa" /data
volumes:
- minio-data:/data
- ${WIS2BOX_HOST_DATADIR}/.ssh:/home/miniouser/.ssh:ro
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 5s
Expand Down Expand Up @@ -128,8 +128,6 @@ services:
- ${WIS2BOX_HOST_DATADIR}:/data/wis2box:rw
- htpasswd:/home/wis2box/.htpasswd:rw
depends_on:
minio:
condition: service_healthy
mosquitto:
condition: service_started
wis2box-api:
Expand Down
Binary file added docs/source/_static/winscp_minio_sftp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
45 changes: 14 additions & 31 deletions docs/source/user/data-ingest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@

* `bufr2bufr` : the input is received in BUFR format and split by subset, where each subset is published as a separate bufr message
* `synop2bufr` : the input is received in `FM-12 SYNOP format <https://library.wmo.int/idviewer/35713/33>`_ and converted to BUFR format. The year and month are extracted from the file pattern
* `csv2bufr` : the input is received in CSV format and converted to BUFR format, a mapping template is used to convert the CSV columns to BUFR encoded values. Custom mapping templates need to be placed in the ``$WIS2BOX_HOST_DATADIR/mappings`` directory. See :ref:`csv2bufr-templates` for examples of mapping templates

Check warning on line 43 in docs/source/user/data-ingest.rst

View workflow job for this annotation

GitHub Actions / main

undefined label: csv2bufr-templates (if the link has no caption the label must precede a section header)

To publish data for other data formats you can use the 'Universal' plugin, which will pass through the data without any conversion.
Please note that you will need to ensure that the date timestamp can be extracted from the file pattern when using this plugin.
Expand Down Expand Up @@ -171,44 +171,28 @@

pip3 install minio

wis2box-ftp
-----------
Uploading data to MinIO over SFTP
---------------------------------

You can add an additional service to allow your data to be accessible over FTP.
Data can also be uploaded via MinIO using SFTP.

To use the ``docker-compose.wis2box-ftp.yml`` template included in wis2box, create a new file called ``ftp.env`` using any text editor, and add the following content:
By default the SFTP service is enabled on port 8022. You can connect to the SFTP service using the MinIO storage username and password.
Using a client such as WinSCP, a user can connect to the SFTP service to visualize the bucket structure in the SFTP client as shown below:

.. code-block:: bash

MYHOSTNAME=hostname.domain.tld

FTP_USER=wis2box
FTP_PASS=wis2box123
FTP_HOST=${MYHOSTNAME}

WIS2BOX_STORAGE_ENDPOINT=http://${MYHOSTNAME}:9000
WIS2BOX_STORAGE_USERNAME=wis2box
WIS2BOX_STORAGE_PASSWORD=XXXXXXXX

LOGGING_LEVEL=INFO
.. image:: ../_static/winscp_minio_sftp.png
:width: 600
:alt: Screenshot of WinSCP showing directory structure of MinIO over SFTP

ensure ``MYHOSTNAME`` is set to **your** hostname (fully qualified domain name) and ``WIS2BOX_STORAGE_PASSWORD`` is set to **your** MinIO password.
To utilize this functionality, data needs to be uploaded to the ``wis2box-incoming`` bucket, in a directory that matches the dataset metadata identifier or the topic in the data mappings.

Then start the ``wis2box-ftp`` service with the following command:
For example using the command line from the host running wis2box:

.. code-block:: bash

docker compose -f docker-compose.wis2box-ftp.yml --env-file ftp.env up -d

When using the wis2box-ftp service to ingest data, please note that the topic is determined by the directory structure in which the data arrives.

For example, to correctly ingest data on the topic ``it-meteoam/data/core/weather/surface-based-observations/synop`` you need to copy the data into the directory ``/it-meteoam/data/core/weather/surface-based-observations/synop`` on the FTP server:

.. image:: ../_static/winscp_wis2box-ftp_example.png
:width: 600
:alt: Screenshot of WinSCP showing directory structure in wis2box-ftp

See the GitHub repository `wis2box-ftp`_ for more information on this service.
sftp -P 8022 -oBatchMode=no -o StrictHostKeyChecking=no wis2box@localhost << EOF
mkdir wis2box-incoming/urn:wmo:md:it-meteoam:surface-weather-observations.synop
put /path/to/your/datafile.csv wis2box-incoming/urn:wmo:md:it-meteoam:surface-weather-observations.synop
EOF

wis2box-data-subscriber
-----------------------
Expand Down Expand Up @@ -248,7 +232,6 @@
Next: :ref:`public-services-setup`

.. _`MinIO`: https://min.io/docs/minio/container/index.html
.. _`wis2box-ftp`: https://github.com/wmo-im/wis2box-ftp
.. _`wis2box-data-subscriber`: https://github.com/wmo-im/wis2box-data-subscriber
.. _`WIS2 topic hierarchy`: https://github.com/wmo-im/wis2-topic-hierarchy
.. _`csv2bufr-templates`: https://github.com/wmo-im/csv2bufr-templates
14 changes: 14 additions & 0 deletions wis2box-management/docker/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,20 @@ set -e
#ensure environment-variables are available for cronjob
printenv | grep -v "no_proxy" >> /etc/environment

# create .ssh directory if not exists
if [ ! -d /data/wis2box/.ssh ]; then
echo "Creating /data/wis2box/.ssh"
mkdir /data/wis2box/.ssh
fi

# create private key file if not exists
if [ ! -f /data/wis2box/.ssh/id_rsa ]; then
echo "Creating /home/wis2box/.ssh/id_rsa"
# generate private key
ssh-keygen -t rsa -b 4096 -f /data/wis2box/.ssh/id_rsa -N ""
chmod 600 /data/wis2box/.ssh/id_rsa
fi

# wis2box commands
# TODO: avoid re-creating environment if it already exists
# TODO: catch errors and avoid bounce in conjuction with restart: always
Expand Down
6 changes: 5 additions & 1 deletion wis2box-management/wis2box/pubsub/subscribe.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,9 +148,13 @@ def on_message_handler(self, client, userdata, msg):
# store notification in messages collection
upsert_collection_item('messages', message)
elif (topic == 'wis2box/storage' and
message.get('EventName', '') == 's3:ObjectCreated:Put'):
message.get('EventName', '') in ['s3:ObjectCreated:Put', 's3:ObjectCreated:CompleteMultipartUpload']): # noqa
LOGGER.debug('Storing data')
key = str(message['Key'])
# if key ends with / then it is a directory
if key.endswith('/'):
LOGGER.info(f'Do not process directories: {key}')
return
filepath = f'{STORAGE_SOURCE}/{key}'
if key.startswith(STORAGE_ARCHIVE):
LOGGER.info(f'Do not process archived-data: {key}')
Expand Down
Loading