A cloud native data mesh implementation
This repo will set up a vagrant box (fredrikhgrelland/hashistack) on your local machine with HashiStack software and an integrated suite of tools;
- MinIO (in gateway mode)
- Hive-metastore (custom repo)
- Presto
- Sqlpad
Installed software:
- Vagrant
- Make
- Virtualbox
- Consul
See fredrikhgrelland/vagrant-hashistack for installation of prerequisites.
This stack requires at least 4
cpu cores and 16GB
memory to run stable. This can be tweaked in Vagrantfile. It has been tested to run on linux and macos.
If you find yourself behind a transparent proxy for any reason , you need to set the environment variables SSL_CERT_FILE
and CURL_CA_BUNDLE
. You have three options:
- Prefix
vagrant up
;SSL_CERT_FILE=<path/to/ca-certificates-file> CURL_CA_BUNDLE=<path/to/ca-certificates-file> vagrant up
. - Set the environment variables in your current session by running
export SSL_CERT_FILE=<path/to/ca-certificates-file>
andexport CURL_CA_BUNDLE=<path/to/ca-certificates-file>
in the terminal. Eg:export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
. - Set the environment variables permanently by adding the above export commands to your
~/.bashrc
orgedit ~/.profile
in the terminal Window.
Data is stored in MinIO
, an S3 compliant object storage. Data in S3 can be accessed by Presto
, a "SQL on anything" distributed database through table definitions stored in hive-metastore
. You may use Presto-CLI to send queries to Presto, or you may use the integrated SQL-interface called SQLPad
. In this interface you can write and visualize SQL-queries executed by Presto or other SQL-engines. SQLPad
has a default connection to our Presto
.
Under is a fairly high-level view of the whole system.
- From the same folder that contains this
README
, runmake up
. This will start the process of provisioning a vagrant box on your machine. Everything is automated, and it will set up the components mentioned in the introduction.
After everything is set up you need to establish connections to the components inside your vagrant box. There are four commands, and they will all need their own terminal-window as long as you want the connection to stay open. You can choose what components you want to open a connection to. If you don't need to go into the presto-dashboard, you don't have to open a connection.
make connect-to-minio
make connect-to-hive
make connect-to-presto
make connect-to-sqlpad
After running the commands above, the URL to access their respective components are:
- MinIO: localhost:8090
- Hive: localhost:8070
- Presto: localhost:8080
- SQLPad: localhost:3000
To access the consul-dashboard open the URL localhost:8500 in your browser (you do not need to run any make connect-to-something
command). In this dashboard you can see all your services, and whether they are ready or not. The consul-dashboard will be available shortly after running make up
.
To access the nomad-dashboard open the URL localhost:4646 in your browser (you do not need to run any make connect-to-something
command). This dashboard shows all your running services, and gives an overview of which hosts the containers run on, their use of resources, their logs, etc.. You do not need to check this dashboard unless you want to know about the inner workings of the data-mesh
system.
To access the presto-dashboard open the URL localhost:8080 in your browser (NB: make connect-to-presto
must be run first). Here you can see every query that has been executed by Presto
, both failed and successful ones. You can also see general statistics of the Presto
instance.
To access the minio-dashboard open the URL localhost:8090 in your browser (NB: make connect-to-minio
must be run first). The username and password is minioadmin
. This dashboard shows all files you have stored, and they will be organised in what is called buckets. You can treat these as normal directories. There will already be two existing buckets, hive
and default
. To upload your own files you need to create a new bucket to put it in, which can be done by pressing the plus sign in the right hand corner, then the button looking like a hard drive, which is create bucket
. You can now access your new bucket by clicking the name of your new bucket in the left hand column.
To access the SQLPad-dashboard open the URL localhost:3000 in your browser (NB: make connect-to-sqlpad
must be run first). Default mail and password is [email protected]
, admin
. This is an interface where you can write and run sql queries that will be executed by our Presto
instance. To begin using it you need to select a connection in the top left corner. There is a default connection to our Presto
you can use, called Presto - hiveCatalog - defaultSchema
. After selecting the connection you can see an example table already laying in our schema in the visualisation on the left hand side. To query it you can run SELECT * FROM iris
.
If you want to use a local presto-cli and connect that directly to presto, instead of using a sql-interface like SQLPad
, you can do that by running these commands in sequence (NB: make connect-to-presto
must be run first):
make download-presto-cli
make presto-cli
There will now be a live-connection to presto in your terminal. Write show tables;
in the terminal and hit enter. You will see there is one table available. This can be queried with select * from iris;
.