Cloudmesh catalog can be used to store information about a service, software component, or project. The information included in it can be categorized so that a comparision is possible. The catalog is implemented as REST service so it can be integrated in other projects and searched programatically.
The catalog depends on the cloudmesh command shell which allows eay integration of new commands line environment. It projects a sample Interface for the catalog from the commandline
We also can create static web pages from the catalog while using the export feature and integrating the pages in for example hugo.
We are currently exploring hugo docsy as it provides an easy way to generate hirarchical web pages, but also leverages hugos tags and categories. Other export formats include markdown and bibtex.
-
If you do not have yet create an ssh key and upload it to the github repository.
ssh-keygen
Upload the
~/.ssh/id_rsa.pub
key to github -
Download cloudmesh with its source repositories
Make sure you ave python 3.10.2
On Mac or Linux do
$ python3.10 -m venv ~/ENV3 $ source ~/ENV3/bin/activate
On Windows
$ py --version # make sure its 3.10.2 $ py -m venv ~/ENV3 $ source ~/ENV3/bin/activate
After that the instalation is the same on all operating systems.
$ mkdir cm $ cd cm $ pip install cloudmesh-installer $ cloudmesh-installer -ssh install catalog $ cms help
This will download all source code for the cloudmesh shell and compile from source.
-
Now you are all ready to do programming and enhancing cloudmesh-catalog If you have any issues, contact [email protected]
A manual pasge shoudl be implemented in
cloudmesh-catalog/catalog/command/catalog.py
This manual page can
be displayed with the following command:
$ cms help catalog help
To just see the usage type in
$ cms catalog
TODO: The integration of data into the service is not yet completed.
TODO: service management on Windows is not yet completed.
On Linux and macOS we can already experiment with an early prototype that allows us starting, sopping, and getting the status of the service. This service has nnot yet been integrated with a database.
TODO: The adat is not yet integrated and we like to use cloudmesh/yamldb for it.
TODO: To add catalog and registry data for new services, one must create new .yaml files in the appropriate folders: 'data/catalog/my_example.yaml' and 'data/registry/my_example.yaml'. Each file must follow yaml formatting similar to the following example.
Example file: Amazon Comprehend (Catalog), amazon_comprehend.yaml
---
id: amazon_comprehend:
name: Amazon Comprehend
title: Amazon Comprehend
author: Amazon
slug: amazon-comprehend
public: true
description: |
Comprehend is Amazon's solution for cloud-based NLP.
It is available with an AWS account. To use,
it requires use of either the AWS Command Line
Interface or an AWS SDK for Python, Java, or .NET.
Notable features include functionality for giving
batches of documents to be processed as well as
submission of multiple jobs in a list. The DetectEntities
function also allows use of a custom-trained
model, but many other functions do not.
version: unknown
license: unknown
microservice: no
protocol: AWS API
owner: Amazon Web Services
modified: 9/29/2021
created: 11/29/2017
documentation: https://docs.aws.amazon.com/comprehend/index.html
source: unknown
specification: unknown
tags: ["nlp", "nlp service", "machine learning", "cloud service", "nlp api",
"deep learning", "natural language processing", "artificial intelligence"]
categories: ["NLP"]
additional_metadata: unknown
endpoint: unknown
sla: https://aws.amazon.com/machine-learning/language/sla/
authors: The AWS team can be contacted through support ticket at https://aws.amazon.com/contact-us/
data: |
User data is stored on Amazon servers under the associated AWS account and is protected under the AWS
shared responsibility model as detailed here https://aws.amazon.com/compliance/shared-responsibility-model/
Written in catalog.py and registry.py are classes capable of reading and storing the data written in the .yaml files. Both use the same interface. Here is an example of the Catalog class in action:
# initialize the catalog using data found in the given directory
catalog = Catalog('data/catalog/')
# query the catalog for Amazon Comprehend data, save result to amazon_catalog_data
amazon_catalog_data = cat.query({'name': 'Amazon Comprehend'})
# add a new data file to the catalog
catalog.add('new_example/azure_language.yaml')
# save entire catalog to a pickle file
catalog.to_pickle('catalog.pkl')
# load from pickle file
catalog.from_pickle('catalog.pkl')
# print catalog data
print(catalog.data)
The catalog command includes several prototype export formats that takes all files recursively in a directory or an explicit file and converts it to the specified output
This includes
cms catalog export bibtex --souce=SOURCE
cms catalog export hugo --souce=SOURCE
cms catalog export md --souce=SOURCE
The commands will create next to the yal file entreies for bibtex, hugo markdown, and markdown.
The templates are just suggestions and we may improve them based on our findings.
It is very important that any entry be checked for minimal yaml complience. Hence we implemented a command
cms catalog check --souce=SOURCE
which will check all file sin the specified directory. THIs check will ignore line legth limits if the line contains an http or https refernce. We also check the data format for YYYY-MM-DD.
We know that it may be problematic to distingush automatically between YYYY-MM-DD and YYYY-DD-MM. Hence we encourage you to be careful when adding entries.
We are providing a number of developer video tutorials that help undesrtanding how we develop code and leverage the cloudmesh-cmd5 shell features:
- Cloudmesh Catalog. Who to improve the check feature
- Cloudmesh Catalog. Overview of the converter
- Cloudmesh Catalog. How to use the integration with hugo
- Cloudmesh Catalog. Managing the server with start, stop, info, status
- Cloudmesh Catalog. Running the Server on a Mac on port 8001
- Github Tips and Project management
- Overview Cloudmesh NIST project
Other videos are available at
Command catalog
===============
::
Usage:
catalog info
catalog start [--docker] [--name=NAME]
catalog stop [--docker] [--name=NAME] [--pid=PID]
catalog status [--docker] [--name=NAME]
catalog list
catalog default [--name=NAME]
catalog init DIR [--name=NAME] [--port=PORT] [--docker]
catalog query QUERY [--name=NAME]
catalog table --attributes=ATTRIBUTES [--name=NAME]
catalog print [--format=FORMAT] [--name=NAME]
catalog copy [--docker] [--name=NAME] [--source=URL]
catalog federate [--docker] [--name=NAME] [--source=URL]
catalog load [--docker] [--name=NAME] [--source=URL]
catalog export bibtex [--source=SOURCE] [--destination=DESTINATION]
catalog export md [--source=SOURCE] [--destination=DESTINATION]
catalog export hugo [--source=SOURCE] [--destination=DESTINATION]
catalog export --template=TEMPLATE [--source=SOURCE] [--destination=DESTINATION]
catalog check [--source=SOURCE]
This command manages the catalog service.
Arguments:
DIR the directory path containing the entries
Options:
--docker docker
--name=NAME the name of the entry
--port=PORT the port
Description:
catalog list
lists all available catalog services. There could be multiple
catalog services
catalog default [--name=NAME]
sets the default catalog server to the given name.
The names of all services is stored in a yaml file at
~/.cloudmesh/catalog.services.yaml
> cloudmesh:
> catalog:
> - name: my-service-a
> mode: native
> port: 10000
> - name: my-service-a
> mode: docker
> port: 10001
catalog init DIR [--name=NAME] [--port=PORT] [--docker]
This command initializes a given catalog service, while using the
directory DIR as a content dir for the entries.
The dir can have multiple subdirectories for better organization.
Each subdirectory name is automatically a "tag" in the entry.
Note that it will be added to any tag that is in the entry. If
the tag is already in the entry it will be ignored.
The name is the name of the catalog to identify it in case
multiple catalogs exist
The port is the port number. The number is identified from the catalog list and is the next
available port if it is not already used. If no prior catalog service with a port exists
the port 40000 will be used
If the docker flag is specified the catalog will not be started natively, but in a
docker container. uid and gid will be automatically forwarded to the container, so data changes are
conducted with the host user.
If the image does not exist, a docker container will be started. The Dockerfile is located in the code
base and dynamically retrieved from the pip installed package in
cloudmesh/catalog/Dockerfile
catalog query QUERY [--name=NAME]
issues a query to the given catalog services. If the name is omitted the default service is used
The query is formulated using https://jmespath.org/tutorial.html
catalog print [--format=FORMAT] [--name=NAME]
prints all entries of the given catalogs. With attributes you can select a number of attributes.
If the attributes ae nested a . notation can be used
The format is by default table, but can also set to json, yaml, csv
catalog start [--docker] [--name=NAME]
This command starts the services. If docker is used the service is started
as container. The name specifies the service so multiple services can be started
If the name is omitted the default container is used. If only one service is specified
this is the default
catalog stop [--docker] [--name=NAME]
This command stops the services. If docker is used the service is stopped
as container. The name specifies the service so multiple services can be started
If the name is omited the default container is used. If only one service is specified
this is the default
catalog status [--docker] [--name=NAME]
This command gets that status of the services. If docker is used the service is stopped
as container. The name specifies the service so multiple services can be started
If the name is omited the default container is used. If only one service is specified
this is the default
catalog copy [--docker] [--name=NAME] [--source=URL]
This command copies the contents from all catalogs specified by the
source urls. Please note that the URLs are of teh form host:port
However it can also load data from a file or directory when specified as
file://path. Relative path can be specified as file::../data
catalog federate [--docker] [--name=NAME] [--source=URL]
This command federates the contents from all catalogs specified by the
source urls. Please note that the URLs are of teh form host:port.
When the federation service is queried, parallel queries will be issued to
all sources and the query result will be reduced to a single result.
when the cache option is specified the result will be cached and the next
time the query is asked it will use also the cached result. A time to live
is specified to asure the cached result will be deleted after the ttl is expired.
catalog load [--docker] [--name=NAME] [--source=URL]
In contrast to the copy command, the LOAD command reads the data from
directories or files and not from URLs
However, copy can also do file://path
catalog export bibtex [--source=SOURCE] [--destination=DESTINATION]
Exports the information from the catalog as a single bibtex file
If a name is specified only the named entries are exported.
The format of the entries will be
> @misc{id,
> author={the author field of the entry},
> title={the title of the entry},
> abstract={the description of the entry},
> url={the url of the entry},
> howpublished={Wb Page},
> month={the month of the date the entry was created},
> year={the year of the date when the entry was created}
> }
catalog export md [--source=SOURCE] [--destination=DESTINATION]
Exports the information from the catalog as a directory tree
equivalent to the original.
If a name is specified only the named entries are exported.
The format of the entries will be
> # {title}
>
> {author}
>
> ## Description
>
> {description}
>
> and so on
catalog export hugo [--source=SOURCE] [--destination=DESTINATION]
Format of the entry
> ---
> title: "Running GPU Batch jobs on Rivanna"
> linkTitle: "GPU@Rivanna"
> author: {author of the technology}
> date: 2017-01-05
> weight: 4
> description: >
> Short Description of the entry
> ---
>
> {{% pageinfo %}}
> Short description from the entry
> {{% /pageinfo %}}
> ## Description
>
> {description}
>
> and so on
catalog export --template=TEMPLATE [--source=SOURCE] [--destination=DESTINATION]
formats the source file(s) based on the template that is provided.
The template is a file that uses curly brakets for replacement of
the attribute names, If a name is not in the source an error will
be produced.
catalog check [--source=SOURCE]
does some elementary checking an all files in the directory tree
starting with SOURCE