Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#132: Fixed outdated documentation #150

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/changes/changes_0.6.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,6 @@ T.B.D

### Documentation

- n/a
- # 132: Fixed outdated information in documentation


34 changes: 22 additions & 12 deletions doc/developer_guide/developer_guide.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to elaborate on how to upload a model by downloading it from the Huggingface Hub to a local drive.

Firstly, a sample code like the one below would be useful. This code will create a model cache. This cache is actually what we need to upload to the BucketFS.

from transformers import AutoTokenizer, AutoModel

AutoTokenizer.from_pretrained(model_name, cache_dir=model_dir, token=user_token)
AutoModel.from_pretrained(model_name, cache_dir=model_dir, token=user_token)

For example, if model_name == 'me/my-awesome-model', and model_dir == 'my_model_dir' then the above code will create some files in the directory '.../my_model_dir/models--me--my-awesome-model'.

The '.../my_model_dir' is what we need to provide to the exasol_transformers_extension.upload_model in the local-model-path parameter. However, and this is important, it will grab EVERYTHING that it finds in this directory. If you downloaded several models there it will wrap them all in a single tar.gz file and start uploading. Therefore it is important to cache every model in its individual sub-directory. For example, if the root cache directory is 'my-models-cache' then I shall set the model_dir to something like '.../my-models-cache/my-awesome-model-cache'.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you already mentioned that in #133. i don´t think it is relevant to this ticket, so i would suggest working on adding missing information to the documentation in #133 since that is the focus of that ticket.

Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,10 @@ In this developer guide we explain how to build this project and how you can add
new transformer tasks and tests.


## Building the Project
## Installation
There are two ways to install the Transformers Extension Package:

### 1. Build the Python Package
### 1. Build and install the Python Package
This project needs Python 3.8 or above installed on the development machine.
In addition, in order to build Python packages you need to have the [Poetry](https://python-poetry.org/)
(>= 1.1.11) package manager. Then you can install and build the `transformers-extension` as follows:
Expand All @@ -16,21 +17,30 @@ poetry install
poetry build
```

### 2. Install the Project
The latest version of the Python package of this extension can be downloaded
from the Releases in GitHub Repository (see [the latest release](https://github.com/exasol/transformers-extension/releases/latest)).
Please download the built archive `transformers_extension.whl` and install it as follows:
### 2. Download and install the pre-build wheel
Instead of building yourself, the latest version of the Python package of this extension can be downloaded
from the Releases in the GitHub Repository (see [the latest release](https://github.com/exasol/transformers-extension/releases/latest)).
Please download the built archive
`exasol_transformers_extension-<version-number>-py3-none-any.whl`(`transformers_extension.whl` in older versions)
and install it as follows:
```bash
pip install dist/transformers_extension.whl --extra-index-url https://download.pytorch.org/whl/cpu
pip install <path/wheel-filename.whl> --extra-index-url https://download.pytorch.org/whl/cpu
```

### 3. Run All Tests

### Run Tests
All unit and integration tests can be run within the Poetry environment created
for the project as follows:
for the project using nox. See [the nox file](../../noxfile.py) for all tasks run by nox. There are three tasks for tests.

Run unit tests:
```bash
poetry run pytest tests
poetry run nox -s unit_tests
```
Start a test database and run integration tests:
```bash
poetry run nox -s start_database
poetry run nox -s integration_tests
```


## Add Transformer Tasks
In the transformers-extension library, the 8 most popular NLP tasks provided by
Expand All @@ -39,7 +49,7 @@ been defined. We created separate UDF scripts for each NLP task. You can find
these tasks and UDF script usage details in the [User Guide](../user_guide/user_guide.md#prediction-udfs).
This section shows you step by step how to add a new NLP task to this library.

### 1. Add UDF Template
### 1. Add a UDF Template
The new task's UDF template should be added to the `exasol_transformers_extension/resources/templates/`
directory. Please pay attention that the UDF script is uses _"SET UDF"_ and the inputs
are received ordered by pre-determined columns. In addition, the first 4 input
Expand Down
25 changes: 13 additions & 12 deletions doc/user_guide/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ use of pre-trained NLP models provided by the [Transformers API](https://hugging

The extension provides two types of UDFs:
- DownloaderUDF : It is responsible to download the specified pre-defined model into the Exasol BucketFS.
- Prediction UDFs: These are a group of UDFs for each supported task. Each of them uses the downloaded pre-trained model and perform prediction. These supported tasks:
- Prediction UDFs: These are a group of UDFs for each supported task. Each of them uses the downloaded pre-trained
model and perform prediction. These are the supported tasks:
1. Sequence Classification for Single Text
2. Sequence Classification for Text Pair
3. Question Answering
Expand Down Expand Up @@ -67,17 +68,18 @@ The extension provides two types of UDFs:
### The Python Package
#### Download The Python Wheel Package
- The latest version of the python package of this extension can be
downloaded from the Releases in GitHub Repository
(see [the latest release](https://github.com/exasol/transformers-extension/releases/latest)).
downloaded from the [GitHUb Release](https://github.com/exasol/transformers-extension/releases/latest).
Please download the following built archive:
```buildoutcfg
transformers_extension.whl
exasol_transformers_extension-<version-number>-py3-none-any.whl
```
If you need to use a version < 0.5.0, the build archive is called `transformers_extension.whl`.


#### Install The Python Wheel Package
- Install the packaged transformers-extension project as follows:
Install the packaged transformers-extension project as follows:
```shell
pip install transformers_extension.whl --extra-index-url https://download.pytorch.org/whl/cpu
pip install <path/wheel-filename.whl> --extra-index-url https://download.pytorch.org/whl/cpu
```

### The Pre-built Language Container
Expand All @@ -87,12 +89,12 @@ extension to run. It can be installed in two ways: Quick and Customized
installations

#### Quick Installation
The desired language container is downloaded and installed by executing the
deployment script below with the desired version. (see GitHub Releases
[the latest release](https://github.com/exasol/transformers-extension/releases).
The language container is downloaded and installed by executing the
deployment script below with the desired version. Make sure the version matches with your installed version of the
Transformers Extension Package. See [the latest release](https://github.com/exasol/transformers-extension/releases) on Github.

```buildoutcfg
python -m exasol_transformers_extension.deploy language-container
python -m exasol_transformers_extension.deploy language-container \
--dsn <DB_HOST:DB_PORT> \
--db-user <DB_USER> \
--db-pass <DB_PASSWORD> \
Expand All @@ -107,8 +109,7 @@ deployment script below with the desired version. (see GitHub Releases
--language-alias <LANGUAGE_ALIAS> \
--version <RELEASE_VERSION> \
--ssl-cert-path <ssl-cert-path> \
--use-ssl-cert-validation \
--no-use-ssl-cert-valiation
--use-ssl-cert-validation
```
The `--ssl-cert-path` is optional if your certificate is not in the OS truststore.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to say what certificate is this. There are two and they look similar. One is basically a list of trusted CA. It is needed for the server's certificate validation by the client (that's when you use the --use-ssl-cert-validation). Another one is the client's own certificate. It may or may not include the private key. In the latter case the key may be provided as a separate file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will add it to #133.

The option `--use-ssl-cert-validation`is the default, you can disable it with `--no-use-ssl-cert-validation`.
Expand Down
Loading