Skip to content

Commit

Permalink
feat(frontend)(api): add datasource type specific input validation (#55)
Browse files Browse the repository at this point in the history
This PR 
1. Enforces validation on each current datasource type in order to ensure that the correct data is inputted into the connection object.
2. Updates documentation to fix invalid inputs for the local run 

Closes #54
Closes #31 
Closes #30
  • Loading branch information
mawandm authored Apr 27, 2024
1 parent a59d1b3 commit 360abaf
Show file tree
Hide file tree
Showing 29 changed files with 492 additions and 132 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements-docs.txt
pip install -r docs/requirements.txt
- name: Build with MkDocs
run: |
mkdocs build --clean --config-file docs/mkdocs.yml --site-dir ../_site
Expand Down
17 changes: 9 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,13 +68,13 @@ To get started with Nesis
2. Enter the details;

1. Type: **MinIO**
4. Name: **documents**
5. Host: **http://minio:9000/**
6. Username: `YOUR_USERNAME`
7. Password: `YOUR_PASSWORD`
8. Dataobjects: **documents**
9. Click **Create**
10. Then, run an adhoc ingestion by clicking the **Ingest** button of the datasource.
2. Name: **documents**
3. Host: **http://minio:9000/**
4. Access Key: `your_username`
5. Access Secret: `your_password`
6. Buckets: **documents**
7. Click **Create**
8. Then, run an adhoc ingestion by clicking the **Ingest** button of the datasource.

- *Note*: Replace `YOUR_USERNAME` and `YOUR_PASSWORD` with the correct values of your `username` and `password`.

Expand All @@ -95,7 +95,8 @@ If enough users support to have the feature, we will be sure to include it in ou

🐞If you find any functionality not working as expected, please feel free to open a bug report.

🌟 If you think that this project has been useful to you, please give it a star.
## Stars let us know you visited
Please give us a star to let us know you visited this page. You are already awesome.

## Origins
This project has been inspired by other open-source projects. Here is a list of some of them;
Expand Down
17 changes: 17 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Nesis Documentation

## Overview
We use mkdocs to create Nesis documentation.

## Editing documentation locally

1. Install dependencies with
```commandline
pip install -r requirements-docs.txt
```
2. Serve the documentation on your local with
```commandline
cd docs
mkdocs serve
```
3. The documentation should now be reachable at [http://127.0.0.1:8000/](http://127.0.0.1:8000/)
4 changes: 4 additions & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ site_name: Nesis - Your AI Powered Enterprise Knowledge Partner
site_description: Your AI Powered Enterprise Knowledge Partner
site_url: https://ametnes.github.io/nesis/
repo_url: https://github.com/ametnes/nesis/
repo_name: ametnes/nesis
edit_uri: edit/main/docs/src/
docs_dir: 'src'
theme:
name: material
Expand All @@ -22,6 +24,8 @@ theme:
name: Switch to light mode
features:
- content.code.copy
- content.action.edit
- content.action.view

markdown_extensions:
- attr_list
Expand Down
File renamed without changes.
19 changes: 9 additions & 10 deletions docs/src/installing/compose.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,8 +142,8 @@ volumes:
- *password* = `password`

3. Connect to your minio instance via http://localhost:59001/ with the following login credentials:
- *username* = `YOUR_USERNAME`
- *password* = `YOUR_PASSWORD`
- *username* = `your_username`
- *password* = `your_password`


4. Upload some documents into your minio `documents` bucket.
Expand All @@ -153,12 +153,11 @@ volumes:
2. Enter the details;

1. Type: **S3 Compatible**
4. Name: **documents**
5. Host: **http://minio:9000/**
6. Username: `YOUR_USERNAME`
7. Password: `YOUR_PASSWORD`
8. Click **Create**
9. Then, run an adhoc ingestion by clicking the **Ingest** button of the datasource.

- *Note*: Replace `YOUR_USERNAME` and `YOUR_PASSWORD` with the correct values of your `username` and `password`.
2. Name: **documents**
3. Host: **http://minio:9000/**
4. Access Key: `your_username`
5. Access Secret: `your_password`
6. Buckets: **documents**
7. Click **Create**
8. Then, run an adhoc ingestion by clicking the **Ingest** button of the datasource.

2 changes: 0 additions & 2 deletions nesis/api/core/controllers/datasources.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,6 @@ def operate_datasources():
return jsonify(error_message("Unauthorized access")), 401
except util.PermissionException:
return jsonify(error_message("Forbidden action on resource")), 403
except util.ValidationException:
return jsonify(error_message("Unable to validate datasource connection")), 403
except:
_LOG.exception("Error getting user")
return jsonify(error_message("Server error")), 500
Expand Down
15 changes: 14 additions & 1 deletion nesis/api/core/document_loaders/minio.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
get_documents,
ingest_file,
)
from nesis.api.core.util import clean_control
from nesis.api.core.util import clean_control, isblank
from nesis.api.core.util.constants import DEFAULT_DATETIME_FORMAT
from nesis.api.core.util.dateutil import strptime

Expand Down Expand Up @@ -261,3 +261,16 @@ def _unsync_s3_documents(

except:
_LOG.warn("Error fetching and updating documents", exc_info=True)


def validate_connection_info(connection: Dict[str, Any]) -> Dict[str, Any]:
_valid_keys = ["endpoint", "user", "password", "dataobjects"]
assert not isblank(connection.get("endpoint")), "An endpoint must be supplied"
assert not isblank(
connection.get("dataobjects")
), "One or more buckets must be supplied"
return {
key: val
for key, val in connection.items()
if key in _valid_keys and not isblank(connection[key])
}
15 changes: 14 additions & 1 deletion nesis/api/core/document_loaders/s3.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
get_documents,
ingest_file,
)
from nesis.api.core.util import clean_control
from nesis.api.core.util import clean_control, isblank
from nesis.api.core.util.dateutil import strptime

_LOG = logging.getLogger(__name__)
Expand Down Expand Up @@ -286,3 +286,16 @@ def _unsync_documents(

except:
_LOG.warn("Error fetching and updating documents", exc_info=True)


def validate_connection_info(connection: Dict[str, Any]) -> Dict[str, Any]:
_valid_keys = ["endpoint", "user", "password", "region", "dataobjects"]
assert not isblank(connection.get("region")), "A valid region must be supplied"
assert not isblank(
connection.get("dataobjects")
), "One or more buckets must be supplied"
return {
key: val
for key, val in connection.items()
if key in _valid_keys and not isblank(connection[key])
}
50 changes: 22 additions & 28 deletions nesis/api/core/document_loaders/samba.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
ValidationException,
ingest_file,
)
from nesis.api.core.util import http, clean_control
from nesis.api.core.util import http, clean_control, isblank
from nesis.api.core.util.constants import DEFAULT_DATETIME_FORMAT, DEFAULT_SAMBA_PORT
from nesis.api.core.util.dateutil import strptime

Expand Down Expand Up @@ -52,43 +52,37 @@ def fetch_documents(
_LOG.exception(f"Error unsyncing documents")


def validate_connection_info(connection):
port = connection.get("port")
if port is None or not port:
connection["port"] = DEFAULT_SAMBA_PORT
elif not port.isnumeric():
raise ValidationException("Port value cannot be non numeric")

if connection.get("endpoint") is None or not connection.get("endpoint"):
raise ValidationException("Endpoint value cannot be null or empty")
def validate_connection_info(connection: Dict[str, Any]) -> Dict[str, Any]:
port = connection.get("port") or DEFAULT_SAMBA_PORT
_valid_keys = ["port", "endpoint", "user", "password", "dataobjects"]
if not str(port).isnumeric():
raise ValueError("Port value cannot be non numeric")

if connection.get("user") is None or not connection.get("user"):
raise ValidationException("Username value cannot be null or empty")

if connection.get("password") is None or not connection.get("password"):
raise ValidationException("Password value cannot be null or empty")
assert not isblank(
connection.get("endpoint")
), "A valid share address must be supplied"

try:
_connect_samba_server(connection)
except ValidationException as sb:
except Exception as ex:
_LOG.exception(
f"Failed to connect to samba server at {connection['endpoint']}",
stack_info=True,
)
raise
return connection
raise ValueError(ex)
connection["port"] = port
return {
key: val
for key, val in connection.items()
if key in _valid_keys and not isblank(connection[key])
}


def _connect_samba_server(connection):
username = connection["user"]
password = connection["password"]
endpoint = connection["endpoint"]
port = connection["port"]
try:
scandir(endpoint, username=username, password=password, port=port)
except Exception as ex:
_LOG.exception(f"Error while connecting to samba server {endpoint} - {ex}")
raise
username = connection.get("user")
password = connection.get("password")
endpoint = connection.get("endpoint")
port = connection.get("port")
next(scandir(endpoint, username=username, password=password, port=port))


def _sync_samba_documents(
Expand Down
19 changes: 18 additions & 1 deletion nesis/api/core/document_loaders/sharepoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import os
import pathlib
import uuid
from typing import Dict, Any

from office365.sharepoint.client_context import ClientContext
from office365.runtime.client_request_exception import ClientRequestException
Expand All @@ -15,6 +16,7 @@
delete_document,
get_documents,
)
from nesis.api.core.util import isblank
from nesis.api.core.util.constants import DEFAULT_DATETIME_FORMAT
from nesis.api.core.util.dateutil import strptime

Expand Down Expand Up @@ -203,4 +205,19 @@ def _unsync_sharepoint_documents(**kwargs):
delete_document(document_id=document.id)

except:
_LOG.warn("Error fetching and updating documents", exc_info=True)
_LOG.warning("Error fetching and updating documents", exc_info=True)


def validate_connection_info(connection: Dict[str, Any]) -> Dict[str, Any]:
_valid_keys = ["endpoint", "client_id", "thumbprint", "certificate", "dataobjects"]
assert not isblank(connection.get("endpoint")), "A site url must be supplied"
assert not isblank(connection.get("client_id")), "A client_id must be supplied"
assert not isblank(connection.get("thumbprint")), "A thumbprint must be supplied"
assert not isblank(
connection.get("certificate")
), "A valid certificate must be supplied"
return {
key: val
for key, val in connection.items()
if key in _valid_keys and not isblank(connection[key])
}
9 changes: 9 additions & 0 deletions nesis/api/core/document_loaders/validators.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
from nesis.api.core.document_loaders import samba
from nesis.api.core.document_loaders import s3
from nesis.api.core.document_loaders import minio
from nesis.api.core.document_loaders import sharepoint
from nesis.api.core.models.entities import DatasourceType


Expand All @@ -22,5 +25,11 @@ def validate_datasource_connection(datasource) -> dict:
match datasource_type:
case DatasourceType.WINDOWS_SHARE:
return samba.validate_connection_info(connection=connection)
case DatasourceType.S3:
return s3.validate_connection_info(connection=connection)
case DatasourceType.MINIO:
return minio.validate_connection_info(connection=connection)
case DatasourceType.SHAREPOINT:
return sharepoint.validate_connection_info(connection=connection)
case _:
return connection
4 changes: 3 additions & 1 deletion nesis/api/core/models/entities.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,9 @@ def __init__(

def to_dict(self, **kwargs) -> dict:
connection = copy.deepcopy(self.connection or {})
connection.pop("password", None)
secret_keys = ["password", "certificate", "thumbprint"]
for secret_key in secret_keys:
connection.pop(secret_key, None)
dict_value = {
"id": self.uuid,
"name": self.name,
Expand Down
15 changes: 7 additions & 8 deletions nesis/api/core/services/datasources.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,15 +88,14 @@ def create(self, **kwargs):

try:
connection = validators.validate_datasource_connection(datasource)
except ValueError as ve:
except (ValueError, AssertionError) as ve:
raise ServiceException(ve)

if not is_valid_resource_name(name):
raise ServiceException(
"Invalid resource name. Must be least five in length and only include [a-z0-9_-]"
)
if not has_valid_keys(connection):
raise ServiceException("Missing connection details")

try:
datasource_type = DatasourceType[source_type.upper()]
except Exception:
Expand Down Expand Up @@ -292,13 +291,13 @@ def update(self, **kwargs):
if datasource.get("connection"):
try:
connection = validators.validate_datasource_connection(datasource)
datasource_record.connection = connection
except ValueError as ve:
datasource_record.connection = {
**datasource_record.connection,
**connection,
}
except (ValueError, AssertionError) as ve:
raise ServiceException(ve)

if not has_valid_keys(connection):
raise ServiceException("Missing connection details")

# We validate the schedule (if supplied), before we create the datasource
self._validate_schedule(datasource)

Expand Down
7 changes: 6 additions & 1 deletion nesis/api/core/services/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

from nesis.api.core.models import DBSession
from nesis.api.core.models.entities import Document
from nesis.api.core.util import isblank
from nesis.api.core.util.http import HttpClient

from apscheduler.triggers.cron import CronTrigger, BaseTrigger
Expand Down Expand Up @@ -152,7 +153,11 @@ def has_valid_keys(value: dict) -> bool:
value is not None
and isinstance(value, dict)
and len(
{key: val for key, val in value.items() if len(key) != 0 and len(val) != 0}
{
key: val
for key, val in value.items()
if isblank(key) != 0 and isblank(value) != 0
}
)
!= 0
)
Expand Down
4 changes: 4 additions & 0 deletions nesis/api/core/util/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,7 @@ def run_sql(engine, path):
with open(path) as file:
query = text(file.read())
con.execute(query)


def isblank(item: str) -> bool:
return item is None or item == "" or str(item).isspace()
Loading

0 comments on commit 360abaf

Please sign in to comment.