Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing reading Data Links #9215

Merged
merged 50 commits into from
Mar 1, 2024
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
be298ff
initial data link structure
radeusgd Feb 22, 2024
9fa95e9
fix imports
radeusgd Feb 22, 2024
c62de9d
update schema
radeusgd Feb 22, 2024
9d70e63
remove version for now
radeusgd Feb 22, 2024
f65e8d0
basic infra for datalinks + S3 - draft
radeusgd Feb 22, 2024
2e99238
parsing the data link
radeusgd Feb 24, 2024
f0f9eb5
imports
radeusgd Feb 24, 2024
d2cf16f
first tests
radeusgd Feb 24, 2024
9f0f5b7
recognize `.datalink` filetype
radeusgd Feb 24, 2024
4594c2e
make method static
radeusgd Feb 24, 2024
2d4cd2d
split type name from registered name
radeusgd Feb 24, 2024
7d312c1
more datalinks
radeusgd Feb 26, 2024
628a43a
add stub for testing schema
radeusgd Feb 26, 2024
ccf3e75
try actual test
radeusgd Feb 26, 2024
2e28de9
fix regex for S3 paths validity
radeusgd Feb 26, 2024
3d15fa3
incl max S3 key limit
radeusgd Feb 26, 2024
6c68437
remove print no longer necessary
radeusgd Feb 26, 2024
871c884
add tests for other cases
radeusgd Feb 26, 2024
92ea3e6
add example http datalink
radeusgd Feb 26, 2024
b070d69
update tests after rebasing on AJV PR
radeusgd Feb 27, 2024
bf459f8
add missing type field (AJV was complaining), run prettier on schema
radeusgd Feb 27, 2024
9a8c10d
javafmt
radeusgd Feb 27, 2024
8dba0bc
fix
radeusgd Feb 27, 2024
be1931a
allow formats to register if they provide datalink JSON parsing logic…
radeusgd Feb 27, 2024
cf3769a
simplify schema
radeusgd Feb 28, 2024
9c78e75
update schema: move libraryName requirement to bottom, add comment
radeusgd Feb 28, 2024
1900f62
add more http datalinks, update tests
radeusgd Feb 28, 2024
adfee59
report validation errors in tests more clearly
radeusgd Feb 28, 2024
14512e2
update schema, add negative test
radeusgd Feb 28, 2024
fdf1d7a
common test setup
radeusgd Feb 28, 2024
fa04a22
implement and test HTTP datalink
radeusgd Feb 28, 2024
eeb084b
switch to `from` conversions for parsing the format
radeusgd Feb 28, 2024
792144b
update codeowners
radeusgd Feb 28, 2024
6079937
javafmt
radeusgd Feb 28, 2024
5f0fdcf
fix asset type mapping
radeusgd Feb 29, 2024
cd242b5
add test for a datalink
radeusgd Feb 29, 2024
95154fa
implement reading data links from Cloud
radeusgd Feb 29, 2024
0e9cb02
CR: both teams can approve tests changes in datalink schema
radeusgd Feb 29, 2024
a42ee1c
CR: move a helper to a better place
radeusgd Feb 29, 2024
81f9e40
changelog
radeusgd Feb 29, 2024
cf59f7f
fix import and name
radeusgd Feb 29, 2024
b04bbe5
Merge branch 'develop' into wip/radeusgd/9123-first-data-links
radeusgd Feb 29, 2024
a08fca2
fixing TS lints
radeusgd Feb 29, 2024
253970f
fix path
radeusgd Feb 29, 2024
36d13e0
fixing lints 2
radeusgd Feb 29, 2024
a316ca0
prettier
radeusgd Feb 29, 2024
5bf71b3
Merge branch 'develop' into wip/radeusgd/9123-first-data-links
radeusgd Mar 1, 2024
32a03dc
fix import
radeusgd Mar 1, 2024
21d4471
make dir test more stable to changes of files
radeusgd Mar 1, 2024
f958447
Merge branch 'develop' into wip/radeusgd/9123-first-data-links
radeusgd Mar 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,6 @@ Cargo.toml

# Dashboard, Cloud & Authentication
/app/ide-desktop/ @PabloBuchu @indiv0 @somebody1234
# The data-link schema is owned by the libraries team
/app/ide-desktop/lib/dashboard/src/data/dataLinkSchema.json @radeusgd @jdunkerley @GregoryTravis @AdRiley
/app/ide-desktop/lib/dashboard/src/data/__tests__ @radeusgd @jdunkerley @GregoryTravis @AdRiley @PabloBuchu @indiv0 @somebody1234
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -618,6 +618,7 @@
- [Separate `Group_By` from `columns` into new argument on `aggregate`.][9027]
- [Allow `copy_to` and `move_to` to work between local and S3 files.][9054]
- [Adjusted expression handling and new `Simple_Expression` type.][9128]
- [Allow reading Data Links configured locally or in the Cloud.][9215]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -892,6 +893,7 @@
[9027]: https://github.com/enso-org/enso/pull/9027
[9054]: https://github.com/enso-org/enso/pull/9054
[9128]: https://github.com/enso-org/enso/pull/9128
[9215]: https://github.com/enso-org/enso/pull/9215

#### Enso Compiler

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import * as fs from 'node:fs'
import * as path from 'node:path'

import * as v from 'vitest'

import * as validateDataLink from '#/utilities/validateDataLink'

v.test('correctly rejects invalid values as not matching the schema', () => {
v.expect(validateDataLink.validateDataLink({})).toBe(false)
v.expect(validateDataLink.validateDataLink('foobar')).toBe(false)
v.expect(validateDataLink.validateDataLink({ foo: 'BAR' })).toBe(false)
})

function loadDataLinkFile(path: string): object {
const text: string = fs.readFileSync(path, { encoding: 'utf-8' })
return JSON.parse(text)
}

function testSchema(json: object, fileName: string): void {
const validate = validateDataLink.validateDataLink
if (!validate(json)) {
v.assert.fail(`Failed to validate ${fileName}:\n${JSON.stringify(validate.errors, null, 2)}`)
}
}

// We need to go up from `app/ide-desktop/lib/dashboard/` to the root of the repo
const repoRoot = '../../../../'
const baseDatalinksRoot = path.resolve(repoRoot, 'test/Base_Tests/data/datalinks/')
const s3datalinksRoot = path.resolve(repoRoot, 'test/AWS_Tests/data/')

v.test('correctly validates example HTTP .datalink files with the schema', () => {
const schemas = [
'example-http.datalink',
'example-http-format-explicit-default.datalink',
'example-http-format-delimited.datalink',
'example-http-format-json.datalink',
]
for (const schema of schemas) {
const json = loadDataLinkFile(path.resolve(baseDatalinksRoot, schema))
testSchema(json, schema)
}
})

v.test('rejects invalid schemas (Base)', () => {
const invalidSchemas = ['example-http-format-invalid.datalink']
for (const schema of invalidSchemas) {
const json = loadDataLinkFile(path.resolve(baseDatalinksRoot, schema))
v.expect(validateDataLink.validateDataLink(json)).toBe(false)
}
})

v.test('correctly validates example S3 .datalink files with the schema', () => {
const schemas = ['simple.datalink', 'credentials-with-secrets.datalink', 'formatted.datalink']
for (const schema of schemas) {
const json = loadDataLinkFile(path.resolve(s3datalinksRoot, schema))
testSchema(json, schema)
}
})
24 changes: 15 additions & 9 deletions app/ide-desktop/lib/dashboard/src/data/dataLinkSchema.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
"anyOf": [
{ "$ref": "#/$defs/S3DataLink" },
{ "$ref": "#/$defs/HttpFetchDataLink" }
]
],
"$comment": "The fields `type` and `libraryName` are required for all data link types, but we currently don't add a top-level `required` setting to the schema, because it was confusing the code that is generating the modal."
},
"SecureValue": {
"title": "Secure Value",
Expand Down Expand Up @@ -88,32 +89,35 @@
"type": "object",
"properties": {
"type": { "title": "Type", "const": "S3", "type": "string" },
"libraryName": { "const": "Standard.AWS" },
"uri": {
"title": "URI",
"description": "Must start with \"s3://\".",
"type": "string",
"pattern": "^s3://[\\w.~-]+/[/\\w.~-]+$"
"pattern": "^s3://[a-z0-9.-]{3,63}/.{1,1024}$"
Comment on lines -95 to +97
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The S3 bucket name can only contain lowercase letters, digits, . and -. It has to be at least 3 characters long and at most 63 characters long.

There are a few more restrictions - e.g. it cannot start with -, it cannot contain double -- etc. But I thought that just checking the simple ones is enough - it's just a basic sanity check - even if the bucket name is valid, the bucket could not exist / be not accessible anyway, so this does not need to be comprehensive.

The bucket key can be arbitrary, but it has a limit of 1024 bytes, so the length limit is a heuristic of that.

},
"auth": { "title": "Authentication", "$ref": "#/$defs/AwsAuth" },
"format": { "title": "Format", "$ref": "#/$defs/Format" }
},
"required": ["type", "uri", "auth"]
"required": ["type", "libraryName", "uri", "auth"]
},
"HttpFetchDataLink": {
"$comment": "missing <headers with secrets> OR <query string with secrets>",
"$comment": "missing <headers with secrets> and <query string with secrets>",
"title": "HTTP Fetch",
"type": "object",
"properties": {
"type": { "title": "Type", "const": "HTTP", "type": "string" },
"libraryName": { "const": "Standard.Base" },
"uri": {
"title": "URI",
"description": "Must start with \"http://\" or \"https://\".",
"type": "string",
"pattern": "^https?://[\\w.~-]+/?.*$"
},
"method": { "title": "Method", "const": "GET", "type": "string" }
"method": { "title": "Method", "const": "GET", "type": "string" },
"format": { "title": "Format", "$ref": "#/$defs/Format" }
},
"required": ["type", "uri", "method"]
"required": ["type", "libraryName", "uri", "method"]
},

"Format": {
Expand Down Expand Up @@ -147,7 +151,8 @@
"title": "Delimiter",
"description": "Must not be blank.",
"type": "string",
"minLength": 1
"minLength": 1,
"maxLength": 1
},
"encoding": { "title": "Encoding", "const": "utf8", "type": "string" },
"headers": {
Expand All @@ -162,9 +167,10 @@
"title": "JSON",
"type": "object",
"properties": {
"type": { "title": "Type", "const": "json", "type": "string" }
"type": { "title": "Type", "const": "format", "type": "string" },
"subType": { "title": "Type", "const": "json", "type": "string" }
},
"required": ["type"]
"required": ["type", "subType"]
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
private

from Standard.Base import all
import Standard.Base.Errors.Illegal_State.Illegal_State
from Standard.Base.Enso_Cloud.Public_Utils import get_required_field
from Standard.Base.Enso_Cloud.Data_Link import parse_secure_value

## PRIVATE
Decodes the JSON representation of `AWS_Credential` as defined in `dataLinkSchema.json#/$defs/AwsAuth`.
decode_aws_credential json -> AWS_Credential | Nothing =
case get_required_field "type" json of
"aws_auth" -> case get_required_field "subType" json of
"default" -> Nothing
"profile" ->
profile = get_required_field "profile" json
AWS_Credential.Profile profile
"access_key" ->
access_key_id = get_required_field "accessKeyId" json |> parse_secure_value
secret_access_key = get_required_field "secretAccessKey" json |> parse_secure_value
AWS_Credential.Access_Key access_key_id secret_access_key
unexpected -> Error.throw (Illegal_State.Error "Unexpected subType inside of `auth` field of a datalink: "+unexpected.to_text)
unexpected -> Error.throw (Illegal_State.Error "Unexpected type inside of `auth` field of a datalink: "+unexpected.to_text)
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
from Standard.Base import all
from Standard.Base.Enso_Cloud.Public_Utils import get_required_field
from Standard.Base.Enso_Cloud.Data_Link import parse_format

import project.AWS_Credential.AWS_Credential
import project.S3.S3_File.S3_File
from project.Internal.Data_Link_Helpers import decode_aws_credential

## PRIVATE
type S3_Data_Link
## PRIVATE
Value (uri : Text) format (credentials : AWS_Credential | Nothing)

## PRIVATE
parse json -> S3_Data_Link =
uri = get_required_field "uri" json
auth = decode_aws_credential (get_required_field "auth" json)
format = parse_format (json.get "format" Nothing)
S3_Data_Link.Value uri format auth

## PRIVATE
as_file self -> S3_File = S3_File.new self.uri self.credentials

## PRIVATE
read self (on_problems : Problem_Behavior) =
self.as_file.read self.format on_problems
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
import project.Any.Any
import project.Data.Json.JS_Object
import project.Data.Text.Encoding.Encoding
import project.Data.Text.Text
import project.Enso_Cloud.Enso_Secret.Enso_Secret
import project.Error.Error
import project.Errors.Illegal_State.Illegal_State
import project.Errors.Problem_Behavior.Problem_Behavior
import project.Errors.Unimplemented.Unimplemented
import project.Nothing.Nothing
import project.System.File.File
import project.System.File.Generic.Writable_File.Writable_File
import project.System.File_Format.Auto_Detect
import project.System.File_Format.Infer
import project.System.File_Format.JSON_Format
import project.System.File_Format_Metadata.File_Format_Metadata
import project.System.Input_Stream.Input_Stream
from project.Enso_Cloud.Public_Utils import get_required_field

polyglot java import org.enso.base.enso_cloud.DataLinkSPI
polyglot java import org.enso.base.file_format.FileFormatSPI

## PRIVATE
A file format for reading data links.
type Data_Link_Format
## PRIVATE
If the File_Format supports reading from the file, return a configured instance.
for_read : File_Format_Metadata -> Data_Link_Format | Nothing
for_read file:File_Format_Metadata =
case file.guess_extension of
".datalink" -> Data_Link_Format
_ -> Nothing

## PRIVATE
Currently writing data links is not supported.
for_file_write : Writable_File -> Nothing
for_file_write file =
_ = file
Nothing

## PRIVATE
Implements the `File.read` for this `File_Format`
read : File -> Problem_Behavior -> Any
read self file on_problems =
json = JSON_Format.read file on_problems
read_datalink json on_problems

## PRIVATE
Implements decoding the format from a stream.
read_stream : Input_Stream -> File_Format_Metadata -> Any
read_stream self stream:Input_Stream (metadata : File_Format_Metadata) =
json = JSON_Format.read_stream stream metadata
read_datalink json Problem_Behavior.Report_Error

## PRIVATE
interpret_json_as_datalink json =
typ = get_required_field "type" json
case DataLinkSPI.findDataLinkType typ of
Nothing ->
library_name = get_required_field "libraryName" json
Error.throw (Illegal_State.Error "The data link for "+typ+" is provided by the library "+library_name+" which is not loaded. Please import the library, and if necessary, restart the project.")
data_link_type ->
data_link_type.parse json

## PRIVATE
read_datalink json on_problems =
data_link_instance = interpret_json_as_datalink json
data_link_instance.read on_problems

## PRIVATE
parse_secure_value (json : Text | JS_Object) -> Text | Enso_Secret =
case json of
raw_text : Text -> raw_text
_ : JS_Object ->
case get_required_field "type" json of
"secret" ->
secret_path = get_required_field "secretPath" json
_ = secret_path
Unimplemented.throw "Reading secrets from a path is not implemented yet, see: https://github.com/enso-org/enso/issues/9048"
other -> Error.throw (Illegal_State.Error "Unexpected value inside of a data-link: "+other+".")

## PRIVATE
parse_format json = case json of
Nothing -> Auto_Detect
_ : JS_Object -> case get_required_field "subType" json of
"default" -> Auto_Detect
sub_type : Text ->
format_type = FileFormatSPI.findFormatForDataLinkSubType sub_type
if format_type.is_nothing then Error.throw (Illegal_State.Error "Unknown format inside of a datalink: "+sub_type+". Perhaps the library providing that format needs to be imported?") else
format_type.from json
other ->
Error.throw (Illegal_State.Error "Expected `subType` to be a string, but got: "+other.to_display_text+".")
other -> Error.throw (Illegal_State.Error "Unexpected value inside of a data-link `format` field: "+other.to_display_text+".")
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ import project.System.File_Format_Metadata.File_Format_Metadata
import project.System.Input_Stream.Input_Stream
import project.System.Output_Stream.Output_Stream
from project.Data.Boolean import Boolean, False, True
from project.Enso_Cloud.Utils import get_required_field
from project.Enso_Cloud.Data_Link import read_datalink
from project.Enso_Cloud.Public_Utils import get_required_field
from project.Data.Text.Extensions import all
from project.System.File.Generic.File_Write_Strategy import generic_copy
from project.System.File_Format import Auto_Detect, Bytes, File_Format, Plain_Text_Format
Expand Down Expand Up @@ -66,7 +67,7 @@ type Enso_File
Enso_Asset_Type.Directory -> Utils.directory_api + "/" + self.id
Enso_Asset_Type.File -> Utils.files_api + "/" + self.id
Enso_Asset_Type.Project -> Utils.projects_api + "/" + self.id
Enso_Asset_Type.Data_Link -> Utils.secrets_api + "/" + self.id
Enso_Asset_Type.Data_Link -> Utils.datalinks_api + "/" + self.id
Enso_Asset_Type.Secret -> Error.throw (Illegal_Argument.Error "Secrets cannot be accessed directly.")

## GROUP Metadata
Expand Down Expand Up @@ -184,7 +185,9 @@ type Enso_File
read self format=Auto_Detect (on_problems=Problem_Behavior.Report_Warning) = case self.asset_type of
Enso_Asset_Type.Project -> Error.throw (Illegal_Argument.Error "Projects cannot be read within Enso code. Open using the IDE.")
Enso_Asset_Type.Secret -> Error.throw (Illegal_Argument.Error "Secrets cannot be read directly.")
Enso_Asset_Type.Data_Link -> Unimplemented.throw "Reading from a Data Link is not implemented yet."
Enso_Asset_Type.Data_Link ->
json = Utils.http_request_as_json HTTP_Method.Get self.internal_uri
read_datalink json on_problems
Enso_Asset_Type.Directory -> if format == Auto_Detect then self.list else Error.throw (Illegal_Argument.Error "Directories can only be read using the Auto_Detect format.")
Enso_Asset_Type.File -> case format of
Auto_Detect ->
Expand Down Expand Up @@ -341,8 +344,8 @@ Enso_Asset_Type.from (that:Text) = case that of
"file" -> Enso_Asset_Type.File
"directory" -> Enso_Asset_Type.Directory
"secret" -> Enso_Asset_Type.Secret
"connection" -> Enso_Asset_Type.Data_Link
_ -> Error.throw (Illegal_Argument.Error "Invalid asset type.")
"connector" -> Enso_Asset_Type.Data_Link
_ -> Error.throw (Illegal_Argument.Error "Invalid asset type: "+that.pretty+".")

## PRIVATE
File_Format_Metadata.from (that:Enso_File) = File_Format_Metadata.Value Nothing that.name (that.extension.catch _->Nothing)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import project.Network.HTTP.HTTP
import project.Network.HTTP.HTTP_Method.HTTP_Method
import project.Nothing.Nothing
from project.Data.Boolean import Boolean, False, True
from project.Enso_Cloud.Utils import get_required_field
from project.Enso_Cloud.Public_Utils import get_required_field

type Enso_User
## PRIVATE
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import project.Enso_Cloud.Errors.Enso_Cloud_Error
import project.Error.Error
import project.Data.Json.JS_Object

## PRIVATE
A helper that extracts a field from a response and handles unexpected
response structure.
get_required_field key js_object = case js_object of
_ : JS_Object -> js_object.get key if_missing=(Error.throw (Enso_Cloud_Error.Invalid_Response_Payload "Missing required field `"+key+"` in "+js_object.to_display_text+"."))
_ -> Error.throw (Enso_Cloud_Error.Invalid_Response_Payload "Expected a JSON object, but got "+js_object.to_display_text+".")
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ import project.Enso_Cloud.Enso_Secret.Enso_Secret
import project.Enso_Cloud.Errors.Enso_Cloud_Error
import project.Enso_Cloud.Errors.Not_Logged_In
import project.Data.Json.Invalid_JSON
import project.Data.Json.JS_Object
import project.Data.Map.Map
import project.Data.Pair.Pair
import project.Data.Text.Text
Expand Down Expand Up @@ -67,6 +66,10 @@ projects_api = cloud_root_uri + "projects"
Root address for Secrets API
secrets_api = cloud_root_uri + "secrets"

## PRIVATE
Root address for DataLinks API
datalinks_api = cloud_root_uri + "connectors"

## PRIVATE
The current project directory that will be used as the working directory,
if the user is running in the Cloud.
Expand Down Expand Up @@ -109,10 +112,3 @@ http_request (method : HTTP_Method) (url : URI) (body : Request_Body = Request_B
case handler of
Nothing -> Error.throw (Enso_Cloud_Error.Unexpected_Service_Error response.code payload)
_ : Function -> handler json_payload

## PRIVATE
A helper that extracts a field from a response and handles unexpected
response structure.
get_required_field key js_object = case js_object of
_ : JS_Object -> js_object.get key if_missing=(Error.throw (Enso_Cloud_Error.Invalid_Response_Payload "Missing required field `"+key+"` in "+js_object.to_display_text+"."))
_ -> Error.throw (Enso_Cloud_Error.Invalid_Response_Payload "Expected a JSON object, but got "+js_object.to_display_text+".")
Loading
Loading