-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[meta] HDP Declarative Programming (working draft) #16
Comments
…f the YAML files, will mention the '[meta] HDP Declarative Programming (working draft) #16'
…SE; comment: maybe JSON Schemas could allow conditional loading, so an *.mul.hdp.yml or *.hdp.yml could allow completion all the time?; also playing with loops;
…n't contain .LLL.hdp.(yml|json) sufixes
TL;DR: we're also using testinfra for some tests
While hxlm.core, in particular the Htypes, makes sense to test the functions directly, at least for hdpcli in the short term, since internals are changing, it seems reasonable to test at a higher level. By "higher level" I mean simulating the cli interface. While this is not as detailed, it at least has more chances to get overall errors while allowing to move internal faster. Another problem is that attaching too much tests on internal methods not only force change things that don't matter for the end user, but also the tests themselves could take more time to write than code the things themselves. Also, some sort of advantage (again, not for internal parts that could be intended to be reused) of doing such top level testing is that it actually may later be easy to move do like a lot of tests in batch. I mean in addition to the tests in this repository itself it would be possible to (either public or for private groups who would want to grant even more compliance with whatever HDP do, we somewhat would have a draft for this. Top level tests could help with moving even fasterWhile retrocompativily is desired, as soon as the HDP syntax could on worst case simple require upgrade (think an human have to edit, even if is something very automatable to do in batch) but the HDP itself at very own core allow this while still able to have chain of accountability, I think that this could at least give some space for serious users that may have more localized community and may still rely on undocumented features. In general this type of both have internal testing and (if at some point be relevant) document how to do with another repository also helps to test the full chain of environment. If exchanged HDP files need to meet higher level of complianceEven if people trust more on public collaboration than their ability to do very deep checks, the idea of still, if as part of agreements to allow data exchange, audit code used, I personally believe that the best approach for who do this, in addition to the initial evaluation, should already implement some sort of automated testing (like to check if filters that should anonymize or block something still work). This type of approach seems more reasonable because without this very soon, in special who would have to meet governmental compliances, would eventually get outdated versions for weirdest reasons, like because the initial thing was paid by humans review for an period of time and then leadership chances (or budgets priorities chance) and then there is no one there even to do bare minimum checking. Note that I by no means am saying that human auditing wouldn't be required (and, if not obviously, code released under public domain don't grant liability ), but what I'm saying is that such type of automation would still require humans to push buttons, so whoever would do auditing for this while would be an requirement continuous use, whoever pay for it should require some type of bare minimum automated test. And what about "air gapped network"? This alone would not avoid the need to update software?Even if we're drafting something that could be used without any access to the internet at all (so it means would not need to receive updates) no matter how paranoid is the organization threat model, soon or later people would need to update. If really fear, at least consider the possibility that may exist, and if not an critical fix to implement in hours, at least plan the scenario where in maximum one week the fix should be implemented. But, again, "air gapped network" is something more specific and whoever uses this already should know what is doing. The thing about trying to automate extra tests is because if a country or community is Exchange data with other people from the same community, these people would still very likely use new versions so still a good idea to somewhat protect others or detect early any issue. Also, doing automated tests help with issues not related at all with security, and in a context that would tolerate broader aspects of use of natural language instead of exact keywords, it actually very pertinent to have this. |
… created; hdpcli is already doing too much; lets break functionlity more on other parts of hxlm lib
….vocab (good to know that is possible to add some sort of test even on an class or method itself
…f_hsilo() & get_languages_of_words()
The Another point is that I recently both discovered about python doctest (so inline comments that seem like interactive python sessions can be used to test code instead of only the tests/ folder). This seems good for test smaller files or things that are not functionality at a higher level. 1. Some points1.2 Fact: already is feasible to make every output also an valid sourceIf we really rush to make even the generated output from the Latin (the internal representation of HDP metadata) it would be possible. Maybe not too many lines over already exist. But it would be a hell of for loop! The new point is both to break a bit more the functionality and also that some way to add more meaning to the keywords without actually changing much the natural language the human uses 1.2 We may actually use
|
Even if we allow use any natural language (but, for sake of allow internationalization, at least know words used) we at least for one term (the one that could means equivalent of hsilo, or maybe something more generic like meta) this could simplify checks. The idea of maybe use some printable characters like But here there is an design decision: we could either use an very generic term, like Note that the ideal idea is (at least for who create an document) be able to write using only own script, to a point of the baseline HDP could work even without use latin characters. |
OMG I think I discovered one better approach: we use as hint the language name already in the exact script! This not only solve the issue of have some term really unique (so, is feasible know the language without by default enforce use ISO 639-3 codes), but also we already solve the problem in detecting the script!!! Also, to be really sure about the context, we could enforce that the prefix The problem with script systemsWhile Arab (the macro language) and Chinese (the macro language) can, at least, have 2 to 4 writing systems (so, with this we not only know the language, but writing system without any additional hint!) to my current knowledge, Japanese [1] have several ones. Like a lot. I'm not saying that we would be able to implement this for some short term, but if HDP become more used, with interest, local communities could propose new vocabularies. Note that we're already defining vocabularies on different files. |
…with use of languages names for thenselves already with their original script (this could allow also detect both language AND script; see #16 (comment))
Oh boy. I think to simplify things I will implement as recursive call the load of files. I'm not finding a lot of examples using python (or at least not with classes) so I may just create an separate file just to put this sort of stuff on the hxlm.core.hdp. While not as dangerious like the python cryptography hazmat usage for packages that are not intended for end user, I think I may reuse such name. |
…icy.{_get_user_know_what_is_doing(),_get_bunker()}
…ady have an draft to allow localized exceptions on the user language! (ok that we may need to make it stable engouth to not trow exceptions on exceptions themselves)
…ed; the idea is abstract ram text messages to allow future l10n
v0.8.5 started. At the https://hdp.etica.ai/hxlm-js/ we have part of the HDP originaly written in Python ported to JavaScript Automated transcompiled code and (if necessary) open room for desktop applicationAutomated transcompiled code (see ceddfa6#diff-423a7725ffc65651ef9b48047bf6f5ccbeb669bef88aaa3deaf97701bc8db885) even if it works, is not as beauty. Also, since we're trying to make as simple as possible to parse the ontologies, it may actually not be hard to at least port the more important parts. HDP JavaScript ports and even more sensitive contentSome parts of HDP may require (in special for potential use cases of humans centralizing a lot of work) steps that require GPG sign. For individuals who actually work with data transformation I will assume they can deal with command line (or at least interact with a combo of an code editor like VSCode that help to write HDP files and them call command line). But I already was thinking about some way that at least provide simple GUIs that allow people press buttons to know if a file is valid or not (but, note, on this case, I already am considering people with so sensitive content that they cannot or are expected to not be online). But considering the alternatives, is harder to create such interfaces using GUI. Also, even if we do things with Lisp-like dialects, the strategies also have ugly guis. So, for more sensitive content, even if proof of concepts may exist to do in JavaScript, I would still recommend that potential people on next years just do a port that bind the Javascript implementations. Web version for less sensitive content (or last-resort check if this is signed by who should be).By design, HDP files are not mean to contain secrets (like file passwords or direct access to resources without authentication). So do exist some cases where people may receibe HDP files and they still not do the full thing like have installed software (or maybe they don't need this at all). So in such cases, for verify signatures, were HTTPS is acceptable, I think any web version is an perfect case. |
… Peter v1) will not be engouth; I think we could refactor all the things and already plan ahead 'programming by contract' already on the prototypes of functions when user create them
…hon port to do heavy processing of HDPLisp until API be stable
As 2021-04-17, the HXL Standard does not have either from code maintained by the HXL working group or from community. Since (even if very primitive way) we will need deal with already well formated CSV files generated by HXL tools, we would start to create a lot of ad-hoc functions just to repeat what HXL would do. This means we will take at least a few days extra just to make a functional draft of HXL on Racket that at least should work with our ongologies written in |
The triggering motivation
Some drafted goals/restrictions (as 2021-03-16):
Both the documentation on how to write the concept of HDP and the proof of concepts to implement are public domain dedication. BSD-0 can be used as alternative.
Be in the user creator language. This means that the underlining tool should allow exchange HDP files (that in practice means how to find datasets or manipulate them) for example in Portuguese or Russian and others could still (with help of HDP) convert the key terms of don't understand such languages
2. Note that special care is done with HDP keywords and instructions that would likely to be used by people who, de facto, need to homologate how data can be used. Since often the data columns may be in the native language one or two humans with both technical skills and a way to understand the native language may need to create a filter and label such filter with tasks that accomplish what that users want and then digitally sign this filter.
The syntax of HDP must be planned in such a way that make it intentionally hard to average user save information that would make the file itself a secret (like passwords or direct URLs to private resources)
Be offline by default and, when applicable, be air-gapped network friendly
"Batteries included", aka try already offer tools that do syntax checks of HDP files.
1. If you use a code editor that supports JSON Schema, the v0.7.5 already has an early version that warns misuse. At the moment it still requires writing with the internal terms used (Latin). But if eventually the schema becomes generated using the internal core_vocab, this means that other languages would have such support too.
The average HDP file should be optimized if it needs to be printed on paper as it is and have ways to express complex but common items of acceptable use policy (as 2021-03-16 not sure if this is tbe best approach) as some sort of constant. (This means the ideal max characters per line and typical indentation level should be carefully planned ahead). (This type of hint was based on suggestions we hear)
Do exist other ideas, but as much of possible, both by the syntax of HDP files (that may be easier just have translation for the core key terms) and, if necessary, creation of constants to abstract concepts, ideally should allow that the exact file (either digitally signed or with literally PDF of an judge authorization, so the "authorization" could be an link to such file) be able to be understood even outside the original country.
The text was updated successfully, but these errors were encountered: