Skip to content

Commit

Permalink
H-2504: harpc wire protocol (#4283)
Browse files Browse the repository at this point in the history
Co-authored-by: Tim Diekmann <[email protected]>
  • Loading branch information
indietyp and TimDiekmann authored Apr 29, 2024
1 parent 01d7f95 commit 1fa6614
Show file tree
Hide file tree
Showing 51 changed files with 5,383 additions and 157 deletions.
438 changes: 328 additions & 110 deletions Cargo.lock

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ members = [
"libs/@local/temporal-versioning",
"tests/hash-graph-integration",
"tests/hash-graph-test-data/rust",
"libs/@local/harpc/wire-protocol",
"libs/@local/harpc/types"
]
exclude = [
"libs/antsi",
Expand Down Expand Up @@ -48,6 +50,8 @@ graph-api.path = "apps/hash-graph/libs/api"
validation.path = "libs/@local/hash-validation"
hash-tracing.path = "libs/@local/tracing"
type-system.path = "libs/@blockprotocol/type-system/rust"
harpc-types.path = "libs/@local/harpc/types"
harpc-wire.path = "libs/@local/harpc/wire-protocol"

# External dependencies owned by HASH
error-stack = { version = "0.4.1", default-features = false }
Expand Down
3 changes: 2 additions & 1 deletion apps/hash-graph/libs/query-builder-derive/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@ cargo-features = ["edition2024"]

[package]
name = "query-builder-derive"
version = "0.1.0"
version = "0.0.0"
edition.workspace = true
publish = false

[lib]
proc-macro = true
Expand Down
2 changes: 1 addition & 1 deletion apps/hash-graph/libs/query-builder-derive/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@rust/query-builder-derive",
"version": "0.1.0",
"version": "0.0.0-private",
"private": true,
"dependencies": {},
"devDependencies": {}
Expand Down
8 changes: 4 additions & 4 deletions apps/hashdotai/glossary/agent-based-modeling.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ ABMs consequently have the potential to be highly-explainable in a way that many

An agent-based model consists of 4 core components:

1. **Agents**: entities in the model that can interact with one another as well as their informations, and pass information between each other. Agents might represent animals, individuals, households, organisations, or even entire countries.
1. **Properties**: agents have properties. A property might be memory; a state, characteristic, or attribute, such as hunger, speed, or health. Properties are discrete, and can be binary (yes/no), numerical (e.g. on a 1–100 scale), or contain any other fixed value (e.g. a tag, name, or other label).
1. **Environment**: the virtual world in which agents act and interact. An environment is any context in which agents are situated — it could be 2D, 3D, spatial or not — a neutral medium with no effect on agents whatsoever, or a prime determinant of their ability to act. Environments can be abstract and imagined, or digital twins and replications of real-world buildings or cities.
1. **Rules**: rules are the logic that govern what happens when agents interact (or come into contact) with each other, or their environments. They may also govern how learning and adaptation occur within an environment. These rules may be pre-programmed, or automatically inferred/evolved in ABM platforms like HASH.
1. **Agents**: entities in the model that can interact with one another as well as their informations, and pass information between each other. Agents might represent animals, individuals, households, organisations, or even entire countries.
1. **Properties**: agents have properties. A property might be memory; a state, characteristic, or attribute, such as hunger, speed, or health. Properties are discrete, and can be binary (yes/no), numerical (e.g. on a 1–100 scale), or contain any other fixed value (e.g. a tag, name, or other label).
1. **Environment**: the virtual world in which agents act and interact. An environment is any context in which agents are situated — it could be 2D, 3D, spatial or not — a neutral medium with no effect on agents whatsoever, or a prime determinant of their ability to act. Environments can be abstract and imagined, or digital twins and replications of real-world buildings or cities.
1. **Rules**: rules are the logic that govern what happens when agents interact (or come into contact) with each other, or their environments. They may also govern how learning and adaptation occur within an environment. These rules may be pre-programmed, or automatically inferred/evolved in ABM platforms like HASH.

The complexity of a model may be constrained by either:

Expand Down
10 changes: 5 additions & 5 deletions apps/hashdotai/glossary/business-process-modeling.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ Typically in BPM, an organization’s current processes are captured _as they ex

In total a BPM project consists of several unique stages:

1. **Process mapping:** defining what needs doing, who is responsible, to what standard a process should be completed, and what the acceptance or success criteria are.
1. **Process discovery:** capturing the steps involved in a current business process (and any variations of it) in writing (for example as a checklist), or in illustrated form (e.g. a flowchart).
1. **Process simulation:** capturing a process in model form (i.e. as code). This allows processes to be tested safely in virtual environments under a wide variety of pre-specified or dynamically generated conditions.
1. **Process analysis:** actively using process simulations and models to identify opportunities for improvement. This could be through running [optimization experiments](/cases/optimization) in a simulation, or more traditional means (e.g. conducting cost-benefit or value-added analyses.
1. **Process improvement:** testing improvements to processes, measuring their performance, and iterating on them to achieve more, in less time, with fewer resources, with a greater degree of consistency and reliability.
1. **Process mapping:** defining what needs doing, who is responsible, to what standard a process should be completed, and what the acceptance or success criteria are.
1. **Process discovery:** capturing the steps involved in a current business process (and any variations of it) in writing (for example as a checklist), or in illustrated form (e.g. a flowchart).
1. **Process simulation:** capturing a process in model form (i.e. as code). This allows processes to be tested safely in virtual environments under a wide variety of pre-specified or dynamically generated conditions.
1. **Process analysis:** actively using process simulations and models to identify opportunities for improvement. This could be through running [optimization experiments](/cases/optimization) in a simulation, or more traditional means (e.g. conducting cost-benefit or value-added analyses.
1. **Process improvement:** testing improvements to processes, measuring their performance, and iterating on them to achieve more, in less time, with fewer resources, with a greater degree of consistency and reliability.

Many BPM tools are only useful during steps one and two. [HASH](/platform) is an integrated, end-to-end solution for completing all five steps of a business process reengineering project.

Expand Down
4 changes: 2 additions & 2 deletions apps/hashdotai/glossary/dag.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ tags: ["Data Science", "Graphs", "Software Engineering"]

A Directed Acyclic Graph (or _DAG_) is a special type of graph made up of nodes (also known as _vertices_), and edges, in which:

1. all edges have a direction associated with them, and
1. the graph as a whole contains no cycles (aka. _loops_).
1. all edges have a direction associated with them, and
1. the graph as a whole contains no cycles (aka. _loops_).

The below figure illustrates a classic DAG, in which all nodes are connected by at least one directional edge, and all pathways lead to a single end-state.

Expand Down
26 changes: 13 additions & 13 deletions apps/hashdotai/glossary/data-mining.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -28,23 +28,23 @@ There are two main approaches to Data Mining: **Supervised** and **Unsupervised*

A typical Data Mining workflow often looks something like this:

1. **Business understanding is established**: analysts and management with an understanding of an organization’s goals agree what type of information they hope to uncover through Data Mining, as well as how they will organize and use that data. Performance can later be analyzed against these predefined business objectives.
1. **Data identification and preparation**: data from sources of interest are loadedinto a data warehouse, or other datastore. Some of these may be “analysis-ready”. Others may require transformation or normalization in order to get them into a state from which they can be analyzed. Data quality/integrity tests may form part of the “pipeline” that this process constitutes. Frameworks like _Great Expectations_ assist greatly in this, and users can make use of them within _[HASH Flows](/platform/core#flows)_. At this stage, the number of features in a dataset may be reduced to those most closely aligned with business goals. Some amount of manual human-led data exploration may be performed at this point, in conjunction with the information acquired during step one, to guide later expectations.
1. **Data mining**: test scenarios are generated to gauge the validity and quality of the data, and data mining begins. Various types of mining may take place, including:
1. **Automated classification**: placing a data point into a predefined category, if an existing structure is known to the user (used by banks to check credit scores when issuing loans/mortgages, by companies looking to gender-stratify their audiences, and platforms looking to identify the genre or category of user-created content).
1. **Regression analysis**: identifying the relationship between 1 dependent and ≥1 independent variables. Frequently utilized in the context of predictive analytics, to forecast future data, based on that available today.
1. **Association analysis**: identifying which independent variables are frequently correlative (used by online retailers to recommend similar items).
1. **Outlier recognition**: identifying the data points that fall outside a predefined category (and do not follow the rules of mutual interdependence between variables). Anomaly detection is often used in security and investigations contexts, as well as by arbitrageurs in financial markets.
1. **Cluster identification**: extending the idea of ‘automated classification’, Data Mining can be used to identify recurring similarities between data, and suggest new groups by which entities might be accordingly clustered (frequently used in marketing to target specific consumer demographics, and in [process mining](/glossary/process-mining)).
1. **Collation and evaluation**: initial results are viewed in light of the original project objectives mapped out before work began. Data visualization tools such as Power BI, Looker or [hCore](/platform/core) can be used to neatly present findings from research to end-users, so they can decide what to do with the information.
1. **Application**: lessons learned can be applied in-business. Some insights can be shared across functions to help improve all levels of business performance — through a data and experiment catalog like [hIndex](/platform/index) — but it is up to the user to selectively act upon any insights drawn from data mining processes (in contrast to Machine Learning, in which results may be fed back into algorithms as part of an automated workflow, powered by a system such as [hCloud](/platform/cloud)).
1. **Business understanding is established**: analysts and management with an understanding of an organization’s goals agree what type of information they hope to uncover through Data Mining, as well as how they will organize and use that data. Performance can later be analyzed against these predefined business objectives.
1. **Data identification and preparation**: data from sources of interest are loadedinto a data warehouse, or other datastore. Some of these may be “analysis-ready”. Others may require transformation or normalization in order to get them into a state from which they can be analyzed. Data quality/integrity tests may form part of the “pipeline” that this process constitutes. Frameworks like _Great Expectations_ assist greatly in this, and users can make use of them within _[HASH Flows](/platform/core#flows)_. At this stage, the number of features in a dataset may be reduced to those most closely aligned with business goals. Some amount of manual human-led data exploration may be performed at this point, in conjunction with the information acquired during step one, to guide later expectations.
1. **Data mining**: test scenarios are generated to gauge the validity and quality of the data, and data mining begins. Various types of mining may take place, including:
1. **Automated classification**: placing a data point into a predefined category, if an existing structure is known to the user (used by banks to check credit scores when issuing loans/mortgages, by companies looking to gender-stratify their audiences, and platforms looking to identify the genre or category of user-created content).
1. **Regression analysis**: identifying the relationship between 1 dependent and ≥1 independent variables. Frequently utilized in the context of predictive analytics, to forecast future data, based on that available today.
1. **Association analysis**: identifying which independent variables are frequently correlative (used by online retailers to recommend similar items).
1. **Outlier recognition**: identifying the data points that fall outside a predefined category (and do not follow the rules of mutual interdependence between variables). Anomaly detection is often used in security and investigations contexts, as well as by arbitrageurs in financial markets.
1. **Cluster identification**: extending the idea of ‘automated classification’, Data Mining can be used to identify recurring similarities between data, and suggest new groups by which entities might be accordingly clustered (frequently used in marketing to target specific consumer demographics, and in [process mining](/glossary/process-mining)).
1. **Collation and evaluation**: initial results are viewed in light of the original project objectives mapped out before work began. Data visualization tools such as Power BI, Looker or [hCore](/platform/core) can be used to neatly present findings from research to end-users, so they can decide what to do with the information.
1. **Application**: lessons learned can be applied in-business. Some insights can be shared across functions to help improve all levels of business performance — through a data and experiment catalog like [hIndex](/platform/index) — but it is up to the user to selectively act upon any insights drawn from data mining processes (in contrast to Machine Learning, in which results may be fed back into algorithms as part of an automated workflow, powered by a system such as [hCloud](/platform/cloud)).

<KeyConcepts>

### 3 Types of Data Modeling

1. **Descriptive**: reveals similarities and differences between data points in a dataset, identifying any anomalies and highlighting relationships between variables.
1. **Predictive**: forecasts what will happen in the future, based on the data available today.
1. **Prescriptive**: using an initial predictive model, prescriptive modeling recommends potential steps to alter the outcome of this model.
1. **Descriptive**: reveals similarities and differences between data points in a dataset, identifying any anomalies and highlighting relationships between variables.
1. **Predictive**: forecasts what will happen in the future, based on the data available today.
1. **Prescriptive**: using an initial predictive model, prescriptive modeling recommends potential steps to alter the outcome of this model.

</KeyConcepts>
6 changes: 3 additions & 3 deletions apps/hashdotai/glossary/data-pipelines.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ Different software packages exist for orchestrating data pipelines such as _Airf

There are three ways to utilize data pipelines in HASH:

1. With an external data warehouse and pipelines, using HASH solely as a modeling engine
1. With an external data warehouse, using HASH to manage data pipelines as well as models
1. Using HASH as your data warehouse, pipeline, and modeling engine
1. With an external data warehouse and pipelines, using HASH solely as a modeling engine
1. With an external data warehouse, using HASH to manage data pipelines as well as models
1. Using HASH as your data warehouse, pipeline, and modeling engine

Users are free to create, run and maintain their data pipelines outside of HASH, both operating on and feeding external data warehouses which can then be accessed from or imported into HASH.

Expand Down
10 changes: 5 additions & 5 deletions apps/hashdotai/glossary/deep-reinforcement-learning.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,11 @@ It is particularly useful as an iterative and adaptive process through which act

The goal of DRL is to develop a robust ‘policy network’ — the name for converting presented problems into outputted actions. This functions as a loop of _learned behavior_ for the agent. Each time it reacts to a problem, it produces a new and better-informed action.

1. Using a sample distribution of available actions, we can vary the information we feed to the agent. This allows it to explore the possibilities available to it (the ‘action spaces), through randomization of possible actions. With this sample distribution of potential actions, the balance of probability determines that the agent will theoretically find the best possible action to take.
1. We provide feedback to the agent whenever it completes a task. If successful, we reward the agent with a positive integer. Using this integer (a ‘policy gradient’), we can make the probability of the agent selecting the _successful actions_ _more likely_ in the future. (Inversely, we make the _unsuccessful actions less likely_ to be selected, by feeding the agent a negative integer.)
1. During this process, the policy network is “recording” through the updates to its weights from the signal from the cost function. The policy network is ‘recording’ the information at its disposal through constantly updating the signals it receives from the value function.
1. The cumulative effect of maximizing the chances of the agent selecting successful actions and minimizing the chances of it selecting unsuccessful actions optimizes the agent’s future behavior. The reinforcement of positive behavior incentivizes the agent to figure out the best method for tackling future problems, learning from its successes and failures (to improve its ‘expected return’).
1. This trial-and-error approach is constantly improving and compounding the agent’s actions over time (in the form of ‘value functions’), so that we no longer need a human model to intervene.
1. Using a sample distribution of available actions, we can vary the information we feed to the agent. This allows it to explore the possibilities available to it (the ‘action spaces), through randomization of possible actions. With this sample distribution of potential actions, the balance of probability determines that the agent will theoretically find the best possible action to take.
1. We provide feedback to the agent whenever it completes a task. If successful, we reward the agent with a positive integer. Using this integer (a ‘policy gradient’), we can make the probability of the agent selecting the _successful actions_ _more likely_ in the future. (Inversely, we make the _unsuccessful actions less likely_ to be selected, by feeding the agent a negative integer.)
1. During this process, the policy network is “recording” through the updates to its weights from the signal from the cost function. The policy network is ‘recording’ the information at its disposal through constantly updating the signals it receives from the value function.
1. The cumulative effect of maximizing the chances of the agent selecting successful actions and minimizing the chances of it selecting unsuccessful actions optimizes the agent’s future behavior. The reinforcement of positive behavior incentivizes the agent to figure out the best method for tackling future problems, learning from its successes and failures (to improve its ‘expected return’).
1. This trial-and-error approach is constantly improving and compounding the agent’s actions over time (in the form of ‘value functions’), so that we no longer need a human model to intervene.

## How can we use DRL?

Expand Down
Loading

0 comments on commit 1fa6614

Please sign in to comment.