armadaproject · dejanzele · Jun 20, 2024 · Jun 20, 2024
diff --git a/_pages/CONTRIBUTING.md b/_pages/CONTRIBUTING.md
@@ -53,6 +53,42 @@ Note the names of the branch must follow proper docker names:
 
 >A tag name must be valid ASCII and may contain lowercase and uppercase letters, digits, underscores, periods and dashes. A tag name may not start with a period or a dash and may contain a maximum of 128 characters.
 
+#### Signing Off Commits
+
+To enhance the integrity of contributions to the Armada repository, we've adopted the use of the DCO (Developer Certificate of Origin) plug-in. This means that for every commit you contribute via Pull Requests, you'll need to sign off your commits to certify that you have the right to submit it under the open source license used by this project.
+
+**Every commit in your PRs must have a "Signed-Off" attribute.**
+
+When committing to the repository, ensure you use the `--signoff` option with `git commit`. This will append a sign-off message at the end of the commit log to indicate that the commit has your signature.
+
+You sign-off by adding the following to your commit messages:
+
+```
+Author: Your Name <[email protected]>
+Date:   Thu Feb 2 11:41:15 2018 -0800
+
+    This is my commit message
+
+    Signed-off-by: Your Name <[email protected]>
+```
+
+Notice the `Author` and `Signed-off-by` lines match. If they don't, the PR will
+be rejected by the automated DCO check.
+
+Git has a `-s` command line option to do this automatically:
+
+    git commit -s -m 'This is my commit message'
+
+If you forgot to do this and have not yet pushed your changes to the remote
+repository, you can amend your commit with the sign-off by running 
+
+    git commit --amend -s
+
+This command will modify the latest commit and add the required sign-off.    
+
+For more details checkout [DCO](https://github.com/apps/dco)
+
+
 ## Chat & Discussions
 
 Sometimes, it's good to hash things out in real time.

diff --git a/consistency.md b/consistency.md
@@ -0,0 +1,22 @@
+# A note on consistency
+
+The data stream approach taken by Armada is not the only way to maintain consistency across views. Here, we compare this approach with the two other possible solutions.
+
+Armada stores its state across several databases. Whenever Armada receives an API call to update its state, all those databases need to be updated. However, if each database were to be updated independently it is possible for some of those updates to succeed while others fail, leading to an inconsistent application state. It would require complex logic to detect and correct for such partial failures. However, even with such logic we could not guarantee that the application state is consistent; if Armada crashes before it has had time to correct for the partial failure the application may remain in an inconsistent state.
+
+There are three commonly used approaches to address this issue:
+
+* Store all state in a single database with support for transactions. Changes are submitted atomically and are rolled back in case of failure; there are no partial failures.
+* Distributed transaction frameworks (e.g., X/Open XA), which extend the notation of transactions to operations involving several databases.
+* Ordered idempotent updates.
+
+The first approach results in tight coupling between components and would limit us to a single database technology. Adding a new component (e.g., a new dashboard) could break existing component since all operations part of the transaction are rolled back if one fails. The second approach allows us to use multiple databases (as long as they support the distributed transaction framework), but components are still tightly coupled since they have to be part of the same transaction. Further, there are performance concerns associated with these options, since transactions may not be easily scalable. Hence, we use the third approach, which we explain next.
+
+First, note that if we can replay the sequence of state transitions that led to the current state, in case of a crash we can recover the correct state by truncating the database and replaying all transitions from the beginning of time. Because operations are ordered, this always results in the same end state. If we also, for each database, store the id of the most recent transition successfully applied to that database, we only need to replay transitions more recent than that. This saves us from having to start over from a clean database; because we know where we left off we can keep going from there. For this to work, we need transactions but not distributed transactions. Essentially, applying a transition already written to the database results in a no-op, i.e., the updates are idempotent (meaning that applying the same update twice has the same effect as applying it once).
+
+The two principal drawbacks of this approach are:
+
+* Eventual consistency: Whereas the first two approaches result in a system that is always consistent, with the third approach, because databases are updated independently, there will be some replication lag during which some part of the state may be inconsistent.
+* Timeliness: There is some delay between submitting a change and that change being reflected in the application state.
+
+Working around eventual consistency requires some care, but is not impossible. For example, it is fine for the UI to show the a job as "running" for a few seconds after the job has finished before showing "completed". Regarding timeliness, it is not a problem if there is a few seconds delay between a job being submitted and the job being considered for queueing. However, poor timeliness may lead to clients (i.e., the entity submitting jobs to the system) not being able to read their own writes for some time, which may lead to confusion (i.e., there may be some delay between a client submitting a job a that job showing as "pending"). This issue can be worked around by storing the set of submitted jobs in-memory either at the client or at the API endpoint.
diff --git a/developer.md b/developer.md
@@ -47,10 +47,11 @@ Please see these documents for more information about Armadas Design:
 * [Using OIDC with Armada](./developer/oidc.md)
 * [Building the Website](./developer/website.md)
 * [Using Localdev Manually](./developer/manual-localdev.md)
+* [Inspecting and Debugging etcd in Localdev setup](./developer/etc-localdev.md)
 
 ## Pre-requisites
 
-- [Go](https://go.dev/doc/install) (version 1.20 or later)
+- [Go](https://go.dev/doc/install) (version 1.21 or later)
 - gcc (for Windows, see, e.g., [tdm-gcc](https://jmeubank.github.io/tdm-gcc/))
 - [mage](https://magefile.org/)
 - [docker](https://docs.docker.com/get-docker/)
@@ -74,12 +75,40 @@ LocalDev provides a reliable and extendable way to install Armada as a developer
 
 It has the following options to customize further steps:
 
-* `mage localdev full` - Installs all components of Armada, including the UI.
-* `mage localdev minimal` - Installs only the core components of Armada, the server, executor and eventingester.
-* `mage localdev no-build` - skips the build step. Assumes that a separate image has been set from `ARMADA_IMAGE` and `ARMADA_TAG` environment variables or it has already been built.
+* `mage localdev full` - Runs all components of Armada, including the Lookout UI.
+* `mage localdev minimal` - Runs only the core components of Armada (such as the API server and an executor).
+* `mage localdev no-build` - Skips the build step; set `ARMADA_IMAGE` and `ARMADA_TAG` to choose the Docker image to use.
 
 `mage localdev minimal` is what is used to test the CI pipeline, and is the recommended way to test changes to the core components of Armada.
 
+## Debug error saying that the (port 6443 is already in use) after running mage localdev full
+
+## Identifying the Conflict
+
+Before making any changes, it's essential to identify which port is causing the conflict. Port 6443 is a common source of conflicts. You can check for existing bindings to this port using commands like `netstat` or `lsof`.
+
+1. The `kind.yaml` file is where you define the configuration for your Kind clusters. To resolve port conflicts:
+
+* Open your [kind.yaml](https://github.com/armadaproject/armada/blob/master/e2e/setup/kind.yaml) file.
+
+2. Locate the relevant section where the `hostPort` is set. It may look something like this:
+
+
+   ```
+   - containerPort: 6443 # control plane
+     hostPort: 6443  # exposes control plane on localhost:6443
+     protocol: TCP
+   ```
+
+   * Modify the hostPort value to a port that is not in use on your system. For example:
+
+   ```
+   - containerPort: 6443 # control plane
+     hostPort: 6444  # exposes control plane on localhost:6444
+     protocol: TCP
+   ```
+   You are not limited to using port 6444; you can choose any available port that doesn't conflict with other services on your system. Select a port that suits your system configuration.
+
 ### Testing if LocalDev is working
 
 Running `mage testsuite` will run the full test suite against the localdev cluster. This is the recommended way to test changes to the core components of Armada.
@@ -121,7 +150,7 @@ mage LocalDevStop
 And then run
 
 ```bash
-mage LocalDev minimal-pulsar
+mage LocalDev minimal
 ```
 
 Ensure your local dev environment is completely torn down when switching between pulsar backed and legacy
@@ -201,6 +230,24 @@ External Debug Port Mappings:
 |jobservice         |localhost:4008|
 
 
+## GoLand Run Configurations
+
+We provide a number of run configurations within the `.run` directory of this project. These will be accessible when opening the project in GoLand, allowing you to run Armada in both standard and debug mode.
+
+The following high-level configurations are provided, each composed of sub-configurations:
+1. `Armada Infrastructure Services`
+    - Runs Infrastructure Services required to run Armada, irrespective of scheduler type
+2. `Armada (Legacy Scheduler)`
+   - Runs Armada with the Legacy Scheduler
+3. `Armada (Pulsar Scheduler)`
+   - Runs Armada with the Pulsar Scheduler (recommended)
+4. `LookoutV2 UI`
+   - Script which configures a local UI development setup
+
+A minimal local Armada setup using these configurations would be `Armada Infrastructure Services` and one of (`Armada (Legacy Scheduler)` or `Armada (Pulsar Scheduler)`). Running the `LookoutV2 UI` script on top of this configuration would allow you to develop the Lookout UI live from GoLand, and see the changes visible in your browser. **These configurations (executor specifically) require a kubernetes config in `$PROJECT_DIR$/.kube/internal/config`**
+
+GoLand does not allow us to specify an ordering for services within docker compose configurations. As a result, some database migration services may require rerunning.
+
 ### Other Debugging Methods
 
 Run `mage debug local` to only spin up the dependencies of Armada, and then run the individual components yourself.

diff --git a/developer/api.md b/developer/api.md
@@ -36,7 +36,6 @@ There are additional API methods defined in proto specifications, which are used
 
 - [event.proto](https://github.com/armadaproject/armada/blob/master/pkg/api/event.proto) - methods for event reporting
 - [queue.proto](https://github.com/armadaproject/armada/blob/master/pkg/api/queue.proto) - methods related to job leasing by executor
-- [usage.proto](https://github.com/armadaproject/armada/blob/master/pkg/api/usage.proto) - methods for reporting of resources usage
 
 ## REST
 The REST API only exposes the public part of the gRPC API and it is implemented using [grpc-gateway](https://github.com/grpc-ecosystem/grpc-gateway).
@@ -68,15 +67,11 @@ Armada will determine which actions you are able to perform based on your user's
 These are defined as global or on a per queue basis.
 
 Below is the list of global Armada permissions (defined [here](https://github.com/armadaproject/armada/blob/master/internal/armada/permissions/permissions.go)):
-* `submit_jobs`
 * `submit_any_jobs`
 * `create_queue`
 * `delete_queue`
-* `cancel_jobs`
 * `cancel_any_jobs`
-* `reprioritize_jobs`
 * `reprioritize_any_jobs`
-* `watch_events`
 * `watch_all_events`
 
 In addition, the following queue-specific permission verbs control what actions can be taken per individual queues (defined [here](https://github.com/armadaproject/armada/blob/master/pkg/client/queue/permission_verb.go)):
@@ -88,14 +83,14 @@ In addition, the following queue-specific permission verbs control what actions
 The table below shows which permissions are required for a user to access each API endpoint (either directly or via a group).
 Note queue-specific permission require a user to be bound to a global permission as well (shown as tuples in the table below).
 
-| Endpoint           | Global Permissions      | Queue Permissions                     |
-|--------------------|-------------------------|---------------------------------------|
-| `SubmitJobs`       | `submit_any_jobs`       | (`submit_jobs`, `submit`)             |
-| `CancelJobs`       | `cancel_any_jobs`       | (`cancel_jobs`, `cancel`)             |
-| `ReprioritizeJobs` | `reprioritize_any_jobs` | (`reprioritize_jobs`, `reprioritize`) |
-| `CreateQueue`      | `create_queue`          |                                       |
-| `UpdateQueue`      | `create_queue`          |                                       |
-| `DeleteQueue`      | `delete_queue`          |                                       |
-| `GetQueue`         |                         |                                       |
-| `GetQueueInfo`     | `watch_all_events`      | (`watch_events`, `watch`)             |
-| `GetJobSetEvents`  | `watch_all_events`      | (`watch_events`, `watch`)             |
+| Endpoint           | Global Permissions      | Queue Permissions |
+|--------------------|-------------------------|-------------------|
+| `SubmitJobs`       | `submit_any_jobs`       | `submit`          |
+| `CancelJobs`       | `cancel_any_jobs`       | `cancel`          |
+| `ReprioritizeJobs` | `reprioritize_any_jobs` | `reprioritize`    |
+| `CreateQueue`      | `create_queue`          |                   |
+| `UpdateQueue`      | `create_queue`          |                   |
+| `DeleteQueue`      | `delete_queue`          |                   |
+| `GetQueue`         |                         |                   |
+| `GetQueueInfo`     | `watch_all_events`      | `watch`           |
+| `GetJobSetEvents`  | `watch_all_events`      | `watch`           |
diff --git a/developer/usage_metrics.md b/developer/usage_metrics.md
@@ -1,6 +1,6 @@
 ## Usage metrics
 
-Some functionality the executor has is to report how much CPU/memory jobs are using.
+Some functionality the executor has is to report how much cpu/memory jobs are using.
 
 This is turned on by changing the executor config file to include:
 ``` yaml

diff --git a/helm.md b/helm.md
@@ -249,7 +249,7 @@ The applicationConfig section of the values file is purely used to override the
 
 It can override any value found in /config/armada/config.yaml
 
-Commonly this will involve overriding the Redis url for example
+Commonly this will involve overriding the redis url for example
 
 As an example, this section is formatted as:
 
@@ -331,25 +331,19 @@ Armada allows you to specify these permissions for user:
 
 | Permission         | Details                                                                           |
 |--------------------|-----------------------------------------------------------------------------------|
-| `submit_jobs`      | Allows users submit jobs to their queue.                                          |
 | `submit_any_jobs`  | Allows users submit jobs to any queue.                                            |
 | `create_queue`     | Allows users submit jobs to create queue.                                         |
-| `cancel_jobs`      | Allows users cancel jobs from their queue.                                        |
 | `cancel_any_jobs`  | Allows users cancel jobs from any queue.                                          |
-| `watch_events`     | Allows users to watch events from their queue.                                    |
 | `watch_all_events` | Allows for watching all events.                                                   |
 | `execute_jobs`     | Protects apis used by executor, only executor service should have this permission |
 
 Permissions can be assigned to user by group membership, like this:
 
 ```yaml
 permissionGroupMapping:
-  submit_jobs: ["teamA", "administrators"]
   submit_any_jobs: ["administrators"]
   create_queue: ["administrators"]
-  cancel_jobs: ["teamA", "administrators"]
   cancel_any_jobs: ["administrators"]
-  watch_events: ["teamA", "administrators"]
   watch_all_events: ["administrators"]
   execute_jobs: ["armada-executor"]
 ```
@@ -433,7 +427,7 @@ If you have many tiny jobs or very small clusters, you may want to decrease this
 
 `maximalClusterFractionToSchedule` This is the maximum percentage of resource to schedule for a cluster per round.
 
-If a cluster had 1000 CPUs, the above settings would mean only 250 CPUs would be scheduled each scheduling round.
+If a cluster had 1000 cpu, the above settings would mean only 250 cpu would be scheduled each scheduling round.
 
 #### Queue resource limits 
 
@@ -455,7 +449,7 @@ scheduling:
 
 All limits are proportional to overall amount of resources in the system. 
 
-In this example, a queue can use at most 25% of all available CPU **and** memory.
+In this example, a queue can use at most 25% of all available cpu **and** memory.
 
 `maximalResourceFractionPerQueue` Is the maximum resource a queue can hold as a percentage of the total resource of this type over all clusters.
 
@@ -465,11 +459,11 @@ Currently scheduling is done in parallel, so it can happen that we exceed the re
 
 To mitigate this, `maximalResourceFractionToSchedulePerQueue` specifies how much can be scheduled in a single round and can be thought of as the margin for error.
 
-Using an example of having 1000 CPUs over all your clusters:
-`maximalResourceFractionPerQueue` Limits a queue to 250 CPUs
-`maximalResourceFractionToSchedulePerQueue` Limits the amount of resource a queue can be allocated in a single round to 50 CPUs.
+Using an example of having 1000 cpu over all your clusters:
+`maximalResourceFractionPerQueue` Limits a queue to 250 cpu
+`maximalResourceFractionToSchedulePerQueue` Limits the amount of resource a queue can be allocated in a single round to 50 cpu.
 
-So in the extreme case two clusters request resource at the exact same time a queue could in theory get to 300 CPUs.
+So in the extreme case two clusters request resource at the exact same time a queue could in theory get to 300 cpu.
 
 We have tested this with many extremely large clusters and even when empty, it is pretty safe to assume the resource limit in the worst case is:
 

diff --git a/libraries.md b/libraries.md
@@ -1,19 +1,24 @@
 # Libraries
 
 ## Overview
-End users can submit jobs to Armada using five different methods: a Python Client, a C# Client, armadactl CLI, the REST API and a gRPC API.
+This document is an overview of Armada client libraries.
 
-### Python
-For an installation and quick-start guide, please see the [Python client readme](https://github.com/armadaproject/armada/blob/master/client/python/README.md).
+## Python
+Here, we give an overview of the Armada Python client.
+
+### Installation and Quick-Start
+For an installation and quick-start guide, please see the [python client readme](https://github.com/armadaproject/armada/blob/master/client/python/README.md).
 
 ### API Documentation
+For full documentation of our python module API, please see our [autogenerated python API docs](https://armadaproject.io/python_armada_client)
 
 To see more about the Armada Server API, please see the [Armada API docs](https://armadaproject.io/api)
 
-### C#
-Armada provides [C# client bindings](https://github.com/armadaproject/armada/tree/master/client/DotNet).
+### Development
+Information relevant to developers working to improve the python client can be
+found in the [python client readme](https://github.com/armadaproject/armada/blob/master/client/python/README.md).
 
-### Armadactl (Command-Line Tool)
-Armadactl is a command-line tool used for managing jobs in the Armada workload orchestration system.
+## C#
+Armada provides C# client bindings.
 
-To learn how to use `armadactl`, refer to the [armadactl readme](https://github.com/armadaproject/armada/blob/master/cmd/armadactl/README.md).
+This client can be accessed in the [Armada git repository](https://github.com/armadaproject/armada/tree/master/client/DotNet).