Dag for local data eval #13155

lbajolet-hashicorp · 2024-08-30T18:50:56Z

This PR is a proof-of-concept for a DAG-based approach for scheduling evaluation of datasources and locals.

This is meant to address the current limitation that data sources cannot use locals in their configuration, thereby preventing post-processing through functions in a data-source's configuration, and daisy-chaining of multiple datasources and locals.

This is also meant to serve as a base for later work on introducing DAG-based scheduling for the rest of Packer core in the future.

This code works in several phases:

1- Gather dependencies: for each local and datasource, we round-up the dependencies we can extract from the HCL code. This approach is not flawless and can be thrown off by local aliasing as we rely on the traverals extracted from the raw code, without knowledge of the scope in which the reference is made.
2- Build a DAG with component as vertexes and dependencies as edges: in this step we build a simple DAG from the locals and datasources. Since we already have the detected dependencies, we can then add them all to the graph, and since the graph can be validated (i.e. checks for circular dependencies), this removes the need to recursively evaluate the components.
3- Walk the graph: walking the graph with a function that evaluates the node is the final step here. Since the dependencies are closer to the root of the graph, they are evaluated before proceeding to the next vertex, so we should be safe to evaluate that component at this time.

This DAG approach is gated behind a command-line flag so we don't make it the default just yet. Assuming this makes its way into the Packer code, this should stay gated for a while so users can experiment with it and report potential bugs/limitations that we can address before making this the default.
Once we are confident this can become the default, the UseDAG flag can be repurposed/inverted to specify that we want to rollback to the current, phased approach to evaluation, as an escape hatch in case the DAG breaks their workflow.

JenGoldstrich

Largely this LGTM, I think we should cleanup the unused DAG functions and consider adding some more tests, let me know what you think!

internal/dag/dag.go

JenGoldstrich · 2024-09-13T17:24:52Z

internal/dag/tarjan.go

+
+// StronglyConnected returns the list of strongly connected components
+// within the Graph g. This information is primarily used by this package
+// for cycle detection, but strongly connected components have widespread


Nit but I think we should just mention what its used for in the case of HCL component DAGs

Suggested change

// for cycle detection, but strongly connected components have widespread

// for cycle detection.

Yeah, I imagine we can cut short on the comment here, good call

internal/dag/walk.go

hcl2template/parser.go

JenGoldstrich · 2024-09-13T18:30:21Z

hcl2template/parser.go

+	return &retGraph, nil
+}
+
+func (cfg *PackerConfig) evaluateBuildPrereqs(skipDatasources bool) hcl.Diagnostics {


The logic in this parser seems minimally unit tested, I notice that we have the packer_tests that invoke a larger section of the code, but I feel like it may be worth adding more testing to the logic in the parser, maybe we could make these functions public instead of private and invoke them in unit tests, just small tests that make sure we skip things aliased as local for example

I agree it is not unit tested, and to be frank there are a bunch of pre-requisites to make this unit-testable, at the moment we rely on the Initialize tests (and subsequent phases) to test it, at least for regression checks.

Also to be clear: that code is in a parser.go file, but at this point we're not doing any kind of parsing, that was done before, it's kinda clumsy to have that logic in this file. I opted to add it here for this PR as this is where the phased code was already located, but honestly we should split this logic into separate files to avoid the confusion,

JenGoldstrich

I'm approving this now, I think the remaining open concerns should not prevent this from going out, and it is a relevant addition to maturing Packer. I appreciate all your work on this one Lucas.

As I mentioned offline I would remove the un-used DAG code personally and do think its the best choice, but I don't think its blocking feedback.

The hcl2template package contains references already, but these are linked to a particular type. This becomes problematic if we want to support cross-type references, so this commit adds a new abstraction: refString. A refString contains the component type, its type (if applicable), and its name, so that the combination of those points to a cty object that can be linked to a block in the configuration. Right now, only `var`, `local` and `data` are supported, but the type is extensible enough that anything else that fits this model such as sources can be supported in the future potentially.

The `Name` function is used by the DAG library to determine which vertex is associated to a component, so the `Name` attribute/function needs to replicate the combination of type and name, so the vertex can be accurately fetched from the graph.

Since we are in the process of integrating a new way to orchestrate dependency management and evaluation for datsources and local variables, we need to split the current function that manages the evaluation of said local variables, so that recursion and the actual evaluation of a local variable are two separate functions. This will allow for evaluating a single variable once the dag is ready to be introduced.

As with variables, this commit introduces a new function whose purpose is evaluating the datasource directly, without looking at its dependencies, or recursively trying to execute them. Instead, we rely on the DAG to determine when it is safe to execute it, or if using the phased approach, the current logic will apply.

When registering dependencies for datasources and locals, we now use refString. This allows for the functions that detect dependencies to not only be able to register the same types as dependencies, but instead generalises it to any type that refString supports, so data, local and var. This can then be leveraged for orchestrating evaluation of those components in a non-phased way (i.e. with a DAG for dependency management).

The dag package is a port over from Terraform to Packer, changing what little there was to fit our current dependency ecosystem. Most of the changes are on the type of diagnostics returned, as Terraform has its own type for them, while we rely on hcl's Diagnostics. Other than that, the functionality is essentially equivalent, and the code was barely touched.

As we have finished setting-up the codebase for it, this commit adds the logic that uses the internal DAG package, and is able to orchestrate evaluation of datasources and locals in a non-phased way. Instead, this code acts by first detecting the dependencies for those components, builds a graph from them, with edges representing the dependency links between them, and finally walking on the graph breadth-first to evaluate those components. This can act as a drop-in replacement for the current phased logic, but both should be supported until we are confident that the approach works, and that there are little to no bugs left to squash.

Following up on the DAG work, this commit adds a new option for initialisation that disables DAG on request. By default we are going to use the DAG approach, with an option to fallback to using the older algorithm for evaluation in case users end-up in an edge-case that prevents them from building a template.

For all the commands that call Initialise, we introduce a new flag: UseSequential. This disables DAG scheduling for evaluating datasources and locals as a fallback to the newly introduced DAG scheduling approach. `hcl2_upgrade` is a special case here, as the template is always JSON, there cannot be any datasource, so the DAG in this case becomes meaningless, and is not integrated in this code path.

The implementation of the DAG as extracted from Terraform relied on a Root vertex being injected into the graph as the last node to visit. This is used as a sanity check for Terraform, but doesn't apply to our use-case for now, as we are always executing everything and have no need for this root node. Instead, we change how Validate operates so it does not error in case there is no valid root node for the graph, but enables us calling it to check for self-referencing edges, and circular dependencies.

Local variables had an attribute called Name with the name of the local variable. However, when producing an error while walking the DAG of local/datasources, if an error is encountered during validation, the raw structure of the vertex was printed out, making the error message produced hard to understand. Therefore in order to clean it up, we rename the `Name` attribute for Local variables as `LocalName`, and introduce a `Name()` function for that block so that the complete name of the variable is clearly reported.

Walk uses a reverse topological order to walk on the graph, doing that visit concurrently if possible. This is nice as we can speed-up execution of datasources and locals, however since the `Variables` map stored in the config, and the production of the context for it, are not meant to be used concurrently, this means that we end-up in cases where Packer crashes because of concurrent accesses to that map. So until we can change this behaviour, we will fallback to using the sequential visit algorithm for those vertexes, therefore limiting the risk of those conflicts.

When preparing the datasources to add into the DAG for evaluating the build prerequisites, we ended-up in a weird situation in which the datasources for each vertex pointed to the same one. This is because of the loop semantics of Go, where the same object is reused over and over again during each loop, so in the end every datasource vertex pointed to the same instance of a datasource block. To avoid this, we instead grab them through their reference, making the reference to the datasource purely local, and pointing to the actual datasource block, not the one scoped to the function.

Evaluating local variables used to be directly written to the PackerConfig while each variable was created. This was somewhat of an issue with testing, as we have a bunch of tests that relied on `PackerConfig.Variables` being set only when we actually write something. This is not really a concern for normal use, just for testing, but to limit the number of changes to the tests in hcl2template, I opted to change how variables' values are retained, so that evaluating a single variable returns a Variable in addition to hcl.Diagnostics, so we can reify the approach and only create the map of variables if there's something evaluated.

Since we introduce the DAG with this series of commits, only on locals and data sources, we need to make sure that the behaviour is what we expect. Therefore, this commit adds a basic test with Packer build, and packer validate, to evaluate a template with both locals and data sources depending on one another. This is rejected with the sequential evaluation methods, as we process the different types one-by-one, whereas the DAG allows us to mix the order between the two, while still rejecting circular dependencies (and doing that before they even get evaluated), and self-references.

Since the DAG package was lifted from Terraform, its contents are more than what we need for now, so this commit cleans-up the package to keep only the currently needed parts of code. If we need to support more in the future, we can revert this commit, or pickup the changes again from Terraform.

github-actions · 2024-11-29T02:17:12Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

lbajolet-hashicorp added enhancement stage/needs-discussion labels Aug 30, 2024

lbajolet-hashicorp force-pushed the dag_for_local_data_eval branch 5 times, most recently from 9946fe1 to 81511d4 Compare September 5, 2024 19:40

lbajolet-hashicorp force-pushed the dag_for_local_data_eval branch from 7fae5b6 to ac566c9 Compare September 9, 2024 15:10

lbajolet-hashicorp marked this pull request as ready for review September 11, 2024 18:27

lbajolet-hashicorp requested a review from a team as a code owner September 11, 2024 18:27

JenGoldstrich reviewed Sep 13, 2024

View reviewed changes

JenGoldstrich approved these changes Oct 25, 2024

View reviewed changes

lbajolet-hashicorp added 15 commits October 28, 2024 10:54

lbajolet-hashicorp force-pushed the dag_for_local_data_eval branch from ac566c9 to 2c552e7 Compare October 29, 2024 19:55

lbajolet-hashicorp force-pushed the dag_for_local_data_eval branch from 2c552e7 to 539b85d Compare October 29, 2024 20:00

lbajolet-hashicorp merged commit 9076c7b into main Oct 29, 2024
11 checks passed

lbajolet-hashicorp deleted the dag_for_local_data_eval branch October 29, 2024 20:10

github-actions bot locked as resolved and limited conversation to collaborators Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dag for local data eval #13155

Dag for local data eval #13155

lbajolet-hashicorp commented Aug 30, 2024

JenGoldstrich left a comment

JenGoldstrich Sep 13, 2024

lbajolet-hashicorp Sep 13, 2024

JenGoldstrich Sep 13, 2024

lbajolet-hashicorp Sep 13, 2024

JenGoldstrich left a comment •

edited

Loading

github-actions bot commented Nov 29, 2024

	// for cycle detection, but strongly connected components have widespread
	// for cycle detection.

Dag for local data eval #13155

Dag for local data eval #13155

Conversation

lbajolet-hashicorp commented Aug 30, 2024

JenGoldstrich left a comment

Choose a reason for hiding this comment

JenGoldstrich Sep 13, 2024

Choose a reason for hiding this comment

lbajolet-hashicorp Sep 13, 2024

Choose a reason for hiding this comment

JenGoldstrich Sep 13, 2024

Choose a reason for hiding this comment

lbajolet-hashicorp Sep 13, 2024

Choose a reason for hiding this comment

JenGoldstrich left a comment • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Nov 29, 2024

JenGoldstrich left a comment •

edited

Loading