Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orleans.TestingHost support test clusters without legacy configuration #3878

Merged
merged 1 commit into from
Jan 29, 2018

Conversation

ReubenBond
Copy link
Member

@ReubenBond ReubenBond commented Jan 17, 2018

This should allow us to more easily migrate configuration over to non-legacy classes bit-by-bit as well as hopefully allow us to run our tests on .NET Core (i.e, by removing the reliance on creating AppDomains)

cc @jdom, who performed the initial work

}

private class TestSiloBuilderFactory : ISiloBuilderFactory
private class TestSiloBuilderFactory : ISiloBuilderConfigurator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could rename to TestSiloBuilderConfigurator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yeah, I renamed some, but I'm sure I missed many

@jdom
Copy link
Member

jdom commented Jan 18, 2018

  public virtual TestCluster CreateTestCluster()
  {
      return new TestCluster();
  }

Is this fine or should it use new TestClusterBuilder().Build() instead? Just asking, because I don't remember if the defaults are useable


Refers to: test/TestExtensions/HostedTestClusterBase.cs:62 in 2db2741. [](commit_id = 2db27411cdf8c8770e2b93c389598b3386a1b7a8, deletion_comment = False)

@ReubenBond
Copy link
Member Author

The latter, will fix. BTW, your comment isn't anchored to the code.

{
public interface ITestClusterManagementGrain : IGrainWithGuidKey
{
Task<T> GetService<T>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't add this grain here (in the package that we ship) since almost no service would be serializable. The only one I can think of is the legacy config objects, so wouldn't pollute the surface with this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, should I make a more specific legacy-themed grain to get config objects or try to retrieve them some other way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was just to get the gateway endpoint, you probably don't even need that, you can derive it from the options, or even ask the client

{
var gwEndpoint = this.HostedCluster.Primary.NodeConfiguration.ProxyGatewayEndpoint;
var clusterConfig = await this.Client.GetGrain<ITestClusterManagementGrain>(Guid.Empty).GetService<ClusterConfiguration>();
var gwEndpoint = clusterConfig.Overrides[Silo.PrimarySiloName].ProxyGatewayEndpoint;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other ways to get the proxy endpoint without the need to resort to serializing the config back from the silo (and in fact that would break if you stop using the legacy config).
If you hold on to the TestClusterOptions, you can peek in there. Otherwise there's also the opportunity to get the gateway list provider options from the client itself without asking the silo for the config

/// <summary>Get the proxy address of the silo</summary>
public SiloAddress ProxyAddress => SiloAddress.New(this.NodeConfiguration.ProxyGatewayEndpoint, 0);
///// <summary>Get the proxy address of the silo</summary>
public SiloAddress GatewayAddress { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default implementation I think could just look for this information in the TestClusterOptions if it uses test cluster membership.

This way other subclasses do not have to rely on extended RPC for it to work (such as MarshalByRefObject as we have in the AppDomainSiloHandle implementation)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted this so that I could query it for some tests. It seems right that if SiloAddress is there, then GatewayAddress should also be there.

On the other hand, I'm not confident about the overall class structure for this change (i.e, could things be separated more cleanly?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's fine, I don't really mind if we keep this way for now. The reality is that I initially was retrieving it from the container in that way before I did all the work to move to strongly typed config. Today you can just as well get it directly from the configuration.
But as I said, it's fine... Both AppDomain and in-domain strategies can RPC from the TestCluster... If we have other approaches (like out of process or containers) we can refactor it so that the silo handle infers it from the config instead of asking the remote host.

var clientConfiguration = GetOrCreateClientConfiguration(services, configuration);

// Test is running inside debugger - Make timeout ~= infinite
clientConfiguration.ResponseTimeout = TimeSpan.FromMilliseconds(1000000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Once we move this to strongly typed options, we can get rid of the dependency to the legacy config objects in the TestClusterHostFactory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to LegacyTestClusterConfiguration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reality is that this is useful in the non-legacy case too going forward. I would keep this by default, it's just that when you eventually move this setting out to a strongly typed option, then you can stop relying on the legacy configuration entirely

@ReubenBond
Copy link
Member Author

ReubenBond commented Jan 18, 2018

I'm thinking of modifying TestClusterPerTest.CreateTestCluster (and similar base class methods) so that instead of returning TestCluster, it consumes a TestClusterBuilder. That way, the base class has a chance to modify the configuration and, eg, insert the defaults loaded from OrleansTestSecrets.json / env vars - the ones loaded in TestDefaultConfiguration.

Something like ~void ConfigureTestCluster(TestClusterBuilder builder). We could also modify LegacyTestClusterBuilder so that it's no longer a subclass of TestClusterBuilder, but instead offers an extension method to let the consumer configure the legacy configuration, eg:

void ConfigureTestCluster(TestClusterBuilder builder)
{
  // probably this would be done in the base class.
  builder.AddTestEnvironmentConfiguration();

  // Set some legacy configuration options.
  builder.ConfigureLegacyConfiguration(cfg =>
  {
    cfg.ClusterConfiguration.SomeProperty = 3;

    cfg.ClientConfiguration.SomeProperty = 3
  });
}

What do you think?

EDIT: the latest commit implements that change to ConfigureTestCluster, but not "AddTestEnvironmentConfiguration"

@jdom
Copy link
Member

jdom commented Jan 18, 2018

I do like the change to use extension methods on the 1 and and only test cluster builder instead of inheritance 👍 👍 in fact now that I remember, I thought of avoiding inheritance too, but never got to it.

@@ -56,13 +50,6 @@ public void Shutdown()
{
this.host.StopAsync().GetAwaiter().GetResult();
}

private void InitializeTestHooksSystemTarget()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this part of logic moved to ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
var siloName = options.GetSiloSpecificOptions(i).SiloName;

this.ClusterConfiguration.GetOrCreateNodeConfigurationForSilo(siloName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why go back to defining these? can't you avoid it and only do so specifically if a particular test was incorrectly relying on it?
I'd like to prevent re-adding this at the infrastructure level if it was just for an edge case test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to avoid it, too, but it's used in non-test code in a non-trivial way which was making tests fail. I had a fix for the one test usage I sent to you, but I reverted it and went with this approach.

return _clusterConfiguration.Overrides.Keys.ToList();

It will [have to] be addressed when we move away from ClusterConfiguration internally

{
public Task<Guid> GetServiceId()
{
return Task.FromResult(this.ServiceProvider.GetRequiredService<GlobalConfiguration>().ServiceId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not get this from the strongly typed option (instead of the legacy config) which is the authoritative source of truth?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix

@ReubenBond
Copy link
Member Author

@dotnet-bot test bvt

@xiazen
Copy link
Contributor

xiazen commented Jan 19, 2018

can we have some write up about new interfaces and major changes, so it is easier to review?

@ReubenBond
Copy link
Member Author

@xiazen the gist of it is this:

  • TestCluster creation now has a builder pattern via TestClusterBuilder
  • Legacy configuration was removed from TestCluster and into separate classes which can be added to the TestClusterBuilder by using the ConfigureLegacyConfiguration extension method
  • Instead of serializing configuration objects over an AppDomain boundary, everything is configured via the IConfiguration system
  • The hope is to be able to swap the AppDomainSiloHandle implementation for different implementations (eg, in-proc, new proc, container, whatever) later, since it's easier to serialize the IConfigurationSource objects which are used to create configuration.

if (useTestClusterMemebership)
if (useTestClusterMemebership
&& services.All(svc => svc.ServiceType != typeof(IMembershipTable))
&& services.All(svc => svc.ServiceType != typeof(IMembershipOracle)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could avoid double enumeration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this and make sure UseTestClusterMembership is correctly set instead

@ReubenBond ReubenBond changed the title [WIP] Orleans.TestingHost support test clusters without legacy configuration Orleans.TestingHost support test clusters without legacy configuration Jan 22, 2018
@ReubenBond
Copy link
Member Author

https://ci.dot.net/job/dotnet_orleans/job/master/job/functional_prtest/1013/

14:12:09 No executable found matching command "dotnet-xunit"
14:12:09     + CategoryInfo          : NotSpecified: (No executable f... "dotnet-xunit" 
14:12:09    :String) [], RemoteException
14:12:09     + FullyQualifiedErrorId : NativeCommandError
14:12:09     + PSComputerName        : localhost

@jdom
Copy link
Member

jdom commented Jan 23, 2018

@dotnet-bot test this please

@ReubenBond
Copy link
Member Author

Please review but don't merge (I'm sure there will be changes requested anyway) - I need to ensure we have a green run in VSO first.

@jdom
Copy link
Member

jdom commented Jan 23, 2018

yup, sure, pay special attention to Azure tests, since they don't run here, as using different membership providers are directly impacted with how this new test infrastructure works. Should be trivial to resolve if there are issues though

@@ -284,42 +237,39 @@ public static TimeSpan GetLivenessStabilizationTime(GlobalConfiguration global,
/// <summary>
/// Start an additional silo, so that it joins the existing cluster.
/// </summary>
/// <returns>SiloHandle for the newly started silo.</returns>
/// <returns>SiloHandle2 for the newly started silo.</returns>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Search for old usages of SiloHandle2 and replace with SiloHandle

///// <summary>
///// Restart a previously stopped.
///// </summary>
///// <param name="siloName">Silo to be restarted.</param>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncomment XML comments

@jdom
Copy link
Member

jdom commented Jan 23, 2018

LGTM. Ship it! :)

Copy link
Contributor

@xiazen xiazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except two minor comments

/// <summary>
/// Configures the client builder.
/// </summary>
public abstract void Configure(IConfiguration configuration, IClientBuilder clientBuilder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have an extra ClientBuilderConfiguratorBase.Configure method here on top of IClientBuilderConfigurator.Configure? if they are supposed to do different things, maybe name them different

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit IClientBuilderConfigurator implementation does some work before calling the abstract method. I could collapse the class hierarchy instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be dumb here, but why cannot we merge two Configure method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can, it just means moving this code into any consumer. The idea was that this class would be reusable, but ... YAGNI

}
catch (FileNotFoundException)
{
configuration = new ClientConfiguration();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure swallowing FileNotFoundException is a good idea, didn't this change original behavior? And for users who wanted to do StandardLoad from a config file, it hided exception from them which make it harder for them to notice that they made a mistake somewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, this is for the path we're users didn't explicitly pass a config object. If they do want, then they'll get the exception in their code when they try to load the config.
Btw, this code path will be removed as soon as the runtime itself stops requiring the legacy config in the first place

@@ -41,7 +41,7 @@ protected override void ConfigureTestCluster(TestClusterBuilder builder)
{
Guid serviceId = Guid.NewGuid();
builder.Options.InitialSilosCount = 4;

builder.Options.UseTestClusterMembership = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't other external membership tests (non azure) affected by the same issue. They wouldn't show up as we don't run them in CI

<ItemGroup>
<Compile Remove="Grains\**" />
<EmbeddedResource Remove="Grains\**" />
<None Remove="Grains\**" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's all this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought I fixed that - my bad.

@jdom
Copy link
Member

jdom commented Jan 24, 2018

There are conflicts now. If you resolve them and tests are passing, I'll merge it asap

@ReubenBond
Copy link
Member Author

@dotnet-bot test functional

@ReubenBond
Copy link
Member Author

@dotnet-bot test bvt

@ReubenBond
Copy link
Member Author

I'm happy for us to merge this now

@jdom
Copy link
Member

jdom commented Jan 29, 2018

I was about to on Friday, but some multi cluster tests in vso are failing because of this. Should I merge anyway or wait until they are fixed?

@sergeybykov
Copy link
Contributor

There were other issues with the multi-cluster tests. I think there's no need to block this waiting for them to be investigated.

@jdom
Copy link
Member

jdom commented Jan 29, 2018

Oh, OK then. I'll merge as soon as the conflicts are resokved

@ReubenBond
Copy link
Member Author

resolved

@xiazen xiazen merged commit f7c3a73 into dotnet:master Jan 29, 2018
@jdom
Copy link
Member

jdom commented Jan 29, 2018

oh, too slow :(
Was waiting for tests to validate the rebase

@xiazen
Copy link
Contributor

xiazen commented Jan 29, 2018

:) yeah. you are late for the party ;) . I verified that it passed in vso with Reuben. So I merged it

@ReubenBond ReubenBond deleted the test-cluster branch January 29, 2018 21:51
@ReubenBond
Copy link
Member Author

ReubenBond commented Jan 29, 2018

The tests passed in .NET CI, so it's all good. I'm just glad this is merged.

@jdom
Copy link
Member

jdom commented Jan 29, 2018

Yeah, sorry for the delay, I was thinking that you were planning to fix the multi-cluster tests... I was running tests in VSO for the entire week last week, waiting for the moment they passed so I could merge :)

@github-actions github-actions bot locked and limited conversation to collaborators Dec 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants