Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Console app prototype for stream analytics replacement #78

Merged
merged 17 commits into from
Jan 6, 2021

Conversation

wi-y
Copy link
Contributor

@wi-y wi-y commented Nov 21, 2020

This is the initial prototype for an oss stream analytics replacement console app.

Before running the app, you will also need fill out appsettings.json with the appropriate event hub and storage account information, and well as instruct the app to read from the appropriate event hub (determined by Console:EventHub).

To run the app, navigate to the console folder of the project and set Microsoft.Health.Fhir.Ingest.Console as the startup project, and then build/run. Some logging info will appear in the console for number of events processed for a given window.

You can adjust the window in appsettings.json via EventBatching:FlushTimespan and EventBatching:MaxEvents.

There are some todos remaining which include:

  • removing hard-coded template files
  • logging
  • managed identity (with event hubs, storage accounts, fhir server)
  • testing min and max batch window interval sizes and queue sizes

src/console/devicecontent.json Outdated Show resolved Hide resolved
{
while (currentEnqueudTime >= _windowEnd)
{
_windowEnd = _windowEnd.Add(_flushTimespan);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to do this with math. The difference between the times will result in a timespan. You should be able to determine how many times bigger the time is from the window end. You can then scale the window end by X amount of the timespan.

{
Console.WriteLine($"Threshold wait reached. Flushing {_eventQueues[queueId].GetQueueCount()} events up to: {windowEnd}");
var events = await GetQueue(queueId).Flush(windowEnd);
await _eventConsumerService.ConsumeEvents(events);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to increment the window time here as well correct? If we don't any new event that arrives will immediately trigger a new batch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to increment here. This method is only called when we haven't received an event in a while, and will only flush if the window end is sufficiently in the past.

If we get a new event then it has to be outside the current window end, we will flush the previous window which has 0 events and advance the window to contain this new event.

The main reason why I didn't want to increment here is to avoid a race condition with the other flushing mechanisms (ThresholdTimeReached and ThresholdCountReached). Perhaps this should be better designed though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a race condition they should be processed on the same thread? We had a race condition in my version since we had a timer triggering the processing.

public void ConfigureServices(IServiceCollection services)
{
var outputEventHubConnection = Configuration.GetSection("OutputEventHub").Value;
var outputEventHubName = outputEventHubConnection.Substring(outputEventHubConnection.LastIndexOf('=') + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var outputEventHubName = outputEventHubConnection.Substring(outputEventHubConnection.LastIndexOf('=') + 1); [](start = 12, length = 107)

Should be able to use the event hub connection string builder here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we move to Managed Identity, we probably want to have setting for the different components the connection string represents (minus the SAS token):
ConsumerGroup
FullyQualifiedNamespace
EventHubName

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, once this is in I am planning on making stories for supporting MI. There are few things I would like to iterate on overall in this code but I would like to this initial build in as a base committed to master so we have the history.


In reply to: 544605846 [](ancestors = 544605846)

@dustinburson dustinburson self-requested a review December 15, 2020 17:46
Copy link
Member

@dustinburson dustinburson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wi-y I re-reviewed. I think you addressed most of the must fix items. I would like to get at least one other person's review here though and get another point of view. It also looks like the build isn't triggering for this PR. Is that something you could look into?

Overall I think we may want to iterate on this some more but rather than do it here I would like to get into master and do follow up with some additional PRs. Make sense?

@wi-y
Copy link
Contributor Author

wi-y commented Dec 15, 2020

@wi-y I re-reviewed. I think you addressed most of the must fix items. I would like to get at least one other person's review here though and get another point of view. It also looks like the build isn't triggering for this PR. Is that something you could look into?

Overall I think we may want to iterate on this some more but rather than do it here I would like to get into master and do follow up with some additional PRs. Make sense?

@dustinburson thank you for reviewing. I agree with getting this into master and iterating. I'll look into the builds not triggering.

return Task.CompletedTask;
}

async Task ProcessInitializingHandler(PartitionInitializingEventArgs initArgs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplifying the codebase by using the built in checkpointing sounds like a resonable suggestion. If we find that the extra values being checkpointed (i..e enqueuedDateTime) don't provide value then we could explore a refactor.

private ITelemetryLogger _logger;

public Processor(
[Blob("template/%Template:FhirMapping%", FileAccess.Read)] string templateDefinition,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was required to replace ":" with "_" in order to deploy this in a Linux container.

public void ConfigureServices(IServiceCollection services)
{
var outputEventHubConnection = Configuration.GetSection("OutputEventHub").Value;
var outputEventHubName = outputEventHubConnection.Substring(outputEventHubConnection.LastIndexOf('=') + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we move to Managed Identity, we probably want to have setting for the different components the connection string represents (minus the SAS token):
ConsumerGroup
FullyQualifiedNamespace
EventHubName

src/console/IomtLogging.cs Outdated Show resolved Hide resolved
_publisherTimer.Enabled = true;
}

private async void OnTimedEvent(object source, ElapsedEventArgs e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Dustin. I think that the checkpoint should be the completed work "source of truth". Should this process crash for some reason, we could lose checkpoints that were saved in memory. Also, ListCheckpointsAsync, only lists checkpoints that are in blob, and will not include the checkpoints in memory. This seems like trouble.

DateTimeOffset enqueuedTime,
IDictionary<string, object> properties,
IReadOnlyDictionary<string, object> systemProperties)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it makes sense to do null checks here. It seems like some of these properties are "required", so enforcing this on the initializer might make sense. That or maybe in the factory.

Copy link
Member

@dustinburson dustinburson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

wi-y and others added 5 commits January 5, 2021 16:43
* template and script for deploying sa replacement as webjobs

* fix template bug

* log measurment metrics

* fix params

* update arm template and deploy script

* set alwaysOn

* fix errors/telemetry
@wi-y wi-y merged commit d7fc484 into master Jan 6, 2021
@dustinburson dustinburson deleted the personal/wiyochum/event-hub-read-and-batch branch November 8, 2022 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants