Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration tools to migrate serialized data from 3.x to 7.0 #8214

Draft
wants to merge 5 commits into
base: 3.x
Choose a base branch
from

Conversation

benjaminpetit
Copy link
Member

@benjaminpetit benjaminpetit commented Dec 7, 2022

The goal of this PR is to introduce a set of tools to enable data migration from 3.x to 7.0.

For the moment:

  • only offline data migration will be supported
  • no in-place conversion: all data should be written to another storage account/table name/container
  • Azure Table/Blob are the first implementations to be supported.
  • Consider these tools as experimental. Please test them in non-production data before.
Microsoft Reviewers: Open in CodeFlow

@benjaminpetit benjaminpetit marked this pull request as draft December 7, 2022 12:20
@ReubenBond
Copy link
Member

What needs to be done before this is ready for review?

@ReubenBond ReubenBond self-assigned this Jan 19, 2023
/// Get all entries in storage
/// </summary>
/// <returns>The entries in storage</returns>
IAsyncEnumerable<StorageEntry> GetAll(CancellationToken cancellationToken);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this cancellation token be marked with the [EnumeratorCancellation] attribute?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there should be an optional "fromKey" parameter so that enumeration can skip already-processed keys. Maybe it's not necessary at this stage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ReadAllAsync would align better with other method names

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we dont need it on the interface type, since it requires the specific implementation of the method:
image

So we should use it on every single method implementation, but not on the interface level. Also @benjaminpetit added it in the Orleans.Storage.AzureBlobGrainStorage in example

/// <param name="input">The data to deserialize.</param>
/// <typeparam name="T">The output type.</typeparam>
/// <returns>The deserialized object.</returns>
T Deserialize<T>(BinaryData input);
Copy link
Member

@ReubenBond ReubenBond Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to solve the issue of string vs binary data here. Eg, for JSON-based storage systems like Postgres (json columns), Mongo DB, & CosmosDB

/// <inheritdoc/>
public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
{
throw new NotImplementedException();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to delete these commented-out bits before merge.

{
private readonly string _filename;

public CsvDataReader(string filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool


[SkippableTheory]
[CsvDataReader("grain-references.csv")]
public void GrainRefenceSerializerTest(Type grainInterfaceType, string key, string keyExt, string expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we re-generate this csv, should that be documented?

@galvesribeiro
Copy link
Member

How far are we from have this? We are already talking about Orleans 8 but we can't move to 7 because of the storage/reminders migration.

public async IAsyncEnumerable<StorageEntry> GetAll([EnumeratorCancellation] CancellationToken cancellationToken)
{
var regex = new Regex("(?<name>[^-]+)-(?<reference>[^-]+).json");
await foreach (var item in this.container.GetBlobsAsync(cancellationToken: cancellationToken))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need a way to store progress periodically so that we make forward progress after a failure. Eg, store the last migrated key periodically and scan blob names starting from that id on recovery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be advantageous to support a graceful "pausing" of the migration. Some environments may only want to run this during periods of lower systems load.

Copy link
Member

@ReubenBond ReubenBond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

@ghost
Copy link

ghost commented Jun 15, 2023

This pull request has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 7 days. It will be closed if no further activity occurs within 7 days of this comment.

@ReubenBond
Copy link
Member

bad bot!

@ghost
Copy link

ghost commented Jun 23, 2023

This pull request has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 7 days. It will be closed if no further activity occurs within 7 days of this comment.

@ghost ghost closed this Jun 30, 2023
@galvesribeiro
Copy link
Member

What happened? Bot went rogue?

@benjaminpetit
Copy link
Member Author

Bad bot!

@benjaminpetit benjaminpetit reopened this Jun 30, 2023
@ghost ghost removed the Needs: author feedback label Jun 30, 2023
@DeagleGross
Copy link

we need to support translation of the Reminders as well (done by Benjamin in src/Orleans.Persistence.Migration/GrainReferenceExtractor.cs, I just need to connect it in the IReminderTable)

@DeagleGross
Copy link

supported reminders and finalized the grains migrations here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants