Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logic for respecting the package auto-delete configuration #5415

Merged
merged 1 commit into from
Feb 8, 2018

Conversation

joelverhagen
Copy link
Member

Address https://github.com/NuGet/Engineering/issues/921.

Could I get some crazy nitpick 👀s on this PR? There is risk associated with this change as it allows users to delete their own packages in some cases.

I check both the DB and the statistics reports for download counts. This is to improve resiliency as separate jobs write these two artifacts.

if (package.PackageRegistration.DownloadCount > _config.MaximumDownloadsForPackageId.Value)
{
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are cases when we can avoid _statisticsService.Refresh() which is expensive. Restructure the code to be cheaper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I really only need it before the stale check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

var sinceCreated = now - package.Created;

// Always refresh the statistics.
await _statisticsService.Refresh();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_statisticsService.Refresh() [](start = 18, length = 28)

We will want to analyze the success of the feature later on and understand how to improve it. Add traces that help understand why we returned false. Consider structuring if in a way that will allow creating a table of input properties and result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

@scottbommarito scottbommarito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything looks fine to me


// Handle the "early" delete case, where the package version download count is not considered but total
// download count on the entire ID (all versions) is considered.
if (_config.StatisticsUpdateFrequencyInHours.HasValue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the name StatisticsUpdateFrequencyInHours is a bit misleading...potentially something along the lines of ExpectedStatisticsUpdateFrequencyInHours would be more accurately convey that stats are not GUARANTEED to be updated every X hours.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expected is good. I'll add that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I decided to not make this change as it would require a config change in NuGetDeployment. This is not worth the effort.

var sinceCreated = now - package.Created;

// Always refresh the statistics.
await _statisticsService.Refresh();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we will refresh stats every time someone goes to the ReportMyPackage page or submit the ReportMyPackage form?

I'm not sure about the logistics of this method (does it always refresh or is it just if the cache is stale?) but if this refresh is not cached, somebody could create a lot of turmoil on the site by hitting either of these two methods repeatedly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refresh is cached and the cache is a singleton.

&& !_config.StatisticsUpdateFrequencyInHours.HasValue;
}

private bool AreStatisticsStale(DateTime now)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a method that should be on the StatisticsService itself, but if it's too much work to move it there it's fine here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale is a relative thing. Different scenarios may or may not have different "stale" definitions. For now the concept only exists in the auto-delete workflow.

$"{nameof(_config.HourLimitWithMaximumDownloads)}.",
nameof(config));
}
}

public Task<bool> CanPackageBeDeletedByUserAsync(Package package)
public async Task<bool> CanPackageBeDeletedByUserAsync(Package package)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on only emitting telemetry when the support request form is actually submitted?

This would reduce the amount of de-duping we would have to do on the telemetry -- but not eliminate all duplicates, since a user could support multiple support requests for a package. I figured it would just be simpler to submit telemetry whenever this method is called and de-dupe later when we are analyzing the data.

As it stands today, telemetry is emitted before the support request form is loaded and after the form is submitted with a deletion request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we'll do the check before, and only show the link if it's possible? I think we should only submit telemetry on submit, not view rendering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we'll do the check before, and only show the link if it's possible?

Correct.

I think we should only submit telemetry on submit, not view rendering.

Yeah, that's where I'm leaning too.

// Handle the "early" delete case, where the package version download count is not considered but total
// download count on the entire ID (all versions) is considered.
if (_config.ExpectedStatisticsUpdateFrequencyInHours.HasValue
&& details.SinceCreated < TimeSpan.FromHours(_config.ExpectedStatisticsUpdateFrequencyInHours.Value))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean owners of popular packages that want to delete will need to unlist immediately, then can delete after the hour limit has expired?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlisted is not a factor here.

Owners of popular packages (IDs > 125,000 downloads) cannot delete their packages at all.

/// allowed since the download data is too stale. If <see cref="StatisticsUpdateFrequencyInHours"/> is null,
/// <see cref="ExpectedStatisticsUpdateFrequencyInHours"/> hours. Also, if the package has not had its download
/// statistics updated in the last <see cref="ExpectedStatisticsUpdateFrequencyInHours"/> hours, the delete is not
/// allowed since the download data is too stale. If <see cref="ExpectedStatisticsUpdateFrequencyInHours"/> is null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be configured based on the frequency of the stats jobs? Should we have a specific recommendation in comments, based on job?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be configured based on the frequency of the stats jobs?

That's not too necessary. As mentioned in the XML doc, this value is used for a stale check (so the value can be larger than the average frequency).

Should we have a specific recommendation in comments, based on job?

Yeah, I'll try to improve this.

StaleStatistics,
TooManyVersionDatabaseDownloads,
TooManyVersionReportDownloads,
TooLate,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the UI always show an option to delete, but present the error message if the criteria aren't met?

Or do we check before showing a link, and after / on submit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the UI always show an option to delete, but present the error message if the criteria aren't met?

Nope, we only show the option if at the time of form load their package is eligible.

Or do we check before showing a link, and after / on submit?

Correct. And if the eligibility has changed since the form was presented, we just treat it as a normal support request.

$"{nameof(_config.HourLimitWithMaximumDownloads)}.",
nameof(config));
}
}

public Task<bool> CanPackageBeDeletedByUserAsync(Package package)
public async Task<bool> CanPackageBeDeletedByUserAsync(Package package)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we'll do the check before, and only show the link if it's possible? I think we should only submit telemetry on submit, not view rendering.


private bool HasVersion(StatisticsFact fact, string version)
{
if (fact != null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if [](start = 12, length = 2)

Consider return xyz; instead of if (xyz) { return true } return false;,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

}

// If no time ranges are configured, allow downloads any time.
if (_config.HourLimitWithMaximumDownloads.HasValue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if [](start = 12, length = 2)

Could we convert the chains of if to else ifs? It'd be a little more clear I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

}
catch
{
// logging failed, don't allow exception to escape
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// logging failed, don't allow exception to escape [](start = 16, length = 50)

Should we maybe log these exceptions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


var hours = details.SinceCreated.TotalHours;

TrackMetric(Events.UserPackageDeleteAfterHours, hours, properties => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

properties [](start = 67, length = 10)

The action here seems unnecessary. I think this just creating a dictionary and using its initializer syntax would be better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it saves the initialization of the dictionary)

@joelverhagen joelverhagen merged commit 044866c into dev Feb 8, 2018
@joelverhagen joelverhagen deleted the jver-fresh-stats branch February 8, 2018 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants