Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22528 Improve date formatting performance #2653

Merged
merged 1 commit into from
Oct 14, 2023

Conversation

mohd-akram
Copy link
Contributor

@mohd-akram mohd-akram commented Oct 3, 2023

This improves date formatting performance to be 1.5-2x as fast.

Script to generate pattern char lookup table
const chars = [
  0x47, 0x79, 0x4D, 0x64, 0x6B, 0x48, 0x6D, 0x73, 0x53, 0x45,
  0x44, 0x46, 0x77, 0x57, 0x61, 0x68, 0x4B, 0x7A, 0x59, 0x65,
  0x75, 0x67, 0x41, 0x5A, 0x76, 0x63, 0x4c, 0x51, 0x71, 0x56,
  0x55, 0x4F, 0x58, 0x78, 0x72, 0x62, 0x42,
  0x3a,
];
process.stdout.write(' '.repeat(7));
for (let c = 0; c < 128; c++) {
  if (c % 16 == 0) process.stdout.write('\n'+' '.repeat(7));
  process.stdout.write(String(chars.indexOf(c)).padStart(3)+',');
}
process.stdout.write('\n');
Benchmarking program
#include <chrono>
#include <iostream>
#define U_SHOW_CPLUSPLUS_API 1
#include "unicode/smpdtfmt.h"
#include "unicode/ustream.h"

using namespace icu;

int main(int argc, char *argv[])
{
	UErrorCode success = U_ZERO_ERROR;
	auto format = argc > 2 ? argv[2] : "hh:mm:ss a OOOO";
	SimpleDateFormat df(UnicodeString(format), success);
	UnicodeString myString;
	for (int32_t i = 0; i < 1; ++i) {
		myString.remove();
		auto x = df.format(1695464381915, myString);
	}
	int n = argc > 1 ? atoi(argv[1]) : 10000;
	auto start = std::chrono::high_resolution_clock::now();
	for (int32_t i = 0; i < n; ++i) {
		myString.remove();
		auto x = df.format(1695464381915, myString);
	}
	std::cout << myString << "\n";
	auto finish = std::chrono::high_resolution_clock::now();
	auto ms = std::chrono::duration_cast<std::chrono::microseconds>(finish-start).count()/1000.0;
	std::cout << ms << "ms\n";
}
Checklist
  • Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22528
  • Required: The PR title must be prefixed with a JIRA Issue number.
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number.
  • Issue accepted (done by Technical Committee after discussion)

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/unistr.cpp is different
  • icu4c/source/i18n/smpdtfmt.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Copy link
Contributor

@richgillam richgillam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks great, but I do have a few small things...

icu4c/source/i18n/datefmt.cpp Show resolved Hide resolved
icu4c/source/i18n/datefmt.cpp Show resolved Hide resolved
icu4c/source/i18n/calendar.cpp Show resolved Hide resolved
icu4c/source/i18n/calendar.cpp Outdated Show resolved Hide resolved
icu4c/source/i18n/dtfmtsym.cpp Show resolved Hide resolved
icu4c/source/i18n/gregoimp.cpp Outdated Show resolved Hide resolved
icu4c/source/i18n/gregoimp.cpp Outdated Show resolved Hide resolved
icu4c/source/i18n/smpdtfmt.cpp Outdated Show resolved Hide resolved
icu4c/source/i18n/smpdtfmt.cpp Outdated Show resolved Hide resolved
icu4c/source/i18n/unicode/formattednumber.h Show resolved Hide resolved
@mohd-akram mohd-akram requested a review from richgillam October 11, 2023 13:53
richgillam
richgillam previously approved these changes Oct 13, 2023
Copy link
Contributor

@richgillam richgillam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I had maybe one more suggestion for you, but I'm happy with this the way it is. Squash your commits (and optionally make the other change), and I'll approve the final version.

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/smpdtfmt.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Copy link
Contributor

@richgillam richgillam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized I should have asked you about Java. We usually try to keep our C++ and Java code as similar as possible. But these all seem like pretty C++-specific changes, so maybe you don't need to worry about that.

@mohd-akram
Copy link
Contributor Author

Thanks @richgillam for the review!

@richgillam
Copy link
Contributor

Do you have merge privileges, or do you need me to merge for you?

@mohd-akram
Copy link
Contributor Author

If you could merge it that would be great as I don't have merge privileges.

@richgillam richgillam merged commit 3d1dee6 into unicode-org:main Oct 14, 2023
@richgillam
Copy link
Contributor

Done! Thank you for the fix!

@mohd-akram mohd-akram deleted the perf-dateformat branch October 14, 2023 04:47
@FrankYFTang
Copy link
Contributor

we recently got some fuzzer detected overflow from these new code :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants