Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

til::rle<T, S> - a run length encoded storage template #8794

Closed
wants to merge 26 commits into from

Conversation

miniksa
Copy link
Member

@miniksa miniksa commented Jan 15, 2021

Introduces til::rle<T, S>. Stores a chosen type T in a run length encoded (compacted) format. For additional memory savings, the counter size S can be chosen as well.

References

PR Checklist

Detailed Description of the Pull Request / Additional comments

Validation Steps Performed

  • Ran cacafire in OpenConsole.exe and it looked beautiful
  • Ran new suite of RunLengthEncodingTests.cpp supporting til::rle<T, S> and derived from the concepts of AttrRowTests.cpp which is obsolete/removed.

@ghost ghost added Area-CodeHealth Issues related to code cleanliness, linting, rules, warnings, errors, static analysis, etc. Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) Issue-Task It's a feature request, but it doesn't really need a major design. Product-Conhost For issues in the Console codebase labels Jan 15, 2021
@miniksa
Copy link
Member Author

miniksa commented Jan 15, 2021

Known TODO list

  • - Deduplicate _at from const and non-const versions
  • - Consider/add validation on til::rle methods and iterators
  • - Finish, defer, or remove non-const iterator (likely requires reference passthrough like std::vector<bool> specialization)
  • - Review hard static_casts and decide between safe/saturating, throwing, debug-only validation, or ignoring the problem. This also goes with the optional bonus feature of inserting an item that is longer than size as it's a similar math problem.
  • - substr() on til::rle to return a subset run.
  • - Discuss further what we need of quick estimation and summing
  • - Audit mode.

@miniksa miniksa requested a review from DHowett January 15, 2021 00:31
@miniksa miniksa self-assigned this Jan 15, 2021
@skyline75489

This comment has been minimized.

src/inc/til/rle.h Outdated Show resolved Hide resolved
src/inc/til/rle.h Outdated Show resolved Hide resolved
src/inc/til/rle.h Outdated Show resolved Hide resolved
src/inc/til/rle.h Outdated Show resolved Hide resolved
src/inc/til/rle.h Outdated Show resolved Hide resolved
@miniksa

This comment has been minimized.

@DHowett
Copy link
Member

DHowett commented Jan 17, 2021

Requests from trying to use with new buffer:

  • replace(off, size, newsize, newthing) (I can construct a new one and assign, but ... will that work properly? I need to replace [one column] with [many columns])
  • resize(newSize, fillValue) let me specify the new fill value instead of extending the final run
  • insert seems more like assign. Can I insert, by pushing subsequent runs forward after the insertion point? Can I insert many? (this could be the basis for a replace where size != newSize)

What error do you get when you remove non-const-_at

@DHowett
Copy link
Member

DHowett commented Jan 17, 2021

Since the new buffer works by counting columns per wchar_t, and sometimes a single character takes up more than one wchar_t, I occasionally need to replace a run that looks like 1 0 0 with 1, or a run that looks like 1 with 2 0. That's a replace+shrink/grow.

I implemented it by replacing the impacted region with 0 0 0 ... (for as many wchar_t as there are) and then replacing the first index with the true column number.

My current backing store grows as necessary when I do that, and it does not have a maximum size (it is a vector).

@github-actions

This comment has been minimized.

src/til/precomp.h Outdated Show resolved Hide resolved
VERIFY_ARE_EQUAL("3|2|1 1 1"sv, rle.slice(2, 7));
// within a run -> within a run
VERIFY_ARE_EQUAL("3|2|1"sv, rle.slice(2, 5));
VERIFY_ARE_EQUAL("3|2|1 1"sv, rle.slice(2, 6));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh man. We have other places in the code where ""sv would be useful...

// NOTE: This can change the size/length of the vector.
void replace(size_type start_index, size_type end_index, const rle_type& new_run)
{
replace(start_index, end_index, { &new_run, 1 });
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this one _check_indicies?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses the non-underscored, non-private version of replace. I could rename _replace to _unchecked_replace to avoid confusion.

@lhecker lhecker force-pushed the dev/miniksa/rle branch from 1822335 to 117e033 Compare May 11, 2021 17:24
@@ -27,8 +27,6 @@ Revision History:
#include "WexTestClass.h"
#endif

#pragma pack(push, 1)
Copy link
Member

@lhecker lhecker May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Without packing this class (and ColorType) it's size increases from just 13 to 14 bytes.
(For some reason the size increase is much larger if you pack ColorType, but I haven't looked into it.)

But this doesn't change the size of til::rle_pair<TextAttribute, UINT>, as the latter UINT length field is 4-byte aligned. This causes padding to be inserted in the rle_pair and it's size to be 20 bytes either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: since the rest of conhost uses SHORT for coordinates, can we consider uint16_t the max size type for an attribute run? Does that help save us a couple bytes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran cacafire for 30s and then ran "cat big.txt" 10 times:

type time cat big.txt private working set
uint32_t 1.54s 36.552MB
uint16_t 1.50s 36.368MB

src/inc/til/rle.h Outdated Show resolved Hide resolved
src/inc/til/rle.h Outdated Show resolved Hide resolved
src/inc/til/rle.h Outdated Show resolved Hide resolved
{
// I couldn't figure out how to attach additional info
// to a failed assertion so I'm doing it this way...
Log::Comment(NoThrowString().Format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yo can use VERIFY_ARE_EQUAL(x, y, NoThrowString().Format(xxxxxxxxxxx)) 😄

@github-actions
Copy link

github-actions bot commented May 14, 2021

@check-spelling-bot Report

Unrecognized words, please review:

  • AAAAA
  • AAAAAAAAAAAAA
  • BBBBB
  • BBBBBBBB
  • Gravell
  • It'd
  • ok'd
  • pws
  • qos
  • RIS
  • sxs
  • sys
  • We'd
  • ZZZZZ
Previously acknowledged words that are now absent AAAAABBBBBBBCCC AAAAABCCCCCCCCC AAAAADCCCCCCCCC actionmap apos bc cang ci conpixels cplinfo crt cw cx df dh dw eb EK ev exeuwp gb gh Gravell's hc hh hk hu Kd KF KJ KU KX lk memcopy mybase nc Nx oa OI Oj Oo oq Ou Ov pb pv pw px py qi QJ qo reserialize rgch ri Rl rmdir ru smalllogo splashscreen storelogo sx sy sz td tf tl tt uk ul Unregistering vf vk wx xa xy Yw yx YZ Zc zd zh ZM zu zy
Some files were were automatically ignored

These sample patterns would exclude them:

^src/host/ft_uia/run\.bat$
^src/host/runft\.bat$
^src/host/runut\.bat$
^src/terminal/adapter/ut_adapter/run\.bat$
^src/terminal/parser/delfuzzpayload\.bat$
^src/terminal/parser/ft_fuzzer/run\.bat$
^src/terminal/parser/ft_fuzzwrapper/run\.bat$
^src/terminal/parser/ut_parser/run\.bat$
^src/tools/integrity/packageuwp/ConsoleUWP\.appxSources$
^src/tools/lnkd/lnkd\.bat$
^src/tools/pixels/pixels\.bat$
^src/tools/texttests/fira\.txt$

You should consider excluding directory paths (e.g. (?:^|/)vendor/), filenames (e.g. (?:^|/)yarn\.lock$), or file extensions (e.g. \.gz$)

You should consider adding them to:

.github/actions/spelling/excludes.txt

File matching is via Perl regular expressions.

To check these files, more of their words need to be in the dictionary than not. You can use patterns.txt to exclude portions, add items to the dictionary (e.g. by adding them to allow.txt), or fix typos.

To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the [email protected]:microsoft/terminal.git repository
on the dev/miniksa/rle branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect/alphabet.txt
.github/actions/spelling/expect/expect.txt
.github/actions/spelling/expect/web.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect/bbe8275f69377c8be14f430cf7a21bc5eada7ace.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
(cat '.github/actions/spelling/excludes.txt' - <<EOF
$should_exclude_patterns
EOF
) |grep .|
sort -f |
uniq > '.github/actions/spelling/excludes.txt.temp' &&
mv '.github/actions/spelling/excludes.txt.temp' '.github/actions/spelling/excludes.txt'
}

comment_json=$(mktemp)
curl -L -s -S \
  --header "Content-Type: application/json" \
  "https://api.github.com/repos/microsoft/terminal/issues/comments/841440426" > "$comment_json"
comment_body=$(mktemp)
jq -r .body < "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")
  

patch_add=$(perl -e '$/=undef;
$_=<>;
s{<details>.*}{}s;
s{^#.*}{};
s{\n##.*}{};
s{(?:^|\n)\s*\*}{}g;
s{\s+}{ }g;
print' < "$comment_body")
  

should_exclude_patterns=$(perl -e '$/=undef;
$_=<>;
exit unless s{(?:You should consider excluding directory paths|You should consider adding them to).*}{}s;
s{.*These sample patterns would exclude them:}{}s;
s{.*\`\`\`([^`]*)\`\`\`.*}{$1}m;
print' < "$comment_body" | grep . || true)

update_files
rm $comment_body
git add -u

@lhecker lhecker force-pushed the dev/miniksa/rle branch from 8f50050 to 24f453b Compare May 14, 2021 19:53
@github-actions
Copy link

github-actions bot commented May 14, 2021

@check-spelling-bot Report

Unrecognized words, please review:

  • accidentially
Previously acknowledged words that are now absent AAAAABBBBBBBCCC AAAAABCCCCCCCCC AAAAADCCCCCCCCC cang memcopy mybase rgch
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the [email protected]:microsoft/terminal.git repository
on the dev/miniksa/rle branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect/alphabet.txt
.github/actions/spelling/expect/expect.txt
.github/actions/spelling/expect/web.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect/bbe8275f69377c8be14f430cf7a21bc5eada7ace.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
}

comment_json=$(mktemp)
curl -L -s -S \
  --header "Content-Type: application/json" \
  "https://api.github.com/repos/microsoft/terminal/issues/comments/841469442" > "$comment_json"
comment_body=$(mktemp)
jq -r .body < "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")
  

patch_add=$(perl -e '$/=undef;
$_=<>;
s{<details>.*}{}s;
s{^#.*}{};
s{\n##.*}{};
s{(?:^|\n)\s*\*}{}g;
s{\s+}{ }g;
print' < "$comment_body")
  
update_files
rm $comment_body
git add -u
✏️ Contributor please read this

By default the command suggestion will generate a file named based on your commit. That's generally ok as long as you add the file to your commit. Someone can reorganize it later.

⚠️ The command is written for posix shells. You can copy the contents of each perl command excluding the outer ' marks and dropping any '"/"' quotation mark pairs into a file and then run perl file.pl from the root of the repository to run the code. Alternatively, you can manually insert the items...

If the listed items are:

  • ... misspelled, then please correct them instead of using the command.
  • ... names, please add them to .github/actions/spelling/allow/names.txt.
  • ... APIs, you can add them to a file in .github/actions/spelling/allow/.
  • ... just things you're using, please add them to an appropriate file in .github/actions/spelling/expect/.
  • ... tokens you only need in one place and shouldn't generally be used, you can add an item in an appropriate file in .github/actions/spelling/patterns/.

See the README.md in each directory for more information.

🔬 You can test your commits without appending to a PR by creating a new branch with that extra change and pushing it to your fork. The check-spelling action will run in response to your push -- it doesn't require an open pull request. By using such a branch, you can limit the number of typos your peers see you make. 😉

🗜️ If you see a bunch of garbage

If it relates to a ...

well-formed pattern

See if there's a pattern that would match it.

If not, try writing one and adding it to a patterns/{file}.txt.

Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

Note that patterns can't match multiline strings.

binary-ish string

Please add a file path to the excludes.txt file instead of just accepting the garbage.

File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

@lhecker lhecker force-pushed the dev/miniksa/rle branch from a41297d to 3f1355d Compare May 14, 2021 20:57
@github-actions
Copy link

github-actions bot commented May 14, 2021

@check-spelling-bot Report

Unrecognized words, please review:

  • accidentially
Previously acknowledged words that are now absent AAAAABBBBBBBCCC AAAAABCCCCCCCCC AAAAADCCCCCCCCC cang memcopy mybase rgch
To accept these unrecognized words as correct (and remove the previously acknowledged and now absent words), run the following commands

... in a clone of the [email protected]:microsoft/terminal.git repository
on the dev/miniksa/rle branch:

update_files() {
perl -e '
my @expect_files=qw('".github/actions/spelling/expect/alphabet.txt
.github/actions/spelling/expect/expect.txt
.github/actions/spelling/expect/web.txt"');
@ARGV=@expect_files;
my @stale=qw('"$patch_remove"');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect/bbe8275f69377c8be14f430cf7a21bc5eada7ace.txt";
use File::Path qw(make_path);
use File::Basename qw(dirname);
make_path (dirname($new_expect_file));
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"$patch_add"');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a)."-".$a cmp lc($b)."-".$b} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;
system("git", "add", $new_expect_file);
'
}

comment_json=$(mktemp)
curl -L -s -S \
  --header "Content-Type: application/json" \
  "https://api.github.com/repos/microsoft/terminal/issues/comments/841501032" > "$comment_json"
comment_body=$(mktemp)
jq -r .body < "$comment_json" > $comment_body
rm $comment_json

patch_remove=$(perl -ne 'next unless s{^</summary>(.*)</details>$}{$1}; print' < "$comment_body")
  

patch_add=$(perl -e '$/=undef;
$_=<>;
s{<details>.*}{}s;
s{^#.*}{};
s{\n##.*}{};
s{(?:^|\n)\s*\*}{}g;
s{\s+}{ }g;
print' < "$comment_body")
  
update_files
rm $comment_body
git add -u
✏️ Contributor please read this

By default the command suggestion will generate a file named based on your commit. That's generally ok as long as you add the file to your commit. Someone can reorganize it later.

⚠️ The command is written for posix shells. You can copy the contents of each perl command excluding the outer ' marks and dropping any '"/"' quotation mark pairs into a file and then run perl file.pl from the root of the repository to run the code. Alternatively, you can manually insert the items...

If the listed items are:

  • ... misspelled, then please correct them instead of using the command.
  • ... names, please add them to .github/actions/spelling/allow/names.txt.
  • ... APIs, you can add them to a file in .github/actions/spelling/allow/.
  • ... just things you're using, please add them to an appropriate file in .github/actions/spelling/expect/.
  • ... tokens you only need in one place and shouldn't generally be used, you can add an item in an appropriate file in .github/actions/spelling/patterns/.

See the README.md in each directory for more information.

🔬 You can test your commits without appending to a PR by creating a new branch with that extra change and pushing it to your fork. The check-spelling action will run in response to your push -- it doesn't require an open pull request. By using such a branch, you can limit the number of typos your peers see you make. 😉

🗜️ If you see a bunch of garbage

If it relates to a ...

well-formed pattern

See if there's a pattern that would match it.

If not, try writing one and adding it to a patterns/{file}.txt.

Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

Note that patterns can't match multiline strings.

binary-ish string

Please add a file path to the excludes.txt file instead of just accepting the garbage.

File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

@miniksa miniksa closed this May 15, 2021
@lhecker lhecker deleted the dev/miniksa/rle branch August 1, 2021 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-CodeHealth Issues related to code cleanliness, linting, rules, warnings, errors, static analysis, etc. Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) Issue-Task It's a feature request, but it doesn't really need a major design. Product-Conhost For issues in the Console codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Abstract AttrRow as til::rle<T>
5 participants