Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add auto language direction in reader #9664

Merged
merged 4 commits into from
Dec 7, 2016
Merged

Add auto language direction in reader #9664

merged 4 commits into from
Dec 7, 2016

Conversation

yurynix
Copy link
Contributor

@yurynix yurynix commented Nov 28, 2016

This pull request addresses the issue of people using calypso in LTR interface ( English )
and reading blogs in RTL language ( such as Hebrew )

Before After
screen shot 2016-11-28 at 15 52 52 screen shot 2016-11-28 at 15 18 36
screen shot 2016-11-28 at 15 57 39 screen shot 2016-11-28 at 15 26 01
screen shot 2016-11-28 at 15 59 53 screen shot 2016-11-28 at 16 00 10

Testing instructions

  1. Run git checkout add/auto-direction and start your server, or open a live branch
  2. Open the Home page
  3. Visit reader's following stream and assert you see the posts in the correct directionality
  4. Visit reader's full post view and assert you see the post and comments in the correct directionality
  5. Start typing in the comments textarea, it should change directionality according to what you type ( RTL / LTR lang )

Reviews

  • Code
  • Product

@Automattic/sdev-feed
/cc @yoavf @bgbg @ranh @danieldanilov

@matticbot
Copy link
Contributor

@yurynix yurynix added the [Feature] Reader The reader site on Calypso. label Nov 28, 2016
return null;
};

// Copied from: https://github.com/twitter/RTLtextarea/blob/master/src/RTLText.module.js#L46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blowery
Copy link
Contributor

blowery commented Nov 28, 2016

This is really cool! How does it impact rendering perf?

* @param {Object.children} props must contain some children
* @returns {ReactElement}
*/
const AutoDirection = ( { children } ) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to implement a meaningful shouldComponentUpdate for AutoDirection? Seems like a lot of work to repeat every time a child updates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, because every time a child updates we need to consider whether it has RTL or not ( consider a textarea for example ).

I'm open to suggestions though =)

*/
const getTextMainDirection = ( text ) => {
let rtlCount = 0;
for ( let i = 0; i < text.length; i++ ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're running a regexp match for every character in the content? So for a 100,000 character post body, we'd run 100,000 regexp matches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.
We can avoid the regex here, but it'll be less readable, i'll create a benchmark to check out if it worth it, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like twitter is running the regexp against the entire text, then checking the length of the returned array. We could do something like that, but cap the text length at, say, 1000 characters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about capping because we likely don't need to check the entire text to get a significant sample, and there's no good reason to build up a matches array whose size matches the entirety of the body content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great suggestion, I've implemented a cap and refactored the function to not use regex.

@yurynix
Copy link
Contributor Author

yurynix commented Nov 29, 2016

This is really cool! How does it impact rendering perf?
For components that don't update much like static texts, imo it doesn't have much overhead, unless you put it on a component with a lot of children ( which shouldn't be done ).

For components that update a lot like textarea it does add some overhead, but improves user experience imo =)

BTW, FB has something similar at their status input field, check it out =)

@yurynix yurynix added [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. and removed [Status] In Progress labels Nov 29, 2016
Copy link
Contributor

@drewblaisdell drewblaisdell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. It worked well in my tests. I had a couple questions and a minor note about function naming.

As far as I can tell, there is a ~20% performance impact when going from a list of posts in the Reader to a full page post. We can probably get that number down quite a bit by decreasing RTL_THRESHOLD (if that doesn't impact UX too much). Someone with more profiling experience in DevTools might do a better job determining the performance of this.

// Strip tags because we're only interested in the text, not markup
// copied from: http://stackoverflow.com/questions/5002111/javascript-how-to-strip-html-tags-from-string#answer-5002161
return props.dangerouslySetInnerHTML.__html
? props.dangerouslySetInnerHTML.__html.replace( /<\/?[^>]+(>|$)/g, '' )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use stripHTML instead?

];

const RTL_THRESHOLD = 0.5;
const MAX_LENGTH_OF_TEXT_TO_EXAMINE = 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding: where did this number come from? It seems like half or even less would be sufficient, and it has a direct impact on performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just what @blowery suggested, but I guess we can get it safely to something like 100, I'll do that now.

const RTL_THRESHOLD = 0.5;
const MAX_LENGTH_OF_TEXT_TO_EXAMINE = 1000;

const isRTLChar = ( char ) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor: in general, the whole word seems to be used instead of char:

wp-calypso (add/auto-direction) ☯ spot -i client/ character | wc -l
     262

@yurynix
Copy link
Contributor Author

yurynix commented Nov 29, 2016

I've measured with MAX_LENGTH_OF_TEXT_TO_EXAMINE = 100 using:

window.YuryTime = 0;
const PerformanceMonitorHOC = ( Component ) => {
		return ( props ) => {
			const t0 = performance.now();
			const result = Component( props );
			const t1 = performance.now();

			window.YuryTime += t1 - t0;
			return result;
		};
};
export default PerformanceMonitorHOC( AutoDirection );

After the stream was loaded:
window.YuryTime === 8.919999999992797

with MAX_LENGTH_OF_TEXT_TO_EXAMINE = 1000, the result was 12.71999999999025

I've measured with the debug version running locally.

@gziolo
Copy link
Member

gziolo commented Nov 29, 2016

Cool hack day idea ;)

@drewblaisdell
Copy link
Contributor

@yurynix Thanks for benchmarking that. :)

If you're cool with the commit I added (feel free to squash it), the code LGTM.

@ranh
Copy link
Contributor

ranh commented Dec 1, 2016

Great work @yurynix! This would be a big win for the Reader. Product LGTM to me, other than one small issue.

It looks like neutral characters are always treated as LTR. This means that an RTL-only text can be identified as LTR even if it contains no LTR characters.

To reproduce, just type a few characters in Hebrew or another RTL script into a comment field, and start adding spaces.

You can probably just ignore any character that doesn't have "strong" directionality.


Regardless of this specific issue, direction switching in the comment field surprised me a bit. When writing a short comment with mixed directionality, the writing direction can switch rapidly back and forth between LTR and RTL as the balance of the mix changes. This can get confusing and disorienting.

For example:

bidi-input-reader-2

bidi-input-reader

It looks like this could be further tweaked/optimized in a later revision. I wonder though how much of this is in fact related to the neutral character issue.

I tried this in a few other places and noticed that it's a common pattern. Twitter and Google+ work pretty much the same way. Since this is relying on a Twitter library I guess that makes sense :)

Facebook seems to rely only on the first character to determine directionality, and it doesn't shift around while you type. However Facebook determines directionality separately for each paragraph, so it's not really comparable.

Don't have any answers here. Not sure I even have a question actually. Personally I'm not a fan of how this approach makes things jump around. But even with this side effect, it's still a huge improvement over the current situation.

@yurynix yurynix added [Status] Needs Author Reply and removed [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. labels Dec 1, 2016
@blowery blowery force-pushed the add/auto-direction branch from 7e7efed to a06f6e0 Compare December 1, 2016 15:51
@blowery
Copy link
Contributor

blowery commented Dec 1, 2016

rebased this to pick up recent changes on master, and added a06f6e0 which makes it deal with cases where there's only one child of AutoDirection and that child has no children.

if ( typeof props.dangerouslySetInnerHTML === 'object' ) {
// Strip tags because we're only interested in the text, not markup
return props.dangerouslySetInnerHTML.__html
? stripHTML( props.dangerouslySetInnerHTML.__html )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm ...minorly worried .. about this. we have some pretty big contents in the reader and copying the entire thing isn't going to be cheap. maybe we truncate it to our max examine * 2 before stripping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion 👍 , I'll take care of it next week

@blowery
Copy link
Contributor

blowery commented Dec 1, 2016

@ranh could you give this another shot? I added support for some basic neutrals (whitespace, 0-9, dash).

@ranh
Copy link
Contributor

ranh commented Dec 4, 2016

@blowery looks much better now. Ignoring spaces solves the biggest problem I think.

But the same issues would apply to all punctuation marks, symbols, emoji, etc.

@yurynix
Copy link
Contributor Author

yurynix commented Dec 4, 2016

I've updated the code to use a different approach, to look at a strong directionality characters according to unicode spec.

I've generated the characters ranges with this code:

const fetch = require( 'node-fetch' );
const MultiRange = require('multi-integer-range').MultiRange;

const UNICODE_DATA_URL = 'http://www.unicode.org/Public/UNIDATA/UnicodeData.txt';
const CATEGORIES_OF_INTEREST = [ 'L', 'R', 'AL' ];
const bidi_categories = {};


fetch( UNICODE_DATA_URL )
    .then( ( res ) => res.text() )
    .then( body => body.split( '\n' ) )
    .then( ( charcterLines ) => charcterLines.map( ( line ) => {
        const lineParts = line.split( ';' );
        return {
            code: parseInt( lineParts[ 0 ], 16 ),
            name: lineParts[ 1 ],
            bidi_category: lineParts[ 4 ]
        };
    } ) )
    .then( ( charcters ) => {
        charcters.forEach( char => {
            if ( ! Array.isArray( bidi_categories[ char.bidi_category ] ) ) {
                bidi_categories[ char.bidi_category ] = [];
            }

            bidi_categories[ char.bidi_category ].push( char.code );
        } );

        Object.keys( bidi_categories )
        .filter( key => CATEGORIES_OF_INTEREST.indexOf( key ) > -1 )
        .forEach( key => {
            const range = new MultiRange( bidi_categories[ key ] );
            const ranges = range.getRanges();
            console.log( key, ':', bidi_categories[ key ].length, ranges.length );
            ranges.forEach( iteratedRange => console.log( `[ 0x${iteratedRange[ 0 ].toString( 16 )}, 0x${iteratedRange[ 1 ].toString( 16 )} ],` ) );
        } );
    } );

@yurynix yurynix added [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. and removed [Status] Needs Author Reply labels Dec 4, 2016
@@ -1428,3 +1428,14 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, I'll remove it.

];

const createLookUpMap = ( charactersRangeArray ) => {
const map = new Map();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just be a Set instead. We don't care about the value.

@drewblaisdell drewblaisdell removed the [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. label Dec 5, 2016
@yurynix yurynix force-pushed the add/auto-direction branch from 9deaf29 to 8f0a228 Compare December 5, 2016 06:31
@yurynix yurynix added [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. and removed [Status] Needs Author Reply labels Dec 5, 2016
@bluefuton bluefuton added this to the Reader Refresh Pre-Launch Cleanup milestone Dec 6, 2016
Yury Michurin and others added 4 commits December 7, 2016 10:36
Added auto direction component to comments

Dont fail on empty html in dangerouslySetInnerHTML

Added AutoDirection to full post view

Added AutoDirection component to reader post card

Added AutoDirection component to post excerpt

Wrote comments for AutoDirection

Refactored RTL check to not use regex

Also added a cap to avoid checking whole text

Switched to availible stripHTML impl

Adjusted length of text to examine

Changed function name to conform with convention
truncate the html content before stripping it. cuts down on the amount of memory needed to run this check with large contents.

add in support for neutral characters in the text

call the isRTL function. Helps that.
Use strong directionality characters to determine direction
according unicode specification:
http://www.unicode.org/reports/tr9/#Bidirectional_Character_Types

Switched Map to Set - value doesn't matter
@blowery blowery force-pushed the add/auto-direction branch from 8f0a228 to 09a0fc2 Compare December 7, 2016 15:36
@blowery
Copy link
Contributor

blowery commented Dec 7, 2016

Squashed the commits by author, going to land this

Copy link
Contributor

@blowery blowery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@blowery blowery added [Status] Ready to Merge and removed [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. labels Dec 7, 2016
@blowery blowery merged commit 8da32e5 into master Dec 7, 2016
@blowery blowery deleted the add/auto-direction branch December 7, 2016 15:41
@yurynix
Copy link
Contributor Author

yurynix commented Dec 7, 2016

thank you @blowery !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Reader The reader site on Calypso.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants