Add auto language direction in reader #9664

yurynix · 2016-11-28T14:03:06Z

This pull request addresses the issue of people using calypso in LTR interface ( English )
and reading blogs in RTL language ( such as Hebrew )

Before	After

Testing instructions

Run git checkout add/auto-direction and start your server, or open a live branch
Open the Home page
Visit reader's following stream and assert you see the posts in the correct directionality
Visit reader's full post view and assert you see the post and comments in the correct directionality
Start typing in the comments textarea, it should change directionality according to what you type ( RTL / LTR lang )

Reviews

Code
Product

@Automattic/sdev-feed
/cc @yoavf @bgbg @ranh @danieldanilov

matticbot · 2016-11-28T14:03:07Z

Test live: https://calypso.live/?branch=add/auto-direction

blowery · 2016-11-28T14:18:18Z

client/components/auto-direction/index.jsx

+	return null;
+};
+
+// Copied from: https://github.com/twitter/RTLtextarea/blob/master/src/RTLText.module.js#L46


you'll need to add https://github.com/twitter/RTLtextarea/blob/master/LICENSE to https://github.com/Automattic/wp-calypso/blob/master/CREDITS.md

blowery · 2016-11-28T14:19:16Z

This is really cool! How does it impact rendering perf?

blowery · 2016-11-28T14:21:10Z

client/components/auto-direction/index.jsx

+ * @param {Object.children} props must contain some children
+ * @returns {ReactElement}
+ */
+const AutoDirection = ( { children } ) => {


Is there any way to implement a meaningful shouldComponentUpdate for AutoDirection? Seems like a lot of work to repeat every time a child updates.

I don't think so, because every time a child updates we need to consider whether it has RTL or not ( consider a textarea for example ).

I'm open to suggestions though =)

blowery · 2016-11-28T14:22:59Z

client/components/auto-direction/index.jsx

+ */
+const getTextMainDirection = ( text ) => {
+	let rtlCount = 0;
+	for ( let i = 0; i < text.length; i++ ) {


we're running a regexp match for every character in the content? So for a 100,000 character post body, we'd run 100,000 regexp matches?

Yep.
We can avoid the regex here, but it'll be less readable, i'll create a benchmark to check out if it worth it, thanks!

it looks like twitter is running the regexp against the entire text, then checking the length of the returned array. We could do something like that, but cap the text length at, say, 1000 characters?

I'm thinking about capping because we likely don't need to check the entire text to get a significant sample, and there's no good reason to build up a matches array whose size matches the entirety of the body content.

That's a great suggestion, I've implemented a cap and refactored the function to not use regex.

yurynix · 2016-11-29T08:06:56Z

This is really cool! How does it impact rendering perf?
For components that don't update much like static texts, imo it doesn't have much overhead, unless you put it on a component with a lot of children ( which shouldn't be done ).

For components that update a lot like textarea it does add some overhead, but improves user experience imo =)

BTW, FB has something similar at their status input field, check it out =)

drewblaisdell

This is great. It worked well in my tests. I had a couple questions and a minor note about function naming.

As far as I can tell, there is a ~20% performance impact when going from a list of posts in the Reader to a full page post. We can probably get that number down quite a bit by decreasing RTL_THRESHOLD (if that doesn't impact UX too much). Someone with more profiling experience in DevTools might do a better job determining the performance of this.

drewblaisdell · 2016-11-29T08:58:08Z

client/components/auto-direction/index.jsx

+		// Strip tags because we're only interested in the text, not markup
+		// copied from: http://stackoverflow.com/questions/5002111/javascript-how-to-strip-html-tags-from-string#answer-5002161
+		return props.dangerouslySetInnerHTML.__html
+			? props.dangerouslySetInnerHTML.__html.replace( /<\/?[^>]+(>|$)/g, '' )


Can we use stripHTML instead?

drewblaisdell · 2016-11-29T08:58:24Z

client/components/auto-direction/index.jsx

+];
+
+const RTL_THRESHOLD = 0.5;
+const MAX_LENGTH_OF_TEXT_TO_EXAMINE = 1000;


For my own understanding: where did this number come from? It seems like half or even less would be sufficient, and it has a direct impact on performance.

That's just what @blowery suggested, but I guess we can get it safely to something like 100, I'll do that now.

drewblaisdell · 2016-11-29T09:01:18Z

client/components/auto-direction/index.jsx

+const RTL_THRESHOLD = 0.5;
+const MAX_LENGTH_OF_TEXT_TO_EXAMINE = 1000;
+
+const isRTLChar = ( char ) => {


Super minor: in general, the whole word seems to be used instead of char:

wp-calypso (add/auto-direction) ☯ spot -i client/ character | wc -l 262

yurynix · 2016-11-29T09:47:19Z

I've measured with MAX_LENGTH_OF_TEXT_TO_EXAMINE = 100 using:

window.YuryTime = 0;
const PerformanceMonitorHOC = ( Component ) => {
		return ( props ) => {
			const t0 = performance.now();
			const result = Component( props );
			const t1 = performance.now();

			window.YuryTime += t1 - t0;
			return result;
		};
};
export default PerformanceMonitorHOC( AutoDirection );

After the stream was loaded:
window.YuryTime === 8.919999999992797

with MAX_LENGTH_OF_TEXT_TO_EXAMINE = 1000, the result was 12.71999999999025

I've measured with the debug version running locally.

gziolo · 2016-11-29T12:08:23Z

Cool hack day idea ;)

drewblaisdell · 2016-11-30T03:06:39Z

@yurynix Thanks for benchmarking that. :)

If you're cool with the commit I added (feel free to squash it), the code LGTM.

ranh · 2016-12-01T15:23:31Z

Great work @yurynix! This would be a big win for the Reader. Product LGTM to me, other than one small issue.

It looks like neutral characters are always treated as LTR. This means that an RTL-only text can be identified as LTR even if it contains no LTR characters.

To reproduce, just type a few characters in Hebrew or another RTL script into a comment field, and start adding spaces.

You can probably just ignore any character that doesn't have "strong" directionality.

Regardless of this specific issue, direction switching in the comment field surprised me a bit. When writing a short comment with mixed directionality, the writing direction can switch rapidly back and forth between LTR and RTL as the balance of the mix changes. This can get confusing and disorienting.

For example:

It looks like this could be further tweaked/optimized in a later revision. I wonder though how much of this is in fact related to the neutral character issue.

I tried this in a few other places and noticed that it's a common pattern. Twitter and Google+ work pretty much the same way. Since this is relying on a Twitter library I guess that makes sense :)

Facebook seems to rely only on the first character to determine directionality, and it doesn't shift around while you type. However Facebook determines directionality separately for each paragraph, so it's not really comparable.

Don't have any answers here. Not sure I even have a question actually. Personally I'm not a fan of how this approach makes things jump around. But even with this side effect, it's still a huge improvement over the current situation.

blowery · 2016-12-01T15:52:56Z

rebased this to pick up recent changes on master, and added a06f6e0 which makes it deal with cases where there's only one child of AutoDirection and that child has no children.

blowery · 2016-12-01T15:55:36Z

client/components/auto-direction/index.jsx

+	if ( typeof props.dangerouslySetInnerHTML === 'object' ) {
+		// Strip tags because we're only interested in the text, not markup
+		return props.dangerouslySetInnerHTML.__html
+			? stripHTML( props.dangerouslySetInnerHTML.__html )


i'm ...minorly worried .. about this. we have some pretty big contents in the reader and copying the entire thing isn't going to be cheap. maybe we truncate it to our max examine * 2 before stripping?

Thank you for the suggestion 👍 , I'll take care of it next week

blowery · 2016-12-01T17:35:17Z

@ranh could you give this another shot? I added support for some basic neutrals (whitespace, 0-9, dash).

ranh · 2016-12-04T12:41:57Z

@blowery looks much better now. Ignoring spaces solves the biggest problem I think.

But the same issues would apply to all punctuation marks, symbols, emoji, etc.

yurynix · 2016-12-04T20:21:46Z

I've updated the code to use a different approach, to look at a strong directionality characters according to unicode spec.

I've generated the characters ranges with this code:

const fetch = require( 'node-fetch' );
const MultiRange = require('multi-integer-range').MultiRange;

const UNICODE_DATA_URL = 'http://www.unicode.org/Public/UNIDATA/UnicodeData.txt';
const CATEGORIES_OF_INTEREST = [ 'L', 'R', 'AL' ];
const bidi_categories = {};


fetch( UNICODE_DATA_URL )
    .then( ( res ) => res.text() )
    .then( body => body.split( '\n' ) )
    .then( ( charcterLines ) => charcterLines.map( ( line ) => {
        const lineParts = line.split( ';' );
        return {
            code: parseInt( lineParts[ 0 ], 16 ),
            name: lineParts[ 1 ],
            bidi_category: lineParts[ 4 ]
        };
    } ) )
    .then( ( charcters ) => {
        charcters.forEach( char => {
            if ( ! Array.isArray( bidi_categories[ char.bidi_category ] ) ) {
                bidi_categories[ char.bidi_category ] = [];
            }

            bidi_categories[ char.bidi_category ].push( char.code );
        } );

        Object.keys( bidi_categories )
        .filter( key => CATEGORIES_OF_INTEREST.indexOf( key ) > -1 )
        .forEach( key => {
            const range = new MultiRange( bidi_categories[ key ] );
            const ranges = range.getRanges();
            console.log( key, ':', bidi_categories[ key ].length, ranges.length );
            ranges.forEach( iteratedRange => console.log( `[ 0x${iteratedRange[ 0 ].toString( 16 )}, 0x${iteratedRange[ 1 ].toString( 16 )} ],` ) );
        } );
    } );

blowery · 2016-12-04T20:39:50Z

CREDITS.md

@@ -1428,3 +1428,14 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 THE SOFTWARE.
 ```
+


do we still need this?

We don't, I'll remove it.

blowery · 2016-12-04T20:40:55Z

client/components/auto-direction/direction.js

+];
+
+const createLookUpMap = ( charactersRangeArray ) => {
+	const map = new Map();


Could just be a Set instead. We don't care about the value.

Added auto direction component to comments Dont fail on empty html in dangerouslySetInnerHTML Added AutoDirection to full post view Added AutoDirection component to reader post card Added AutoDirection component to post excerpt Wrote comments for AutoDirection Refactored RTL check to not use regex Also added a cap to avoid checking whole text Switched to availible stripHTML impl Adjusted length of text to examine Changed function name to conform with convention

truncate the html content before stripping it. cuts down on the amount of memory needed to run this check with large contents. add in support for neutral characters in the text call the isRTL function. Helps that.

Use strong directionality characters to determine direction according unicode specification: http://www.unicode.org/reports/tr9/#Bidirectional_Character_Types Switched Map to Set - value doesn't matter

blowery · 2016-12-07T15:36:39Z

Squashed the commits by author, going to land this

blowery

🚢

yurynix · 2016-12-07T15:45:55Z

thank you @blowery !

yurynix added the [Status] In Progress label Nov 28, 2016

yurynix force-pushed the add/auto-direction branch from bb54516 to 6740c34 Compare November 28, 2016 14:04

yurynix added the [Feature] Reader The reader site on Calypso. label Nov 28, 2016

blowery reviewed Nov 28, 2016

View reviewed changes

yurynix added [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. and removed [Status] In Progress labels Nov 29, 2016

drewblaisdell reviewed Nov 29, 2016

View reviewed changes

yurynix added [Status] Needs Author Reply and removed [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. labels Dec 1, 2016

blowery force-pushed the add/auto-direction branch from 7e7efed to a06f6e0 Compare December 1, 2016 15:51

blowery reviewed Dec 1, 2016

View reviewed changes

blowery mentioned this pull request Dec 1, 2016

Reader: handle mixed RTL and LTR posts #268

Closed

yurynix added [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. and removed [Status] Needs Author Reply labels Dec 4, 2016

blowery reviewed Dec 4, 2016

View reviewed changes

drewblaisdell added the [Status] Needs Author Reply label Dec 5, 2016

drewblaisdell removed the [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. label Dec 5, 2016

yurynix force-pushed the add/auto-direction branch from 9deaf29 to 8f0a228 Compare December 5, 2016 06:31

yurynix added [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. and removed [Status] Needs Author Reply labels Dec 5, 2016

bluefuton added this to the Reader Refresh Pre-Launch Cleanup milestone Dec 6, 2016

Yury Michurin and others added 4 commits December 7, 2016 10:36

Reader: Fix stripHTML import

9cdd457

Deal with components that don't have children

d86f127

truncate the html content before stripping it. cuts down on the amount of memory needed to run this check with large contents. add in support for neutral characters in the text call the isRTL function. Helps that.

Use R+AL, L unicode bidi types to determine dir

09a0fc2

Use strong directionality characters to determine direction according unicode specification: http://www.unicode.org/reports/tr9/#Bidirectional_Character_Types Switched Map to Set - value doesn't matter

blowery force-pushed the add/auto-direction branch from 8f0a228 to 09a0fc2 Compare December 7, 2016 15:36

blowery approved these changes Dec 7, 2016

View reviewed changes

blowery added [Status] Ready to Merge and removed [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. labels Dec 7, 2016

blowery merged commit 8da32e5 into master Dec 7, 2016

blowery deleted the add/auto-direction branch December 7, 2016 15:41

blowery removed the [Status] Ready to Merge label Dec 7, 2016

gziolo mentioned this pull request Oct 25, 2017

Add support for RTL scripts WordPress/gutenberg#138

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add auto language direction in reader #9664

Add auto language direction in reader #9664

yurynix commented Nov 28, 2016

matticbot commented Nov 28, 2016

blowery Nov 28, 2016

blowery commented Nov 28, 2016

blowery Nov 28, 2016

yurynix Nov 29, 2016

blowery Nov 28, 2016

yurynix Nov 28, 2016

blowery Nov 28, 2016

blowery Nov 28, 2016

yurynix Nov 29, 2016

yurynix commented Nov 29, 2016

drewblaisdell left a comment •

edited

Loading

drewblaisdell Nov 29, 2016

drewblaisdell Nov 29, 2016

yurynix Nov 29, 2016

drewblaisdell Nov 29, 2016

yurynix commented Nov 29, 2016

gziolo commented Nov 29, 2016

drewblaisdell commented Nov 30, 2016

ranh commented Dec 1, 2016

blowery commented Dec 1, 2016

blowery Dec 1, 2016

yurynix Dec 1, 2016

blowery commented Dec 1, 2016

ranh commented Dec 4, 2016

yurynix commented Dec 4, 2016

blowery Dec 4, 2016

yurynix Dec 5, 2016

blowery Dec 4, 2016

blowery commented Dec 7, 2016

blowery left a comment

yurynix commented Dec 7, 2016

Add auto language direction in reader #9664

Add auto language direction in reader #9664

Conversation

yurynix commented Nov 28, 2016

Testing instructions

Reviews

matticbot commented Nov 28, 2016

Choose a reason for hiding this comment

blowery commented Nov 28, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurynix commented Nov 29, 2016

drewblaisdell left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurynix commented Nov 29, 2016

gziolo commented Nov 29, 2016

drewblaisdell commented Nov 30, 2016

ranh commented Dec 1, 2016

blowery commented Dec 1, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blowery commented Dec 1, 2016

ranh commented Dec 4, 2016

yurynix commented Dec 4, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blowery commented Dec 7, 2016

blowery left a comment

Choose a reason for hiding this comment

yurynix commented Dec 7, 2016

drewblaisdell left a comment •

edited

Loading