-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add auto language direction in reader #9664
Conversation
bb54516
to
6740c34
Compare
return null; | ||
}; | ||
|
||
// Copied from: https://github.com/twitter/RTLtextarea/blob/master/src/RTLText.module.js#L46 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really cool! How does it impact rendering perf? |
* @param {Object.children} props must contain some children | ||
* @returns {ReactElement} | ||
*/ | ||
const AutoDirection = ( { children } ) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to implement a meaningful shouldComponentUpdate
for AutoDirection
? Seems like a lot of work to repeat every time a child updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, because every time a child updates we need to consider whether it has RTL or not ( consider a textarea
for example ).
I'm open to suggestions though =)
*/ | ||
const getTextMainDirection = ( text ) => { | ||
let rtlCount = 0; | ||
for ( let i = 0; i < text.length; i++ ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're running a regexp match for every character in the content? So for a 100,000 character post body, we'd run 100,000 regexp matches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
We can avoid the regex here, but it'll be less readable, i'll create a benchmark to check out if it worth it, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like twitter is running the regexp against the entire text, then checking the length of the returned array. We could do something like that, but cap the text length at, say, 1000 characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about capping because we likely don't need to check the entire text to get a significant sample, and there's no good reason to build up a matches array whose size matches the entirety of the body content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great suggestion, I've implemented a cap and refactored the function to not use regex.
For components that update a lot like BTW, FB has something similar at their status input field, check it out =) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. It worked well in my tests. I had a couple questions and a minor note about function naming.
As far as I can tell, there is a ~20% performance impact when going from a list of posts in the Reader to a full page post. We can probably get that number down quite a bit by decreasing RTL_THRESHOLD
(if that doesn't impact UX too much). Someone with more profiling experience in DevTools might do a better job determining the performance of this.
// Strip tags because we're only interested in the text, not markup | ||
// copied from: http://stackoverflow.com/questions/5002111/javascript-how-to-strip-html-tags-from-string#answer-5002161 | ||
return props.dangerouslySetInnerHTML.__html | ||
? props.dangerouslySetInnerHTML.__html.replace( /<\/?[^>]+(>|$)/g, '' ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use stripHTML
instead?
]; | ||
|
||
const RTL_THRESHOLD = 0.5; | ||
const MAX_LENGTH_OF_TEXT_TO_EXAMINE = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own understanding: where did this number come from? It seems like half or even less would be sufficient, and it has a direct impact on performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's just what @blowery suggested, but I guess we can get it safely to something like 100, I'll do that now.
const RTL_THRESHOLD = 0.5; | ||
const MAX_LENGTH_OF_TEXT_TO_EXAMINE = 1000; | ||
|
||
const isRTLChar = ( char ) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super minor: in general, the whole word seems to be used instead of char
:
wp-calypso (add/auto-direction) ☯ spot -i client/ character | wc -l
262
I've measured with window.YuryTime = 0;
const PerformanceMonitorHOC = ( Component ) => {
return ( props ) => {
const t0 = performance.now();
const result = Component( props );
const t1 = performance.now();
window.YuryTime += t1 - t0;
return result;
};
};
export default PerformanceMonitorHOC( AutoDirection ); After the stream was loaded: with I've measured with the debug version running locally. |
Cool hack day idea ;) |
@yurynix Thanks for benchmarking that. :) If you're cool with the commit I added (feel free to squash it), the code LGTM. |
Great work @yurynix! This would be a big win for the Reader. Product LGTM to me, other than one small issue. It looks like neutral characters are always treated as LTR. This means that an RTL-only text can be identified as LTR even if it contains no LTR characters. To reproduce, just type a few characters in Hebrew or another RTL script into a comment field, and start adding spaces. You can probably just ignore any character that doesn't have "strong" directionality. Regardless of this specific issue, direction switching in the comment field surprised me a bit. When writing a short comment with mixed directionality, the writing direction can switch rapidly back and forth between LTR and RTL as the balance of the mix changes. This can get confusing and disorienting. For example: It looks like this could be further tweaked/optimized in a later revision. I wonder though how much of this is in fact related to the neutral character issue. I tried this in a few other places and noticed that it's a common pattern. Twitter and Google+ work pretty much the same way. Since this is relying on a Twitter library I guess that makes sense :) Facebook seems to rely only on the first character to determine directionality, and it doesn't shift around while you type. However Facebook determines directionality separately for each paragraph, so it's not really comparable. Don't have any answers here. Not sure I even have a question actually. Personally I'm not a fan of how this approach makes things jump around. But even with this side effect, it's still a huge improvement over the current situation. |
7e7efed
to
a06f6e0
Compare
rebased this to pick up recent changes on master, and added a06f6e0 which makes it deal with cases where there's only one child of |
if ( typeof props.dangerouslySetInnerHTML === 'object' ) { | ||
// Strip tags because we're only interested in the text, not markup | ||
return props.dangerouslySetInnerHTML.__html | ||
? stripHTML( props.dangerouslySetInnerHTML.__html ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm ...minorly worried .. about this. we have some pretty big contents in the reader and copying the entire thing isn't going to be cheap. maybe we truncate it to our max examine * 2 before stripping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion 👍 , I'll take care of it next week
@ranh could you give this another shot? I added support for some basic neutrals (whitespace, 0-9, dash). |
@blowery looks much better now. Ignoring spaces solves the biggest problem I think. But the same issues would apply to all punctuation marks, symbols, emoji, etc. |
I've updated the code to use a different approach, to look at a strong directionality characters according to unicode spec. I've generated the characters ranges with this code: const fetch = require( 'node-fetch' );
const MultiRange = require('multi-integer-range').MultiRange;
const UNICODE_DATA_URL = 'http://www.unicode.org/Public/UNIDATA/UnicodeData.txt';
const CATEGORIES_OF_INTEREST = [ 'L', 'R', 'AL' ];
const bidi_categories = {};
fetch( UNICODE_DATA_URL )
.then( ( res ) => res.text() )
.then( body => body.split( '\n' ) )
.then( ( charcterLines ) => charcterLines.map( ( line ) => {
const lineParts = line.split( ';' );
return {
code: parseInt( lineParts[ 0 ], 16 ),
name: lineParts[ 1 ],
bidi_category: lineParts[ 4 ]
};
} ) )
.then( ( charcters ) => {
charcters.forEach( char => {
if ( ! Array.isArray( bidi_categories[ char.bidi_category ] ) ) {
bidi_categories[ char.bidi_category ] = [];
}
bidi_categories[ char.bidi_category ].push( char.code );
} );
Object.keys( bidi_categories )
.filter( key => CATEGORIES_OF_INTEREST.indexOf( key ) > -1 )
.forEach( key => {
const range = new MultiRange( bidi_categories[ key ] );
const ranges = range.getRanges();
console.log( key, ':', bidi_categories[ key ].length, ranges.length );
ranges.forEach( iteratedRange => console.log( `[ 0x${iteratedRange[ 0 ].toString( 16 )}, 0x${iteratedRange[ 1 ].toString( 16 )} ],` ) );
} );
} ); |
@@ -1428,3 +1428,14 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | |||
THE SOFTWARE. | |||
``` | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't, I'll remove it.
]; | ||
|
||
const createLookUpMap = ( charactersRangeArray ) => { | ||
const map = new Map(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could just be a Set
instead. We don't care about the value.
9deaf29
to
8f0a228
Compare
Added auto direction component to comments Dont fail on empty html in dangerouslySetInnerHTML Added AutoDirection to full post view Added AutoDirection component to reader post card Added AutoDirection component to post excerpt Wrote comments for AutoDirection Refactored RTL check to not use regex Also added a cap to avoid checking whole text Switched to availible stripHTML impl Adjusted length of text to examine Changed function name to conform with convention
truncate the html content before stripping it. cuts down on the amount of memory needed to run this check with large contents. add in support for neutral characters in the text call the isRTL function. Helps that.
Use strong directionality characters to determine direction according unicode specification: http://www.unicode.org/reports/tr9/#Bidirectional_Character_Types Switched Map to Set - value doesn't matter
8f0a228
to
09a0fc2
Compare
Squashed the commits by author, going to land this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
thank you @blowery ! |
This pull request addresses the issue of people using calypso in LTR interface ( English )
and reading blogs in RTL language ( such as Hebrew )
Testing instructions
git checkout add/auto-direction
and start your server, or open a live branchHome
pageReviews
@Automattic/sdev-feed
/cc @yoavf @bgbg @ranh @danieldanilov