Speed up how node names are processed on initialization #462

fedarko · 2020-12-11T20:26:48Z

Just making a record of this here so we don't forget it. (@kwcantrell and @ElDeveloper did all the work of finding this!!!)

Relates to the tree shown in this twitter thread. Basically, when there are a ton of node names, this block of code:

empress/empress/support_files/js/empress.js

Lines 463 to 472 in 1b7a8d3

    
           var nodeNames = this._tree.getAllNames(); 
        
           // Don't include nodes with the name null (i.e. nodes without a 
        
           // specified name in the Newick file) in the auto-complete. 
        
           nodeNames = nodeNames.filter((n) => n !== null); 
        
           // Sort node names case insensitively 
        
           nodeNames.sort(function (a, b) { 
        
               return a.localeCompare(b, "en", { sensitivity: "base" }); 
        
           }); 
        
           nodeNames = _.uniq(nodeNames);

... slows things down a lot, because string comparisons are slow.

There are many possible ways we could handle this (better algorithms / data structures in JS, preprocessing stuff in Python, etc.) -- it's probably most important right now just to implement something that fixes this, since this is a pretty significant bottleneck.

fedarko added the performance label Dec 11, 2020

fedarko mentioned this issue Dec 11, 2020

Clean up arc approximations for big trees with polytomies #463

Open

kwcantrell mentioned this issue Dec 16, 2020

remove sort/binarysearch #466

Merged

kwcantrell linked a pull request Dec 16, 2020 that will close this issue

remove sort/binarysearch #466

Merged

fedarko closed this as completed in #466 Jan 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up how node names are processed on initialization #462

Speed up how node names are processed on initialization #462

fedarko commented Dec 11, 2020

Speed up how node names are processed on initialization #462

Speed up how node names are processed on initialization #462

Comments

fedarko commented Dec 11, 2020