Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance with large numbers of reference sequences by using MST volatiles #2060

Merged
merged 8 commits into from
Jun 22, 2021

Conversation

cmdcolin
Copy link
Collaborator

This is a possible proposal to help speed up large assemblies

This demo has 200,000 contigs

https://s3.amazonaws.com/jbrowse.org/code/jb2/main/index.html?config=test_data%2Fconfig_many_contigs.json

This can hang the page for as much as 10 seconds on a fast machine, production build

After this change the load time is about 3 seconds, with the code that does this coming from refNameAliases instead of assembly.regions now, so probably can be reduced more

This is an early proof of concept PR but note that types.frozens can also be typescripted

Possible help for #1847

@github-actions github-actions bot added the needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal) label Jun 19, 2021
@cmdcolin cmdcolin force-pushed the types_frozen_large_assembly branch from 6f12a2a to d017965 Compare June 19, 2021 20:14
@cmdcolin
Copy link
Collaborator Author

Ref for types.frozens https://mobx-state-tree.js.org/overview/types

@rbuels
Copy link
Contributor

rbuels commented Jun 21, 2021 via email

@cmdcolin
Copy link
Collaborator Author

probably the only downside is "if the regions of the assembly object was mutated, it wouldn't pick it up" but you could "reassign the regions object and it would work fine"

example showing to actually reassigning the frozen is fine
https://codesandbox.io/s/adoring-flower-o1kxf?file=/src/App.js

there are also some places in the code that currently assumes it has to getSnapshot(assembly.regions) to deserialize them, and this would now throw an error saying assembly.regions is not an mst object, but those errors can be fixed

@cmdcolin
Copy link
Collaborator Author

note that we generally don't even reassign an assemblies regions object anyways, but just wanted to show that example

@cmdcolin
Copy link
Collaborator Author

in general, using a types.frozen seems like it can can be a good option for storing "bigdata" objects in the mst tree that aren't really things that we edit

@cmdcolin
Copy link
Collaborator Author

it could possibly be stored in a volatile as an alternative too, as we may not rely on serializing these into snapshots

@rbuels rbuels added enhancement New feature or request and removed needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal) labels Jun 21, 2021
@rbuels rbuels changed the title Use types.frozen to help load large assemblies Improve performance with large numbers of reference sequences by using types.frozen Jun 21, 2021
@rbuels
Copy link
Contributor

rbuels commented Jun 21, 2021

Sounds reasonable to me, any reason it's still marked as draft?

Is there any live editing of the contents of the frozen that we can think of?

@rbuels rbuels requested a review from garrettjstevens June 21, 2021 17:22
@cmdcolin cmdcolin force-pushed the types_frozen_large_assembly branch from d017965 to e20b7fe Compare June 21, 2021 17:46
@cmdcolin
Copy link
Collaborator Author

It needed a couple more fixes to not use getSnapshot for the assembly region objects, but it should be ready probably now

@cmdcolin cmdcolin force-pushed the types_frozen_large_assembly branch from 97df17c to 49094a7 Compare June 21, 2021 18:22
@cmdcolin cmdcolin marked this pull request as ready for review June 21, 2021 18:22
@cmdcolin cmdcolin marked this pull request as draft June 21, 2021 18:34
@codecov
Copy link

codecov bot commented Jun 21, 2021

Codecov Report

Merging #2060 (8ed5e00) into main (f06cc31) will decrease coverage by 0.04%.
The diff coverage is 86.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2060      +/-   ##
==========================================
- Coverage   61.30%   61.26%   -0.05%     
==========================================
  Files         480      480              
  Lines       22981    22956      -25     
  Branches     5271     5266       -5     
==========================================
- Hits        14088    14063      -25     
  Misses       8618     8618              
  Partials      275      275              
Impacted Files Coverage Δ
...inearGenomeView/components/SearchResultsDialog.tsx 43.75% <0.00%> (ø)
packages/core/assemblyManager/assembly.ts 86.66% <80.00%> (+0.09%) ⬆️
...r-view/src/CircularView/components/CircularView.js 90.90% <100.00%> (ø)
plugins/dotplot-view/src/DotplotView/model.ts 62.50% <100.00%> (ø)
...me-view/src/LinearGenomeView/components/Header.tsx 84.00% <100.00%> (ø)
...iew/src/LinearGenomeView/components/ImportForm.tsx 61.33% <100.00%> (ø)
.../linear-genome-view/src/LinearGenomeView/index.tsx 84.27% <100.00%> (+0.22%) ⬆️
...ctor/src/SvInspectorView/models/SvInspectorView.js 69.85% <100.00%> (+0.06%) ⬆️
...nts/src/SNPCoverageRenderer/SNPCoverageRenderer.ts 80.24% <0.00%> (-7.41%) ⬇️
packages/core/util/index.ts 78.30% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f06cc31...8ed5e00. Read the comment docs.

cmdcolin added 2 commits June 21, 2021 15:00
Return type of exported function has or is using name '$ "stateTreeNodeType'"
@cmdcolin cmdcolin force-pushed the types_frozen_large_assembly branch from f7d01ee to 8ed5e00 Compare June 21, 2021 19:16
@cmdcolin
Copy link
Collaborator Author

I changed this to use volatiles instead of types.frozen. It would work with types.frozen still, but I think it is helpful for the "clarity of thought" for the app to use volatiles here. We don't expect or allow serializing the assembly regions so it is not really required to be part of the app model.

@cmdcolin cmdcolin changed the title Improve performance with large numbers of reference sequences by using types.frozen Improve performance with large numbers of reference sequences by using volatiles instead of MST state Jun 21, 2021
@cmdcolin cmdcolin marked this pull request as ready for review June 21, 2021 19:25
@cmdcolin cmdcolin changed the title Improve performance with large numbers of reference sequences by using volatiles instead of MST state Improve performance with large numbers of reference sequences by using volatiles Jun 21, 2021
@cmdcolin cmdcolin closed this Jun 22, 2021
@cmdcolin cmdcolin reopened this Jun 22, 2021
@rbuels rbuels changed the title Improve performance with large numbers of reference sequences by using volatiles Improve performance with large numbers of reference sequences by using MST volatiles Jun 22, 2021
@rbuels rbuels merged commit bca05f0 into main Jun 22, 2021
@rbuels rbuels deleted the types_frozen_large_assembly branch June 22, 2021 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants