Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get only those block items which have their path to root #11095

Merged
merged 2 commits into from
Jan 22, 2016

Conversation

Qubad786
Copy link
Contributor

SUST-20 Improved

This PR is redo of edx/edx-platform#10994 which was reverted due to its implication on performance. We can use get_items with include_orphans set to False, this will help us getting only those items which are not orphans. Whenever we are not concerned about getting orphans, just don't pass include_orphans kwarg while calling get_items.

Performance Comparison:

With reverted PR , get_items was taking 386 milliseconds from which total 4 calls to load_tagged_classes were consuming 343 milliseconds as shown in profile
reverted_get_items_tracelog_1
reverted_get_items_tracelog_2

With this PR , get_items is taking only 44 milliseconds when called with or without include_orphans=False as shown in profiles below

improved_get_items

with include_orphans false

@Qubad786 Qubad786 force-pushed the mushtaq/improve_get_item branch from 53da483 to 9a86e57 Compare December 30, 2015 12:40
@@ -1197,7 +1205,11 @@ def _block_matches_all(block_data):
settings['children'] = qualifiers.pop('children')
for block_id, value in course.structure['blocks'].iteritems():
if _block_matches_all(value):
items.append(block_id)
if not include_orphans:
if self.has_path_to_root(block_id, course) or block_id.type in _DETACHED_CATEGORIES:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we checked block_id.type in _DETACHED_CATEGORIES first, we'll get a little performance boost, since if that passed, we wouldn't have to check self.has_path_to_root(block_id, course)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@adampalay
Copy link
Contributor

@Qubad786 , can you please post some indication of the performance improvement between the two PRs on this PR?

@Qubad786 Qubad786 force-pushed the mushtaq/improve_get_item branch from 9a86e57 to 0ff0693 Compare December 31, 2015 15:54
@Qubad786
Copy link
Contributor Author

@adampalay I have updated PR with performance comparison.

course_key = test_course.id

# get detached category list
detached_categories = [name for name, __ in XBlock.load_tagged_classes("detached")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Qubad786 You can import _DETACHED_CATEGORIES as you have done in the split.py

@Qubad786 Qubad786 force-pushed the mushtaq/improve_get_item branch 2 times, most recently from 8c84fbe to 16a724b Compare January 4, 2016 13:53
@@ -83,6 +83,7 @@
from .caching_descriptor_system import CachingDescriptorSystem
from xmodule.modulestore.split_mongo.mongo_connection import MongoConnection, DuplicateKeyError
from xmodule.modulestore.split_mongo import BlockKey, CourseEnvelope
from xmodule.modulestore.mongo.base import _DETACHED_CATEGORIES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels weird, for split to import from mongo.base. Can they both just import from xmodule.modulestore? (You can put DETACHED_CATEGORIES into xmodule/modulestore/__init__.py. I'd also remove the leading underscore, since it will no longer be private.

@adampalay
Copy link
Contributor

@Qubad786 , one minor point, otherwise this looks good. Thanks for the performance comparison, that's awesome

@adampalay
Copy link
Contributor

So to be clear, this PR makes get_items 0.044 - 0.034 = 0.01 seconds slower?

@Qubad786
Copy link
Contributor Author

Qubad786 commented Jan 4, 2016

In reverted PR , get_items was taking 0.386 sec (i.e. 386 ms), and in this PR, get_items is taking only 0.044 sec (i.e. 44 ms) which makes get_items 0.342 sec (i.e. 342 ms) faster. And has_path_to_root is taking only 0.001 sec (i.e. 1 ms) to execute when we call get_items with include_orphans=False.

@adampalay
Copy link
Contributor

@Qubad786 , ah, I see.
So the difference in performance on this PR vs. master is 0.001 secs? How big is the course that you're testing on?

@Qubad786
Copy link
Contributor Author

Qubad786 commented Jan 4, 2016

Yes. It's not that big, It has 2 chapters with 2 subsections each. In each subsection, there are 3 problems, other than that there is 1 orphan problem. But I kept the test course same for both PRs while testing.

@Qubad786
Copy link
Contributor Author

Qubad786 commented Jan 5, 2016

With the course provided yesterday, master's get_items is taking 0.055 sec (i.e. 55 ms) to execute.

get_items from master

On this branch get_items is taking 0.108 sec (i.e. 108 ms). Single call to has_path_to_root is taking 0.015 sec (i.e. 15 ms).

get_items with include_orphans

@adampalay
Copy link
Contributor

@benpatterson , @ormsbee , @tobz

This PR gives one the ability to call get_items with an optional parameter that filters out orphan modules from its results. For large courses, this introduces a performance penalty, since each module, in order to determine if its parent is in the course, needs to iterate over each module in the course. For smaller courses, this isn't a big deal (the first one @Qubad786 tested on only added 1 millisecond). But for a larger course (the one Hassan tested had around 1500 items), the penalty was steeper. Each check to see that a module wasn't an orphan took 15 ms.

My question for you three is: is this performance hit too high?

@ormsbee
Copy link
Contributor

ormsbee commented Jan 5, 2016

@adampalay: I guess that really depends on how many things we're calling get_items in our worst case scenarios. Is getting rid of a lot of orphan related bugs worth a 50ms worst case? Sure. As long as you can tell me that 50ms is as bad as it gets. Are there places (e.g. grading, courses with really long sequences) where the penalty would be significantly higher?

Is there any chance we can just kill the orphans when we're doing the save (so that they never make it into the database in the first place)? Maybe post-processing before we save the structure doc? Or do orphans still need to get exported?

@adampalay
Copy link
Contributor

@ormsbee , I don't know that 50 ms is as bad as it gets — I'd have to see what course has the most amount of items, since this should just be correlated with how many items a course has.

You ask if we can kill orphans we we do a save. Killing orphans turned out to be more complicated than we anticipated (see this document), so we took the tack of just working around them.

Orphans don't get exported, so we have a ticket in to make course imports atomic: PLAT-863. Once that is implemented, you'll be able to remove orphans in a split course by exporting and then importing.

@benpatterson
Copy link
Contributor

I know it doesn't cover the large-course use case you're talking about, but I wanted to capture the bok-choy build results here. The reverted PR added about 70 minutes (of test time...10 mins per shard) to the overall bok-choy run, but this current PR doesn't have an impact.

image

I do think a look at how this impacts the larger courses is worth a cycle.

@Qubad786
Copy link
Contributor Author

Qubad786 commented Jan 5, 2016

@benpatterson this does not have considerable impact because get_items is never used with include_orphans=False kwarg anywhere except in only the test which was presented in this PR.

@ormsbee
Copy link
Contributor

ormsbee commented Jan 5, 2016

@adampalay: Got it. I'm definitely not concerned if it's not being used yet, but if we are switching the LMS over to it, I'd like to see it run on some of the very large MITx courses to gauge impact (e.g. 8.MReVx, 14.74)

@benpatterson
Copy link
Contributor

Yes @Qubad786 I understand that. I just wanted to post additional findings :)

@adampalay
Copy link
Contributor

@ormsbee , for now, we wouldn't switch the whole LMS; it would only be used in a couple of views (grabbing content groups, discussion modules to display on the discussion forum). We'll take a look at the performance impact on MReV too and post results here. So that we're clear, what are the acceptance criteria here?

@ormsbee
Copy link
Contributor

ormsbee commented Jan 6, 2016

@adampalay: spitballing it, I'd say a ~5% degradation in overall server side request time on those pages is a reasonable tradeoff for getting rid of these types of bugs. If it's > 10%, let's talk about this more.

@@ -3,6 +3,9 @@
from collections import namedtuple

import uuid
from xblock.core import XBlock

DETACHED_CATEGORIES = [name for name, __ in XBlock.load_tagged_classes("detached")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I was looking through the documentation on opaque keys today I noticed this: https://opaque-keys.readthedocs.org/en/stable/opaque_keys.edx.html#opaque_keys.edx.locator.BlockUsageLocator.category

Since category is depreciated we may want to change the name of this to DETACHED_XBLOCK_TYPES

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macdiesel great suggestion

@Qubad786 Qubad786 force-pushed the mushtaq/improve_get_item branch from ba3e316 to 0de049a Compare January 8, 2016 19:39
@Qubad786
Copy link
Contributor Author

Following are further performance testing with the courses mentioned earlier:

With 8.MReVx; there were total 3405 items in course.
On master, get_items is taking 269 ms to execute, can be seen in profile below:

master

On this branch, get_items is taking 363 ms from which has_path_to_root is taking 106 ms to execute, can be seen in profile below:

branch

With 14.74; there were total 2090 items in course.
On master, get_items is taking 122 ms to execute, can be seen in profile below:

screen shot 2016-01-11 at 4 49 10 pm

On this branch, get_items is taking 159 ms from which has_path_to_root is taking 48 ms to execute, can be seen in profile below:

branch

@adampalay
Copy link
Contributor

@ormsbee , looks like this is a bit more than 10%. As @Qubad786 points out, we're not planning to implement this in that many places, but I also think we can improve the way we calculate which items are in a course tree. (for example, we could cache this list, and regenerate this cache on each publish event). Or we can reimplement has_path_to_root where we temporarily store if a module has a path to root so we don't have to walk up the course tree for every module.

@ormsbee
Copy link
Contributor

ormsbee commented Jan 13, 2016

It doesn't bother me so much that this particular method call goes up in time by that much. When I was spitballing 10%, I meant the overall server execution time for, say, the course index page. Given that, and the limited places where this is going to be used, I'm fine with taking this hit. Thank you for doing this investigative work.

@adampalay
Copy link
Contributor

@ormsbee , can you give this another review please?

@Qubad786 , just for a sense of performance, can you post how long get_items takes for a large course with and without include_orphans?

@@ -3,6 +3,9 @@
from collections import namedtuple

import uuid
from xblock.core import XBlock

DETACHED_XBLOCK_TYPES = [name for name, __ in XBlock.load_tagged_classes("detached")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ordering doesn't matter, we might as well make this a set, since the primary use case will be checking for inclusion, and we might one day have many more of these.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call

@ormsbee
Copy link
Contributor

ormsbee commented Jan 20, 2016

Done with my pass. Just minor comments.

@Qubad786
Copy link
Contributor Author

Course: 8.MReVx
Total Items: 3402

This time, I have tested get_items on getting all 3402 items of the course.

Following is the profile when get_items is called with the old has_path_to_root when include_orphans=False:

old

Following is the profile when get_items is called with the improved has_path_to_root when include_orphans=False:
new

Following is the profile when get_items is called without include_orphans=False:
master

You can see, has_path_to_root has been much improved with path_cache(in terms of reduced recursive calls) but still get_items is around 5 or 6 times slower than than the master's.
FYI @adampalay

@adampalay
Copy link
Contributor

@ormsbee , what do you think of those results?

@ormsbee
Copy link
Contributor

ormsbee commented Jan 20, 2016

5 or 6x is a lot. If _get_parents_from_structure() is the expensive part, can't we make it faster? Looking at that code, it looks like it's doing a full traversal of the structure relationships to see if that item is in any child block. But we could instead do one pass to build a proper mapping of block_key -> parents, and then do hash lookups from that point on.

@ormsbee
Copy link
Contributor

ormsbee commented Jan 20, 2016

By building the reverse lookup, I mean having a method to build something like this ahead of time:

children_to_parents = defaultdict(list)
for parent_key, value in structure['blocks'].iteritems():
    for child_key in value.fields.get('children', []):
        children_to_parents[child_key].append(parent_key)

So that instead of having 3K lookups that all need to do this:

        return [
            parent_block_key
            for parent_block_key, value in structure['blocks'].iteritems()
            if block_key in value.fields.get('children', [])
        ]

You instead just iterate through it once to build the data structure, and each check is just a dict lookup.

@adampalay
Copy link
Contributor

Yeah, that's a great idea; we hadn't thought of that.

We could also make both caches global caches, where the cache prefix is the version hash of the course structure we're using (since they're immutable). So even if it does cause a performance hit, you only take the hit once.

@adampalay
Copy link
Contributor

(Also, this feels like an interview question: You have a list of nodes that has one root node. Each node has pointers to its children. How do you prune out the nodes that aren't descendents of the root?)

if block_key in value.fields.get('children', [])
]
cache_key = u'structure.{structure_id}'.format(structure_id=structure['_id'])
children_to_parents = cache.get(cache_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't want this cache call here, because going out to the cache is going to be a network operation, and you don't want to do that 3K times. You'd want to build the data structure outside of this method altogether and pass it in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also because you'd be introducing a Django dependency into common/lib. Maybe just try it first without the caching and see where we end up in terms of profiling times? It's possible that the network hop and deserialization wouldn't be worth it when compared to a single traversal of a structure we already have in memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, even our largest structures only have a few thousand items; I can't imagine this taking more than a handful of milliseconds if we do it once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ormsbee thanks for pointing it out, I am going to move cache and mapping to get_items instead. And yeah, I will update the PR with profiles.

@Qubad786
Copy link
Contributor Author

Course: 8.MReVx
Total Items: 3402

Continued testing get_items on getting all 3402 items for the current commit.

Following is the profile when get_items is called with include_orphans=False:
branch

Following is the profile when get_items is called without include_orphans=False:
master

get_items with include_orphans=False turns out to be 0.387 sec slower than master's get_items this time.

@adampalay , @ormsbee please have a look.

if parents_cache is None
else
parents_cache[block_key]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit awkward to read. Please break this out into a more conventional

if parents_cache is None:
    xblock_parents = ...
else:
    ...

@ormsbee
Copy link
Contributor

ormsbee commented Jan 22, 2016

Congratulations! :-) There's probably more that can be done with micro level optimizations, but I think we're pretty close to the point of diminishing returns. Thank you for taking the time to work through this issue and for all the profiling results. If I had a Performance Team appreciation badge 🏆, I'd put one on this PR. :-P

I had a couple of minor comments I'd like to see addressed before merging, but I'm 👍 on the performance aspect of this.

@@ -1195,31 +1201,84 @@ def _block_matches_all(block_data):
# don't expect caller to know that children are in fields
if 'children' in qualifiers:
settings['children'] = qualifiers.pop('children')

# No need of these caches unless include_orphans is set to False
path_cache, parents_cache = None, None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we won't actually even use these variables if include_orphans is set to False. Maybe not even worth setting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just declaration to avoid "variable might be referenced before assignment".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, totally fair. Can you split this out to separate lines though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will split them up.

@adampalay
Copy link
Contributor

@Qubad786 , just a few nits, but otherwise this is looking good :). And thanks @ormsbee :)

@adampalay
Copy link
Contributor

@Qubad786 , this looks great to me! Once you squash your commits, 👍

@Qubad786
Copy link
Contributor Author

@adampalay , @ormsbee may you guys please have a quick look for the last time to see if there is anything missed from being addressed :) Thanks!

@adampalay
Copy link
Contributor

@Qubad786 , looks good to me, just needs a squash (maybe 2 commits — @mushtaqak 's original one and one for your improvements)

@@ -269,7 +268,7 @@ def load_item(self, location, for_parent=None): # pylint: disable=method-hidden
)
if parent_url:
parent = self._convert_reference_to_key(parent_url)
if not parent and category not in _DETACHED_CATEGORIES + ['course']:
if not parent and category not in DETACHED_XBLOCK_TYPES + ['course']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work (adding a list to a set)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, it doesn't. Python tests caught that tho

@ormsbee
Copy link
Contributor

ormsbee commented Jan 22, 2016

Just one comment, otherwise 👍

@Qubad786 Qubad786 force-pushed the mushtaq/improve_get_item branch from fbbed2a to 352e219 Compare January 22, 2016 20:51
@Qubad786
Copy link
Contributor Author

Going to merge.

Qubad786 added a commit that referenced this pull request Jan 22, 2016
Get only those block items which have their path to root
@Qubad786 Qubad786 merged commit 8c26178 into master Jan 22, 2016
@Qubad786 Qubad786 deleted the mushtaq/improve_get_item branch January 22, 2016 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants