Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add all local users to the user_directory and optionally search them #2723

Merged
merged 15 commits into from
Dec 5, 2017

Conversation

ara4n
Copy link
Member

@ara4n ara4n commented Nov 29, 2017

Quick and dirty fix to #2720.

…lts to additional users

Initial commit; this doesn't work yet - the LIKE filtering seems too aggressive.
It also needs _do_initial_spam to be aware of prepopulating the whole user_directory_search table with all users...
...and it needs a handle_user_signup() or something to be added so that new signups get incrementally added to the table too.

Committing it here as a WIP
@ara4n ara4n changed the title WIP to add all local users to the user_directory and optionally search them Add all local users to the user_directory and optionally search them Nov 30, 2017
@ara4n
Copy link
Member Author

ara4n commented Nov 30, 2017

@erikjohnston ptal. i'm particularly baffled on how the FTS match could ever have worked on sqlite - I see no evidence that (foo* | foo) is ever valid as a match term?

@ara4n
Copy link
Member Author

ara4n commented Nov 30, 2017

@erikjohnston one design thought on this: i have a feeling that the include_pattern idea is overkill (as it causes constant filtering based on the LIKE pattern) and it should just be a search_all_local_users flag. wdyt?

@erikjohnston
Copy link
Member

@erikjohnston one design thought on this: i have a feeling that the include_pattern idea is overkill (as it causes constant filtering based on the LIKE pattern) and it should just be a search_all_local_users flag. wdyt?

Yeah, I think having a flag is going to be best here, I think using a LIKE expression may tie our hands if we need to change how the lookup works (if e.g. it turns out to be slow)

profile = yield self.store.get_profileinfo(target_user.localpart)
yield self.user_directory_handler.handle_local_profile_change(
target_user.user_id, profile
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the if and fetching of profile into handle_local_profile_change rather than duplicating it everywhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm, point.

yield self._handle_initial_room(room_id)
num_processed_rooms += 1
yield sleep(self.INITIAL_SLEEP_MS / 1000.)

logger.info("Processed all rooms.")

if self.include_pattern:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I would be tempted to do this all the time and do the checks when we filter. Otherwise we need to keep track of people toggling the option on and off

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i deliberately wasn't doing this (and instead thought the user can blow away the user_dir caches if they change the option), as for bigger servers it's going to make the user_directory and user_directory_search tables enormous, and slow down the FTS for a feature which may not even be being used...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we add users to the table when they are in a room with anyone else, wouldn't the vast majority of users already be in there?

yield self._handle_initial_room(room_id)
num_processed_rooms += 1
yield sleep(self.INITIAL_SLEEP_MS / 1000.)

logger.info("Processed all rooms.")

if self.include_pattern:
num_processed_users = 0
user_ids = yield self.store.get_all_local_users()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to include appservice users here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also worried that pulling out millions of users is going to be painful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And its going to take matrix.org more than a day on the default settings to get through all its users

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, point. so your suggestion is to batch? and yes, we probably should be including AS users.

@erikjohnston
Copy link
Member

Also, you broke the unit tests and flake8 is sad

@erikjohnston erikjohnston assigned ara4n and unassigned erikjohnston Nov 30, 2017
@ara4n
Copy link
Member Author

ara4n commented Dec 4, 2017

@erikjohnston PTAL now.

  • changes the config option to be 'search all users', which certainly feels much more sane
  • fixes tests & pep8
  • speeds up the initial spam (but doesn't batch it; i think for the edge case of a server with millions of users, we can probably still temporarily slurp them all up into RAM)
  • spells out that if you change the option you'll need to flush the search tables

@ara4n ara4n assigned erikjohnston and unassigned ara4n Dec 4, 2017
retcols=("displayname", "avatar_url"),
desc="get_profileinfo",
)
except StoreError, e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should really be using StoreError as e syntax

@ara4n ara4n merged commit 33cb7ef into develop Dec 5, 2017
@richvdh
Copy link
Member

richvdh commented Jan 17, 2018

Quick and dirty fix to #2720.

Just dirty, I'd say :/

@ara4n
Copy link
Member Author

ara4n commented Jan 17, 2018

might need slightly more detailed review than that...

@hawkowl hawkowl deleted the matthew/search-all-local-users branch September 20, 2018 14:02
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants