Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search on es #3093

Merged
merged 46 commits into from
May 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
8d3e9f8
es: track search
stereosteve May 13, 2022
51a813c
es: multisearch tracks, users, playlists
stereosteve May 13, 2022
d8bb8a3
fix es search with no user_id present
stereosteve May 13, 2022
e462a00
track search updates almost complete set
isaacsolo May 14, 2022
58b9cc5
more track search fields
isaacsolo May 15, 2022
7697950
search hydrates user from ES, does user + playlist search
stereosteve May 17, 2022
03c278f
search populates followee_reposts and favorites
stereosteve May 17, 2022
aed280f
tracks: index comma separated tags correctly
stereosteve May 18, 2022
c7fa076
bug fix none balance
isaacsolo May 18, 2022
f7d60a2
bug fix the balance indexing
isaacsolo May 18, 2022
e6ca303
fix search query artist to user and add took
isaacsolo May 18, 2022
a970f67
Merge branch 'master' into search-on-es
stereosteve May 19, 2022
7115b88
add env var and fallback
isaacsolo May 19, 2022
37c022b
skip mat view refresh if ES search enabled
stereosteve May 19, 2022
500cb95
default all search
isaacsolo May 19, 2022
d753f1a
add saved entities and albums search, needs refactoring
isaacsolo May 19, 2022
d381ce9
Use ES for autocomplete
stereosteve May 20, 2022
e95d7e2
deep copy query dsl before changing
stereosteve May 20, 2022
6d39fb7
search_as_you_type for autocomplete
stereosteve May 20, 2022
0d6d3af
add integration tests against ES searches and fix some bugs
isaacsolo May 21, 2022
2369a06
fix test fixtures + add index refresh
isaacsolo May 23, 2022
f6c8874
shorten timeout
isaacsolo May 23, 2022
178fc61
script to print test searches
stereosteve May 23, 2022
de21e43
Dedicated suggest field.
stereosteve May 24, 2022
d47f8c4
search scoring tweaks, use autocomplete query everywhere
stereosteve May 24, 2022
8a0c41b
Apply ES search for all APIs
isaacsolo May 24, 2022
3e92a70
es scoring tweaks
stereosteve May 24, 2022
bd927fd
tune scoring + add scores to script
isaacsolo May 25, 2022
2713510
add personalization
isaacsolo May 25, 2022
093e160
unbreak es search
stereosteve May 25, 2022
87bc6fc
fix circular import
stereosteve May 25, 2022
671abb7
refactor search hydrate code
stereosteve May 25, 2022
bed6ea3
filter copycats from search results
stereosteve May 25, 2022
bee8bb4
move script
isaacsolo May 25, 2022
581e5a9
update playlist query
isaacsolo May 25, 2022
8e740bc
cleanup comments
isaacsolo May 25, 2022
623884a
refactor es dsl building code
stereosteve May 25, 2022
a7d927d
add fuzziness
isaacsolo May 25, 2022
6cd2b7e
Add note about current_user_followee_follow_count
stereosteve May 26, 2022
015487c
edit test env
isaacsolo May 26, 2022
2132760
small naming fixes
isaacsolo May 26, 2022
d3cfe6f
lint
isaacsolo May 26, 2022
b0a6642
dedupe user type
isaacsolo May 26, 2022
b060030
Merge branch 'master' into search-on-es
isaacsolo May 26, 2022
1729393
lower + asciifolding for all analyzed fields
stereosteve May 27, 2022
169ebf5
Add some over-fetching on user search for drop_copycats
stereosteve May 27, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion discovery-provider/.test.env
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# TODO: dummy ganache keys for local setup; should wire with dynamically generated keys
audius_delegate_owner_wallet=0x1D9c77BcfBfa66D37390BF2335f0140979a6122B
audius_delegate_private_key=0x3873ed01bfb13621f9301487cc61326580614a5b99f3c33cf39c6f9da3a19cad
audius_solana_rewards_manager_account=8MzNUaBHskteN7poTrZG5wgSNSbXQwieMDB4wk9fgB7f
audius_solana_rewards_manager_account=8MzNUaBHskteN7poTrZG5wgSNSbXQwieMDB4wk9fgB7f

audius_elasticsearch_url=http://localhost:9200
audius_db_url=postgresql+psycopg2://postgres:postgres@localhost:5432/test_audius_discovery
3 changes: 3 additions & 0 deletions discovery-provider/compose/docker-compose.backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ services:
- audius_redis_url=redis://${COMPOSE_PROJECT_NAME}_redis-server_1:6379/00
- audius_db_url=postgresql+psycopg2://postgres:postgres@${COMPOSE_PROJECT_NAME}_discovery-provider-db_1:5432/audius_discovery
- audius_db_url_read_replica=postgresql+psycopg2://postgres:postgres@${COMPOSE_PROJECT_NAME}_discovery-provider-db_1:5432/audius_discovery
- audius_elasticsearch_url=http://elasticsearch:9200
- audius_elasticsearch_run_indexer=true
- audius_elasticsearch_search_enabled=true
- audius_delegate_owner_wallet=${audius_delegate_owner_wallet}
- audius_delegate_private_key=${audius_delegate_private_key}
- audius_ipfs_host=${COMPOSE_PROJECT_NAME}-ipfs-node
Expand Down
19 changes: 17 additions & 2 deletions discovery-provider/es-indexer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,22 @@ If you are adding a new denormalization (attaching data from a related model), t
- For "catchup" mode this is the `checkpointSql` function. See UserIndexer or TrackIndexer for an example
- For listen / notify mode, this is the handler code in `listen.ts`

When working on mapping changes, I might put code like this at top of `main.ts main()` function:

```ts
await new Promise((r) => setTimeout(r, 100)) // don't ask... will fix haha
await indexer.playlists.createIndex({ drop: true })
await indexer.playlists.catchup()
process.exit(0)
```

and then:

```
source .env
npm run dev
```

## How it works

Program attempts to avoid any gaps by doing a "catchup" on boot... when complete it swithces to processing "batches" which are events collected from postgres LISTEN / NOTIFY.
Expand All @@ -43,8 +59,7 @@ Check "elasticsearch" health info in `/health_check?verbose=true` endpoint.
(instructions for sandbox3... subject to change):

Use Kibana:
Uncomment the kibana container and restart discovery-provider.

Uncomment the kibana container and restart discovery-provider.

List indices:

Expand Down
2 changes: 1 addition & 1 deletion discovery-provider/es-indexer/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"dev": "ts-node src/main.ts",
"nuke": "ts-node nuke.ts",
"start": "tsc && pm2-runtime build/src/main.js --restart-delay=3000",
"test": "echo \"Error: no test specified\" && exit 1"
"test": "tsc --noEmit"
},
"keywords": [],
"author": "",
Expand Down
8 changes: 8 additions & 0 deletions discovery-provider/es-indexer/src/conn.ts
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,14 @@ export async function waitForHealthyCluster() {
)
}

export async function ensureSaneCluterSettings() {
return dialEs().cluster.putSettings({
persistent: {
'action.auto_create_index': false,
},
})
}

/**
* Gets the max(blocknumber) from elasticsearch indexes
* Used for incremental indexing to understand "where we were" so we can load new data from postgres
Expand Down
6 changes: 3 additions & 3 deletions discovery-provider/es-indexer/src/indexNames.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
export const indexNames = {
playlists: 'playlists2',
playlists: 'playlists6',
reposts: 'reposts2',
saves: 'saves2',
tracks: 'tracks2',
users: 'users2',
tracks: 'tracks6',
users: 'users6',
}
10 changes: 8 additions & 2 deletions discovery-provider/es-indexer/src/indexers/BaseIndexer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,12 @@ export abstract class BaseIndexer<RowType> {
}
}

async refreshIndex() {
const { es, logger, indexName } = this
logger.info('refreshing index: ' + indexName)
await es.indices.refresh({ index: indexName })
Comment on lines +49 to +50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this take a while?
If already logging, might be nice to record the time it takes as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a few ms each index so probably ok without logging. it was added here since we extended the refresh_interval to 5s. an edge case would be when the indexer first catches up we want to refresh before making it available otherwise there could be no results for a few seconds. it's also necessary for test cases since we don't want to wait 5s.

}

async cutoverAlias() {
const { es, logger, indexName, tableName } = this

Expand Down Expand Up @@ -173,7 +179,7 @@ export abstract class BaseIndexer<RowType> {
])
}

async withBatch(rows: Array<RowType>) {}
async withBatch(rows: Array<RowType>) { }

withRow(row: RowType) {}
withRow(row: RowType) { }
}
62 changes: 56 additions & 6 deletions discovery-provider/es-indexer/src/indexers/PlaylistIndexer.ts
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
import { IndicesCreateRequest } from '@elastic/elasticsearch/lib/api/types'
import { keyBy } from 'lodash'
import { keyBy, merge } from 'lodash'
import { dialPg } from '../conn'
import { indexNames } from '../indexNames'
import { BlocknumberCheckpoint } from '../types/blocknumber_checkpoint'
import { PlaylistDoc } from '../types/docs'
import { BaseIndexer } from './BaseIndexer'
import {
sharedIndexSettings,
standardSuggest,
standardText,
} from './sharedIndexSettings'

export class PlaylistIndexer extends BaseIndexer<PlaylistDoc> {
tableName = 'playlists'
Expand All @@ -13,13 +18,13 @@ export class PlaylistIndexer extends BaseIndexer<PlaylistDoc> {

mapping: IndicesCreateRequest = {
index: indexNames.playlists,
settings: {
settings: merge(sharedIndexSettings, {
index: {
number_of_shards: 1,
number_of_replicas: 0,
refresh_interval: '5s',
},
},
}),
mappings: {
dynamic: false,
properties: {
Expand All @@ -30,9 +35,37 @@ export class PlaylistIndexer extends BaseIndexer<PlaylistDoc> {
is_album: { type: 'boolean' },
is_private: { type: 'boolean' },
is_delete: { type: 'boolean' },
playlist_name: { type: 'text' },
suggest: standardSuggest,
playlist_name: {
type: 'keyword',
fields: {
searchable: standardText,
},
},
'playlist_contents.track_ids.track': { type: 'keyword' },

user: {
properties: {
handle: {
type: 'keyword',
fields: {
searchable: standardText,
},
},
name: {
type: 'keyword',
fields: {
searchable: standardText,
},
},
location: { type: 'keyword' },
follower_count: { type: 'integer' },
is_verified: { type: 'boolean' },
created_at: { type: 'date' },
updated_at: { type: 'date' },
},
},

// saves
saved_by: { type: 'keyword' },
save_count: { type: 'integer' },
Expand All @@ -58,7 +91,17 @@ export class PlaylistIndexer extends BaseIndexer<PlaylistDoc> {
return `
-- etl playlists
select
*,
playlists.*,

json_build_object(
'handle', users.handle,
'name', users.name,
'location', users.location,
'follower_count', follower_count,
'is_verified', users.is_verified,
'created_at', users.created_at,
'updated_at', users.updated_at
) as user,

array(
select user_id
Expand All @@ -83,7 +126,11 @@ export class PlaylistIndexer extends BaseIndexer<PlaylistDoc> {
) as saved_by

from playlists
where is_current = true
join users on playlist_owner_id = user_id
left join aggregate_user on users.user_id = aggregate_user.user_id
where
playlists.is_current
AND users.is_current
`
}

Expand Down Expand Up @@ -132,6 +179,9 @@ export class PlaylistIndexer extends BaseIndexer<PlaylistDoc> {
}

withRow(row: PlaylistDoc) {
row.suggest = [row.playlist_name, row.user.handle, row.user.name]
.filter((x) => x)
isaacsolo marked this conversation as resolved.
Show resolved Hide resolved
.join(' ')
row.repost_count = row.reposted_by.length
row.save_count = row.saved_by.length
}
Expand Down
Loading