Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search doesn't seem to work well with numeric values #374

Open
wnojopra opened this issue Jun 17, 2020 · 2 comments
Open

Search doesn't seem to work well with numeric values #374

wnojopra opened this issue Jun 17, 2020 · 2 comments

Comments

@wnojopra
Copy link
Contributor

For UKB ITT's synthetic data, we tested indexing each genotype as a field using its RSID as the field name. They looked like rs123, rs1234, rs12345, etc. Searching for 'rs' returns the list of genotypes, but searching for 'rs123' didn't seem to work. Even if I changed the field names to include underscores (I think the analyzer tokenizes with characters like underscore and spaces, but not numbers), it didn't seem to help.

This is also visible in 1000 genomes. Searching for 'chr' and 'vcf' returns values like chr_1_vcf, chr_16_vcf, etc, but searching for 'chr 1' returns nothing.

@melissachang
Copy link
Contributor

chr_1 works with 1000 Genomes.

Is there a UKB demo that demonstrates the issue, since it doesn't appear to be an issue with 1000 Genomes?

@wnojopra
Copy link
Contributor Author

What about chr 1 (a space, not an underscore)? I would expect that to work.

Sorry, I needed to take down the UKB demo explorer. It was fairly expensive to keep up.

But actually, this issue is visible in biobank-explorer. If you click the dropdown, u10004_0_* facets are among the first to populate in the list. But if you do a search for one of these facets, say u10004_0_0, or even u10004, it doesn't show up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants