Skip to content
This repository has been archived by the owner on Jul 26, 2023. It is now read-only.

Search support #230

Closed
wants to merge 18 commits into from
Closed

Search support #230

wants to merge 18 commits into from

Conversation

myl7
Copy link
Collaborator

@myl7 myl7 commented Jan 1, 2022

Close #216

The PR has not been completed and some impl may need discussions IMO

The PR adds search support. For UI it adds a search box with result dropdown on the right of breadcrumb. For utils it adds getStrSimilarity and useClickAwayListener. For API it adds new query string field q in index.ts and handle it like raw. For config it adds 2 options into config/site.json.

The searching is done by OneDrive provided API, or builtin string similarity util, according to options. The default policy ascii-onedrive-else-builtin is to use OneDrive API when all is ASCII, else builtin.

The builtin search relies on traverseFolder of MultiFileDownloader.tsx to get all files in current folder, and compute string similarity via getStrSimilarity of getStrSimilarity.ts, where there is description about the algorithm. The getStrSimilarity should work on all languages including English and Chinese.

What have not been completed in the PR are:

  • It seems that the function to extract path from webUrl to convert ID to path works for me but not for the dev build (the dev build webUrl seems to use ID other than path). Other methods are required. Fixed
  • When using builtin search, it is extremely slow (minutes literally). At least a per-session per-path caching is required. Or maybe we can move traversing to api, if client-to-server is much slower than server-to-OneDrive-server (It should be, though tests are required).
  • Even using OneDrive provided API, search is still slow (about 3s), so a loading icon is required
  • Dark mode support
  • Mobile support

@vercel
Copy link

vercel bot commented Jan 1, 2022

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/spencerwoo/onedrive-vercel-index/CLvL8vbdnsBsED1fXjdvDqG8qZeH
✅ Preview: https://onedrive-vercel-index-git-fork-myl7-search-spencerwoo.vercel.app

@spencerwooo
Copy link
Owner

you can use this for pop up ui: https://headlessui.dev/react/dialog - already present in the dependencies

put the search button in the nav bar instead, putting it beside the breadcrumb is not ideal for mobile layouts.

@myl7
Copy link
Collaborator Author

myl7 commented Jan 2, 2022

IMO dropdown (also shipped in headlessui) is closer to input box to show the relationship between them, and has a consistent experience as other search such as search engines. As for dialog, because it can stop users' reading immediately, it is better to be used to ask users for urgent information like credentials. The search query string is not so urgent, so I used dropdown. However, dialog may fit the search consumption better, as it costs fairly a long time to do a searching even with OneDrive provided API. I would be OK to both solutions, and I thought it better you to make the decision.

pending to wait for decision
whether to move search box to dialog and leave a button in navbar
@spencerwooo
Copy link
Owner

spencerwooo commented Jan 2, 2022

the search in tailwind css documentation site is a pop over, and the search in github documentation site is a slideover, so...

image

oh, and i see you are sending quite a lot of requests when doing a search - basically traversing every subfolder and every file:

image

you know that we have a persistent storage now right ;), we can do some pre-indexing and perform search on the indexed files - just a thought - the idea is basically having a cron job setup every 24 hours that:

  • do a full traverse of the directories inside a onedrive storage
  • get a serialised indexed file/directory listing that's searchable
  • save the serialised index in redis

and on another api endpoint (serverless function route) such as /search?q={query}

  • perform search on the indexed file stored in redis
  • and return a list of files/directories and their respective links

i would think that this is the most performant solution for searching without onedrive's own api and with support for cjk. the only caveat is the indexing may take some time for large onedrive storages, which are not ideal as long running jobs (longer than 5 secs), which may be terminated by vercel.

another thing is that i do plan on refactoring the project to use getStaticProps instead of the current useSWR implementation, which means i am definitely implementing a function for traversing the entire directory for a list of available static paths to generate (getStaticPaths (Static Generation)) sooner or later. but this would only happen during build time, which is not time limited, but vercel doesn't provide any runtime function for long running tasks, so...we need to solve this imo

edit - shoot, i think we can use github actions for this lol!!

@spencerwooo
Copy link
Owner

spencerwooo commented Jan 2, 2022

ok, on the github actions thought:

  • cron job: ✅
  • available for long running command: ✅
  • available to all forked repos: ✅
  • connection to redis: i should think so

so, the idea would be:

  • user: deployment to vercel trigger github actions (or cron job)
  • github actions: fetch a pair of available tokens in redis
  • github actions: do the heavy lifting part - indexing the entire onedrive directory
  • github actions: save the serialised index data back in redis

and so, what we need here:

  • a custom github actions for the heavy lifting work
  • serverless function api for the search query

and voila!

@spencerwooo spencerwooo added the feature request New feature or request label Jan 2, 2022
@myl7
Copy link
Collaborator Author

myl7 commented Jan 3, 2022

GitHub Actions ToS says:

Additionally, regardless of whether an Action is using self-hosted runners, Actions should not be used for:

...

- if using GitHub-hosted runners, any other activity unrelated to the production, testing, deployment, or publication of the software project associated with the repository where GitHub Actions are used.

And last year GitHub did take some repos (even accounts) down, even they are just doing checkin tasks. So I worry that indexing via GitHub Actions is a little dangerous. Not sure.

Anyway indexing via cron job is great, only the runner is a problem.

@spencerwooo spencerwooo mentioned this pull request Jan 16, 2022
4 tasks
@spencerwooo
Copy link
Owner

closing in favor of #283 being merged

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature request New feature or request
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

File search
2 participants