Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document API / Usage #24

Open
jkillian opened this issue Aug 24, 2016 · 15 comments
Open

Document API / Usage #24

jkillian opened this issue Aug 24, 2016 · 15 comments

Comments

@jkillian
Copy link

My apologies if I missed it while scanning over your README, but I didn't notice any sections that document usage of this library or its API. I'm aware the API is similar to https://github.com/atom/fuzzaldrin, but it would be great to have it listed here as well.

@jkillian jkillian changed the title Document Usage / API Document API / Usage Aug 24, 2016
@jeancroy
Copy link
Owner

Thank you for asking this question, indeed this is long needed.
I'll discuss a bit here, then when it's clear i'll post to readme.

Basic usage

fz = require("fuzzaldrin-plus")

Filtering

Filtering is the process if finding valid entries among a list of candidate and sorting them by score, given a query.

  • A candidate is valid if query is a subsequence of it.
    • that is, every character of query is present in the candidate in proper oder. (alternatively it's possible to produce query by only deleting characters from candidate)
  • The score aproximate a meaningfullness of the subsequence.
    • Does it happens together or scathered ?
    • Does it happens at interesting places ? (eg acronym position)

Filtering Array of strings

  • Input: array of string,
  • Ouput: sorted & filtered array of string
fz.filter(candidates, query)

Example:

candidates = [
        'Find And Replace: Select All',
        'Settings View: Uninstall Packages',
        'Settings View: View Installed Themes',
        'Application: Install Update',
        'Install'
      ]

results = fz.filter(candidates, 'install')

Filtering Array of objects

  • Input: array of objects,
  • Ouput: sorted & filtered array of object, score is computed comparing specified key to query
fz.filter(candidates, query, {key:"mykey"})
//filter & sort list of objects by obj.mykey

Scoring

Filtering is provided to provide some out-of-box usefulness, but most of this library is about finding the proper score between a candidate and a query. (Score of 0 meaning entry should be filtered out)

Outside of debugging, generating a score is mostly useful to generate your own filtering algorithm.
For example

  • control iteration on a special data structure
  • control ectraction / computation of the candidate string from the candidate object
  • modify score based on external information (boost to recent files, boost to autocomplete entry near the insertion point)

If you have such a need you can use scoring with the folowing guideline:

  1. Prepare the query
  2. Iterate on each elements
    • Compute canditate string from object
    • Compute the match score
    • Adjust score with external information as needed
    • If score indicate a match include <candidate, score> on an intermediate list
  3. Sort intermediate list by score
  4. Build ouput list from intermediate list
    • keep best items
    • extract candidate from <candidate, score>

Basic scoring

  • Input: string to be scored, query
  • Ouput: Score(double), 0 if non match, positive otherwise.
score = fz.filter(candidate_string, query)

There's no variant that take an object and ask wich key because at this point you can probably do it better.

It is not recommanded to display result of scoring to user. Even if the ordering try to be intuitive, the score by itself is very hard to interpret since it mix together a lot of quality signals, is non linear and sometime jumpy.

Loop scoring (prepQuery)

The basic idea is to precompute some quantity upfront about the query so we do less work on a candidate by candidate basis.

prepared = fz.prepQuery(query)
for(...){
    score = fz.filter(candidate_string, query, prepared)
}

Note: there's the recent addition of a cache on the fz object that store last query and coresponding prepared query.
So, in simple for loop with constant query, this should not be needed anymore.

Matching

To communicate why the algorithm think a result is good or bad, it's often good to highligth matched characters.
The function match return an array of position where candidate_string match query.
(If multiple are possible it return one of the position set that produce the best score)

fz.match(candidate_string, query)

Note fz.match(candidate_string, query, prepQuery) is also available.

See also the demo on how to wrap that ouput with html tag
https://github.com/jeancroy/fuzzaldrin-plus/blob/master/demo/demo.html#L85-L137

Advanced

All of the method (filter, score, match, prepQuery) take an option hash.
Some of those setting are common to all (for example tweak on how to score).
Some setting are specific example keyin filter.

I may return to document those when I have more time, but most user don't need them.

@jkillian
Copy link
Author

jkillian commented Aug 25, 2016

Thanks! This was very helpful. I'm working on a set of TypeScript typings for your library, does the following look correct to you?

// Type definitions for fuzzaldrin-plus
// Project: https://github.com/jeancroy/fuzzaldrin-plus/
// Definitions by: Jason Killian <https://github.com/jkillian>
// Definitions: https://github.com/DefinitelyTyped/DefinitelyTyped

export as namespace fuzzaldrin;

export interface IQueryOptions {
    pathSeparator?: string;
    optCharRegEx?: RegExp;
}

export interface IScoringOptions extends IQueryOptions {
    allowErrors?: boolean;
    isPath?: boolean;
    useExtensionBonus?: boolean;
}

export interface IFilterOptions extends IScoringOptions {
    key?: string;
    maxResults?: number;
}

export type PreparedQuery = { __internalAPIBrand: string; };

export function filter<T>(data: T[], query: string, options?: IFilterOptions): T[];
export function score(str: string, query: string, preparedQuery?: PreparedQuery, options?: IScoringOptions): number;
export function match(str: string, query: string, preparedQuery?: PreparedQuery, options?: IScoringOptions): number[];
export function prepQuery(query: string, options?: IQueryOptions): PreparedQuery;

Note that the export as namespace fuzzaldrin; line denotes that the library is published in a UMD format.

@jeancroy
Copy link
Owner

Curently prepQuery looks more like this
https://github.com/jeancroy/fuzzaldrin-plus/blob/master/src/scorer.coffee#L66-L73

Others signature seems OK.

@jkillian
Copy link
Author

My thinking was that those are private fields that are only meant for use by your library and not by an external user. The way I wrote things above basically only lets users pass a PreparedQuery to score and match but not access its internal data.

Does that seem like the right decision?

@jeancroy
Copy link
Owner

Yes, thank you that look good. I have some plan to add some options I guess when that settle out we'll see how to extend the option hash definition.

@jkillian
Copy link
Author

jkillian commented Aug 25, 2016

Great! See PR here if you're interested

@mdahamiwal
Copy link
Collaborator

mdahamiwal commented Oct 9, 2016

@jkillian, @jeancroy I have updated the TS typing for latest changes. check PR 11865. @jeancroy, Can we also update npm package with a new released version?

@jeancroy
Copy link
Owner

jeancroy commented Oct 9, 2016

Thanks for that, I'll try to keep more stability in the interface for the future.

The reason I've demoted prepared query from it's own argument is that the internal cache was giving just as good performance than explicitly setting a prepared query. So no caring about prepared query allow simpler usage.

@mdahamiwal
Copy link
Collaborator

Yes, that is one thing that should be taken care with every new changes.
I was thinking to get a Nuget package published for this lib to make it available for .net projects or other projects that don't depend on Node.js. Currently we are using a copy of this lib (converted to TS). With Typings and a Nuget package, we can take a package dependency instead of a converted source. @jeancroy, thoughts?

@jeancroy
Copy link
Owner

jeancroy commented Oct 9, 2016

I'm open to maintaining a nugget package. And/or outputting typescript as a distribution format on each build. ( I may actually be due to cut a real release soon )

I'm also not that invested in the current coffescript form.
The package was written for Atom text editor and cofeescript was what they used.
But now they are moving to es6 and have babel in their tool chain I believe, so there might be a natural compromise in between es6 and typescript that is closer to actual usages.

@mdahamiwal
Copy link
Collaborator

mdahamiwal commented Oct 9, 2016

Awesome, moving to ES6 will definitely bring more cohesion with other projects as most of them are evolving in that direction to get more out of box functionality and performance.
So, here is how I think we can maintain a NuGet release:

Maintain a separate release branch:

  • Appveyor config to automate package publishing for releases.
  • Travis CI to ensure the latest release is compatible with DefinitelyTyped typings.

master branch works as dev branch for regular improvements/updates and is merged to release.
What you think? I can contribute in that direction as I get time.

@mdahamiwal
Copy link
Collaborator

Hi @jeancroy, are you ok with the approach? I already have some work in my local repository for this.

@jeancroy
Copy link
Owner

Yes I think this is the right path forward. I've added you as collaborator as I guess we'll needs to setups some things. If you need me to create branches or something, please tell.

@mdahamiwal
Copy link
Collaborator

Hi Jean,

May I know your Email ID? I will add you as owner for nuget package.

Thanks

On Thu, Oct 20, 2016 at 7:08 PM, Jean Christophe Roy <
[email protected]> wrote:

Yes I think this is the right path forward. I've added you as collaborator
as I guess we'll needs to setups some things. If you need me to create
branches or something, please tell.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#24 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALjGMk8vl9qmxtlP7XIexTNH6Dpk3PIHks5q127lgaJpZM4JsR-V
.

@jeancroy
Copy link
Owner

hi I'm registered with nugget as jeancroy, with email [email protected]

Jean Christophe Roy

On Tue, Oct 25, 2016 at 2:44 PM, Manish Dahamiwal [email protected]
wrote:

Hi Jean,

May I know your Email ID? I will add you as owner for nuget package.

Thanks

On Thu, Oct 20, 2016 at 7:08 PM, Jean Christophe Roy <
[email protected]> wrote:

Yes I think this is the right path forward. I've added you as
collaborator
as I guess we'll needs to setups some things. If you need me to create
branches or something, please tell.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#24
issuecomment-255108146>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/
ALjGMk8vl9qmxtlP7XIexTNH6Dpk3PIHks5q127lgaJpZM4JsR-V>
.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#24 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AMLCEopRajjavpgKzi0i3_6_UDqFoB0Nks5q3k3xgaJpZM4JsR-V
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants