Linear-progressive text discovery engine in C#. Exposes functionality through simple service APIs. Break plain text into a sequence of slices which can be reconstituted as annotated text. Generate meta-rich tokens from a search expression to then be used to annotate source text matches; noise-word detection, tokenization, and matching options are configurable. Use a common adapter interface with interchangeable DOM libraries (HtmlAgility, AngleSharp, etc.) to do the following: mark search hits in the DOM, create HTML excerpts at a given word count with configurable element-breaking rules, and extract text content with selectively preserved formatting indicators. High degree of extensibility leveraging dependency injection. While regex can be used in advanced configurations, it is not required.
-
Notifications
You must be signed in to change notification settings - Fork 2
Linear-progressive text discovery engine exposing functionality through simple service APIs. Break text into stream of token/non-token slices. Tokens can be annotated with search term matches. Using adapters for popular DOM libraries (HtmlAgilityPack, AngleSharp), you can highlight HTML, break HTML at a word count, and more.
davidwest/TextDiscovery
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Linear-progressive text discovery engine exposing functionality through simple service APIs. Break text into stream of token/non-token slices. Tokens can be annotated with search term matches. Using adapters for popular DOM libraries (HtmlAgilityPack, AngleSharp), you can highlight HTML, break HTML at a word count, and more.
Resources
Stars
Watchers
Forks
Packages 0
No packages published