feck

An obscenity detection and filtering library for Erlang, ported from Expletive.

Installation

Add feck to your .app or .app.src file:

{ applications, [ feck ] }

Usage

All feck functions expect a configuration to be passed:

Config = feck:configure( [ { blacklist, [ "very", "bad", "words" ] } ] )

feck:profane( "this is bad!", Config )
%% => true
feck:profane( "perfectly safe", Config )
%% => false

feck:profanities( "this is bad, so BAD!", Config )
%% => [ "bad", "BAD" ]

Sanitization

The library offers a fairly wide variety of profanity replacement strategies which can be defined at configuration time.

feck:sanitize( "this is bad, so BAD!", feck:configure( [ { replacement, garbled } ], Config ) )
%% => "This is $#!@%, so %$@!#!"

feck:sanitize( "this is bad, so BAD!", feck:configure( [ { replacement, stars } ], Config ) )
%% => "This is ***, so ***!"

feck:sanitize( "this is bad, so BAD!", feck:configure( [ { replacement, vowels } ], Config ) )
%% => "This is b*d, so B*D!

feck:sanitize( "this is bad, so BAD!", feck:configure( [ { replacement, ":poop:" } ], Config ) )
%% => "This is :poop:, so :poop:!

feck:sanitize( "this is bad, so BAD!", feck:configure( [ { replacement, { repeat, $- } } ], Config ) )
%% => "This is ---, so ---!

feck:sanitize( "this is bad, so BAD!", feck:configure( [ { replacement, keep_first_letter } ], Config ) )
%% => "This is b**, so B**!

feck:sanitize( "this is bad, so BAD!", feck:configure( [ { replacement, { keep_first_letter, $- } } ], Config ) )
%% => "This is b--, so B--!

Whitelisting

If you wish to allow some words present in the blacklist, you can add exceptions to a whitelist at configuration time:

Config = feck:configure( [ { blacklist, [ "very", "bad", "words" ] }, { whitelist, [ "words" ] } ] )

feck:profane( "words", Config )
%% => false

Built-in blacklists

The library comes with a couple of word lists ready to use that are compiled from priv/dictionary/<name>.txt:

Config = feck:configure( [ { blacklist, english } ] )

feck:profane( "this is batshit crazy!", Config )
%% => true

Config = feck:configure( [ { blacklist, international } ] )

feck:profanities( "ceci n'est pas une pipe", Config )
%% => [ "pipe" ]

MFA word lists

The black and whitelist can also be specified as a { Module, Function, Args } tuple that returns a word list.

Config = feck:configure( [ { blacklist, { string, tokens, [ "very bad words", " " ] } } ] )

feck:profanities( "this is bad, so BAD!", Config )
%% => [ "bad", "BAD" ].

Matching strategy

By default, only exact whole word matches are detected, but matches as substrings of other words can also be found with the { match, any } option:

Config = feck:configure( [ { match, word_boundaries }, { blacklist, [ "very", "bad", "words" ] } ] )
AnyConfig = feck:configure( [ { match, any } ], Config )

feck:profanities( "this is bad!", Config )
%% => [ "bad" ]

feck:profanities( "this is superbadly!", Config )
%% => []

feck:profanities( "this is superbadly!", AnyConfig )
%% => [ "bad" ]

Default config

The config parameter can be omitted from profane/2, profanities/2 & sanitize/2 in order to use the application-level default settings. These settings are read once, the first time the default config is used, after which the compiled config is cached as an environmental variable. The default options can be overridden via your sys.config:

{	feck,
	[
		{ blacklist, english },
		{ whitelist, [] },
		{ replacement, stars },
		{ match, word_boundaries }
	]
}

feck:sanitize( "this is batshit crazy!" )
%% => "this is ******* crazy!"

The default config can also be overwritten at runtime:

NewConfig = feck:configure( [ { replacement, garbled } ], feck:default_config() )
feck:set_default( NewConfig )

feck:sanitize( "this is batshit crazy!" )
%% => "this is $#!@% crazy!"

Known Limitations

Unicode support in Erlang R16 and below

Erlang 17 introduced the ucp option to the re module:

ucp Specifies that Unicode Character Properties should be used when resolving \B, \b, \D, \d, \S, \s, \W and \w. Without this flag, only ISO-Latin-1 properties are used. Using Unicode properties hurts performance, but is semantically correct when working with Unicode characters beyond the ISO-Latin-1 range.

Without this option (R16 and below), when using { match, word_boundaries } finding words will fail for words containing characters with codepoints outside the Latin-1 range.

The length of words being replaced is calculated by length( unicode:characters_to_list( String ) ) which is inconsistent in R16 and below when dealing with characters with codepoints outside the Latin-1 range.

I18n concerns

A couple of replacement strategies (vowels and nonconsonants) are currently limited to the english language.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
priv		priv
src		src
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
rebar.config		rebar.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

feck

Installation

Usage

Sanitization

Whitelisting

Built-in blacklists

MFA word lists

Matching strategy

Default config

Known Limitations

Unicode support in Erlang R16 and below

I18n concerns

About

Releases

Packages

Languages

License

howleysv/feck

Folders and files

Latest commit

History

Repository files navigation

feck

Installation

Usage

Sanitization

Whitelisting

Built-in blacklists

MFA word lists

Matching strategy

Default config

Known Limitations

Unicode support in Erlang R16 and below

I18n concerns

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages