An obscenity detection and filtering library for Erlang, ported from Expletive.
Add feck to your .app
or .app.src
file:
{ applications, [ feck ] }
All feck functions expect a configuration to be passed:
Config = feck:configure( [ { blacklist, [ "very", "bad", "words" ] } ] )
feck:profane( "this is bad!", Config )
%% => true
feck:profane( "perfectly safe", Config )
%% => false
feck:profanities( "this is bad, so BAD!", Config )
%% => [ "bad", "BAD" ]
The library offers a fairly wide variety of profanity replacement strategies which can be defined at configuration time.
feck:sanitize( "this is bad, so BAD!", feck:configure( [ { replacement, garbled } ], Config ) )
%% => "This is $#!@%, so %$@!#!"
feck:sanitize( "this is bad, so BAD!", feck:configure( [ { replacement, stars } ], Config ) )
%% => "This is ***, so ***!"
feck:sanitize( "this is bad, so BAD!", feck:configure( [ { replacement, vowels } ], Config ) )
%% => "This is b*d, so B*D!
feck:sanitize( "this is bad, so BAD!", feck:configure( [ { replacement, ":poop:" } ], Config ) )
%% => "This is :poop:, so :poop:!
feck:sanitize( "this is bad, so BAD!", feck:configure( [ { replacement, { repeat, $- } } ], Config ) )
%% => "This is ---, so ---!
feck:sanitize( "this is bad, so BAD!", feck:configure( [ { replacement, keep_first_letter } ], Config ) )
%% => "This is b**, so B**!
feck:sanitize( "this is bad, so BAD!", feck:configure( [ { replacement, { keep_first_letter, $- } } ], Config ) )
%% => "This is b--, so B--!
If you wish to allow some words present in the blacklist, you can add exceptions to a whitelist at configuration time:
Config = feck:configure( [ { blacklist, [ "very", "bad", "words" ] }, { whitelist, [ "words" ] } ] )
feck:profane( "words", Config )
%% => false
The library comes with a couple of word lists ready to use that are compiled from priv/dictionary/<name>.txt
:
Config = feck:configure( [ { blacklist, english } ] )
feck:profane( "this is batshit crazy!", Config )
%% => true
Config = feck:configure( [ { blacklist, international } ] )
feck:profanities( "ceci n'est pas une pipe", Config )
%% => [ "pipe" ]
The black and whitelist can also be specified as a { Module, Function, Args }
tuple that returns a word list.
Config = feck:configure( [ { blacklist, { string, tokens, [ "very bad words", " " ] } } ] )
feck:profanities( "this is bad, so BAD!", Config )
%% => [ "bad", "BAD" ].
By default, only exact whole word matches are detected, but matches as substrings of other words can also be found with the { match, any }
option:
Config = feck:configure( [ { match, word_boundaries }, { blacklist, [ "very", "bad", "words" ] } ] )
AnyConfig = feck:configure( [ { match, any } ], Config )
feck:profanities( "this is bad!", Config )
%% => [ "bad" ]
feck:profanities( "this is superbadly!", Config )
%% => []
feck:profanities( "this is superbadly!", AnyConfig )
%% => [ "bad" ]
The config parameter can be omitted from profane/2
, profanities/2
& sanitize/2
in order to use the application-level default settings. These settings are read once, the first time the default config is used, after which the compiled config is cached as an environmental variable. The default options can be overridden via your sys.config:
{ feck,
[
{ blacklist, english },
{ whitelist, [] },
{ replacement, stars },
{ match, word_boundaries }
]
}
feck:sanitize( "this is batshit crazy!" )
%% => "this is ******* crazy!"
The default config can also be overwritten at runtime:
NewConfig = feck:configure( [ { replacement, garbled } ], feck:default_config() )
feck:set_default( NewConfig )
feck:sanitize( "this is batshit crazy!" )
%% => "this is $#!@% crazy!"
Erlang 17 introduced the ucp
option to the re
module:
ucp Specifies that Unicode Character Properties should be used when resolving \B, \b, \D, \d, \S, \s, \W and \w. Without this flag, only ISO-Latin-1 properties are used. Using Unicode properties hurts performance, but is semantically correct when working with Unicode characters beyond the ISO-Latin-1 range.
Without this option (R16 and below), when using { match, word_boundaries }
finding words will fail for words containing characters with codepoints outside the Latin-1 range.
The length of words being replaced is calculated by length( unicode:characters_to_list( String ) )
which is inconsistent in R16 and below when dealing with characters with codepoints outside the Latin-1 range.
A couple of replacement strategies (vowels
and nonconsonants
) are currently limited to the english language.