Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few invalid email addresses #2

Open
pdehaan opened this issue Jan 15, 2015 · 1 comment
Open

A few invalid email addresses #2

pdehaan opened this issue Jan 15, 2015 · 1 comment

Comments

@pdehaan
Copy link
Contributor

pdehaan commented Jan 15, 2015

I have special eyes...

" [email protected]" doesn't look like a valid email address. (1149)
" [email protected]" doesn't look like a valid email address. (1154)
" [email protected]" doesn't look like a valid email address. (1156)
" [email protected]" doesn't look like a valid email address. (1162)
" [email protected]" doesn't look like a valid email address. (1168)
" [email protected]" doesn't look like a valid email address. (1171)
" [email protected]" doesn't look like a valid email address. (1179)
" [email protected]" doesn't look like a valid email address. (1310)
"siham–[email protected]" doesn't look like a valid email address. (1454)
"[email protected][email protected]" doesn't look like a valid email address. (1549)
"m" doesn't look like a valid email address. (2301)
"wahid" doesn't look like a valid email address. (2325)

(Where the parenthesis'ed digits are approximate line numbers.)

And here's my magical linting code:

'use strict';

var fs = require('fs');
var path = require('path');

var isEmail = require('isemail');

var spamListDir = path.join(__dirname, 'spam_lists');

fs.readdir(spamListDir, function (err, files) {
  if (err) {
    throw err;
  }
  files.forEach(function (file) {
    var data = fs.readFileSync(path.join(spamListDir, file), 'utf8');
    var emails = data.split('\n');
    console.log('BEFORE: %d', emails.length);
    emails = emails.filter(function (email) {
      return !(/^(#|!|\n)/.test(email));
    });
    console.log('AFTER: %d', emails.length);
    emails.forEach(function (email, idx) {
      if (!isEmail(email)) {
        console.log('"%s" doesn\'t look like a valid email address. (%d)', email, idx);
      }
    });
  });
});

Note: You'll need to do a npm i isemail -D to install the isemail module.

There are a few interesting results:

  1. Some email addresses have leading/trailing whitespace (easy to fix, just use trim())
  2. Some aren't emails at all.
  3. One is missing a line break.

Obviously, all easy to fix locally, but if you're [manually] scraping this from the remote blogspot site, it may be moot (unless you can get it changed upstream).

@pdehaan
Copy link
Contributor Author

pdehaan commented Jan 15, 2015

I handled the email whitespace trim() problems in #3.
Still not sure what to do about the invalid emails in the list. I can fix them, but not sure what your long-term strategy is for keeping your static files and the remote list in sync.

I guess technically you could add the isemail module into package.json and do something like this where you only add the [trimmed] email address to the known-good list if it is actually an email address (and ignore everything that isn't email address-esque):

default:
  email = email.trim();
  if (isEmail(email)) {
    lists.push(email);
  }
  break;

There is still no valid solution for "houria.chaji@[email protected]" since that looks bad in the source link.

I think you may be safer just blocking all emails using a variant of this regex /@hotmail\.com$/i.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant