Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NLTK to filter feature place names #1

Open
mikedillion opened this issue Feb 6, 2020 · 4 comments
Open

Use NLTK to filter feature place names #1

mikedillion opened this issue Feb 6, 2020 · 4 comments
Assignees

Comments

@mikedillion
Copy link
Member

Disclaimer: this does take about 5 mins to run through the ___ rows

Just ran through the notebook! Though I'd throw in this suggestion using NLTK to filter the feature place names without mutating the original name and adding in the other words like 'valentine'

Aside: I've been watching your work in maptimelex! Awesome stuff! Keep maptime alive!

from nltk import Text
from nltk.tokenize import word_tokenize

def filter_words(row):
    match_words = ['Love', 'Valentine', 'Heart']
    feature_name_text = Text(word_tokenize(row['FEATURE_NAME']))
    for word in match_words:
        if feature_name_text.count(word) > 0:
            return True

    return False

love_df = data_in[data_in.apply(filter_words, axis=1)]]
@rgdonohue rgdonohue self-assigned this Feb 6, 2020
@rgdonohue
Copy link
Contributor

Yes, thanks Mike! Glad to have you still "here."

@rgdonohue
Copy link
Contributor

Python packages still always a treat after all these years ... 😛

https://stackoverflow.com/questions/30822131/nltk-package-errors-punkt-and-pickle

@mikedillion
Copy link
Member Author

Python packages still always a treat after all these years ... 😛

https://stackoverflow.com/questions/30822131/nltk-package-errors-punkt-and-pickle

Aw, jeez I always forget about that and take it for granted.

@rgdonohue
Copy link
Contributor

Worked fine after installing the punkt model. Returned way fewer results ... Looks like it's not returning places like "Loveland" ...

Also thinking about that altitude attribute of the places ... could channel Steve Winwood "Higher Love" and lose the younger audience completely lol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants