Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows update to 5.27.2015 master branch #36

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
41 changes: 41 additions & 0 deletions CMUTweetTaggerWindows.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
__author__ = 'KevinZhao'
# You can use this to run the CMU Tweet NLP package(http://www.ark.cs.cmu.edu/TweetNLP/)
# First, download the package at https://github.com/brendano/ark-tweet-nlp/
# Second, put everything in the project directory where you are running the python script

import subprocess
import codecs
import os
import psutil
import tempfile

def runFile(fileName):
p = subprocess.Popen('java -XX:ParallelGCThreads=2 -Xmx500m -jar ark-tweet-nlp-0.3.2.jar "'+ fileName + '"',stdout=subprocess.PIPE)
file_name = 'tagged_tweets_%s.txt' % os.getpid()
o = codecs.open(file_name,'w','utf-8')
while p.poll() is None:
l = p.stdout.readline()
o.write(l.decode('utf-8'))
o.flush()
o.close()

def runString(s):
file_name = 'temp_file_%s.txt' % os.getpid()
o = codecs.open(file_name,'w','utf-8')
uniS = s.decode('utf-8')
o.write(uniS)
o.close()
l = ''
p = subprocess.Popen('java -XX:ParallelGCThreads=2 -Xmx500m -jar ark-tweet-nlp-0.3.2.jar ' + file_name,stdout=subprocess.PIPE)

while p.poll() is None:
l = p.stdout.readline()
break

p.kill()
psutil.pids()

os.remove(file_name)
#Running one tweet at a time takes much longer time because of restarting the tagger
#we recommend putting all sentences into one file and then tag the whole file, use the runFile method shown above
return l
13 changes: 13 additions & 0 deletions README.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
For this branch only:

If you are on a Windows machine, you should use the CMUTweetTaggerWindows.py file to run the tagger.

It might not work on Mac or Linux, but will work on Windows.

Please contact Kevin Zhao([email protected]) if you encounter any trouble with the Windows wrapper.

You can find the Python wrapper for Mac/Linux OS at https://github.com/ianozsvald/ark-tweet-nlp-python. It's a pretty awesome
program.

For all branches:

CMU ARK Twitter Part-of-Speech Tagger v0.3.2
http://www.ark.cs.cmu.edu/TweetNLP/

Expand Down