-
Notifications
You must be signed in to change notification settings - Fork 12
1. Welcome to the NLTweets Project
As mentioned in the wiki Home page, we're building a platform for teams with no great machine learning experience to undertake a natural language processing project for the purpose of doing user research. Our goal is for any team to be able to follow their intuition about what questions they might answer by analyzing Twitter data. In order to build the most usable platform for the most teams, we will be making some assumptions, testing them by interviewing other project teams, and then narrowing the scope of potential workflows. There is an infinite variety of possible ML uses, but our tool will point in a very focused direction and we will be obsessing over removing every friction point for new users so that people can start doing meaningful work straight away. At this point, we think the tool will provide the following features:
- Get the raw data from Twitter and store in a database
- Prompt user to optionally upload files with their own domain-specific dictionaries and lists
- Run the raw data through an assembly line of processes that make it most useful for ML analysis
- Platform for crowd-source data labeling
- Add components to the pipeline for classifying tweets by topic, aggregating statistics on most popular subjects, finding new popular topics, etc
- Maybe API endpoint for feeding project applications with updated results for informational content
In the Getting Started page, you'll find an overview of how the project is organized across Github, Google Drive, and Slack, but here's a breakdown of the technologies we're using to build the platform:
- Python
- Jupyter Notebooks
- Twitter API
- API queries
- Transforming data stores such as from csv to dataframe, or dataframe to database table
- NLP basics and best practices
- Labeling for training data
- Advanced NLP techniques and packages