Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoP: Data Science: SEIE Survey Analysis #26

Closed
12 of 13 tasks
ExperimentsInHonesty opened this issue Feb 26, 2021 · 23 comments
Closed
12 of 13 tasks

CoP: Data Science: SEIE Survey Analysis #26

ExperimentsInHonesty opened this issue Feb 26, 2021 · 23 comments

Comments

@ExperimentsInHonesty
Copy link
Member

ExperimentsInHonesty commented Feb 26, 2021

Overview

We need to report on our progress for our partner at the Department of Neighborhood Empowerment

Action Items

Phase 1--Completed by Sathwik:

  • NLP engineering/analysis on a large survey featuring free text columns.
  • Run TF-IDF analysis
  • create word clouds
  • Create a presentation to show data to DONE and neighborhood council members. Tooling could include scikit-learn, spacy, pandas. First version here

Phase 2--Picked up by Henry (8.2.2021):

  • Create filtering system on google sheets (labels)
  • Revisit preprocess script and apply to dataset (ex. remove stopwords, puncuation, duplicates, etc)
  • Frequency counts on most common unigrams, bigrams and trigrams for One Question
    - [x] On the entire dataset
    - [x] By Region
  • Create clear view of responses containing key phrases
  • Number of council members who mention the "top" topics
  • Present to Julien

Phase 3

  • Expand to the rest of the questionnaire
  • Create dashboard with results of quantitative analysis, showing comparisons of themes topic by region, and comparisons of regions grouped by topic. (Dashboard link)
  • Develop presentation/write-up for final delivery to Julien

Resources/Instructions

Currently Underway:
@henrykaplan

@ExperimentsInHonesty
Copy link
Member Author

@RinkalAgarwal can self assign when she has accepted the github invite.

@ExperimentsInHonesty
Copy link
Member Author

@sathwikkes Please provide process update

  1. Progress
  2. Blockers
  3. Availability
  4. ETA for total completion (if possible), otherwise indicate when you will be delivering the next progress report .

@sathwikkes
Copy link
Member

sathwikkes commented Feb 26, 2021

  1. Progress: loaded survey data and performed basic exploratory data analysis such as bar plots and word clouds, identified word pairings using CountVectorizer
  2. Blockers: need to preprocess and clean text to generate better results with less noise, lowercase all text to stay consistent
  3. Future: standardizing text data, removing irrelevant information, produce keyword/name recognition

screenshots of sample work are provided below

Screen Shot 2021-02-25 at 9 27 30 PM

Screen Shot 2021-02-25 at 9 16 34 PM

@RinkalAgarwal RinkalAgarwal self-assigned this Mar 3, 2021
@ExperimentsInHonesty
Copy link
Member Author

@RinkalAgarwal @sathwikkes
Please provide update

  1. Progress
  2. Blockers
  3. Availability
  4. ETA

@sathwikkes
Copy link
Member

  1. Progress: standardized/normalized text, performed tf-idf vectorizer on a sample column, removed irrelevant content, touched on entity recognition, added more visualization
  2. Blockers/Future: updating word clouds with acquired weights, performing sentiment analysis, documenting results
  3. ETA/Availability: next data science meeting

@sathwikkes
Copy link
Member

  1. Progress: created, finished, and workshopped the presentation
  2. Blockers/Future: waiting on client availability
  3. ETA/Availability: next data science meeting

@snooravi
Copy link

Presented updated to Julien here based on progress made in Collab notebook

@snooravi
Copy link

Affinity Map here

@akhaleghi akhaleghi added epic: missing feature: missing this tags is mutually exclusive with project: missing. Please use the correct label role: missing size: missing and removed feature: missing this tags is mutually exclusive with project: missing. Please use the correct label labels Nov 2, 2021
@ExperimentsInHonesty ExperimentsInHonesty added the project: missing this tags is mutually exclusive with feature: missing. Please use the correct label label Nov 5, 2021
@akhaleghi akhaleghi added role: data science project: seie epic: empowerla.org and removed role: missing project: missing this tags is mutually exclusive with feature: missing. Please use the correct label epic: missing labels Nov 12, 2021
@akhaleghi
Copy link
Contributor

@henrykaplan We're adding "size" labels to issues to help volunteers understand the time commitment to complete the task. Please add this based on your estimate of how many hours this should take. Please see #131

@henrykaplan henrykaplan added size: 8pt Can be done in 31-48 hours and removed size: missing labels Dec 1, 2021
@snooravi
Copy link

snooravi commented Jan 14, 2022

Next steps:

  • Recruit UI designer for Map visual
  • Validation of region mapping to current NC dataset. Compare with output from empower ArcGIS Dash
  • High level (# of responses, # of unique responses, # of respondents, respondents by region, etc)
  • Henry to provide a list of top themes by region
  • Gender and Ethnicity aggregation next steps found here
  • Region breakout for themes
  • Re-run analysis for bugs
  • Start write-up
  • Swap/fix "council" with "neighborhood council" in all plots/figures

@snooravi
Copy link

Today we aligned with Randy Rios to join as the UI Designer to develop the map. He will begin by developing icons for the 10 themes.

@akhaleghi
Copy link
Contributor

Hey @snooravi, this issue hasn't had any updates for a few months, so please provide updates to any of the following that are applicable:

Progress: "What is the current status of your project? What have you completed and what is left to do?"
Blockers: "Difficulties or errors encountered."
Availability: "How much time will you have this week to work on this issue?"
ETA: "When do you expect this issue to be completed?"
Pictures (if necessary): "Add any pictures that will help illustrate what you are working on."

@akhaleghi
Copy link
Contributor

@henrykaplan Is everything completed on this issue and ready to be closed?

@henrykaplan
Copy link
Member

@henrykaplan Is everything completed on this issue and ready to be closed?

@akhaleghi I'm going to be doing one more step on my end, creating a dashboard that shows just the results of the quantitative part of this analysis. I've added that step to the summary above.

@akhaleghi
Copy link
Contributor

@henrykaplan do you have an ETA for the dashboard? It seems like a bigger task so I just want to make sure you don't have any blockers for this.

@henrykaplan
Copy link
Member

@akhaleghi No blockers on this. The idea is that it will just be a display of the work I’ve done so far, so I won’t need anything from anyone else to complete it. Let’s tentatively say June 2nd as the ETA for a first draft.

@akhaleghi
Copy link
Contributor

Hey @henrykaplan can we review this first draft at our next community of practice meeting on June 9th?

@henrykaplan
Copy link
Member

Hey @henrykaplan can we review this first draft at our next community of practice meeting on June 9th?

@akhaleghi Yes, let's plan on that.

@akhaleghi
Copy link
Contributor

@henrykaplan are there any recent updates on this issue?

@henrykaplan
Copy link
Member

A working draft of the dashboard is here: https://datastudio.google.com/u/0/reporting/45c01785-e1ac-447f-8442-d617e0193d7a/page/Y8wrC

I'll be looking at it again and to edit the text and tweak the visauls, but the calculations and analysis are complete.

@akhaleghi
Copy link
Contributor

Hey @henrykaplan, I think there is just one action item left on this issue (developing a presentation). Is that something you wanted to work on or we can assign that to another volunteer?

@henrykaplan
Copy link
Member

Hey @akhaleghi,

Last I heard, the project leads were leaning towards a write-up rather than a presentation as a final deliverable, but were still undecided about what it should look like. I'm planning to step off this project and on to something else, so I created the google data studio dashboard as the final writeup and results for everything I've personally worked on or know closely. That's mainly the natural language analysis side of this project.

My goal was that the dashboard could either be the final deliverable, or, should you want to continue changing or adding to the analysis, a way to hand off the work I've done so far to the next volunteer. So I think the project leads should take a look at the draft dashboard and decide where they want to take it from here. And then I'll do whatever remains to be done to package up the analysis I've done, whether for the client or for another volunteer.

@akhaleghi
Copy link
Contributor

@henrykaplan are there any loose ends that need to be tied up on this issue or is the project complete?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Filled
Development

No branches or pull requests

6 participants