Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape Instagram #10

Closed
mattmotoki opened this issue Oct 24, 2018 · 5 comments
Closed

Scrape Instagram #10

mattmotoki opened this issue Oct 24, 2018 · 5 comments
Assignees
Labels
in progress Work is in progress

Comments

@mattmotoki
Copy link
Contributor

mattmotoki commented Oct 24, 2018

Scrape Instagram for images of plants on the following lists:

Upload all data to our shared images directory.

@mattmotoki mattmotoki added the in progress Work is in progress label Oct 24, 2018
@mattmotoki mattmotoki changed the title Obtain training dataset Obtain common plant dataset Oct 24, 2018
@mattmotoki mattmotoki assigned joker2600 and unassigned mattmotoki Oct 24, 2018
@mattmotoki mattmotoki changed the title Obtain common plant dataset Collect common plant dataset Oct 25, 2018
@xyl012
Copy link

xyl012 commented Oct 26, 2018

scraped instagram for ~400 pictures in each hashtag. Includes non-plant pictures. Will go through images depending on how many needed for each class. Running Inception and mobilenet classifiers on noisy data. Out of box and with noisy data, inception v3 65-70% validation accuracy https://github.com/tensorflow/hub/blob/master/examples/image_retraining/retrain.py

https://drive.google.com/drive/folders/1TauCtJxGnEOYLX0_iKVXltuS2hNVNf3d?usp=sharing

@xyl012
Copy link

xyl012 commented Oct 27, 2018

Deleted misclassified images, ~2600 images total in common_v2

@mattmotoki
Copy link
Contributor Author

Nice, the images in common_v2 look a lot cleaner! Can you share the code you are using too (I'm assuming it's a bit more than just a straight application of the link you posted)? Can we apply whatever you are using to the native and invasive lists too?

@xyl012
Copy link

xyl012 commented Oct 27, 2018

I went through all the pictures manually for V2 to make ground truth certain. The pictures are all from mobile/regular life so the settings are variable ranging from selfies to coffee shop commercials. I used this github to scrape posts with a free AWS (fastest internet) original instagram scraper. Only edited to get --maximum 500 posts and command line loop to get 7 hashtags. I'm not sure about the native/invasives, but I'm in the process of testing. The tensorflow retraining uses folders as classes so I don't need to create a csv, but it should be straightforward with ls >. Simple use of the example is here retraining as submodule

@mattmotoki
Copy link
Contributor Author

Wow very nice! Sounds good. Thanks Chris!

@mattmotoki mattmotoki changed the title Collect common plant dataset Scrape Instagram Oct 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in progress Work is in progress
Projects
None yet
Development

No branches or pull requests

3 participants