From 402e59519884ce8c16600c124d43d524e1b9c600 Mon Sep 17 00:00:00 2001
From: "Win Cowger, PhD" <wincowger@gmail.com>
Date: Sat, 22 Jul 2023 15:20:50 -0700
Subject: [PATCH] Review1 (#181)

* my workflow

* example templates for manuscript.

* manuscript info

* update citations

* cleanup text

* cite

* update readme

* contribute

* add website badge

* update web link

* Update README.md

* Update README.md

* add video link

* auto update

* utility files device specific

* move the pdf generator to workflows

* test error in paper.bib

* test citation style

* add citations

* add a few more citations

* try without space

* add all

* no spaces allowed in citation names

* add demo images

* Update paper.md

* give images some space

* image captions

* add mention of wade

* update authors and acknowledgements

* Updated with Dan's Recs

* Mary Comments

https://github.com/code4sac/trash-ai/pull/69#issuecomment-1257554556

* SteveO comments

https://github.com/code4sac/trash-ai/pull/69#pullrequestreview-1119350215

* Walter's comments

https://github.com/code4sac/trash-ai/pull/69#pullrequestreview-1119351559

* Create config.yml

* Create bug.yml

* Create feature.yml

* hyperlink

* Update bug.yml

* Update feature.yml

* Added in Kristiina and Kris's comments

* Update README.md

* https://github.com/code4sac/trash-ai/pull/69#discussion_r1006489065

* https://github.com/code4sac/trash-ai/pull/69#discussion_r1006514485

* https://github.com/code4sac/trash-ai/pull/69#discussion_r1006517045

* https://github.com/code4sac/trash-ai/pull/69#discussion_r1006519572

* https://github.com/code4sac/trash-ai/pull/69#discussion_r1006536610

* https://github.com/code4sac/trash-ai/pull/69#discussion_r1007668618

* add kris's comments

* add WSL2 link

* Update paper.md

* add acknowledgements

* https://github.com/code4sac/trash-ai/pull/69#discussion_r1033145379

* https://github.com/code4sac/trash-ai/pull/69#discussion_r1033146187

* https://github.com/code4sac/trash-ai/pull/69#discussion_r1024726101

* add submitted badge

* update dois

* Update paper.md

https://github.com/code4sac/trash-ai/issues/122

* Update paper.md

add Elizabeth to ack

* Update paper.md

add funder possibility lab

* Update paper.md

add funder.

* Update README.md

* add r code for analyzing data

* remove unnecessary code.

* trying to fix the unexpected period issue, not sure where it is coming from.

* revert bib to test

* add dois

* update paper acknowledgments and links.

* add comments.

* remove empty line

* Update about.vue

update about

* Update about.vue

add tutorial

* Update README.md

update video

* Update paper.md

update video
---
 README.md                           |  6 +++-
 docs/localdev.md                    |  2 +-
 frontend/src/views/about.vue        |  9 ++++--
 notebooks/data_reader/data_reader.R | 50 +++++++++++++++++++++++++++++
 paper.bib                           | 12 +++----
 paper.md                            | 16 ++++-----
 6 files changed, 76 insertions(+), 19 deletions(-)
 create mode 100644 notebooks/data_reader/data_reader.R

diff --git a/README.md b/README.md
index 6b5ec26..50dfdfe 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,7 @@
 # Trash AI: Web application for serverless image classification of trash
 
 [![Website](https://img.shields.io/badge/Web-TrashAI.org-blue)](https://www.trashai.org)
+[![status](https://joss.theoj.org/papers/6ffbb0f89e6c928dad6908a02639789b/status.svg)](https://joss.theoj.org/papers/6ffbb0f89e6c928dad6908a02639789b)
 
 ### Project Information
 
@@ -14,7 +15,7 @@ Trash AI is a web application where users can upload photos of litter, which wil
 
 #### Demo
 
-[![image](https://user-images.githubusercontent.com/26821843/188515526-33e1196b-6830-4187-8fe4-e68b2bd4019e.png)](https://youtu.be/HHrjUpQynUM)
+[![image](https://user-images.githubusercontent.com/26821843/188515526-33e1196b-6830-4187-8fe4-e68b2bd4019e.png)](https://youtu.be/u0DxGrbPOC0)
 
 ## Deployment
 
@@ -71,6 +72,9 @@ docker rm -v $id
 
 -   Runs the complex stuff so you don't have to.
 
+### Tests
+Instructions for automated and manual tests [here](https://github.com/code4sac/trash-ai/tree/production/frontend/__tests__). 
+
 ## Contribute
 
 We welcome contributions of all kinds.
diff --git a/docs/localdev.md b/docs/localdev.md
index 05d1f5e..d8f669a 100644
--- a/docs/localdev.md
+++ b/docs/localdev.md
@@ -16,12 +16,12 @@ These values can be adjusted by editing the localdev env file [.env](../localdev
 It's suggested you work in branch `local` by creating your own local branch when developing
 Pushing / merging PR's to any branches with a prefix of `aws/` will trigger deployment actions
 For full functionality you will want to get a Google Maps API key and name it VITE_GOOGLE_MAPS_API_KEY, but it is not required
-=======
 
 Pushing / merging PR's to any branches with a prefix of `aws/` will
 trigger deployment actions, when developing locally, create a new branch
 and submit a pull request to `aws/trashai-staging`
 
+
 ---
 # Set up 
 
diff --git a/frontend/src/views/about.vue b/frontend/src/views/about.vue
index 91264f2..226b672 100644
--- a/frontend/src/views/about.vue
+++ b/frontend/src/views/about.vue
@@ -23,6 +23,10 @@
             To get started, visit the Upload Tab or
             <a href="/uploads/0">click here</a>.
         </p>
+        <h2>Tutorial</h2>
+        <p>
+            <iframe width="560" height="315" src="https://www.youtube.com/embed/u0DxGrbPOC0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+        </p>
 
         <h2>What is it?</h2>
         <p>
@@ -54,9 +58,8 @@
         <h2>Disclaimer about uploaded images</h2>
         <p>
             The current version of Trash AI and the model we are using is just a
-            start! When you upload an image, we are storing the image and the
-            classification in an effort to expand the trash dataset and improve
-            the model over time.
+            start! The tool works best for images of individual pieces of trash imaged less than 1 meter away from the camera. 
+            We are looking for collaborators who can help us improve this project.
         </p>
 
         <h2>Reporting issues and improvements</h2>
diff --git a/notebooks/data_reader/data_reader.R b/notebooks/data_reader/data_reader.R
new file mode 100644
index 0000000..a8855ba
--- /dev/null
+++ b/notebooks/data_reader/data_reader.R
@@ -0,0 +1,50 @@
+#Working directory ----
+setwd("notebooks/data_reader") #Change this to your working directory
+
+#Libraries ----
+library(rio)
+library(jsonlite)
+library(ggplot2)
+library(data.table)
+
+# Data import ----
+json_list <- import_list("example_data_download2.zip")
+
+# Get path of the summary table. 
+summary_metadata <- names(json_list)[grepl("summary.json", names(json_list))]
+
+# Get path of the image metadata. 
+image_metadata <- names(json_list)[!grepl("(.jpg)|(.png)|(.tif)|(schema)|(summary)", names(json_list))][-1]
+
+# Filter the summary data. 
+summary_json <- json_list[[summary_metadata]]
+
+# Flatten the summary data. 
+flattened_summary <- data.frame(name = summary_json$detected_objects$name,
+                                     count = summary_json$detected_objects$count)
+# Filter the image data. 
+image_json <- json_list[image_metadata] 
+
+# Flatten the image data. 
+flattened_images <- lapply(1:length(image_json), function(i){
+    print(i)
+    data.frame(hash = image_json[[i]]$hash, 
+               filename = image_json[[i]]$filename, 
+               datetime = if(!is.null(image_json[[i]]$exifdata$DateTimeOriginal)){image_json[[i]]$exifdata$DateTimeOriginal} else{NA}, 
+               latitude = if(!is.null(image_json[[i]]$exifdata$GPSLatitude)){image_json[[i]]$exifdata$GPSLatitude} else{NA}, 
+               longitude = if(!is.null(image_json[[i]]$exifdata$GPSLongitude)){image_json[[i]]$exifdata$GPSLongitude} else{NA}, 
+               score = if(!is.null(image_json[[i]]$metadata$score)){image_json[[i]]$metadata$score} else{NA}, 
+               label = if(!is.null(image_json[[i]]$metadata$label)){image_json[[i]]$metadata$label} else{NA})
+}) |>
+    rbindlist()
+
+# Test equivalence in counts. 
+nrow(flattened_images[!is.na(flattened_images$label),]) == sum(flattened_summary$count)
+
+# Figure creation ----
+ggplot(flattened_summary, aes(y = reorder(name, count), x = count, fill = name)) +
+    geom_bar(stat = "identity") +
+    theme_classic(base_size = 15) +
+    theme(legend.position = "none") +
+    labs(x = "Count", y = "Type")
+    
diff --git a/paper.bib b/paper.bib
index 562d126..36defc9 100644
--- a/paper.bib
+++ b/paper.bib
@@ -88,7 +88,8 @@ @ARTICLE{Hapich:2022
   number   =  1,
   pages    = "15",
   month    =  jun,
-  year     =  2022
+  year     =  2022,
+  doi      =  "10.1186/s43591-022-00035-1"
 }
 
 @misc{Waterboards:2018,
@@ -108,11 +109,10 @@ @article{vanLieshout:2020
     number = {8},
     pages = {e2019EA000960},
     keywords = {plastic pollution, object detection, automated monitoring, deep learning, artificial intelligence, river plastic},
-    doi = {https://doi.org/10.1029/2019EA000960},
+    doi = {10.1029/2019EA000960},
     url = {https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2019EA000960},
     eprint = {https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2019EA000960},
     note = {e2019EA000960 10.1029/2019EA000960},
-    abstract = {Abstract Quantifying plastic pollution on surface water is essential to understand and mitigate the impact of plastic pollution to the environment. Current monitoring methods such as visual counting are labor intensive. This limits the feasibility of scaling to long-term monitoring at multiple locations. We present an automated method for monitoring plastic pollution that overcomes this limitation. Floating macroplastics are detected from images of the water surface using deep learning. We perform an experimental evaluation of our method using images from bridge-mounted cameras at five different river locations across Jakarta, Indonesia. The four main results of the experimental evaluation are as follows. First, we realize a method that obtains a reliable estimate of plastic density (68.7\% precision). Our monitoring method successfully distinguishes plastics from environmental elements, such as water surface reflection and organic waste. Second, when trained on one location, the method generalizes well to new locations with relatively similar conditions without retraining (≈50\% average precision). Third, generalization to new locations with considerably different conditions can be boosted by retraining on only 50 objects of the new location (improving precision from ≈20\% to ≈42\%). Fourth, our method matches visual counting methods and detects ≈35\% more plastics, even more so during periods of plastic transport rates of above 10 items per meter per minute. Taken together, these results demonstrate that our method is a promising way of monitoring plastic pollution. By extending the variety of the data set the monitoring method can be readily applied at a larger scale.},
     year = {2020}
 }
 
@@ -145,7 +145,8 @@ @ARTICLE{Lynch:2018
   number   =  1,
   pages    = "6",
   month    =  jun,
-  year     =  2018
+  year     =  2018,
+  doi      =  "10.1186/s40965-018-0050-y"
 }
 
 
@@ -176,7 +177,7 @@ @article{Majchrowska:2022
 pages = {274-284},
 year = {2022},
 issn = {0956-053X},
-doi = {https://doi.org/10.1016/j.wasman.2021.12.001},
+doi = {10.1016/j.wasman.2021.12.001},
 url = {https://www.sciencedirect.com/science/article/pii/S0956053X21006474},
 author = {Sylwia Majchrowska and Agnieszka Mikołajczyk and Maria Ferlin and Zuzanna Klawikowska and Marta A. Plantykow and Arkadiusz Kwasigroch and Karol Majek},
 keywords = {Object detection, Semi-supervised learning, Waste classification benchmarks, Waste detection benchmarks, Waste localization, Waste recognition},
@@ -193,4 +194,3 @@ @misc{Proença:2020
   year = {2020},
   copyright = {arXiv.org perpetual, non-exclusive license}
 }
-
diff --git a/paper.md b/paper.md
index 57d97f6..da70152 100644
--- a/paper.md
+++ b/paper.md
@@ -52,10 +52,10 @@ bibliography: paper.bib
 Although computer vision classification routines have been created for trash, they have not been accessible to most researchers due to the challenges in deploying the models. Trash AI is a web GUI (Graphical User Interface) for serverless computer vision classification of individual items of trash within images, hosted at www.trashai.org. With a single batch upload and download, a user can automatically describe the types and quantities of trash in all of their images. 
 
 # Statement of need
-Trash in the environment is a widespread problem that is difficult to measure. Policy makers require high quality data on trash to create effective policies. Classical measurement techniques require surveyors with pen and paper to manually quantify every piece of trash at a site. This method is time-consuming. Scientists are actively trying to address this issue by using imaging to better understand the prevalence and distribution of trash in an `efficient yet effective manner` [@Majchrowska:2022; @Proença:2020; @Moore:2020; @vanLieshout:2020; @WADEAI:2020; @Lynch:2018; @Wuu:2018; @Waterboards:2018]. An app-based reporting of trash using cell phones, laptops, and other devices has been a `valuable solution` [@Lynch:2018]. Applications for AI in detecting trash currently include: images from `bridges` [@vanLieshout:2020], `drone imaging` [@Moore:2020], cameras on `street sweepers` [@Waterboards:2018], and cell phone app based reporting of `trash` [@Lynch:2018]. Although there are many artificial intelligence algorithms developed for trash classification, none are readily accessible to the average litter researcher. The primary limitation is that artificial intelligence (AI) algorithms are primarily run through programming languages (not graphic user interfaces), difficult to deploy without AI expertise, and often live on a server (which costs money to host). New developments in browser-side AI (e.g., tensorflow.js) and serverless architecture (e.g., AWS Lambda) have created the opportunity to have affordable browser-side artificial intelligence in a web GUI, alleviating both obstacles. We present Trash AI, an open source service for making computer vision available to anyone with a web browser and images of trash. 
+Trash in the environment is a widespread problem that is difficult to measure. Policy makers require high quality data on trash to create effective policies. Classical measurement techniques require surveyors with pen and paper to manually quantify every piece of trash at a site. This method is time-consuming. Scientists are actively trying to address this issue by using imaging to better understand the prevalence and distribution of trash in an `efficient yet effective manner` [@Majchrowska:2022; @Proença:2020; @Moore:2020; @vanLieshout:2020; @WADEAI:2020; @Lynch:2018; @Wuu:2018; @Waterboards:2018]. Image-based reporting of trash using cell phones, laptops, and other devices has been a `valuable solution` [@Lynch:2018]. Applications for AI in detecting trash using imagery currently include: cameras mounted on `bridges` [@vanLieshout:2020], `drone imaging` [@Moore:2020], cameras on `street sweepers` [@Waterboards:2018], and cell phone app based reporting of `trash` [@Lynch:2018]. Although there are many artificial intelligence algorithms developed for trash classification, none are readily accessible to the average litter researcher. The primary limitation is that artificial intelligence (AI) algorithms are primarily run through programming languages (not graphic user interfaces), difficult to deploy without AI expertise, and often live on a server (which costs money to host). New developments in browser-side AI (e.g., tensorflow.js) and serverless architecture (e.g., AWS Lambda) have created the opportunity to have affordable browser-side artificial intelligence in a web GUI, alleviating both obstacles. We present Trash AI, an open source service for making computer vision available to anyone with a web browser and images of trash. 
 
 # Demo
-We have a full video tutorial on [Youtube](https://youtu.be/HHrjUpQynUM)
+We have a full video tutorial on [Youtube](https://youtu.be/u0DxGrbPOC0)
 
 ## Basic workflow:
 ### 1.  
@@ -72,7 +72,7 @@ We have a full video tutorial on [Youtube](https://youtu.be/HHrjUpQynUM)
 
 ### 4. 
 
-![View results mapped if your images have location stamp.\label{fig:example4}](https://user-images.githubusercontent.com/26821843/188520745-65ef3270-6093-488a-b501-305ecb436bc1.png)
+![View results mapped if the images have location stamp.\label{fig:example4}](https://user-images.githubusercontent.com/26821843/188520745-65ef3270-6093-488a-b501-305ecb436bc1.png)
 
 ### 5. 
 
@@ -93,21 +93,21 @@ We have a full video tutorial on [Youtube](https://youtu.be/HHrjUpQynUM)
 # Method
 
 ## Workflow Overview
-Trash AI is trained on the [TACO dataset](http://tacodataset.org/) using [YOLO 5](pytorch.org). Trash AI stores images in [IndexDB](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API) to keep the data primarily browser side and uses [tensorflow.js](https://www.tensorflow.org/js) to keep analysis browser side too. When images are uploaded to the browser, Trash AI provides the prediction of the model as a graphical output. The raw data from the model and labeled images can be downloaded in a batch download to expedite analyses. Any data uploaded to the platform may be automatically saved to an [S3 bucket](https://aws.amazon.com/s3/), which we can use to improve the model over time.
+Trash AI is trained on the [TACO dataset](http://tacodataset.org/) using [YOLO 5](https://pytorch.org/). Trash AI stores images in [IndexDB](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API) to keep the data primarily browser side and uses [tensorflow.js](https://www.tensorflow.org/js) to keep analysis browser side too. When images are uploaded to the browser, Trash AI provides the prediction of the model as a graphical output. The raw data from the model and labeled images can be downloaded in batch to expedite analyses. 
 
 ## AI Training
 The AI model was developed starting with the TACO dataset, which was available with a complimentary Jupyter Notebook on [Kaggle](https://www.kaggle.com/datasets/kneroma/tacotrashdataset). An example notebook was referenced, which used the default `YOLO v5 model` [@Jocher:2020] as the basic model to begin transfer learning. Next, transfer learning was completed using the entire TACO dataset to import the image classes and annotations in the YOLO v5 model.
 
 ## Limitations
-From our experience, the accuracy of the model varies depending on the quality of the images and their context/background. "Trash" is a word people use for an object that lacks purpose, and the purpose of an object is often not obvious in an image. Trash is a nuanced classification because the same object in different settings will not be considered trash (e.g., a drink bottle on someone's desk vs in the forest laying on the ground). This is the main challenge with any image-based trash detection algorithm. Not everything that LOOKS like trash IS trash. This and other complexities to trash classification make a general trash AI a challenging (yet worthwhile) long-term endeavor. The algorithm is primarily trained on single pieces of trash in the image, with the trash laying on the ground. Thus, model class prediction of trash in these kinds of images will generally be better than trash appearing in aerial images, for example. Additionally, user feedback has shows that the distance of trash from the camera is a critical aspect. The model performs ideally with single pieces of trash in an image less than 1 m away. The model performs less accurately on images when trash which is farther away such as when taken from a vehicle. This is likely due to the training data, TACO dataset, which consists primarily of images of trash close to the camera.
+From our experience, the accuracy of the model varies depending on the quality of the images and their context/background. "Trash" is a word people use for an object that lacks purpose, and the purpose of an object is often not obvious in an image. Trash is a nuanced classification because the same object in different settings will not be considered trash (e.g., a drink bottle on someone's desk vs in the forest lying on the ground). This is the main challenge with any image-based trash detection algorithm. Not everything that LOOKS like trash IS trash. This and other complexities to trash classification make a general trash AI a challenging (yet worthwhile) long-term endeavor. The algorithm is primarily trained on the TACO dataset, which is composed of images of single pieces of trash, with the trash lying on the ground (< 1 m away). Thus, model class prediction of trash in these kinds of images will generally be better than trash appearing in aerial images or imaged from a vehicle, for example.
 
 # Availability
-Trash AI is hosted on the web at www.trashai.org. The source code is [available on github](https://github.com/code4sac/trash-ai) with an [MIT license](https://mit-license.org/). The source code can be run offline on any machine that can install [Docker and Docker-compose](www.docker.com). [Documentation](https://github.com/code4sac/trash-ai#ai-for-litter-detection-web-application) is maintained by Code for Sacramento and Open Fresno on Github and will be updated with each release. [Nonexhaustive instructions for AWS deployment](https://github.com/code4sac/trash-ai/blob/manuscript/docs/git-aws-account-setup.md) is available for anyone attempting production level deployment. The image datasets shared to the tool are in an S3 Bucket that needs to be reviewed before being shared with others due to security and moderation concerns but can be acquired by [contacting the repo maintaniers](https://github.com/code4sac/trash-ai/graphs/contributors). 
+Trash AI is hosted on the web at www.trashai.org. The source code is [available on github](https://github.com/code4sac/trash-ai) with an [MIT license](https://mit-license.org/). The source code can be run offline on any machine that can install [Docker and Docker-compose](www.docker.com). [Documentation](https://github.com/code4sac/trash-ai#trash-ai-web-application-for-serverless-image-classification-of-trash) is maintained by Code for Sacramento and Open Fresno on Github and will be updated with each release. [Nonexhaustive instructions for AWS deployment](https://github.com/code4sac/trash-ai/blob/manuscript/docs/git-aws-account-setup.md) is available for anyone attempting production level deployment.
 
 # Future Goals
-This workflow is likely to be highly useful for a wide variety of computer vision applications and we hope that people reuse the code for applications beyond trash detection. We aim to increase the labeling of images by creating a user interface that allows users to improve the annotations that the model is currently predicting by manually restructuring the bounding boxes and relabeling the classes. We aim to work in collaboration with the TACO development team to improve our workflow integration to get the data that people share to our S3 bucket into the [TACO training dataset](http://tacodataset.org/) and trained model. Future models will expand the annotations to include the `Trash Taxonomy` [@Hapich:2022] classes and add an option to choose between other models besides the current model.
+This workflow is likely to be highly useful for a wide variety of computer vision applications and we hope that people reuse the code for applications beyond trash detection. We aim to increase the labeling of images by creating a user interface that allows users to improve the annotations that the model is currently predicting by manually restructuring the bounding boxes and relabeling the classes. We aim to work in collaboration with the TACO development team to improve our workflow integration to get additional data into the [TACO training dataset](http://tacodataset.org/) by creating an option for users to share their data. Future models will expand the annotations to include the `Trash Taxonomy` [@Hapich:2022] classes and add an option to choose between other models besides the current model. 
 
 # Acknowledgements
-Code for Sacramento and Open Fresno led the development of the software tool. The Moore Institute advised on priorities and led the drafting of this manuscript. Let's Do It Foundation assisted with original products leading up to trash AI in the development of WADE AI. We acknowledge the work of the Code for Sacramento and Open Fresno team, brigades of Code for America, without whom this project would not have been possible, and acknowledge the input of the California Water Monitoring Council Trash Monitoring Workgroup. In particular, we would like to acknowledge Kevin Fries, J.Z. Zhang, Joseph Falkner, Democracy Lab, Brad Anderson, Jim Ewald, Don Brower, and University of Houston. We acknowledge financial support from McPike Zima Charitable Foundation.
+Code for Sacramento and Open Fresno led the development of the software tool. The Moore Institute for Plastic Pollution Research advised on priorities and led the drafting of this manuscript. Let's Do It Foundation assisted with original products leading up to trash AI in the development of WADE AI. We acknowledge the work of the Code for Sacramento and Open Fresno team, brigades of Code for America, without whom this project would not have been possible, and acknowledge the input of the California Water Monitoring Council Trash Monitoring Workgroup. In particular, we would like to acknowledge Gary Conley, Tony Hale, Emin Israfil, Tom Novotny, Margaret McCauley, Julian Fulton, Janna Taing, Elizabeth Pierotti, Kevin Fries, J.Z. Zhang, Joseph Falkner, Democracy Lab, Brad Anderson, Jim Ewald, Don Brower, and University of Houston. We acknowledge financial support from McPike Zima Charitable Foundation, the National Renewable Energy Laboratory, and the Possibility Lab.
 
 # References