Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revision #47

Merged
merged 24 commits into from
Dec 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
aecc093
abstract: mention limitations
miltondp Dec 15, 2023
1044899
introduction: focus more on human-centric approach, add unit testing …
miltondp Dec 16, 2023
b669c3f
introduction: mention the tool's name, Manubot AI Editor
miltondp Dec 16, 2023
4ff28f6
methods: explain human-centric approach and the tool's components
miltondp Dec 16, 2023
62f8103
methods: update some details and URLs
miltondp Dec 16, 2023
fec7b63
methods: mention custom prompts in figure caption
miltondp Dec 18, 2023
f0a2945
results: improve evaluation setup, add unit testing and human assessm…
miltondp Dec 18, 2023
691f36e
results: add sentence in figure captions where single words are not h…
miltondp Dec 18, 2023
92adcf1
results: revies abstract and introduction sections
miltondp Dec 19, 2023
4bce420
conclusions: revise
miltondp Dec 19, 2023
55263de
results: minor change
miltondp Dec 19, 2023
2fd3779
conclusions: fix about open models
miltondp Dec 19, 2023
1a0925f
introduction: attempt to frame it in the context of medical informatics
miltondp Dec 19, 2023
96cf740
freeze references to other articles
miltondp Dec 19, 2023
0745702
results: wording improvements
miltondp Dec 19, 2023
3bff253
abstract: improve text and rewrite to fit 250 words limit
miltondp Dec 19, 2023
6627127
metadata: update Milton's
miltondp Dec 19, 2023
47be183
latex: update manuscript.tex
miltondp Dec 19, 2023
4193df1
latex: add instructions to compile latex and generate diffs
miltondp Dec 20, 2023
468a962
latex: minor fixes in manuscript.tex
miltondp Dec 20, 2023
03105a5
latex/README.md: update instructions
miltondp Dec 20, 2023
16a98a2
latex: add diff tex
miltondp Dec 20, 2023
6c64430
latex/README.md: update instructions
miltondp Dec 20, 2023
87e701a
Update content/02.introduction.md
miltondp Dec 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions content/01.abstract.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
## Abstract {.page_break_before}

In this work, we investigate how models with advanced natural language processing capabilities can be used to reduce the time-consuming process of writing and revising scholarly manuscripts.
To this end, we integrate large language models into the Manubot publishing ecosystem to suggest revisions for scholarly text.
Our AI-based revision workflow uses a prompt generator that integrates metadata from the manuscript into prompt templates to generate section-specific instructions for the language model.
Then, the model generates a revised version of each paragraph that the human author can review.
We tested our AI-based revision workflow in three case studies of existing manuscripts, including the present one.
Our results suggest that these models can capture the concepts in the scholarly text and produce high-quality revisions that improve clarity.
All changes to the manuscript are tracked using a version control system, providing transparency into the human or machine origin of text.
Given the amount of time that researchers put into crafting prose, incorporating large language models into the scholarly writing process can significantly improve the type of knowledge work performed by academics.
It can also help scholars to focus on the most important aspects of their work, such as the novelty of their ideas, and automate the most tedious parts such as adhering to a certain writing style.
Although the use of AI-assisted tools for scientific authoring is controversial, our work focuses on revising text written by humans and provides transparency into the origin of text by tracking changes, which can alleviate concerns about the use of AI in scientific authoring.
In this work, we investigate the use of advanced natural language processing models to reduce the time-consuming process of writing and revising scholarly manuscripts.
For this purpose, we integrate large language models into the Manubot publishing ecosystem to suggest revisions for scholarly texts.
Our AI-based revision workflow employs a prompt generator that incorporates manuscript metadata into templates, generating section-specific instructions for the language model.
The model then generates revised versions of each paragraph for human authors to review.
We evaluated this methodology through three case studies of existing manuscripts, including the revision of this manuscript.
Our results indicate that these models, despite some limitations, can grasp complex academic concepts and enhance text quality.
All changes to the manuscript are tracked using a version control system, ensuring transparency in distinguishing between human- and machine-generated text.
Given the significant time researchers invest in crafting prose, incorporating large language models into the scholarly writing process can significantly improve the type of knowledge work performed by academics.
Our approach also enables scholars to concentrate on critical aspects of their work, like the novelty of their ideas, while automating tedious tasks such as adhering to specific writing styles.
Although the use of AI-assisted tools in scientific authoring is controversial, our approach, which focuses on revising human-written text and provides change-tracking transparency, can mitigate concerns regarding AI's role in scientific writing.
22 changes: 13 additions & 9 deletions content/02.introduction.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,27 @@
## Introduction

Manuscripts have been around for thousands of years, but scientific journals have only been around for about 350 years [@isbn:0810808447].
The tradition of scholarly writing dates back thousands of years, evolving significantly with the advent of scientific journals approximately 350 years ago [@isbn:0810808447].
External peer review, which is used by many journals, is even more recent, having been around for less than 100 years [@doi:10/d26d8b].
Most manuscripts are written by humans or teams of humans working together to describe new advances, summarize existing literature, or argue for changes in the status quo.
However, scholarly writing is a time-consuming process where results of a study are presented using a specific style and format.
Academics can sometimes be long-winded in getting to key points, making writing more impenetrable to their audience [@doi:10.1038/d41586-018-02404-4].

Recent advances in computing capabilities and the widespread availability of text, images, and other data on the internet have laid the foundation for artificial intelligence (AI) models with billions of parameters.
Large language models, in particular, are opening the floodgates to new technologies with the capability to transform how society operates [@arxiv:2102.02503].
Large language models (LLMs), in particular, are opening the floodgates to new technologies with the capability to transform how society operates [@arxiv:2102.02503].
OpenAI's models, for instance, have been trained on vast amounts of data and can generate human-like text [@arxiv:2005.14165].
These models are based on the transformer architecture which uses self-attention mechanisms to model the complexities of language.
The most well-known of these models is the Generative Pre-trained Transformer 3 (GPT-3), which have been shown to be highly effective for a range of language tasks such as generating text, completing code, and answering questions [@arxiv:2005.14165].
Scientists are already using these tools to improve scientific writing [@doi:10.1038/d41586-022-03479-w].
The most well-known of these models is the Generative Pre-trained Transformer (GPT-3 and, more recently, GPT-4), which have been shown to be highly effective for a range of language tasks such as generating text, completing code, and answering questions [@arxiv:2005.14165].
In the realm of medical informatics, scientists are beginning to explore the utility of these tools in optimizing clinical decision support [@doi:10.1093/jamia/ocad072] or assessing its potential to reduce health disparities [@doi:10.1093/jamia/ocad245], while also raising concerns about their impact in medical education [@doi:10.1093/jamia/ocad104] and the importance of keeping the human aspect central in AI development and application [@doi:10.1093/jamia/ocad091].
These tools have been also used in enhancing scientific communication [@doi:10.1038/d41586-022-03479-w].
This technology has the potential to revolutionize how scientists write and revise scholarly manuscripts, saving time and effort and enabling researchers to focus on more high-level tasks such as data analysis and interpretation.
However, the use of LLMs in research has sparked controversy, primarily due to their propensity to generate plausible yet factually incorrect or misleading information.

We present a novel AI-assisted revision tool that envisions a future where authors collaborate with large language models in the writing of their manuscripts.
This workflow builds on the Manubot infrastructure for scholarly publishing [@doi:10.1371/journal.pcbi.1007128], a platform designed to enable both individual and large-scale collaborative projects [@doi:10.1098/rsif.2017.0387; @pmid:34545336].
Our workflow involves parsing the manuscript, utilizing a large language model with section-specific prompts for revision, and then generating a set of suggested changes to be integrated into the main document.
In this work, we present a human-centric approach for the use of AI in manuscript writing where scholarly text, initially created by humans, is revised through edit suggestions from LLMs, and then ultimately reviewed and approved by humans.
This approach mitigates the risk of generating misleading information while still providing the benefits of AI-assisted writing.
We developed an AI-assisted revision tool that implements this approach and builds on the Manubot infrastructure for scholarly publishing [@doi:10.1371/journal.pcbi.1007128], a platform designed to enable both individual and large-scale collaborative projects [@doi:10.1098/rsif.2017.0387; @pmid:34545336].
Our tool, named the Manubot AI Editor, parses the manuscript, utilizes an LLM with section-specific prompts for revision, and then generates a set of suggested changes to be integrated into the main document.
These changes are presented to the user through the GitHub interface for review.
To evaluate our workflow, we conducted a case study with three Manubot-authored manuscripts that included sections of varying complexity.
During prompt engineering, we developed unit tests to ensure that a minimum set of quality measures are met by the AI revisions.
For end-to-end evaluation, we manually reviewed the AI revisions on three Manubot-authored manuscripts that included sections of varying complexity.
Our findings indicate that, in most cases, the models were able to maintain the original meaning of text, improve the writing style, and even interpret mathematical expressions.
Our AI-assisted writing workflow can be incorporated into any Manubot manuscript, and we anticipate it will help authors more effectively communicate their work.
Officially part of the Manubot platform, our Manubot AI Editor can be readily incorporated into Manubot-based manuscripts, and we anticipate it will help authors more effectively communicate their work.
Loading