Automate your scientific research paper (thesis, original article, review). The aim of auto-paper is to give you tips, tricks, and tools to accelerate your publication rate and improve publication quality.
⚠️ 2021-03-20 This repository is under construction. The end-goal is to have a plug-and-play system that links many of the existing (and amazing) tools that are available to help create scientific documents and add some new tools as well. In the meantime, I hope that the "tips, tricks, and tools" covered in this repository can be of use for you. Please open up an Issue if you have feedback, suggestions, or things you're particularly interested in. I'll prioritize these. Pull requests are welcome.
Interested in seeing a template and usage instructions for the LaTeX autopaper.sty
package? Please fill out the poll under "Discussions".
- S. G. Baird, M. Liu, T. D. Sparks, (2022), Computational Materials Science, https://dx.doi.org/10.1016/j.commatsci.2022.111505 (arXiv)
- S. G. Baird, J. R. Hall, T. D. Sparks, (2022), ChemRxiv, https://dx.doi.org/10.26434/chemrxiv-2022-nz2w8
- S. G. Baird, M. Liu, H. M. Sayeed, T. D. Sparks, (2022), arXiv, http://arxiv.org/abs/2202.02380
- S. G. Baird, E. R. Homer, D. T. Fullwood, O. K. Johnson, (2022), MethodsX, https://dx.doi.org/10.1016/j.mex.2022.101731
- S. G. Baird, T. Q. Diep, T. D. Sparks, (2022), Digital Discovery, https://dx.doi.org/10.1039/D1DD00028D
- S. G. Baird, E. R. Homer, D. T. Fullwood, O. K. Johnson, (2021), ChemRxiv, https://dx.doi.org/10.26434/chemrxiv-2021-ds0ml
- S. G. Baird, E. R. Homer, D. T. Fullwood, O. K. Johnson, (2021) Computational Materials Science, https://dx.doi.org/10.1016/j.commatsci.2021.110756
Taylor Sparks 7K subscribers
Sterling Baird, PhD candidate at the University of Utah's Materials Science & Engineering Department in Dr. Taylor Sparks' laboratory, explains some incredible tips, tricks, and tools for automating your research paper. This presentation includes awesome video tutorials for using Github, LaTeX, Matlab, Mathematica, overleaf, Zotero, Web of Science, Scinote, Endnote Click, Science Direct, Table generators, Mathpix snip, protocols.io, and so much more! Sterling has video demonstrations for each software package and explains how they can all work together in one seamless workflow that will make life easy for any graduate student or researcher. Learn how to be a graduate student from the future with these amazing tools that will astonish your PhD advisor!
0:00 survey results of current tools
1:47 automation overview
4:30 LaTeX & github
12:20 summary of automated workflow tools
18:39 literature management
24:10 conversion apps
25:10 LaTeX packages
30:00 code integrations
36:09 electronic lab notebooks
38:31 Q&A
Due to the large filesize, the slides are not available in this repository. However, you can download a copy of slides at their saved state on 2021-03-18:
An automated workflow might look something like: | My personal workflow looks like the following: |
---|---|
LaTeX and Git are the bread and butter of an automated scientific research paper. While you can use some aspects of these tips/tricks without LaTeX/Git (e.g. MathPix Snipping Tool), many others are highly integrated. Learning LaTeX and Git might have a startup cost of 10-100 hrs (depending on desired skill level), but could easily save 100's of hrs in the first year or two of using it.
LaTeX (pronounced "Lah-tech" or "Lay-tech") allows you to focus more on content rather than formatting.
See 🔗LaTeX Teaching
You may be interested in using autopaper.sty
which I've been iterating on for years and is based on referencing hundreds of stack exchange posts and extensive troubleshooting. This is probably one of the highest-impact contributions of this repo/methodology. It can be used by placing the file in your parent folder and in your main document using the command:
\usepackage[refcheck=false,todonotes=false]{autopaper}
or simply:
\usepackage{autopaper}
Template and detailed usage instructions TBD. If you're interested in using this sooner, please consider filling out the poll under "Discussions".
I encourage you to check out PyScaffold. For some examples of this in action on repositories I'm developing, see xtal2png
(GitHub) and mp-time-split
(GitHub).
Git gives you ease, control, power, and peace of mind in version control.
See 🔗Git Teaching
Plotting videos by Dr. Taylor Sparks: Python to make nice figures.
Highly recommended: pymatviz
(Python >= 3.8
), a toolkit for visualizations in materials informatics.
pip install pymatviz
Here are the example visualizations taken from the pymatviz README (2022-06-11):
For a nice materials science-focused Python/Jupyter introduction, see the Primer at https://workshop.materialsproject.org/.
See https://github.com/stars/sgbaird/lists/visualization
Wrap built-in functions into custom versatile functions that suit your research needs. For example, using parityplot.m
from mat-fig:
See MATLAB Directory
Search for LaTeX code for common equations
Find an image of the equation you're looking for and use Mathpix Snipping Tool to convert to LaTeX, MathML, etc.
https://github.com/arnog/mathlive (open-source, LaTeX support, but haven't tried out yet)
You can play around with Mathematica code (including use of TeXForm
) and get some quick interactive tutorials via the Wolfram Programming Lab (no sign-in required). If after spending a few minutes, you decide you're interested in trying it out, I suggest downloading Mathematica with a 15-day free trial or using your institution's license if applicable. Mathematica licenses are not yet offered at UoU for general download, but you can use it via UoU CHPC. Also, student pricing (as of 2021-03-20) is as follows:
You can also call Mathematica functions from within Python, but you still need to install Mathematica in the default location or connect to a Cloud kernel (basic plan of Wolfram Cloud is free).
Right click on selection and CopyAs.. "LaTeX"
- Typeset equations in Mathematica
- Can copy LaTeX or MathML into Mathematica (latter generally behaves better in my experience)
To help you learn the shortcuts, consider opening the Math Assistant Palette and hovering over the relevant "boxes" to see the shortcut commands. While the palettes can be useful at first, don't let them become a crutch. Chances are you won't need them most of the time once you're more comfortable. Advanced typesetting is demonstrated for a teaching figure:
\begin{equation}
\begin{array}{cccc}
\overbrace{\left(
\begin{array}{c}
\text{Distances} \\
\text{Angles} \\
\text{Area} \\
\text{Volume} \\
\end{array}
\right)}^{\text{Features}} & \overbrace{\left(
\begin{array}{c}
0.1\text{\AA} \\
0\text{rad} \\
0\text{\AA}^2 \\
0\text{\AA}^3 \\
\end{array}
\right)}^{\text{Lower Bound}} & \overbrace{\left(
\begin{array}{ccccccccccccc}
0 & 0.1 & 0.3 & 0.1 & 0 & 0 & 0.2 & 0.4 & 0.2 & 0 & 0 & \ldots & 0 \\
0 & 0.1 & 0.2 & 0.4 & 0.2 & 0.1 & 0 & 0 & 0 & 0 & 0 & \ldots & 0.2 \\
0 & 0 & 0 & 0.3 & 0.6 & 0.3 & 0 & 0 & 0 & 0.1 & 0.2 & \ldots & 0 \\
0 & 0 & 0 & 0 & 0 & 0.4 & 0.6 & 0.4 & 0 & 0 & 0 & \ldots & 0.1 \\
\end{array}
\right)}^{\text{Gaussian Encoding}} & \overbrace{\left(
\begin{array}{c}
15\text{\AA} \\
\pi \text{rad} \\
225\text{\AA}^2 \\
3375\text{\AA}^3 \\
\end{array}
\right)}^{\text{Upper Bound}} \\
\end{array}
\end{equation}
Using TeXport
- Export equations to a
.tex
file - Export equations followed by variable definitions (e.g. "where a, b, and c represent apples, bananas, and cantouloupes, respectively.")
- Perform and typeset proofs (equations, variable definitions, sentences, and symbolic solutions)
I've played around with MathType and have been able to get a comparable typesetting experience as Mathematica (i.e. edit the equation in its "full" form and copy as LaTeX code). Most shortcuts are fairly complicated for being shortcuts (almost all of the standard ones involves a Ctrl+, release, then ). MathType is the only thing I've found so far that gives a similar typesetting experience to Mathematica. Once the free trial expires, the cost is ~$50/year. Personally, I prefer Mathematica still (MathType is only typesetting), but this may suit your needs.
If it was easy, someone has probably already done it. If not, it's probably not as important as you think.
EndNote Click (formerly Kopernio) has given the best results for adding the correct metadata to PDFs.
Between Science Direct and Web of Science, you're likely to find all the research articles that you need. I find that searching both is better than searching only one. While Google Scholar and Google are great search engines for certain applications, I discourage heavy reliance on these for your literature searches. It is likely that you may miss seminal, cutting edge, and or obscure, important papers by neglecting to use Science Direct and Web of Science.
Keywords deserve their own section because they can make or break your success in finding "gold" articles in your field.
Citation trees (or webs) help you find "gold", "silver", and "bronze" articles in your field by allowing you to trace what works have been cited and what works have been doing the citing. Find 3-4 highly relevant papers in your field, moving up, down, and side-to-side in the citation tree can lead to a treasure trove of other articles.
Here is an example of navigating to "citing" articles within Web of Science:
Use EndNote Click to bring your documents in!
See ref-software
My favorite is Zotero.
Issues with Mendeley changing citation keys all the time on me. Has thrown off almost the entire LaTeX bibliography before.
Ability to quick-copy citation commands in Zotero is great.
After downloading with EndNote Click, drag-and-drop the PDF into Zotero into a folder structure that you organize, and notice that the metadata is automatically extracted:
In addition to the above resources, packages have their own documentation and examples which are usually quite informative.
Moved a section from the introduction to the right before the conclusion, and messed up all your glossary definitions? No problem with glossaries and glossaries-extra packages!
\gls{ml}
\Gls{rf}
\glspl{ann}
I like to \glsreset
after the abstract and right before the conclusion so that people don't have to go digging through the paper if they're just reading one of those sections. I generally use \acrfull{}
in the first usage within a figure or table caption, and then \acrshort{}
afterwards (within the same caption). I believe captions are supposed to be stand-alone components similar to the abstract.
Dealing with numbers and units is a cinch with this package.
\SI{10.25}{\joule\per\square\meter}
\SIlist{10.25; 5; 6}{\joule}
\SIlist{10.25 +- 2.5; 5 \pm 2.1; 6}{\joule}
\ch{ThCR2Si2}
\cite{meredigCombinatorialScreeningNew2014}
\citet{meredigCombinatorialScreeningNew2014}
Especially useful for cross-reference between main and Supplementary/Supporting information document in same project. Overleaf tutorial: Cross referencing with the xr package in Overleaf
SciNote vs. Labfolder vs. LabArchives vs. OneNote
Open Citrine Platform and PIF/GEMD data formats
More comprehensive than a lab notebook solution.
For example, LabCollector and openBIS
Jablonka KM, Zasso M, Patiny L, Marzari N, Pizzi G, Smit B, et al. Connecting lab experiments with computer experiments: Making "routine" simulations routine. ChemRxiv. Cambridge: Cambridge Open Engage; 2021; This content is a preprint and has not been peer-reviewed. https://doi.org/10.26434/chemrxiv-2021-h3381-v2