-
Notifications
You must be signed in to change notification settings - Fork 189
Style guide
Style is important, and we want our code to be readable and look great.
In general, we follow the tidyverse style guide, with a few exceptions noted below.
- Source files use the extension .R (not .r).
- In general we have one function per .R file, although closely related functions (e.g. translation) are grouped in single .R files.
- Use meaningful lowercase names, in snake_case, for source files with words separated by underscores e.g.
tokens.R
,textmodel_wordscores.R
, etc.
- Use
snake_case
for function names and variable names, following the rOpenSci guidelines. - Do not use dot.separated names for anything except when extending S3 generic methods.
- Use short variable names for very local or temporary variables, and longer explanatory names otherwise.
- Use
<-
, not=
, for assignment.
An opening curly brace should never go on its own line and should always be followed by a new line; a closing curly brace should always go on its own line, unless followed by else. Always indent the code inside the curly braces. See the examples in Hadley Wickham's book Advanced R.
Pipes %>%
can be used in examples but not in package's functions, because it makes debugging more difficult and induces unnecessarily dependency.
return()
should be used to return values in local variables both in the middle and the end of functions. However, you do not need to use that expression if functions simply call other functions and return their outputs.
Put spaces around operators to aid expression readibility: see sections 3 and 4 here and the section Syntax: Spacing in Advanced R.
In quanteda, we use 4 spaces (spaces, never tabs) for indentation. Why?
- It aids readability (and space is cheap!).
- It reminds us of Python.
- Mickey Mouse has four fingers, not two.
- Because otherwise you will be water-boarded.
For examples in the quanteda documentation and vignettes, we use the following format:
-
coll
: collocations -
corp
: corpus -
dict
: dictionary -
dat
: data.frame -
tab
: table -
mat
: matrix -
dfmat
: dfm -
fcmat
: fcm -
kw
: kwic -
lis
: list -
tmod
: textmodel -
tstat
: textstat -
tplot
: textplot -
toks
: tokens -
txt
: character vector of text(s)
Variations would have the numeric or alpha variant after these names, generally without underscores, e.g. dictlg
, dfmat1
, but could also be corp_orig
, corp_reshaped
.
- We use the APA style for references in the documentation.
- If available, the DOI (preferred) or URL (if no DOI available) of the reference is liked with "title" field:
\href{doi/url}{Title of Reference}
. - The title is printed in italics (
\emph{}
). For page ranges en dashes (XX--YY
) are used. - The publication year is surrounded by brackets and followed by a full stop.
- The first letter of words in titles and journal names is capitalized.
Examples
Bondi, M. & Scott, M. (eds) (2010). Keyness in Texts. Amsterdam, Philadelphia: John Benjamins.
Laver, M., Benoit, K.R., & Garry, J. (2003). Extracting Policy Positions From Political Texts Using Words as Data. American Political Science Review, 97(2), 311–331.