Style guide

quanteda style guide

Style is important, and we want our code to be readable and look great.

In general, we follow the tidyverse style guide, with a few exceptions noted below.

Source files

Source files use the extension .R (not .r).
In general we have one function per .R file, although closely related functions (e.g. translation) are grouped in single .R files.
Use meaningful lowercase names, in snake_case, for source files with words separated by underscores e.g. tokens.R, textmodel_wordscores.R, etc.

Function and Variable names

Use snake_case for function names and variable names, following the rOpenSci guidelines.
Do not use dot.separated names for anything except when extending S3 generic methods.
Use short variable names for very local or temporary variables, and longer explanatory names otherwise.
Use <-, not =, for assignment.

Braces

An opening curly brace should never go on its own line and should always be followed by a new line; a closing curly brace should always go on its own line, unless followed by else. Always indent the code inside the curly braces. See the examples in Hadley Wickham's book Advanced R.

Piping

Pipes %>% can be used in examples but not in package's functions, because it makes debugging more difficult and induces unnecessarily dependency.

Return values

return() should be used to return values in local variables both in the middle and the end of functions. However, you do not need to use that expression if functions simply call other functions and return their outputs.

Formatting

Put spaces around operators to aid expression readibility: see sections 3 and 4 here and the section Syntax: Spacing in Advanced R.

In quanteda, we use 4 spaces (spaces, never tabs) for indentation. Why?

It aids readability (and space is cheap!).
It reminds us of Python.
Mickey Mouse has four fingers, not two.
Because otherwise you will be water-boarded.

Example Object Names

For examples in the quanteda documentation and vignettes, we use the following format:

coll: collocations
corp: corpus
dict: dictionary
dat: data.frame
tab: table
mat: matrix
dfmat: dfm
fcmat: fcm
kw: kwic
lis: list
tmod: textmodel
tstat: textstat
tplot: textplot
toks: tokens
txt: character vector of text(s)

Variations would have the numeric or alpha variant after these names, generally without underscores, e.g. dictlg, dfmat1, but could also be corp_orig, corp_reshaped.

References

We use the APA style for references in the documentation.
If available, the DOI (preferred) or URL (if no DOI available) of the reference is liked with "title" field: \href{doi/url}{Title of Reference}.
The title is printed in italics (\emph{}). For page ranges en dashes (XX--YY) are used.
The publication year is surrounded by brackets and followed by a full stop.
The first letter of words in titles and journal names is capitalized.

Examples

Bondi, M. & Scott, M. (eds) (2010). Keyness in Texts. Amsterdam, Philadelphia: John Benjamins.

Laver, M., Benoit, K.R., & Garry, J. (2003). Extracting Policy Positions From Political Texts Using Words as Data. American Political Science Review, 97(2), 311–331.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly