Skip to content
Kohei Watanabe edited this page Feb 10, 2019 · 12 revisions

quanteda style guide

Style is important, and we want our code to be readable and look great.

In general, we follow the tidyverse style guide, with a few exceptions noted below.

Source files

  • Source files use the extension .R (not .r).
  • In general we have one function per .R file, although closely related functions (e.g. translation) are grouped in single .R files.
  • Use meaningful lowercase names, in snake_case, for source files with words separated by underscores e.g. tokens.R, textmodel_wordscores.R, etc.

Function and Variable names

  • Use snake_case for function names and variable names, following the rOpenSci guidelines.
  • Do not use dot.separated names for anything except when extending S3 generic methods.
  • Use short variable names for very local or temporary variables, and longer explanatory names otherwise.
  • Use <-, not =, for assignment.

Braces

An opening curly brace should never go on its own line and should always be followed by a new line; a closing curly brace should always go on its own line, unless followed by else. Always indent the code inside the curly braces. See the examples in Hadley Wickham's book Advanced R.

Piping

Pipes %>% can be used in examples but not in package's functions, because it makes debugging more difficult and induces unnecessarily dependency.

Return values

return() should be used to return values in local variables both in the middle and the end of functions. However, you do not need to use that expression if functions simply call other functions and return their outputs.

Formatting

Put spaces around operators to aid expression readibility: see sections 3 and 4 here and the section Syntax: Spacing in Advanced R.

In quanteda, we use 4 spaces (spaces, never tabs) for indentation. Why?

  1. It aids readability (and space is cheap!).
  2. It reminds us of Python.
  3. Mickey Mouse has four fingers, not two.
  4. Because otherwise you will be water-boarded.

Example Object Names

For examples in the quanteda documentation and vignettes, we use the following format:

  • coll: collocations
  • corp: corpus
  • dict: dictionary
  • dat: data.frame
  • tab: table
  • mat: matrix
  • dfmat: dfm
  • fcmat: fcm
  • kw: kwic
  • lis: list
  • tmod: textmodel
  • tstat: textstat
  • tplot: textplot
  • toks: tokens
  • txt: character vector of text(s)

Variations would have the numeric or alpha variant after these names, generally without underscores, e.g. dictlg, dfmat1, but could also be corp_orig, corp_reshaped.

References

  • We use the APA style for references in the documentation.
  • If available, the DOI (preferred) or URL (if no DOI available) of the reference is liked with "title" field: \href{doi/url}{Title of Reference}.
  • The title is printed in italics (\emph{}). For page ranges en dashes (XX--YY) are used.
  • The publication year is surrounded by brackets and followed by a full stop.
  • The first letter of words in titles and journal names is capitalized.

Examples

Bondi, M. & Scott, M. (eds) (2010). Keyness in Texts. Amsterdam, Philadelphia: John Benjamins.

Laver, M., Benoit, K.R., & Garry, J. (2003). Extracting Policy Positions From Political Texts Using Words as Data. American Political Science Review, 97(2), 311–331.