collogetr performs (i) collocates retrieval (currently from the
sentence-based (Indonesian) Leipzig
Corpora) and (ii)
computation of collocation association-measures. The function
colloc_leipzig()
is used to retrieve window-span collocates for a set
of word forms (viz. the nodeword(s) or keyword(s)).
Two other functions (namely, assoc_prepare()
and
assoc_prepare_dca()
) serve to process the output of colloc_leipzig()
into tabular/data frame formats, which then become the input data for
computing the association measure between the collocates and the
node (as in Stefanowitsch and Gries’
(2003)
collostructional/collocation
analysis)
(Stefanowitsch, 2013;
Stefanowitsch & Gries, 2009; see
also, Gries, 2015). The function
assoc_prepare()
generates input data for computing Simple
Collexeme/Collocational
Analysis
(SCA), meanwhile assoc_prepare_dca()
uses the output of
assoc_prepare()
to generate input data for computing Distinctive
Collexeme/Collocates
Analysis
(DCA) (Gries & Stefanowitsch, 2004;
Hilpert, 2006). Based on the output of
assoc_prepare()
, SCA can then be computed using the default
collex_fye()
, which is based on Fisher-Yates Exact test (other
association measures available for computing SCA include (i)
collex_llr()
for Log-Likelihood Ratio, (ii) collex_MI()
for Mutual
Information score, (iii) collex_TScore()
for T-Score, (iv)
collex_chisq()
for Chi-Square-based score, and (v) collex_logOR()
for Log10 Odds Ratio). DCA is computed using
collex_fye_dca()
.
collogetr is built on top of the core packages in the tidyverse.
Install collogetr from GitHub with devtools:
library(devtools)
install_github("gederajeg/collogetr")
library(collogetr)
To cite collogetr in publication, type as follows:
citation("collogetr")
#>
#> To cite `collogetr` in publication, please use:
#>
#> Rajeg, G. P. W. (2020). collogetr: Collocates retriever and
#> collocational association measure. R package development version
#> 1.1.4. url: https://github.com/gederajeg/collogetr. doi:
#> https://doi.org/10.26180/5b7b9c5e32779
#>
#> Please also cite the following foundational works on the Collexeme
#> Analysis and Distinctive Collexeme Analysis:
#>
#> Stefanowitsch, A., & Gries, S. T. (2003). Collostructions:
#> Investigating the interaction of words and constructions.
#> International Journal of Corpus Linguistics, 8(2), 209–243.
#>
#> Gries, S. T., & Stefanowitsch, A. (2004). Extending collostructional
#> analysis: A corpus-based perspective on ‘alternations’. International
#> Journal of Corpus Linguistics, 9(1), 97–129.
#>
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.
The package has three data sets for demonstration. The important one is
the demo_corpus_leipzig
whose documentation can be accessed via
?demo_corpus_leipzig
. Another data is a list of Indonesian stopwords
(i.e. stopwords
) that can be filtered out when performing
collocational measure. The last one is leipzig_corpus_path
containing
character vector of full path to my Leipzig Corpus files in my computer.
colloc_leipzig()
accepts two types of corpus-input data:
- A named-list object with character-vector elements of each Leipzig
Corpus Files, represented by
demo_corpus_leipzig
and the format of which is shown below:
lapply(demo_corpus_leipzig[1:2], sample, 2)
#> $ind_mixed_2012_1M
#> [1] "282181 Sebab, dari komunikasi yang dilakukan selama ini, warga yang memanfaatkan lahan itu untuk parkir telah menyatakan siap untuk pindah kapan saja."
#> [2] "755026 Mas Dony yang dipikir kok itu dulu."
#>
#> $ind_news_2008_300K
#> [1] "273613 Ia melakukan penelitian dan mengambil sampel gas di pusat semburan lumpur Lapindo."
#> [2] "220694 Namun menurut para wartawan, kirab itu berjalan tanpa insiden besar."
- Full-paths to the Leipzig Corpus plain texts, as in the
leipzig_corpus_path
.
leipzig_corpus_path[1:2]
#> [1] "/Users/Primahadi/Documents/Corpora/_corpusindo/Leipzig Corpora/ind_mixed_2012_1M-sentences.txt"
#> [2] "/Users/Primahadi/Documents/Corpora/_corpusindo/Leipzig Corpora/ind_news_2008_300K-sentences.txt"
In terms of the input strings for the pattern
argument,
colloc_leipzig()
accepts three scenarios:
-
Plain string representing a whole word form, such as
"memberikan"
‘to give’ -
Regex of a whole word, such as
"^memberikan$"
‘to give’ -
Regex of a whole word with word boundary character (
\\b
), such as"\\bmemberikan\\b"
.
All of these three forms will be used to match the exact word form of
the search pattern after the corpus file is tokenised into individual
words. That is, input patterns following scenario 1 or 3 will be turned
into their exact search pattern represented in scenario 2 (i.e., with
the beginning- and end-of-line anchors, hence "^...$"
). So user can
directly use the input pattern in scenario 2 for the pattern
argument.
If there are more than one word to be searched, put them into a
character vector (e.g., c("^memberi$", "^membawa$")
).
The codes below show how one may retrieve the collocates for the
Indonesian verb mengatakan ‘to say sth.’. The function
colloc_leipzig()
will print out progress messages of the stages onto
the console. It generates warning(s) when a search pattern or node word
is not found in a corpus file or in all loaded corpus files.
out <- colloc_leipzig(leipzig_corpus_list = demo_corpus_leipzig,
pattern = "mengatakan",
window = "r",
span = 1L,
save_interim = FALSE)
In the example above, the collocates are restricted to those occurring
one word (i.e. span = 1L
) to the right (window = "r"
) of
mengatakan ‘to say’. The "r"
character in window
stands for
right-side collocates ("l"
for left-side collocates and "b"
for
both right- and left-side collocates). The span
argument requires
integer (i.e., a whole number) to indicate the range of words covered in
the specified window. The pattern
argument requires one or more exact
word forms; if more than one, put into a character vector (e.g.,
c("mengatakan", "menjanjikan")
).
The save_interim
is FALSE
means that no output is saved into the
computer, but in the console (i.e., in the out
object). If
save_interim = TRUE
, the function will save the outputs into the files
in the computer. colloc_leipzig()
has specified the default file names
for the outputs via these arguments: (i) freqlist_output_file
, (ii)
colloc_output_file
, (iii) corpussize_output_file
, and (iv)
search_pattern_output_file
. It is recommended that the output
filenames are stored as a character vector. See Examples “(2)” in
the documentation of colloc_leipzig()
for a call when save_interim = TRUE
.
The output of colloc_leipzig()
is a list of 4 elements:
colloc_df
; a table/tibble of raw collocates data with columns for:- corpus names
- sentence id in which the collocates and the node word(s) are found
- the collocates (column
w
) - the span information (e.g.,
"r1"
for one-word, right-side collocates) - the node word
- the text/sentence match in which the collocates and the node are found
freqlist_df
; a table/tibble of word-frequency list in the loaded corpuscorpussize_df
; a table/tibble of total word-tokens in the loaded corpuspattern
; a character vector of the search pattern/node
str(out)
#> List of 4
#> $ colloc_df :Classes 'tbl_df', 'tbl' and 'data.frame': 151 obs. of 6 variables:
#> ..$ corpus_names: chr [1:151] "ind_mixed_2012_1M" "ind_mixed_2012_1M" "ind_mixed_2012_1M" "ind_news_2008_300K" ...
#> ..$ sent_id : int [1:151] 185 191 215 1 93 96 122 130 136 158 ...
#> ..$ w : chr [1:151] "kalau" "ia" "bahwa" "rupiah" ...
#> ..$ span : chr [1:151] "r1" "r1" "r1" "r1" ...
#> ..$ node : chr [1:151] "mengatakan" "mengatakan" "mengatakan" "mengatakan" ...
#> ..$ sent_match : chr [1:151] "705166 Beberapa kawan mengatakan kalau voting dilakukan secara tertutup satu orang satu suara dan tidak ada kes"| __truncated__ "870266 Pak haji mengatakan, ia sebenarnya menginginkan seorang menantu yang bisa mengajarkan caranya menggunaka"| __truncated__ "256689 Catatan: sebelum bagian ini Edwin Louis Cole mengatakan bahwa Allah memberikan firman kepada Martin Luth"| __truncated__ "270199 Ia mengatakan, rupiah makin terpuruk sulit dipertahankan, karena faktor negatif internal sangat kuat men"| __truncated__ ...
#> $ freqlist_df :Classes 'tbl_df', 'tbl' and 'data.frame': 30093 obs. of 3 variables:
#> ..$ corpus_names: chr [1:30093] "ind_mixed_2012_1M" "ind_mixed_2012_1M" "ind_mixed_2012_1M" "ind_mixed_2012_1M" ...
#> ..$ w : chr [1:30093] "yang" "dan" "di" "dengan" ...
#> ..$ n : int [1:30093] 128 93 59 53 50 45 37 32 31 28 ...
#> $ corpussize_df:Classes 'tbl_df', 'tbl' and 'data.frame': 15 obs. of 2 variables:
#> ..$ corpus_names: chr [1:15] "ind_mixed_2012_1M" "ind_news_2008_300K" "ind_news_2009_300K" "ind_news_2010_300K" ...
#> ..$ size : int [1:15] 3676 4663 4740 4904 4690 4881 4018 3854 3831 3827 ...
#> $ pattern : chr "mengatakan"
The freqlist_df
and corpussize_df
are important for performing the
collocational strength measure for the search pattern with the
collocates.
First we need to call assoc_prepare()
for generating the data SCA. The
demo illustrates it with in-console output of colloc_leipzig()
. See
the Examples “2.2” in the documentation for assoc_prepare()
for
handling saved outputs (?assoc_prepare()
).
assoc_tb <- assoc_prepare(colloc_out = out,
window_span = "r1",
per_corpus = FALSE, # combine all data across corpus
stopword_list = collogetr::stopwords,
float_digits = 3L)
#> Your colloc_leipzig output is stored as list!
#> You chose to combine the collocational and frequency list data from ALL CORPORA!
#> Tallying frequency list of all words in ALL CORPORA!
#> You chose to remove stopwords!
Inspect the output of assoc_prepare()
:
head(assoc_tb)
#> # A tibble: 6 x 3
#> # Groups: node, w [6]
#> node w data
#> <chr> <chr> <list>
#> 1 mengatakan pemerintah <tibble [1 × 9]>
#> 2 mengatakan israel <tibble [1 × 9]>
#> 3 mengatakan kasus <tibble [1 × 9]>
#> 4 mengatakan akibat <tibble [1 × 9]>
#> 5 mengatakan alokasi <tibble [1 × 9]>
#> 6 mengatakan angklung <tibble [1 × 9]>
The assoc_prepare()
and collex_fye()
functions are designed
following the tidy principle so that the association/collocation measure
is performed in a row-wise fashion, benefiting from the combination of
nested column
(cf., Wickham & Grolemund, 2017, p. 409) for the
input-data (using tidyr::nest()
) and purrr
’s map_*
function.
assoc_prepare()
includes calculating the expected co-occurrence
frequencies between the collocates/collexemes and the node
word/construction.
The column data
in assoc_tb
above consists of nested tibble/table as
a list. Each contains required data for performing association measure
for each of the collocates in column w
(Gries,
2013, 2015;
Stefanowitsch & Gries, 2003,
2009). This nested column can be
inspected as follows (for the first row, namely for the word pihaknya
‘the party’).
# get the tibble in the `data` column for the first row
assoc_tb$data[[1]]
#> # A tibble: 1 x 9
#> a n_w_in_corp corpus_size n_pattern b c d a_exp assoc
#> <int> <int> <int> <int> <int> <int> <int> <dbl> <chr>
#> 1 4 96 41179 152 92 148 40935 0.354 attraction
Column a
indicates the co-occurrence frequency between the node word
and the collocates column w
, meanwhile a_exp
indicates the expected
co-occurrence frequency between them. The n_w_in_corp
represents the
total token/occurrence frequency of a given collocate. The n_pattern
stores the total token/occurrence frequency of the node word in the
corpus. Column b
, c
, and d
are required for the association
measure that is essentially based on 2-by-2 crosstabulation table. The
assoc
column indicates whether the value in a
is higher than that in
a_exp
, thus indicating attraction or positive association between
the node word and the collocate. The reverse is repulsion or negative
association when the value in a
is less/lower than that in a_exp
.
As in the Collostructional Analysis (Stefanowitsch & Gries,
2003), collex_fye()
uses
one-tailed Fisher-Yates Exact test whose
p-FisherExactvalue is log-transformed to the base of 10 to
indicate the collostruction strength between the collocates and the node
word (Gries, Hampe, & Schönefeld, 2005).
collex_fye()
simultaneously performs two uni-directional measures of
Delta P (Gries, 2013,
2015, p. 524). One of these shows the extent to
which the presence of the node-word cues the presence of the
collocates/collexemes; the other one determines the extent to which the
collocates/collexemes cues the presence of the node-word.
Here is the codes to perform the SCA using collex_fye()
:
# perform FYE test for Collexeme Analysis
am_fye <- collex_fye(df = assoc_tb, collstr_digit = 3)
Now we can retrieve the top-10 most strongly attracted collocates to
mengatakan ‘to say sth.’. The association strength is shown in the
collstr
column, which stands for collostruction strength. The
higher, the stronger the association.
# get the top-10 most strongly attracted collocates
dplyr::top_n(am_fye, 10, collstr)
#> # A tibble: 84 x 9
#> # Groups: node, w [84]
#> node w a a_exp assoc p_fye collstr dP_collex_cue_c…
#> <chr> <chr> <int> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 meng… peme… 4 0.354 attr… 4.55e-4 3.34 0.024
#> 2 meng… isra… 2 0.078 attr… 2.71e-3 2.57 0.013
#> 3 meng… angk… 1 0.004 attr… 3.69e-3 2.43 0.007
#> 4 meng… ayla 1 0.004 attr… 3.69e-3 2.43 0.007
#> 5 meng… defi… 1 0.004 attr… 3.69e-3 2.43 0.007
#> 6 meng… hofos 1 0.004 attr… 3.69e-3 2.43 0.007
#> 7 meng… kawa… 1 0.004 attr… 3.69e-3 2.43 0.007
#> 8 meng… kebe… 1 0.004 attr… 3.69e-3 2.43 0.007
#> 9 meng… kete… 1 0.004 attr… 3.69e-3 2.43 0.007
#> 10 meng… konj… 1 0.004 attr… 3.69e-3 2.43 0.007
#> # … with 74 more rows, and 1 more variable: dP_cxn_cue_collex <dbl>
Column a
contains the co-occurrence frequency of the collocates (w
)
with the node
as its R1 collocates in the demo corpus. p_fye
shows
the one-tailed pFisherExact-value.
The idea of distinctive collexemes/collocates is to contrast two
functionally/semantically similar constructions or words in terms of the
collocates that are (significantly) more frequent for one of the two
contrasted constructions/words (Hilpert,
2006; see Gries & Stefanowitsch,
2004). colloc_leipzig()
can be used to
retrieve collocates of two functionally/semantically similar words by
specifying the pattern
argument with two character vectors of words.
The following example use one of the Leipzig corpus files (not included
in the package but can be downloaded from the Leipzig Corpora webpage
for free), namely the "ind_mixed_2012_1M-sentences"
. The aim is to
contrast collocational preferences of two deadjectival transitive verbs
based on the root kuat ‘strong’ framed within two causative
morphological schemas: one with per-+ADJ and the other with
ADJ+-kan. Theoretically, the per- schema indicates that the direct
object of the verb is caused to have more of the characteristic
indicated by the adjectival root, meanwhile the -kan schema indicates
that the direct object is caused to have the characteristic indicated by
the root (that is not previously had). The focus here is on the R1
collocates of the verbs (i.e. one word immediately to the right of the
verbs in the sentences).
my_leipzig_path <- collogetr::leipzig_corpus_path[1]
dca_coll <- collogetr::colloc_leipzig(leipzig_path = my_leipzig_path,
pattern = c("memperkuat", "menguatkan"),
window = "r",
span = 1,
save_interim = FALSE)
Then, we prepare the output into the format required for performing DCA
with collex_fye_dca()
.
assoc_tb <- assoc_prepare(colloc_out = dca_coll,
window_span = "r1",
per_corpus = FALSE,
stopword_list = collogetr::stopwords,
float_digits = 3L)
#> Your colloc_leipzig output is stored as list!
#> You chose to combine the collocational and frequency list data from ALL CORPORA!
#> Tallying frequency list of all words in ALL CORPORA!
#> You chose to remove stopwords!
# prepare the dca input table
dca_tb <- assoc_prepare_dca(assoc_tb)
Compute DCA for the two verbs and view the results snippet.
dca_res <- collex_fye_dca(dca_tb)
head(dca_res, 10)
#> # A tibble: 10 x 6
#> # Groups: w [10]
#> w memperkuat menguatkan p_fye collstr dist_for
#> <chr> <int> <int> <dbl> <dbl> <chr>
#> 1 diagnosis 15 0 0.00908 2.04 memperkuat
#> 2 posisi 15 0 0.00908 2.04 memperkuat
#> 3 daya 14 0 0.0125 1.90 memperkuat
#> 4 sistem 13 0 0.0171 1.77 memperkuat
#> 5 pertahanan 10 0 0.0439 1.36 memperkuat
#> 6 basis 8 0 0.0823 1.08 memperkuat
#> 7 ketahanan 7 0 0.113 0.948 memperkuat
#> 8 pasukan 7 0 0.113 0.948 memperkuat
#> 9 rasa 7 0 0.113 0.948 memperkuat
#> 10 tim 7 0 0.113 0.948 memperkuat
The package also includes a function called dca_top_collex()
to
retrieve the top-n distinctive collocates for one of the two contrasted
words. The dist_for
argument can be specified by either the character
vector of the name of the contrasted words, or the character IDs of the
constructions/words (e.g., ..., dist_for = "a", ...
or ..., dist_for = "A", ...
for construction/word appearing in the second column from
the output of collex_fye_dca()
; ..., dist_for = "b", ...
or ..., dist_for = "B", ...
for construction/word appearing in the third
column).
# retrieve distinctive collocates for Construction A (i.e., memperkuat)
dist_for_a <- dca_top_collex(dca_res, dist_for = "memperkuat", top_n = 10)
head(dist_for_a)
#> # A tibble: 6 x 6
#> # Groups: w [6]
#> w memperkuat menguatkan p_fye collstr dist_for
#> <chr> <int> <int> <dbl> <dbl> <chr>
#> 1 diagnosis 15 0 0.00908 2.04 memperkuat
#> 2 posisi 15 0 0.00908 2.04 memperkuat
#> 3 daya 14 0 0.0125 1.90 memperkuat
#> 4 sistem 13 0 0.0171 1.77 memperkuat
#> 5 pertahanan 10 0 0.0439 1.36 memperkuat
#> 6 basis 8 0 0.0823 1.08 memperkuat
The codes below retrieve the distinctive collocates for menguatkan ‘to strengthen’ or Construction B.
# retrieve distinctive collocates for Construction B (i.e., menguatkan)
dist_for_b <- dca_top_collex(dca_res, dist_for = "menguatkan", top_n = 10)
head(dist_for_b)
#> # A tibble: 6 x 6
#> # Groups: w [6]
#> w memperkuat menguatkan p_fye collstr dist_for
#> <chr> <int> <int> <dbl> <dbl> <chr>
#> 1 hati 0 12 0.000000109 6.96 menguatkan
#> 2 satu 2 8 0.000636 3.20 menguatkan
#> 3 iman 7 10 0.00487 2.31 menguatkan
#> 4 kebenaran 0 4 0.00501 2.30 menguatkan
#> 5 orang 0 4 0.00501 2.30 menguatkan
#> 6 kepercayaan 0 3 0.0189 1.72 menguatkan
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.3 (2020-02-29)
#> os macOS Catalina 10.15.3
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Asia/Makassar
#> date 2020-03-28
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.0)
#> callr 3.2.0 2019-03-15 [1] CRAN (R 3.6.0)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.0)
#> collogetr * 1.1.4 2020-03-28 [1] local
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
#> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.0)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.0)
#> dplyr 0.8.5 2020-03-07 [1] CRAN (R 3.6.0)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.0)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
#> glue 1.3.2 2020-03-12 [1] CRAN (R 3.6.0)
#> hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.0)
#> htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.6.0)
#> knitr 1.28 2020-02-06 [1] CRAN (R 3.6.0)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
#> pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.0)
#> pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.6.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
#> processx 3.3.1 2019-05-08 [1] CRAN (R 3.6.0)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
#> purrr 0.3.3 2019-10-18 [1] CRAN (R 3.6.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0)
#> Rcpp 1.0.4 2020-03-17 [1] CRAN (R 3.6.0)
#> readr 1.3.1 2018-12-21 [1] CRAN (R 3.6.0)
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
#> rlang 0.4.5 2020-03-01 [1] CRAN (R 3.6.0)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
#> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.0)
#> tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
#> tidyr 1.0.2 2020-01-24 [1] CRAN (R 3.6.0)
#> tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.0)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.0)
#> vctrs 0.2.4 2020-03-10 [1] CRAN (R 3.6.0)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
#> xfun 0.12 2020-01-13 [1] CRAN (R 3.6.0)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
#>
#> [1] /Users/Primahadi/Rlibs
#> [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
Gries, S. T. (2013). 50-something years of work on collocations: What is or should be next …. International Journal of Corpus Linguistics, 18(1), 137–166. https://doi.org/10.1075/ijcl.18.1.09gri
Gries, S. T. (2015). More (old and new) misunderstandings of collostructional analysis: On Schmid and Küchenhoff (2013). Cognitive Linguistics, 26(3), 505–536. https://doi.org/10.1515/cog-2014-0092
Gries, S. T., Hampe, B., & Schönefeld, D. (2005). Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions. Cognitive Linguistics, 16(4), 635–676.
Gries, S. T., & Stefanowitsch, A. (2004). Extending collostructional analysis: A corpus-based perspective on ’alternations’. International Journal of Corpus Linguistics, 9(1), 97–129.
Hilpert, M. (2006). Distinctive collexeme analysis and diachrony. Corpus Linguistics and Linguistic Theory, 2(2), 243–256.
Stefanowitsch, A. (2013). Collostructional analysis. In T. Hoffmann & G. Trousdale (Eds.), The Oxford handbook of Construction Grammar (pp. 290–306). https://doi.org/10.1093/oxfordhb/9780195396683.013.0016
Stefanowitsch, A., & Gries, S. T. (2003). Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics, 8(2), 209–243.
Stefanowitsch, A., & Gries, S. T. (2009). Corpora and grammar. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook (Vol. 2, pp. 933–951). Berlin: Mouton de Gruyter.
Wickham, H., & Grolemund, G. (2017). R for Data Science. Retrieved from http://r4ds.had.co.nz/