fix typos in docs

AdamSpannbauer · Dec 4, 2018 · 3e209c2 · 3e209c2
1 parent 7bce0a1
commit 3e209c2
Show file tree

Hide file tree

Showing 11 changed files with 24 additions and 24 deletions.
diff --git a/R/lexRank.R b/R/lexRank.R
@@ -5,12 +5,12 @@
 #' @param docId A vector of document IDs with length equal to the length of \code{text}.  If \code{docId == "create"} then doc IDs will be created as an index from 1 to \code{n}, where \code{n} is the length of \code{text}.
 #' @param threshold The minimum simil value a sentence pair must have to be represented in the graph where lexRank is calculated.
 #' @param n The number of sentences to return as the extractive summary.  The function will return the top \code{n} lexRanked sentences.  See \code{returnTies} for handling ties in lexRank.
-#' @param returnTies \code{TRUE} or \code{FALSE} indicating whether or not to return greater than \code{n} sentence IDs if there is a tie in lexRank.  If \code{TRUE}, the returned number of sentences will not be limited to \code{n}, but rather will return every sentece with a top 3 score.  If \code{FALSE}, the returned number of sentences will be \code{<=n}. Defaults to \code{TRUE}.
+#' @param returnTies \code{TRUE} or \code{FALSE} indicating whether or not to return greater than \code{n} sentence IDs if there is a tie in lexRank.  If \code{TRUE}, the returned number of sentences will not be limited to \code{n}, but rather will return every sentence with a top 3 score.  If \code{FALSE}, the returned number of sentences will be \code{<=n}. Defaults to \code{TRUE}.
 #' @param usePageRank \code{TRUE} or \code{FALSE} indicating whether or not to use the page rank algorithm for ranking sentences.  If \code{FALSE}, a sentences unweighted centrality will be used as the rank.  Defaults to \code{TRUE}.
 #' @param damping The damping factor to be passed to page rank algorithm.  Ignored if \code{usePageRank} is \code{FALSE}.
 #' @param continuous \code{TRUE} or \code{FALSE} indicating whether or not to use continuous LexRank.  Only applies if \code{usePageRank==TRUE}.  If \code{TRUE}, \code{threshold} will be ignored and lexRank will be computed using a weighted graph representation of the sentences. Defaults to \code{FALSE}.
 #' @param sentencesAsDocs \code{TRUE} or \code{FALSE}, indicating whether or not to treat sentences as documents when calculating tfidf scores for similarity. If \code{TRUE}, inverse document frequency will be calculated as inverse sentence frequency (useful for single document extractive summarization).
-#' @param removePunc \code{TRUE} or \code{FALSE} indicating whether or not to remove punctuation from text while tokenizing.  If \code{TRUE}, puncuation will be removed.  Defaults to \code{TRUE}.
+#' @param removePunc \code{TRUE} or \code{FALSE} indicating whether or not to remove punctuation from text while tokenizing.  If \code{TRUE}, punctuation will be removed.  Defaults to \code{TRUE}.
 #' @param removeNum \code{TRUE} or \code{FALSE} indicating whether or not to remove numbers from text while tokenizing.  If \code{TRUE}, numbers will be removed.  Defaults to \code{TRUE}.
 #' @param toLower \code{TRUE} or \code{FALSE} indicating whether or not to coerce all of text to lowercase while tokenizing.  If \code{TRUE}, \code{text} will be coerced to lowercase.  Defaults to \code{TRUE}.
 #' @param stemWords \code{TRUE} or \code{FALSE} indicating whether or not to stem resulting tokens.  If \code{TRUE}, the outputted tokens will be tokenized using \code{SnowballC::wordStem()}.  Defaults to \code{TRUE}.

diff --git a/R/lexRankFromSimil.R b/R/lexRankFromSimil.R
@@ -1,12 +1,12 @@
 #' Compute LexRanks from pairwise sentence similarities
 
 #' @description Compute LexRanks from sentence pair similarities using the page rank algorithm or degree centrality the methods used to compute lexRank are discussed in "LexRank: Graph-based Lexical Centrality as Salience in Text Summarization."
-#' @param s1 A character vector of sentence IDs corresponding to the \code{s2} and \code{simil} arguemants.
-#' @param s2 A character vector of sentence IDs corresponding to the \code{s1} and \code{simil} arguemants.
-#' @param simil A numeric vector of similiarity values that represents the similiarity between the sentences represented by the IDs in \code{s1} and \code{s2}.
+#' @param s1 A character vector of sentence IDs corresponding to the \code{s2} and \code{simil} arguments
+#' @param s2 A character vector of sentence IDs corresponding to the \code{s1} and \code{simil} arguments
+#' @param simil A numeric vector of similarity values that represents the similarity between the sentences represented by the IDs in \code{s1} and \code{s2}.
 #' @param threshold The minimum simil value a sentence pair must have to be represented in the graph where lexRank is calculated.
 #' @param n The number of sentences to return as the extractive summary.  The function will return the top \code{n} lexRanked sentences.  See \code{returnTies} for handling ties in lexRank.
-#' @param returnTies \code{TRUE} or \code{FALSE} indicating whether or not to return greater than \code{n} sentence IDs if there is a tie in lexRank.  If \code{TRUE}, the returned number of sentences will not be limited to \code{n}, but rather will return every sentece with a top 3 score.  If \code{FALSE}, the returned number of sentences will be \code{<=n}. Defaults to \code{TRUE}.
+#' @param returnTies \code{TRUE} or \code{FALSE} indicating whether or not to return greater than \code{n} sentence IDs if there is a tie in lexRank.  If \code{TRUE}, the returned number of sentences will not be limited to \code{n}, but rather will return every sentence with a top 3 score.  If \code{FALSE}, the returned number of sentences will be \code{<=n}. Defaults to \code{TRUE}.
 #' @param usePageRank \code{TRUE} or \code{FALSE} indicating whether or not to use the page rank algorithm for ranking sentences.  If \code{FALSE}, a sentences unweighted centrality will be used as the rank.  Defaults to \code{TRUE}.
 #' @param damping The damping factor to be passed to page rank algorithm.  Ignored if \code{usePageRank} is \code{FALSE}.
 #' @param continuous \code{TRUE} or \code{FALSE} indicating whether or not to use continuous LexRank.  Only applies if \code{usePageRank==TRUE}.  If \code{TRUE}, \code{threshold} will be ignored and lexRank will be computed using a weighted graph representation of the sentences. Defaults to \code{FALSE}.

diff --git a/R/sentenceSimil.R b/R/sentenceSimil.R
@@ -5,9 +5,9 @@ NULL
 #' Compute distance between sentences
 
 #' @description Compute distance between sentences using modified idf cosine distance from "LexRank: Graph-based Lexical Centrality as Salience in Text Summarization".  Output can be used as input to \code{\link{lexRankFromSimil}}.
-#' @param sentenceId A character vector of sentence IDs corresponding to the \code{docId} and \code{token} arguemants.
-#' @param token A character vector of tokens corresponding to the \code{docId} and \code{sentenceId} arguemants.
-#' @param docId A character vector of document IDs corresponding to the \code{sentenceId} and \code{token} arguemants.  Can be \code{NULL} if \code{sentencesAsDocs} is \code{TRUE}.
+#' @param sentenceId A character vector of sentence IDs corresponding to the \code{docId} and \code{token} arguments
+#' @param token A character vector of tokens corresponding to the \code{docId} and \code{sentenceId} arguments
+#' @param docId A character vector of document IDs corresponding to the \code{sentenceId} and \code{token} arguments.  Can be \code{NULL} if \code{sentencesAsDocs} is \code{TRUE}.
 #' @param sentencesAsDocs \code{TRUE} or \code{FALSE}, indicating whether or not to treat sentences as documents when calculating tfidf scores. If \code{TRUE}, inverse document frequency will be calculated as inverse sentence frequency (useful for single document extractive summarization)
 #' @return A 3 column dataframe of pairwise distances between sentences. Columns: \code{sent1} (sentence id), \code{sent2} (sentence id), & \code{dist} (distance between \code{sent1} and \code{sent2}).
 #' @references \url{http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html}

diff --git a/R/sentenceTokenParse.R b/R/sentenceTokenParse.R
@@ -3,7 +3,7 @@
 #' @description Parse a character vector of documents into into both sentences and a clean vector of tokens.  The resulting output includes IDs for document and sentence for use in other \code{lexRank} functions.
 #' @param text A character vector of documents to be parsed into sentences and tokenized.
 #' @param docId A character vector of document Ids the same length as \code{text}.  If \code{docId=="create"} document Ids will be created.
-#' @param removePunc \code{TRUE} or \code{FALSE} indicating whether or not to remove punctuation from \code{text} while tokenizing.  If \code{TRUE}, puncuation will be removed.  Defaults to \code{TRUE}.
+#' @param removePunc \code{TRUE} or \code{FALSE} indicating whether or not to remove punctuation from \code{text} while tokenizing.  If \code{TRUE}, punctuation will be removed.  Defaults to \code{TRUE}.
 #' @param removeNum \code{TRUE} or \code{FALSE} indicating whether or not to remove numbers from \code{text} while tokenizing.  If \code{TRUE}, numbers will be removed.  Defaults to \code{TRUE}.
 #' @param toLower \code{TRUE} or \code{FALSE} indicating whether or not to coerce all of \code{text} to lowercase while tokenizing.  If \code{TRUE}, \code{text} will be coerced to lowercase.  Defaults to \code{TRUE}.
 #' @param stemWords \code{TRUE} or \code{FALSE} indicating whether or not to stem resulting tokens.  If \code{TRUE}, the outputted tokens will be tokenized using \code{SnowballC::wordStem()}.  Defaults to \code{TRUE}.

diff --git a/R/tokenize.R b/R/tokenize.R
@@ -3,7 +3,7 @@ utils::globalVariables(c("smart_stopwords"))
 
 #' Parse the elements of a character vector into a list of cleaned tokens.
 #' @param text The character vector to be tokenized
-#' @param removePunc \code{TRUE} or \code{FALSE} indicating whether or not to remove punctuation from \code{text}.  If \code{TRUE}, puncuation will be removed.  Defaults to \code{TRUE}.
+#' @param removePunc \code{TRUE} or \code{FALSE} indicating whether or not to remove punctuation from \code{text}.  If \code{TRUE}, punctuation will be removed.  Defaults to \code{TRUE}.
 #' @param removeNum \code{TRUE} or \code{FALSE} indicating whether or not to remove numbers from \code{text}.  If \code{TRUE}, numbers will be removed.  Defaults to \code{TRUE}.
 #' @param toLower \code{TRUE} or \code{FALSE} indicating whether or not to coerce all of \code{text} to lowercase.  If \code{TRUE}, \code{text} will be coerced to lowercase.  Defaults to \code{TRUE}.
 #' @param stemWords \code{TRUE} or \code{FALSE} indicating whether or not to stem resulting tokens.  If \code{TRUE}, the outputted tokens will be tokenized using \code{SnowballC::wordStem()}.  Defaults to \code{TRUE}.

diff --git a/README.md b/README.md
@@ -13,11 +13,11 @@ devtools::install_github("AdamSpannbauer/lexRankr")
 ```
 
 ## Overview
-lexRankr is an R implementation of the LexRank algorithm discussed by Güneş Erkan & Dragomir R. Radev in [LexRank: Graph-based Lexical Centrality as Salience in Text Summarization](http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html).  LexRank is designed to summarize a cluster of documents by proposing which sentences subsume the most information in that particular set of documents.  The algorithm may not perform well on a set of unclustered/unrelated set of documents.  As the white paper's title suggests, the sentences are ranked based on their centrality in a graph.  The graph is built upon the pairwise similarities of the sentences (where similarity is measured with a modified idf cosine similiarity function).  The paper describes multiple ways to calculate centrality and these options are available in the R package.  The sentences can be ranked according to their degree of centrality or by using the Page Rank algorithm (both of these methods require setting a minimum similarity threshold for a sentence pair to be included in the graph).  A third variation is Continuous LexRank which does not require a minimum similarity threshold, but rather uses a weighted graph of sentences as the input to Page Rank.
+lexRankr is an R implementation of the LexRank algorithm discussed by Güneş Erkan & Dragomir R. Radev in [LexRank: Graph-based Lexical Centrality as Salience in Text Summarization](http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html).  LexRank is designed to summarize a cluster of documents by proposing which sentences subsume the most information in that particular set of documents.  The algorithm may not perform well on a set of unclustered/unrelated set of documents.  As the white paper's title suggests, the sentences are ranked based on their centrality in a graph.  The graph is built upon the pairwise similarities of the sentences (where similarity is measured with a modified idf cosine similarity function).  The paper describes multiple ways to calculate centrality and these options are available in the R package.  The sentences can be ranked according to their degree of centrality or by using the Page Rank algorithm (both of these methods require setting a minimum similarity threshold for a sentence pair to be included in the graph).  A third variation is Continuous LexRank which does not require a minimum similarity threshold, but rather uses a weighted graph of sentences as the input to Page Rank.
 
 *note: the lexrank algorithm is designed to work on a cluster of documents. LexRank is built on the idea that a cluster of docs will focus on similar topics*
 
-*note: pairwise sentence similiarity is calculated for the entire set of documents passed to the function.  This can be a computationally instensive process (esp with a large set of documents)*
+*note: pairwise sentence similarity is calculated for the entire set of documents passed to the function.  This can be a computationally instensive process (esp with a large set of documents)*
 
 ## Basic Usage
 ```r

diff --git a/man/lexRank.Rd b/man/lexRank.Rd
diff --git a/man/lexRankFromSimil.Rd b/man/lexRankFromSimil.Rd
diff --git a/man/sentenceSimil.Rd b/man/sentenceSimil.Rd
diff --git a/man/sentenceTokenParse.Rd b/man/sentenceTokenParse.Rd