docs: add extract_metrics to docs and readme

HLasse · Jan 5, 2023 · 163bee5 · 163bee5
1 parent 5cb931f
commit 163bee5
Show file tree

Hide file tree

Showing 4 changed files with 61 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -20,12 +20,33 @@ A Python library for calculating a large variety of statistics from text(s) usin
 
 * Version 2.0 out with a new API, a new component, updated documentation, and tutorials! Components are now called by "`textdescriptives/{metric_name}`. New `coherence` component for calculating the semantic coherence between sentences. See the [documentation](https://github.com/HLasse/TextDescriptives) for tutorials and more information!  
 
+
+
 # ⚡ Quick Start
 
-Import the library and add the component(s) to your pipeline using the standard spaCy syntax. Available components are *descriptive_stats*, *readability*, *dependency_distance*, *pos_proportions*, *coherence*, and *quality* prefixed with `textdescriptives/`. 
+Use `extract_metrics` to quickly extract your desired metrics. Available metrics are `["descriptive_stats", "readability", "dependency_distance", "pos_proportions", "coherence", "quality]`
 
-If you want to add all components you can use the shorthand `textdescriptives/all`.
+Set the `spacy_model` parameter to specify which spaCy model to use, otherwise, TextDescriptives will auto-download an appropriate one based on `lang`. If `lang` is set, `spacy_model` is not necessary and vice versa.
+
+Specify which metrics to extract in the `metrics` argument. `None` extracts all metrics. 
+
+```py
+import textdescriptives as td
 
+text = "The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it."
+# will automatically download ´en_core_web_lg´ and extract all metrics
+df = td.extract_metrics(text=text, lang="en")
+
+# specify spaCy model and which metrics to extract
+df = td.extract_metrics(text=text, spacy_model="en_core_web_sm", metrics=["readability", "coherence"])
+```
+
+
+## Usage with spaCy
+
+To integrate with other spaCy pipelines, import the library and add the component(s) to your pipeline using the standard spaCy syntax. Available components are *descriptive_stats*, *readability*, *dependency_distance*, *pos_proportions*, *coherence*, and *quality* prefixed with `textdescriptives/`. 
+
+If you want to add all components you can use the shorthand `textdescriptives/all`.
 
 ```py
 import spacy
@@ -39,7 +60,7 @@ doc._.readability
 doc._.token_length
 ```
 
-TextDescriptives includes convenience functions for extracting metrics to a Pandas DataFrame or a dictionary.
+TextDescriptives includes convenience functions for extracting metrics from a `Doc` to a Pandas DataFrame or a dictionary.
 
 ```py
 td.extract_dict(doc)

diff --git a/docs/extractors.rst b/docs/extractors.rst
@@ -0,0 +1,12 @@
+Extractor
+--------------------
+
+The extractors are used to extract the features from the document. :code:`extract_metrics` is meant to be called on raw texts, whereas :code:`extract_df` and :code:`extract_dict` work on spaCy documents (:code:`Doc`).
+
+
+API
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autofunction:: textdescriptives.extractors.extract_metrics
+.. autofunction:: textdescriptives.extractors.extract_df
+.. autofunction:: textdescriptives.extractors.extract_dict
diff --git a/docs/index.rst b/docs/index.rst
@@ -51,7 +51,7 @@ please use the discussion Forums.
 
 .. toctree::
    :maxdepth: 2
-   :caption: Compoents
+   :caption: Components
    :hidden:
 
    descriptivestats
@@ -60,6 +60,7 @@ please use the discussion Forums.
    posstats
    quality
    coherence
+   extractors
 
 
 .. toctree::

diff --git a/docs/usingthepackage.rst b/docs/usingthepackage.rst
@@ -1,7 +1,29 @@
 Quick Start
 =======================
 
-Import the library and add the component(s) to your pipeline using the standard spaCy syntax. Available components are :code:`descriptive_stats`, :code:`readability`, :code:`dependency_distance`, :code:`pos_proportions`, :code:`coherence`, and :code:`quality` prefixed with :code:`textdescriptives/`. 
+
+Use :code:`extract_metrics` to quickly extract your desired metrics. Available metrics are :code:`["descriptive_stats", "readability", "dependency_distance", "pos_proportions", "coherence", "quality]`
+
+Set the :code:`spacy_model` parameter to specify which spaCy model to use, otherwise, TextDescriptives will auto-download an appropriate one based on :code:`lang`. If :code:`lang` is set, :code:`spacy_model` is not necessary and vice versa.
+
+Specify which metrics to extract in the `metrics` argument. `None` extracts all metrics. 
+
+.. code-block:: python
+
+   import textdescriptives as td
+
+   text = "The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it."
+   # will automatically download ´en_core_web_lg´ and extract all metrics
+   df = td.extract_metrics(text=text, lang="en")
+
+   # specify spaCy model and which metrics to extract
+   df = td.extract_metrics(text=text, spacy_model="en_core_web_sm", metrics=["readability", "coherence"])
+
+
+Usage with spaCy
+------------------
+
+To integrate with other spaCy pipelines, import the library and add the component(s) to your pipeline using the standard spaCy syntax. Available components are :code:`descriptive_stats`, :code:`readability`, :code:`dependency_distance`, :code:`pos_proportions`, :code:`coherence`, and :code:`quality` prefixed with :code:`textdescriptives/`. 
 If you want to add all the components you can use the shorthand :code:`textdescriptives/all`.