Skip to content

Commit

Permalink
docs: add extract_metrics to docs and readme
Browse files Browse the repository at this point in the history
  • Loading branch information
HLasse committed Jan 5, 2023
1 parent 5cb931f commit 163bee5
Show file tree
Hide file tree
Showing 4 changed files with 61 additions and 5 deletions.
27 changes: 24 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,33 @@ A Python library for calculating a large variety of statistics from text(s) usin

* Version 2.0 out with a new API, a new component, updated documentation, and tutorials! Components are now called by "`textdescriptives/{metric_name}`. New `coherence` component for calculating the semantic coherence between sentences. See the [documentation](https://github.com/HLasse/TextDescriptives) for tutorials and more information!



# ⚡ Quick Start

Import the library and add the component(s) to your pipeline using the standard spaCy syntax. Available components are *descriptive_stats*, *readability*, *dependency_distance*, *pos_proportions*, *coherence*, and *quality* prefixed with `textdescriptives/`.
Use `extract_metrics` to quickly extract your desired metrics. Available metrics are `["descriptive_stats", "readability", "dependency_distance", "pos_proportions", "coherence", "quality]`

If you want to add all components you can use the shorthand `textdescriptives/all`.
Set the `spacy_model` parameter to specify which spaCy model to use, otherwise, TextDescriptives will auto-download an appropriate one based on `lang`. If `lang` is set, `spacy_model` is not necessary and vice versa.

Specify which metrics to extract in the `metrics` argument. `None` extracts all metrics.

```py
import textdescriptives as td

text = "The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it."
# will automatically download ´en_core_web_lg´ and extract all metrics
df = td.extract_metrics(text=text, lang="en")

# specify spaCy model and which metrics to extract
df = td.extract_metrics(text=text, spacy_model="en_core_web_sm", metrics=["readability", "coherence"])
```


## Usage with spaCy

To integrate with other spaCy pipelines, import the library and add the component(s) to your pipeline using the standard spaCy syntax. Available components are *descriptive_stats*, *readability*, *dependency_distance*, *pos_proportions*, *coherence*, and *quality* prefixed with `textdescriptives/`.

If you want to add all components you can use the shorthand `textdescriptives/all`.

```py
import spacy
Expand All @@ -39,7 +60,7 @@ doc._.readability
doc._.token_length
```

TextDescriptives includes convenience functions for extracting metrics to a Pandas DataFrame or a dictionary.
TextDescriptives includes convenience functions for extracting metrics from a `Doc` to a Pandas DataFrame or a dictionary.

```py
td.extract_dict(doc)
Expand Down
12 changes: 12 additions & 0 deletions docs/extractors.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Extractor
--------------------

The extractors are used to extract the features from the document. :code:`extract_metrics` is meant to be called on raw texts, whereas :code:`extract_df` and :code:`extract_dict` work on spaCy documents (:code:`Doc`).


API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autofunction:: textdescriptives.extractors.extract_metrics
.. autofunction:: textdescriptives.extractors.extract_df
.. autofunction:: textdescriptives.extractors.extract_dict
3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ please use the discussion Forums.

.. toctree::
:maxdepth: 2
:caption: Compoents
:caption: Components
:hidden:

descriptivestats
Expand All @@ -60,6 +60,7 @@ please use the discussion Forums.
posstats
quality
coherence
extractors


.. toctree::
Expand Down
24 changes: 23 additions & 1 deletion docs/usingthepackage.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,29 @@
Quick Start
=======================

Import the library and add the component(s) to your pipeline using the standard spaCy syntax. Available components are :code:`descriptive_stats`, :code:`readability`, :code:`dependency_distance`, :code:`pos_proportions`, :code:`coherence`, and :code:`quality` prefixed with :code:`textdescriptives/`.

Use :code:`extract_metrics` to quickly extract your desired metrics. Available metrics are :code:`["descriptive_stats", "readability", "dependency_distance", "pos_proportions", "coherence", "quality]`

Set the :code:`spacy_model` parameter to specify which spaCy model to use, otherwise, TextDescriptives will auto-download an appropriate one based on :code:`lang`. If :code:`lang` is set, :code:`spacy_model` is not necessary and vice versa.

Specify which metrics to extract in the `metrics` argument. `None` extracts all metrics.

.. code-block:: python
import textdescriptives as td
text = "The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it."
# will automatically download ´en_core_web_lg´ and extract all metrics
df = td.extract_metrics(text=text, lang="en")
# specify spaCy model and which metrics to extract
df = td.extract_metrics(text=text, spacy_model="en_core_web_sm", metrics=["readability", "coherence"])
Usage with spaCy
------------------

To integrate with other spaCy pipelines, import the library and add the component(s) to your pipeline using the standard spaCy syntax. Available components are :code:`descriptive_stats`, :code:`readability`, :code:`dependency_distance`, :code:`pos_proportions`, :code:`coherence`, and :code:`quality` prefixed with :code:`textdescriptives/`.
If you want to add all the components you can use the shorthand :code:`textdescriptives/all`.


Expand Down

0 comments on commit 163bee5

Please sign in to comment.