From 0d5bd302cb12036311216a1a2eb535cf05780500 Mon Sep 17 00:00:00 2001 From: Lasse Date: Mon, 9 Aug 2021 11:38:09 +0200 Subject: [PATCH] fix setup typo --- README.md | 21 ++++++++++----------- setup.py | 1 - 2 files changed, 10 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 6c619108..ea7f8923 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ A Python library for calculating a large variety of statistics from text(s) using spaCy v.3 pipeline components and extensions. TextDescriptives can be used to calculate several descriptive statistics, readability metrics, and metrics related to dependency distance. The components are implemented using getters, which means they will only be calculated when accessed. # 🔧 Installation -`python -m pip install git+https://github.com/HLasse/TextDescriptives.git` +`pip install textdescriptives` # 📰 News @@ -41,9 +41,9 @@ TextDescriptives includes a convenience function for extracting metrics to a Pan ```py td.extract_df(doc) ``` -| | text | token_length_mean | token_length_median | token_length_std | sentence_length_mean | sentence_length_median | sentence_length_std | syllables_per_token_mean | syllables_per_token_median | syllables_per_token_std | n_tokens | n_unique_tokens | percent_unique_tokens | n_characters | n_sentences | flesch_reading_ease | flesch_kincaid_grade | smog | gunning_fog | automated_readability_index | coleman_liau_index | lix | rix | dependency_distance_mean | dependency_distance_std | prop_adjacent_dependency_relation_mean | prop_adjacent_dependency_relation_std | -|---:|:------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------:|----------------------:|-------------------:|-----------------------:|-------------------------:|----------------------:|---------------------------:|-----------------------------:|--------------------------:|-----------:|------------------:|------------------------:|---------------:|--------------:|----------------------:|-----------------------:|--------:|--------------:|------------------------------:|---------------------:|--------:|------:|---------------------------:|--------------------------:|-----------------------------------------:|----------------------------------------:| -| 0 | The world is changed (...) | 3.28571 | 3 | 1.54127 | 7 | 6 | 3.09839 | 1.08571 | 1 | 0.368117 | 35 | 23 | 0.657143 | 121 | 5 | 107.879 | -0.0485714 | 5.68392 | 3.94286 | -2.45429 | -17.6229 | 12.7143 | 0.4 | 1.8019 | 0.599967 | 0.457143 | 0.0722806 | +| | text | token_length_mean | token_length_median | token_length_std | sentence_length_mean | sentence_length_median | sentence_length_std | syllables_per_token_mean | syllables_per_token_median | syllables_per_token_std | n_tokens | n_unique_tokens | proportion_unique_tokens | n_characters | n_sentences | flesch_reading_ease | flesch_kincaid_grade | smog | gunning_fog | automated_readability_index | coleman_liau_index | lix | rix | dependency_distance_mean | dependency_distance_std | prop_adjacent_dependency_relation_mean | prop_adjacent_dependency_relation_std | +|---:|:------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------:|----------------------:|-------------------:|-----------------------:|-------------------------:|----------------------:|---------------------------:|-----------------------------:|--------------------------:|-----------:|------------------:|---------------------------:|---------------:|--------------:|----------------------:|-----------------------:|--------:|--------------:|------------------------------:|---------------------:|--------:|------:|---------------------------:|--------------------------:|-----------------------------------------:|----------------------------------------:| +| 0 | The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it. | 3.28571 | 3 | 1.54127 | 7 | 6 | 3.09839 | 1.08571 | 1 | 0.368117 | 35 | 23 | 0.657143 | 121 | 5 | 107.879 | -0.0485714 | 5.68392 | 3.94286 | -2.45429 | -0.708571 | 12.7143 | 0.4 | 1.69524 | 0.422282 | 0.44381 | 0.0863679 | Set which group(s) of metrics you want to extract using the `metrics` parameter (one or more of `readability`, `dependency_distance`, `descriptive_stats`, defaults to `all`) @@ -56,8 +56,8 @@ td.extract_df(docs, metrics="dependency_distance") ``` | | text | dependency_distance_mean | dependency_distance_std | prop_adjacent_dependency_relation_mean | prop_adjacent_dependency_relation_std | |---:|:------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------:|--------------------------:|-----------------------------------------:|----------------------------------------:| -| 0 | The world is changed (...) | 1.8019 | 0.599967 | 0.457143 | 0.0722806 | -| 1 | He felt that his whole (...) | 2.56 | 0 | 0.44 | 0 | +| 0 | The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it. | 1.69524 | 0.422282 | 0.44381 | 0.0863679 | +| 1 | He felt that his whole life was some kind of dream and he sometimes wondered whose it was and whether they were enjoying it. | 2.56 | 0 | 0.44 | 0 | The `text` column can by exluded by setting `include_text` to `False`. @@ -74,11 +74,10 @@ docs = nlp.pipe(['Da jeg var atten, tog jeg patent på ild. Det skulle senere vi td.extract_df(docs, include_text = False) ``` -| | token_length_mean | token_length_median | token_length_std | sentence_length_mean | sentence_length_median | sentence_length_std | syllables_per_token_mean | syllables_per_token_median | syllables_per_token_std | n_tokens | n_unique_tokens | percent_unique_tokens | n_characters | n_sentences | -|---:|--------------------:|----------------------:|-------------------:|-----------------------:|-------------------------:|----------------------:|---------------------------:|-----------------------------:|--------------------------:|-----------:|------------------:|------------------------:|---------------:|--------------:| -| 0 | 4.4 | 3 | 2.59615 | 10 | 10 | 1 | 1.65 | 1 | 0.852936 | 20 | 19 | 0.95 | 90 | 2 | -| 1 | 4 | 3.5 | 2.44949 | 6 | 6 | 3 | 1.58333 | 1 | 0.862007 | 12 | 12 | 1 | 53 | 2 | - +| | token_length_mean | token_length_median | token_length_std | sentence_length_mean | sentence_length_median | sentence_length_std | syllables_per_token_mean | syllables_per_token_median | syllables_per_token_std | n_tokens | n_unique_tokens | proportion_unique_tokens | n_characters | n_sentences | +|---:|--------------------:|----------------------:|-------------------:|-----------------------:|-------------------------:|----------------------:|---------------------------:|-----------------------------:|--------------------------:|-----------:|------------------:|---------------------------:|---------------:|--------------:| +| 0 | 4.4 | 3 | 2.59615 | 10 | 10 | 1 | 1.65 | 1 | 0.852936 | 20 | 19 | 0.95 | 90 | 2 | +| 1 | 4 | 3.5 | 2.44949 | 6 | 6 | 3 | 1.58333 | 1 | 0.862007 | 12 | 12 | 1 | 53 | 2 | ## Available attributes The table below shows the metrics included in TextDescriptives and their attribues on spaCy's `Doc`, `Span`, and `Token` objects. For more information, see the docs. diff --git a/setup.py b/setup.py index dbc1fbb5..f406bee1 100644 --- a/setup.py +++ b/setup.py @@ -40,7 +40,6 @@ "Intended Audience :: Science/Research", "Topic :: Scientific/Engineering", "Topic :: Text Processing", - "Topic :: NLP", # Specify the Python versions you support here. In particular, ensure # that you indicate whether you support Python 2, Python 3 or both. "Programming Language :: Python :: 3.7",