Skip to content

Commit

Permalink
fix setup typo
Browse files Browse the repository at this point in the history
  • Loading branch information
HLasse committed Aug 9, 2021
1 parent ce36172 commit 0d5bd30
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 12 deletions.
21 changes: 10 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
A Python library for calculating a large variety of statistics from text(s) using spaCy v.3 pipeline components and extensions. TextDescriptives can be used to calculate several descriptive statistics, readability metrics, and metrics related to dependency distance. The components are implemented using getters, which means they will only be calculated when accessed.

# 🔧 Installation
`python -m pip install git+https://github.com/HLasse/TextDescriptives.git`
`pip install textdescriptives`

# 📰 News

Expand Down Expand Up @@ -41,9 +41,9 @@ TextDescriptives includes a convenience function for extracting metrics to a Pan
```py
td.extract_df(doc)
```
| | text | token_length_mean | token_length_median | token_length_std | sentence_length_mean | sentence_length_median | sentence_length_std | syllables_per_token_mean | syllables_per_token_median | syllables_per_token_std | n_tokens | n_unique_tokens | percent_unique_tokens | n_characters | n_sentences | flesch_reading_ease | flesch_kincaid_grade | smog | gunning_fog | automated_readability_index | coleman_liau_index | lix | rix | dependency_distance_mean | dependency_distance_std | prop_adjacent_dependency_relation_mean | prop_adjacent_dependency_relation_std |
|---:|:------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------:|----------------------:|-------------------:|-----------------------:|-------------------------:|----------------------:|---------------------------:|-----------------------------:|--------------------------:|-----------:|------------------:|------------------------:|---------------:|--------------:|----------------------:|-----------------------:|--------:|--------------:|------------------------------:|---------------------:|--------:|------:|---------------------------:|--------------------------:|-----------------------------------------:|----------------------------------------:|
| 0 | The world is changed (...) | 3.28571 | 3 | 1.54127 | 7 | 6 | 3.09839 | 1.08571 | 1 | 0.368117 | 35 | 23 | 0.657143 | 121 | 5 | 107.879 | -0.0485714 | 5.68392 | 3.94286 | -2.45429 | -17.6229 | 12.7143 | 0.4 | 1.8019 | 0.599967 | 0.457143 | 0.0722806 |
| | text | token_length_mean | token_length_median | token_length_std | sentence_length_mean | sentence_length_median | sentence_length_std | syllables_per_token_mean | syllables_per_token_median | syllables_per_token_std | n_tokens | n_unique_tokens | proportion_unique_tokens | n_characters | n_sentences | flesch_reading_ease | flesch_kincaid_grade | smog | gunning_fog | automated_readability_index | coleman_liau_index | lix | rix | dependency_distance_mean | dependency_distance_std | prop_adjacent_dependency_relation_mean | prop_adjacent_dependency_relation_std |
|---:|:------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------:|----------------------:|-------------------:|-----------------------:|-------------------------:|----------------------:|---------------------------:|-----------------------------:|--------------------------:|-----------:|------------------:|---------------------------:|---------------:|--------------:|----------------------:|-----------------------:|--------:|--------------:|------------------------------:|---------------------:|--------:|------:|---------------------------:|--------------------------:|-----------------------------------------:|----------------------------------------:|
| 0 | The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it. | 3.28571 | 3 | 1.54127 | 7 | 6 | 3.09839 | 1.08571 | 1 | 0.368117 | 35 | 23 | 0.657143 | 121 | 5 | 107.879 | -0.0485714 | 5.68392 | 3.94286 | -2.45429 | -0.708571 | 12.7143 | 0.4 | 1.69524 | 0.422282 | 0.44381 | 0.0863679 |

Set which group(s) of metrics you want to extract using the `metrics` parameter (one or more of `readability`, `dependency_distance`, `descriptive_stats`, defaults to `all`)

Expand All @@ -56,8 +56,8 @@ td.extract_df(docs, metrics="dependency_distance")
```
| | text | dependency_distance_mean | dependency_distance_std | prop_adjacent_dependency_relation_mean | prop_adjacent_dependency_relation_std |
|---:|:------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------:|--------------------------:|-----------------------------------------:|----------------------------------------:|
| 0 | The world is changed (...) | 1.8019 | 0.599967 | 0.457143 | 0.0722806 |
| 1 | He felt that his whole (...) | 2.56 | 0 | 0.44 | 0 |
| 0 | The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it. | 1.69524 | 0.422282 | 0.44381 | 0.0863679 |
| 1 | He felt that his whole life was some kind of dream and he sometimes wondered whose it was and whether they were enjoying it. | 2.56 | 0 | 0.44 | 0 |

The `text` column can by exluded by setting `include_text` to `False`.

Expand All @@ -74,11 +74,10 @@ docs = nlp.pipe(['Da jeg var atten, tog jeg patent på ild. Det skulle senere vi
td.extract_df(docs, include_text = False)
```

| | token_length_mean | token_length_median | token_length_std | sentence_length_mean | sentence_length_median | sentence_length_std | syllables_per_token_mean | syllables_per_token_median | syllables_per_token_std | n_tokens | n_unique_tokens | percent_unique_tokens | n_characters | n_sentences |
|---:|--------------------:|----------------------:|-------------------:|-----------------------:|-------------------------:|----------------------:|---------------------------:|-----------------------------:|--------------------------:|-----------:|------------------:|------------------------:|---------------:|--------------:|
| 0 | 4.4 | 3 | 2.59615 | 10 | 10 | 1 | 1.65 | 1 | 0.852936 | 20 | 19 | 0.95 | 90 | 2 |
| 1 | 4 | 3.5 | 2.44949 | 6 | 6 | 3 | 1.58333 | 1 | 0.862007 | 12 | 12 | 1 | 53 | 2 |

| | token_length_mean | token_length_median | token_length_std | sentence_length_mean | sentence_length_median | sentence_length_std | syllables_per_token_mean | syllables_per_token_median | syllables_per_token_std | n_tokens | n_unique_tokens | proportion_unique_tokens | n_characters | n_sentences |
|---:|--------------------:|----------------------:|-------------------:|-----------------------:|-------------------------:|----------------------:|---------------------------:|-----------------------------:|--------------------------:|-----------:|------------------:|---------------------------:|---------------:|--------------:|
| 0 | 4.4 | 3 | 2.59615 | 10 | 10 | 1 | 1.65 | 1 | 0.852936 | 20 | 19 | 0.95 | 90 | 2 |
| 1 | 4 | 3.5 | 2.44949 | 6 | 6 | 3 | 1.58333 | 1 | 0.862007 | 12 | 12 | 1 | 53 | 2 |

## Available attributes
The table below shows the metrics included in TextDescriptives and their attribues on spaCy's `Doc`, `Span`, and `Token` objects. For more information, see the docs.
Expand Down
1 change: 0 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@
"Intended Audience :: Science/Research",
"Topic :: Scientific/Engineering",
"Topic :: Text Processing",
"Topic :: NLP",
# Specify the Python versions you support here. In particular, ensure
# that you indicate whether you support Python 2, Python 3 or both.
"Programming Language :: Python :: 3.7",
Expand Down

1 comment on commit 0d5bd30

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage

Coverage Report
FileStmtsMissCoverMissing
textdescriptives
   init.py40100% 
   about.py30100% 
   dataframe_extract.py500100% 
   load_components.py120100% 
textdescriptives/components
   init.py30100% 
   dependency_distance.py320100% 
   descriptive_stats.py51198%105
   readability.py720100% 
   utils.py160100% 
TOTAL243199% 

Please sign in to comment.