Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve content text extraction #40

Merged
merged 1 commit into from
Aug 6, 2024
Merged

Conversation

freddyheppell
Copy link
Member

@freddyheppell freddyheppell commented Aug 6, 2024

  • Fix bug where only the first element to be excluded from post content was removed
    • Changed to use el.decompose() instead of el.extract() because this more robustly destroys the element
  • Add tables to the list of elements which should be excluded when extracting post content
  • Improve the test to cover these changes

@freddyheppell freddyheppell merged commit 8e3260f into dev Aug 6, 2024
6 checks passed
@freddyheppell freddyheppell deleted the text-exclude-tables branch August 6, 2024 15:38
freddyheppell added a commit that referenced this pull request Aug 6, 2024
* Add package version attribute (#36)

* Add version attribute to package

* Revert "Hotfix: remove usage of __version__ in docs (#35)"

This reverts commit 641375a.

* add contributing guidelines (#37)

* Add ref to langcodes docs (#38)

* add manual ref to Language class

* fix footnote in start

* make opening to multilingual docs clearer

* Fix element exclusion in text extraction (#40)

* Prepare 1.0.3 release (#41)

* prepare 1.0.3

* fix changelog sections
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant