Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update confluence.py to return spaces between elements #5383

Merged
merged 3 commits into from
Jun 3, 2023

Conversation

gardner
Copy link
Contributor

@gardner gardner commented May 29, 2023

Update confluence.py to return spaces between elements like headers and links.

Please see https://stackoverflow.com/questions/48913975/how-to-return-nicely-formatted-text-in-beautifulsoup4-when-html-text-is-across-m

Given:

<address>
        183 Main St<br>East Copper<br>Massachusetts<br>U S A<br>
        MA 01516-113
    </address>

The document loader currently returns:

'183 Main StEast CopperMassachusettsU S A        MA 01516-113'

After this change, the document loader will return:

183 Main St East Copper Massachusetts U S A MA 01516-113

@eyurtsev would you prefer this to be an option that can be passed in?

Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thanks

@gardner
Copy link
Contributor Author

gardner commented May 29, 2023

I have applied the black formatting in a second commit.

@dev2049 dev2049 added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label May 30, 2023
@dev2049 dev2049 requested a review from eyurtsev May 31, 2023 00:00
@eyurtsev
Copy link
Collaborator

eyurtsev commented Jun 2, 2023

This looks good as is

@hwchase17 hwchase17 merged commit b81f98b into langchain-ai:master Jun 3, 2023
Undertone0809 pushed a commit to Undertone0809/langchain that referenced this pull request Jun 19, 2023
…5383)

# Update confluence.py to return spaces between elements like headers
and links.

Please see
https://stackoverflow.com/questions/48913975/how-to-return-nicely-formatted-text-in-beautifulsoup4-when-html-text-is-across-m

Given:

```html
<address>
        183 Main St<br>East Copper<br>Massachusetts<br>U S A<br>
        MA 01516-113
    </address>
```

The document loader currently returns:

```
'183 Main StEast CopperMassachusettsU S A        MA 01516-113'
```

After this change, the document loader will return:

```
183 Main St East Copper Massachusetts U S A MA 01516-113
```


@eyurtsev would you prefer this to be an option that can be passed in?
This was referenced Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm PR looks good. Use to confirm that a PR is ready for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants