Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evernote Document Loader Concatenates All Notes Together #4493

Closed
MikeMcGarry opened this issue May 11, 2023 · 1 comment · Fixed by #4577
Closed

Evernote Document Loader Concatenates All Notes Together #4493

MikeMcGarry opened this issue May 11, 2023 · 1 comment · Fixed by #4577

Comments

@MikeMcGarry
Copy link
Contributor

Feature request

The EverNoteLoader treats an export from Evernote as a very large text document by combining the content from all notes into a single long text string.

It also only saves the name of the export file as metadata on this large document.

This isn't terribly useful when interrogating data from Evernote whereby you might export an entire notebook which contains many notes, see an example notebook export below which has two notes. Ideally we should treat each note as an independent document with it's own richer metadata e.g. created, updated, title etc. to make retrieval more effective.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export4.dtd">
<en-export export-date="20230611T011239Z" application="Evernote" version="10.56.9">
  <note>
    <title>Test</title>
    <created>20230511T011217Z</created>
    <updated>20230511T011228Z</updated>
    <note-attributes>
      <author>Michael McGarry</author>
    </note-attributes>
    <content>
      <![CDATA[<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd"><en-note><div>abc</div></en-note>      ]]>
    </content>
  </note>
  <note>
    <title>Summer Training Program</title>
    <created>20221227T015948Z</created>
    <updated>20221227T020423Z</updated>
    <note-attributes>
      <author>Michael McGarry</author>
      <latitude>{redacted}</latitude>
      <longitude>{redacted}</longitude>
      <altitude>{redacted}</altitude>
      <source>mobile.iphone</source>
    </note-attributes>
    <content>
      <![CDATA[<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd"><en-note><div><b>Jan - March 2022</b></div></en-note>      ]]>
    </content>
  </note>
</en-export>

Motivation

Looking to add a tool to an agent which can interrogate my Evernote journal entries.

Your contribution

I can put together a PR for this.

@MikeMcGarry
Copy link
Contributor Author

@eyurtsev / @dev2049 I have submitted a PR to make this improvement. Thanks for all your work on Langchain, really enjoying using it. Looking forward to your feedback!

dev2049 added a commit that referenced this issue May 19, 2023
# Improve Evernote Document Loader

When exporting from Evernote you may export more than one note.
Currently the Evernote loader concatenates the content of all notes in
the export into a single document and only attaches the name of the
export file as metadata on the document.

This change ensures that each note is loaded as an independent document
and all available metadata on the note e.g. author, title, created,
updated are added as metadata on each document.

It also uses an existing optional dependency of `html2text` instead of
`pypandoc` to remove the need to download the pandoc application via
`download_pandoc()` to be able to use the `pypandoc` python bindings.

Fixes #4493 

Co-authored-by: Mike McGarry <[email protected]>
Co-authored-by: Dev 2049 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant