-
-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keeps re-creating my embeddings. #496
Comments
@bbecausereasonss are you talking about the "save" by the API key? There isn't currently a method for manually saving. However, checking the developer console logs should let us know whether the file is being saved. It should say something like "Saved in XXXXms". |
No I mean the save .smart-connections data |
That button simply saves the setting. It's a manual save for the setting because if you change the folder, it needs to rename it, which you don't want to trigger automatically. |
|
I have the same problem. My vault is saved to iCloud rather than dropbox and I assume it has something to do with the file dates. I'm wondering if iCloud/DropBox messes with the dates on files so Smart Connections things that the embeddings are out of date. I came here to see if that was the case. Not found anything yet, but perhaps this will give people a clue? |
Same issue and I am using Obisidan Sync but also generate daily journal entries and have about 8k notes in the vault. I'm spending much more time with degraded Obsidian performance while it embeds smart notes on the same folder every day than I actually use the smart connections features. Juice isn't worth the squeeze. Didn't do this before the latest Obsidian update. |
@robwheatley @DantesHub @sh4d0wl3ss Keep an eye on the dev console during the embedding process. If there's an error with saving, screenshot and I can fix that. If everything saves correctly, then something else is clearing the embeddings. Check the dev console immediately after startup to check for loading issues. Otherwise, it might be something that's specific to your setup. So also include any third-party syncing that you're using, like how @robwheatley mentioned he is using iCloud. Thanks for your help in solving this issue, |
Also, which operating system you're using will help narrow down the issue. |
Hi @brianpetro I'm new to Obsidian and didn't realise that there was a dev console. I've just taken a look and have spotted a few errors being thrown out by smart connections. Not sure if this helps you, but I can provide more info on request... I'm on oSX 14.3.1 and Obsidian 1.5.8 and smart connections 1.0.128. Also, as mentioned, I have my vault in iCloud. As a test, I moved my vault out of iCloud and onto my regular drive. When I did that, the dev log did look a little different - I more entries for "Embedded X inputs..." but I still got the final undefined 'last_history' at the end of the log like below. Also, on restart of Obsidian, all my notes required embedding again - so it doesn't look like an iCloud specific problem. Here is the log when running on iCloud... Edit: Please only upload screenshots of logs |
@robwheatley if you could screenshot the console, that would be much easier for me to look through. |
@brianpetro here is the 1st screenshot - taken after start-up with the debugging on. The 2nd screenshot is from after I clicked the 'create embeddings' button up to the point it finished |
Does anyone that this has happened to remember hitting the pause button prior to losing the embeddings? It seems like under some conditions, like pausing/restarting, multiple embedding processes could be executed at once. This is visualized by the denominator in the progress notification changing between multiple values. So far this has only happened once for me during development, so it'll require more testing. There could be some other situations where this happens. I'm continuing to investigate. |
I'm 99% sure I didn't pause. OK, 98% |
I hit the pause button once, but that was weeks ago and it's been happening ever since. |
@robwheatley @bbecausereasonss thanks for letting me know! 🌴 |
Thinking out loud here: Something else that was recently changed was replacing a file hash (b/c incompatible with mobile) with checking both the files size and last change time. In theory, this should not cause a noticeable difference because even if the time is modified by some other process, the file size should stay the same. But, in practice, maybe the file size is also being altered even without changing the note. I'm going to need to come up with some sort of test for this. 🌴 |
I've been doing some messing about with a new Vault saved directly onto my hard drive. Clean install, with only the Smart Connections plugin installed. And I've been adding notes to that to see what it does.... I think what's happening is that when the embeddings.ajson file is loaded on start-up, the JSAON parser doesn't like something and reports the error below. That results in the embedding.ajson file being deleted, so you have to start the embedding again.
I don't know what the bad character is - I can't see anything obviously wrong with the file (I can do more testing later) and I don't know what put it here in the first place. I wonder if I could hack something to prevent the file from being deleted when it discovers the error to see what's going on....? |
@robwheatley good catch! That error doesn't specifically delete the file, but the embeddings fail to load and then the reprocessing overwrites the existing file. Solving the source of the issue: I'm thinking it's this line
Specifically, I'll get this change shipped in the next update, today if I can fit it in. Another thing that can help situations like this: saving the file so that records (or batches of records) are separated by newlines. This way the erroneous record/batch can be thrown out while preserving the rest. This would likely have a negative impact on start-up performance, but would still probably be worth it to prevent this embedding-rewrite headache. 🌴 |
@brianpetro It's great that you are looking into this. I've just spent the last few hours seeing if I could add anymore info. I went down a bit of a rabbithole TBH! From my clean install, I started to add notes in from my 'real' vault. I wondered if a particular note was causing the issue. After lots of messing about I thought I found something. When I added a specific note, I started to get errors. But it turned out to be nothing special about that note. If I just added 'one more note' of any sort, I would cause the issue. Basically, I got in the situation where I had 236 notes, but adding a 237th would make things fail. I then started to look other things, because adding the 237th note doesn't 100% reproduce the problem. So I then started to add more content to notes when I just had 236. I'm not convinced that this actually got me anywhere though!! I did run into a few odd things along the way though. For example, when I added a new note, I got the alert to say that it was being embedded, but the alert never went away, even though I could see in the console that some sort of embedding had been done. Sometimes I saw a time-out on this single file (dunno why, I was using a super simple embedding and I'm on a speedy machine). Also, on start-up, I sometimes get asked if I want to re-embed all my notes, even though I can see there is a valid embeddings file and there has been no parsing errors. Quit and restart sorts that out (on next run, I'm not asked to re-embed). I'm not sure how you are keeping track of what's been embedded or not. Maybe the embeddings file itself, and these issues were being caused by file-on-disk mismatches. No idea, and I realise these ramblings won't help! I'm super-keen to get this plugin working though. I've only just moved to Obsidian, and although I'm putting some structure in place for new notes, the old ones I have imported are a mess, so this would be really useful! |
@robwheatley thanks for sharing all that! Your rabbit hole can be my gold mine. It's not very often (considering how many people have downloaded Smart Connections) that I get such detailed feedback 😊 You definitely managed to point out some curiosities. Separating meta data from embeddings files is something I've played around with in the past, and could be a way to thwart some of these issues. If you ever have a note that seems to cause an issue, but you can't figure out why, please do share the note with me. If you need a private channel to do so, I can accommodate. But being able to see some of these issues myself can be invaluable to the debugging process. There is still a lot of legacy code in v2.0, but I'm continuing to modularize the processes, enabling useful test processes, so the stability will only improve (though things tend to get worse before they get better...). More importantly, these design decisions should also allow contributing by community members long into the future. PS- all these GitHub issues end up in my personal obsidian vault, and Smart Connections enables me to resurface them at the right times. So any notes you make on your experience, much like what you just shared, will be useful even if they aren't specifically addressed right away. Thanks for your help in making Smart Connections better! |
@robwheatley @bbecausereasonss @DantesHub @sh4d0wl3ss latest update |
@brianpetro No joy I'm afraid. I updated to the latest version, scrubbed everything and started embedding from scratch. My local vault created the embeddings file, but on restart wants me to re-do them all again (even though there is a valid 2Meg file there, that doesn't get overwritten). I can get more from the logs on this later. My iCloud vault created the embeddings (took at lot longer as I have 4x more notes in this one), it said that it saved the file in the console, but the saved file was zero bytes, and obviously I get asked to re-do them on restart. Must be something else causing the issue. I don't think I will have much time to play tonight, but will over the weekend if you don't work it out before then.... |
@robwheatley bummer, but thanks for letting me know. When you get a chance, let me know if you're still seeing the same error or if it's something new. 🌴 |
@robwheatley @bbecausereasonss @DantesHub @sh4d0wl3ss released another update |
@brianpetro I tried .30 and it looked liked it worked, but saw a couple of funny things. I wasn't paying much attention as I was waiting in the car for my daughter. Just got back in the house and seen that you have released a .31 version. Happy to report that I've hit no issues so far (local or iCloud). To flex it a bit, I've just switched to a beefier model to see if a larger file size causes any issues. It's chugging away as I type and I can see that you are saving the file after a few notes have been processed and recording the file sizes as you go. At least I can see the size increasing, so that's encouraging! YES! IT WORKED - Nice one!! |
@brianpetro Spoke too soon. It creates the notes embeddings fine, but it's failing to create block embeddings at the moment. I'm getting this error. Just thought I should let you know.
|
@robwheatley, thanks for the update! Seems like we're at least making some progress 😊 Please toggle on this option: It makes the logs provide useful line numbers (the other ones are based on a compiled file). I just made an update ( Thanks for your help |
I've just deleted the previous comment saying that all is well. My embeddings file just got overwritten with an empty file after a quit and restart. I wasn't paying attention as I was doing other things at the time. I will keep an eye on things and add more info when I can...Sorry to be giving you bad news on a weekend.. |
@robwheatley you jinxed it! Lol. In the latest version, I added logic so that, when new embeddings are being saved, the disk writes happen in a new temporary file. That new temporary file should only replace the existing "working" file if it is at least 50% of the size of the "working" file. So it's weird that you would end up with a completely empty file. A few things to check:
Thanks for the update |
@bbecausereasonss it might be worth it for you to try this too #528 (reply in thread) |
Thanks for sharing. I'm trying it right now. |
@bbecausereasonss wow that's annoying! I hope we manage to get to the bottom of this soon. It's very frustrating. 🌴 |
I often notice 0kb files in the folder, is this normal? |
Would not let me create the embeddings today, it kept asking me to delete them and going in a loop of failing to save. Nuked the folder started fresh, now no more 0kb files and the size of the embedding file is much larger. I have done this 2x already this past week though so feel like something is going to mess it up again. |
@bbecausereasonss all embedding models also have a max content length, including the OpenAI embeddings, there just isn't a log associated with the truncating. This local model truncating log will be turned off in the next version. The EBUSY error is interesting. It is likely indicating that some external software is blocking access to the file. Possibly while syncing. I won't be able to make any changes today, but I might be able to add some sort of logic to catch that error and retry in the future. 🌴 |
@bbecausereasonss @Hopsakee I just shipped an experimental feature in Instead of one large file, the experimental feature creates a file-per-note. Note: Switching will require re-embedding. If we still have the same issue with the many files, then that would at least narrow the possible issues down to a much smaller range of possibilities. 🌴 |
Awesome. Thank for pushing this. I'll give this a shot. |
Sounds good - I’ll give it a go later (super silly week this week in the day job).
That said, I’ve not had an issue for some time now...
…On 2 Apr 2024 at 18:58 +0100, WFH Brian ***@***.***>, wrote:
@bbecausereasonss @Hopsakee I just shipped an experimental feature in v2.1.23 that you might want to try.
Screenshot.2024-04-02.at.1.32.26.PM.png (view on web)
Instead of one large file, the experimental feature creates a file-per-note.
Note: Switching will require re-embedding.
If we still have the same issue with the many files, then that would at least narrow the possible issues down to a much smaller range of possibilities.
🌴
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hi @quicly Thanks for letting me know about that! That tells me the issue is happening during the "loading" process rather than the "saving" process. What should happen is that, when loading, the last identical line, based on the first part, the file name in quotes, should replace all of the others. An issue in the loading process also explains why re-embedding keeps happening since the previous embedding was never loaded in the first place. It would be helpful if you could open the developer console, disable and re-enable Smart Connections, and screenshot the logs, which should show us some errors. Before doing this, please turn on the "Debug at startup time" setting in the Obsidian Community plugin settings. This will help make sure the logs are as detailed as possible. Thanks for your help in figuring this out, |
I'll do it after my current re-embedding will finish. Right now I disabled block model, because last time note model finished its work while block model just kept creating files again and again. I'm seeing right now in real time how already embedded note is saved again after each iteration with duplicate content. For example. It adds new lines to the file just a minute ago |
@quicly thanks for the screenshots. I'll have to review them further and see if I can find anything that might be causing the issue and get back to you 🌴 |
@quicly After reviewing the logs, I made some updates that might help solve this in version If the latest version doesn't automatically clear things up, I recommend manually deleting the If you're still encountering issues, please screenshot the new errors/logs so I can further investigate the cause. Thanks for your help in figuring this out, |
@quicly interesting, thanks for sharing those. The errors after disabling the plugin can be attributed to the main process being discontinued prior to other processes finishing. Besides that, the other errors are OK and shouldn't cause issues. If you're still on the "Embedding file per note (EXPERIMENTAL)" setting, I would try turning that off now that I have made some updates, which may have solved the reason you turned it on in the first place. Sorry that you're still having trouble with this, I know it's frustrating! 🌴 |
@brianpetro I turned "Embedding file per note" off. At least for now everything's working fine. Moreover, saving time is greatly reduced and there are no freezing while saving, which was the problem even until re-embeddings problem started. UPD: although I'm not sure why, block model doesn't start embedding for now. It happened before and then it just started working, so I'll wait |
This is interesting because for me, after my embedding file got rather large. Turning that feature "ON" is what saved me. |
@bbecausereasonss it seems to depend on a lot of factors. I've been mostly keeping the file per note feature ON, mostly because I believe it will be the default eventually so I want to make sure it's working well. But, it isn't quite ready for mainstream use yet, also lacks some features, so I'm not surprised if some people have issues with it. 🌴 |
@brianpetro
|
@brianpetro Thought I would share the good news. |
🤞😊🌴 |
Everything worked fine for 2 months. But a new update required me to recreate all my embeddings. As I see it right now, it saves everything in separate files, but these files are not stored and are missing after creation. UPD. I tested it more.It is very similar to what @vanishrap describes |
Update to Unfortunately, I screwed up a line of code that prevented the new embeddings from saving. You will need to re-embed again, however, this time they will save using the improved embeddings file system. Thanks for bringing this to my attention |
Thanks for jumping on this so quickly @brianpetro! |
I'm using Obsidian on Desktop/Mac and Sync with Dropbox. My embeddings keep getting re-created, seemingly every day sometimes fully. Not sure why. This never used to happen before. Also when I click 'save' now nothing happens where previous versions used to save an embedding file.
The text was updated successfully, but these errors were encountered: