Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed zsync content within AppImage to be used in AppImageUpdate #525

Closed
pamarcos opened this issue Nov 14, 2017 · 25 comments
Closed

Embed zsync content within AppImage to be used in AppImageUpdate #525

pamarcos opened this issue Nov 14, 2017 · 25 comments

Comments

@pamarcos
Copy link
Contributor

Please consider this as an idea. My understanding after reading #238, is that we can actually embed anything in the header of the AppImage without affecting the content of what the AppImage holds (the image itself).

Depending on how big your AppImage and how powerful your machine is, calculating the checksums for the zsync file in order to use AppImageUpdate might not be that trivial. This is probably not a big issue when using the appimageupdate tool manually against your AppImage. However, when creating a self-updateable AppImage, checking regularly for new versions may waste resources for something that you already did a while ago.

For such cases, I think it might be worth to store directly the zsync information of the content of the AppImage so that it can be reused every single time the application checks for an update. Take into account that this would be an option and should never be enforced when creating an AppImage. After all, not every AppImage wants to use AppImageUpdate, right?

PROs:

  • Checking just for updates becomes even more trivial. You could even do something as straightforward as checking directly if remote_zsync == local_zsync without having to calculate any checksum at all. Not the safest option, but still better than comparing MTime

CONs:

  • It increases the size of the AppImage itself. Whether this is worth it or not, depends on the kind of AppImage and I think it's up to the developer to decide

What do you think, @probonopd, @TheAssassin?

@pamarcos pamarcos changed the title Embed zsync content in AppImage to be used in AppImageUpdate Embed zsync content within AppImage to be used in AppImageUpdate Nov 14, 2017
@probonopd
Copy link
Member

probonopd commented Nov 14, 2017

We had briefly thought about even embedding the entire zsync file (which contains the checksums of all blocks) but put that idea on hold because

  • zsync files can be "large" (in the order of hundreds of kilobytes) and (even worse) are variable in size (meaning we can't just reserve X amount of size for it but would need to grow dynamically)
  • it would complicate the format and most likely necessitate an extension of the zsync spec, rendering the original zsnyc tools such as zsyncmake unusable (since we would have to skip an area when doing the zsync checksumming, as we would have to inject the zsync file into the very file it was created on)

Your suggestion solves 1 but not 2. But I definitely can see (and do like) the benefit, especially when optional, e.g., for large files. So I think we should consider this as an optional future addition (although not at a high priority).

Hence, let's discuss here if someone volunteers to implement it (and the needed tools).

@probonopd
Copy link
Member

It increases the size of the AppImage itself.

Not significantly. A SHA-1 looks like this: c6510c2f6bd942fd5227b34ef47c78740eaab62b and always has the same length. So I think this would be more than acceptable a price to pay in terms of file size increase, even when you add the overhead for an ELF section to it.

@TheAssassin
Copy link
Member

As explained in detail on IRC when you suggested this the first time, I am strongly opposed to this idea.

Embedding any of the contents of the .zsync file adds an unnecessary amount of data to the file. If you don't have the .zsync file on the server any more as a second file, one would have to start downloading the actual AppImage file until the .zsync file section is found (i.e., having the ELF header and any section as overhead that has to be downloaded every time). To me, that doesn't make any sense.

For what you really want to achieve, i.e., more efficient update checks, there must be easier and more efficient approaches than embedding that data. (Especially since you don't even need the majority of the header fields, or any checksum block data).

@TheAssassin
Copy link
Member

I don't think that checking the files to be equal is a good idea, but embedding data doesn't do the job either.

Not significantly. A SHA-1 looks like this: c6510c2f6bd942fd5227b34ef47c78740eaab62b and always has the same length. So I think this would be more than acceptable a price to pay in terms of file size increase, even when you add the overhead for an ELF section to it.

There's a lot of problems with this approach, too. First of all, you can't hash a file, then embed the hash, and expect that hash to remain consistent. What we would have to do is skip this section like with the signature section whenever hashing the file. While this would work, it also means that this very specific use case needs to be supported right within libzsync2, which would have to skip that part every time a file is downloaded and composed from several other files. It really makes the algorithms there a lot more complex, and I don't think it's really worth the effort or even better than the MTime based approach. Either you trust the developers to be able to provide a good .zsync file with the correct time stamp, or you should anyway use the safe option.

You always claim that the MTime: header was unsafe. Well, there's 100s of other use cases that use UTC timestamps and rely on the system clock to be set up properly, in which things work perfectly fine. The MTime approach does avoid having to patch any original file, is very fast, easy to calculate, and the only real problem it has is that if the timestamp is tampered with locally (or the system clock of the computer creating the .zsync file initially is incorrect). I see a lot more problems with the "let's embed something" approach, being hard to calculate, needing special support in the tooling, making the composition of the resulting file more complicated etc.

@TheAssassin
Copy link
Member

That said, please don't take this as being opposed to suggestions on making update checks more efficient. I just don't think that your specific suggestion has a sufficient cost benefit factor, and neither does @probonopd's here.

@TheAssassin
Copy link
Member

TheAssassin commented Nov 14, 2017

We (@probonopd and I, that is) have come to the conclusion that embedding some string that could be extracted from the local AppImage and compared with a custom header in the .zsync file might fulfill all constraints we all set up here, while avoiding the "value changes after embedding" problem a hash has.

In theory, it won't matter what exactly is embedded. In contrary to the MTime based approach, the comparison is purely value based (bitwise) instead of trying to compare the order. Therefore, we could just embed any fixed-size string and add a custom header to the .zsync file with this value (something like X-AppImage-...) with this value. Then, you just fetch the header from the .zsync file (without too much overhead, at least way less than skipping >50% of the ELF header), compare it to the local value and trigger an update accordingly.

Here's a few options what could be embedded:

  • UUID: easy to calculate, there's a lot of implementations that we can use
    • type 1 might be unsuited, as it takes the system time into account, and it doesn't really matter whether the machine generating the AppImage is identifiable either
    • type 2 is just random data, and as we don't care about being compatible to some other standard, I don't think we need to make our identifier look like a UUID4
  • just random data
    • could be encoded to 62 characters (ASCII character set a-z, A-Z and 0-9) for human readability

This approach still requires a second file, but as discussed a few times, that's really not a problem.

Feedback welcome.

@pamarcos
Copy link
Contributor Author

pamarcos commented Nov 14, 2017

Not significantly. A SHA-1 looks like this: c6510c2f6bd942fd5227b34ef47c78740eaab62b and always has the same length. So I think this would be more than acceptable a price to pay in terms of file size increase, even when you add the overhead for an ELF section to it.

Well, for my original approach where we embed the zsync content (which is just a checksum per block of data), it directly increases the size to embed as the size of the AppImage gets bigger (more blocks of data).

If you don't have the .zsync file on the server any more as a second file, one would have to start downloading the actual AppImage file until the .zsync file section is found (i.e., having the ELF header and any section as overhead that has to be downloaded every time). To me, that doesn't make any sense.

My idea was to also keep the zsync file on the server separately to avoid doing that craziness, of course.

Feedback welcome.

I'm perfectly fine with that idea as well. We still need to do a few changes to libzsync2 (add new metadata to the header, which we needed to extend in the future anyway) and also to the AppImage format. However, since it's embedding a fixed-size content, it makes the implementation simpler.

@pamarcos
Copy link
Contributor Author

pamarcos commented Nov 14, 2017

Wait, why don't we simply embed the SHA-1 that's already in the zsync file header? That would avoid making any changes into libzsync2 and we would only need to embed a fixed-size checksum into the AppImage header.

We don't need to actually check that the checksum matches (meaning, the hard part of skiping the header bytes, calculating the checksum for real etc). We just need to check that the value in the zsync header is the same as the one embedded into the AppImage. Sold?

@TheAssassin
Copy link
Member

@pamarcos using the SHA-1 hash has a big issue: When you embed it, the resulting file's hash differs, hence the hash in the file and the hash of the file won't ever be equal (except when hitting a hash collision, and I mean, that is unlike). Using the SHA-1 hash has no other advantage except for being available as a header field, it only brings disadvantages with it.

@pamarcos
Copy link
Contributor Author

Ah, of course now we were talking about the SHA1 of the whole AppImage, not only the image itself. Ok, forget about it. So basically we'll introduce a random fixed-size string that will pass to zsyncmake so that it's added into the zsync header file.

@TheAssassin
Copy link
Member

That's a feature for zsyncmake2, actually. When I should get to rewriting it, I'll add such a feature.

@TheAssassin
Copy link
Member

TheAssassin commented Nov 20, 2017

I probably have a day this week to create a zsyncmake2 prototype. As soon as this is ready to use and has an option to add custom headers (maybe we even hardcode that one header, as soon as we decide on a header name), AppImageUpdate will be able to make use of it.

@TheAssassin
Copy link
Member

Update: A development version of zsyncmake2 has been released, which features a custom header functionality.

@pamarcos
Copy link
Contributor Author

Great! Thanks for letting me know. I'll take a look whenever I have some spare time.

@hideout
Copy link

hideout commented Dec 21, 2017 via email

@probonopd
Copy link
Member

Thank you @azubieta and @TheAssassin.

Update: A development version of zsyncmake2 has been released, which features a custom header functionality.

How does it work/how can we test?

@TheAssassin
Copy link
Member

You have tested it in the past, @probonopd. Refer to https://github.com/AppImage/zsync2/releases, download the AppImage, then call it with -h. You'll see the option and how it works in the description.

@probonopd
Copy link
Member

Ah, I was misreading "custom header functionality" as "we are adding a custom header that includes a uuid or something like that".

@azubieta
Copy link
Contributor

I guess that we already include an elf section with update information. @TheAssassin can you please take a look at this?

@TheAssassin
Copy link
Member

I think we can close this issue. I'm not sure where the discussions led, and there's no concrete TODOs in here.

@probonopd
Copy link
Member

As far as I understood, at some point we were talking about checking some GUID that, if matched, would skip time-costly update checks. Or something along those lines. What happened in this area?

@TheAssassin
Copy link
Member

That's been dealt with in the AppImageUpdate repository. We now simply calculate the SHA-1 hash of the local file and compare it to the data in the .zsync file.

@probonopd
Copy link
Member

I think that results in similar advantages to that @pamarcos describes - @pamarcos do you agree?

@pamarcos
Copy link
Contributor Author

Sorry for the late reply. Yes, I think comparing the SHA-1 is better than doing the whole zsync algorithm over all chunks and it's the de facto standard. Since depending on the AppImage size this may be non-negligible, it's up to developers to choose how often to run this update check.

@probonopd
Copy link
Member

Thanks @pamarcos, do you think we can close this ticket then - or else, what is missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants