Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write a .zip #1442

Open
swankjesse opened this issue Feb 23, 2024 · 21 comments
Open

Write a .zip #1442

swankjesse opened this issue Feb 23, 2024 · 21 comments

Comments

@swankjesse
Copy link
Collaborator

We should design an API to create a .zip file.

See also:
#1408

@swankjesse
Copy link
Collaborator Author

Maybe something like this?

fun BufferedSink.writeZip(
  sourceFileSystem: FileSystem,
  baseDirectory: Path,
)

You’d create a real or fake FileSystem, populate a directory with content, then create a .zip from that content.

One drawback of this API is it’s awkward to create entries from a stream, like an HTTP response.

@swankjesse
Copy link
Collaborator Author

Another option:

fun BufferedSink.writeZip(
  writeContents: FileSystem.() -> Unit,
)

@swankjesse
Copy link
Collaborator Author

swankjesse commented Feb 24, 2024

A couple more considerations:

  • A new ZIP-writing API should allow the caller to supply timestamps. These could come from the originating file, or from the clock, or they could be constant 0 values. What’s the convention for zeroing out timestamps in zips? We should do that.

  • A new ZIP-writing API should allow the caller to configure either COMPRESSION_METHOD_DEFLATED or COMPRESSION_METHOD_STORED for each file.

  • For directory entries, we could always include them, always exclude them, let the user choose, or let the user choose on a case-by-case basis.

I suspect these are a deal-breaker for the APIs that use a FileSystem as the input or builder.

Here’s another API proposal. It ends up looking a lot like Moshi’s JsonUtf8Writer in name & usage.

class ZipWriter(sink: BufferedSink) : Closeable {
  inline fun <T> file(
    file: Path,
    compress: Boolean = true,
    lastModifiedAtMillis: Long? = null,
    lastAccessedAtMillis: Long? = null,
    createdAtMillis: Long? = null,
    writerAction: BufferedSink.() -> T,
  ): T

  fun directory(
    dir: Path,
    lastModifiedAtMillis: Long? = null,
    lastAccessedAtMillis: Long? = null,
    createdAtMillis: Long? = null,
  )
}

inline fun <T> BufferedSink.writeZip(writerAction: ZipWriter.() -> T): T

And a usage example of the above:

FileSystem.SYSTEM.write("greetings.zip".toPath()) {
  writeZip {
    file("hello.txt".toPath()) {
      writeUtf8("Hello World")
    }

    directory("directory".toPath())

    directory("directory/subdirectory".toPath())

    file(
      file = "directory/subdirectory/child.txt".toPath(),
      compress = false,
      lastModifiedAtMillis = Clock.System.now().toEpochMilliseconds(),
    ) {
      writeUtf8("Another file!")
    }
  }
}

@swankjesse
Copy link
Collaborator Author

I think I’d canonicalize input paths by stripping a leading / if present. I think that’s more user-friendly than either crashing or creating a .zip file that includes an absolute path.

@swankjesse
Copy link
Collaborator Author

I think I’d default timestamps to null/absent/0 rather than grabbing the host machine’s time and jamming that in there. Too many tools that produce .zip archives end up with non-deterministic outcomes because their libraries inserted data in the output that the author never really asked for.

@swankjesse
Copy link
Collaborator Author

I think I’d produce .zip files that don’t include directory entries at all by default. I’d only add ’em if the user explicitly asked for them. This creates an escape hatch for developers that want empty directories in their .zip files, without creating a bunch of redundant data otherwise.

@swankjesse
Copy link
Collaborator Author

I think I’d stream output to a BufferedSink, which should make it straightforward to create .zip files on-demand in web services or clients.

@vanniktech
Copy link
Contributor

That API in #1442 (comment) looks really good and would suit most of my needs. I have a bunch of app of which you can export your data. Everything that is a table in my sqlite tables just gets a corresponding json file where I dump all the data. Media files such as videos/images are stored such that they preserve their relative path from Context.filesDir so for instance I'd have inside the zip file attachments/image_1664623103090.jpg file. It would be really amazing if as part of ZipWriter you could also stream files into the zip via a Source, maybe something like this:

class ZipWriter(sink: BufferedSink) : Closeable {
  fun copy(
    source: Source,
    compress: Boolean = true,
  ): T
}

Or would this just be achievable by something like this?

file("attachments/image_1664623103090.jpg".toPath()) { 
   writeAll(fileSystem.source("attachments/image_1664623103090.jpg"))
}

@swankjesse
Copy link
Collaborator Author

swankjesse commented Mar 28, 2024

@vanniktech We could include all kinds of helpers, possibly as extensions.

fun <T> ZipWriter.copy(
    file: Path,
    compress: Boolean = true,
    lastModifiedAtMillis: Long? = null,
    lastAccessedAtMillis: Long? = null,
    createdAtMillis: Long? = null,
    openSource: () -> Source,
): T
copy("attachments/image_1664623103090.jpg".toPath()) { 
  fileSystem.source("attachments/image_1664623103090.jpg")
}

@mipastgt
Copy link

Is there already some functionality to create a simple ZIP file of a directory or any ZIP file at all for native targets (iOS in my case)?

@swankjesse
Copy link
Collaborator Author

@mipastgt not yet!

@kagg886
Copy link

kagg886 commented Feb 11, 2025

Is there any progress in this feature?

@vitorhugods
Copy link

I was this close to giving it a shot, but my main issue is that I also wanted to have it working for JS/browser, and I've seen that zlib is only included in jvmMain and nativeMain.

I wonder what's okio's view on it, @swankjesse.

I'd love to read/write zips on JS/Browser, Android, and mac/iOS. Sure, JS/Browser would need to have it working with an in-memory file system, like FakeFileSystem.

I recently had to write some file-related code for all of these targets and the only place where I had to write platform-specific code other than the in-outputs (paths for Android and native, Uint8Array for JS), was the zipping part.

Having mobile development as my mother-tongue, it does feel weird to have it all in memory, but for JS/Browser it is just another day under the sun.

@swankjesse
Copy link
Collaborator Author

swankjesse commented Feb 12, 2025

I don’t think it’s appropriate for Okio to ship a zlib dependency for Kotlin/JS.

@mipastgt
Copy link

If it is OK for native to have a dependency on zlib why shouldn't it be OK for JS/Wasm to have a dependency on Pako or something similar (e.g., zlib-rs).

@JakeWharton
Copy link
Collaborator

Mostly because the delivery mechanisms are entirely different. One is orders of magnitude less sensitive to an additional megabyte than the other.

@swankjesse
Copy link
Collaborator Author

Most practically, Kotlin/Native and Kotlin/JVM already include zlib. Kotlin/JS doesn’t.

@mipastgt
Copy link

I find it disturbing that a few bytes more are used as an excuse and justification to exclude an important functionality which almost everybody will need sooner or later. You could also make this an optional download. One could also think about the whole web delivery mechanisms. I was thinking the build process would strip any unused code before packaging so that the bytes in the library don't matter. Otherwise the whole web platform doesn't make much sense for me if you can't build any sophisticated software just because everybody wants to save some bytes somewhere.

@mipastgt
Copy link

Maybe this https://developer.mozilla.org/en-US/docs/Web/API/CompressionStream/CompressionStream could also be an option to implement this feature. It's supported in all major browsers.

@swankjesse
Copy link
Collaborator Author

@mipastgt an optional download is a good option. Unfortunately CompressionStream doesn’t work for Okio because Okio is built as a blocking API, and CompressionStream is a non-blocking API.

@mipastgt
Copy link

mipastgt commented Feb 17, 2025

Then I'd opt for the optional download.

By the way, compressed pako is just 58kB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants