Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basics: support multiple formats with a single archiver #6369

Merged
merged 16 commits into from
Apr 5, 2023

Conversation

MaxDesiatov
Copy link
Contributor

@MaxDesiatov MaxDesiatov commented Apr 3, 2023

Motivation:

In certain function signatures only a single archiver instance can be taken as an argument. We need to support multiple archive formats, such as .zip and .tar.gz, for example in cross-compilation destination bundles. Instead of updating those functions to take multiple archivers, creating a single type that can support multiple formats would be a better solution.

Modifications:

Added MultiFormatArchiver that can handle multiple formats by delegating to other archivers it was initialized with. Since most uses of Archiver are either generic or take an existential, this allows us to pass MultiFormatArchiver to functions that previously supported only a single archive format.

rdar://107485048

Results

This is technically NFC as MultiFormatArchiver is not used anywhere yet.

`swift experimental-destination install` subcommand fails when pointed to a destination bundle compressed with `.tar.gz` extension, compressed with tar and gzip. This archival format is vastly superior to zip in the context of cross-compilation destination bundles. In our typical scenario we see that `.tar.gz` archives are at least 4x better than `.zip`.
@MaxDesiatov MaxDesiatov self-assigned this Apr 3, 2023
@MaxDesiatov MaxDesiatov requested a review from tomerd as a code owner April 3, 2023 13:36

var description: String {
switch self {
case .unknownFormat(let ext, let path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not consider using magic? https://github.com/file/file/tree/ec41083645689a787cdd00cb3b5bf578aa79e46c should be possible to use on Windows even.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assume it's installed on Windows? It doesn't seem to be installed on swift Docker images for Ubuntu, so adding a dependency on it will require updating Dockerfiles in the swift-docker repository. This isn't supposed to be an incredibly resilient algorithm, but something that we can still land in 5.9 I hope. So distinguishing by extension should be good enough for now, and we can consider adding support for magic format detection later.

@@ -14,7 +14,7 @@ import TSCBasic
import Dispatch

/// An `Archiver` that handles ZIP archives using the command-line `zip` and `unzip` tools.
public struct ZipArchiver: Archiver, Cancellable {
public final class ZipArchiver: Archiver, Cancellable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally this remains a struct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the base TarArchiver PR, both FileSystem and Cancellator are reference types, and Cancellator in particular holds mutable state, which transitively makes ZipArchiver stateful.


private let formatMapping: [String: any Archiver]

public init(_ archivers: [any Archiver]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not convinced about this API. the client should not care to initialize the "multiplexing" archiver with concrete archivers. Instead, an API that takes the desired "extensions" or "archiveTypes" (enum) and then creates the concrete instances seems a more intent based API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the archivers used by this new archiver are an implementation detail, wouldn't an enum for archive types be a premature generalization? I've changed it to take only any FileSystem and Cancellator to initialize instances of ZipArchiver and TarArchiver, like you suggested. Would that be ok for this initial implementation?

throw Error.noFileNameExtension(archivePath)
}

var extensions = String(filename[firstDot ..< filename.endIndex])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we have an API to extract the extension from a path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in a comment above on the line 53, that AbsolutePath API doesn't support stacked extensions, it returns just gz for .tar.gz archives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll expand the comment to mention the behavior explicitly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add an API that can return multiple extensions?

Copy link
Contributor

@tomerd tomerd Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly ^^ we can add that to basics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, added in the latest commit

@MaxDesiatov MaxDesiatov force-pushed the maxd/multi-format-archiver branch from 0592ba4 to cc88808 Compare April 4, 2023 15:18
@MaxDesiatov MaxDesiatov requested a review from tomerd April 4, 2023 15:29

/// An `Archiver` that handles multiple formats by delegating to other existing archivers each dedicated to its own
/// format.
public final class UniversalArchiver: Archiver {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a value type

self.supportedExtensions = supportedExtensions
}

private func archiver(for archivePath: AbsolutePath) throws -> any Archiver {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@@ -14,7 +14,7 @@ import TSCBasic
import Dispatch

/// An `Archiver` that handles ZIP archives using the command-line `zip` and `unzip` tools.
public struct ZipArchiver: Archiver, Cancellable {
public final class ZipArchiver: Archiver, Cancellable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this change is necessary

@MaxDesiatov MaxDesiatov force-pushed the maxd/multi-format-archiver branch from 844994c to d34bb91 Compare April 4, 2023 17:04
Added `MultiFormatArchiver` that can handle multiple formats by delegating to other archivers it was initialized with. Since most uses of `Archiver` are either generic or take an existential, this allows us to pass `MultiFormatArchiver` that previously supported only a single archive format.

rdar://107485048

# Conflicts:
#	Sources/Basics/MultiFormatArchiver.swift
@MaxDesiatov MaxDesiatov force-pushed the maxd/multi-format-archiver branch from d34bb91 to 00fb477 Compare April 4, 2023 17:05
…xd/multi-format-archiver

# Conflicts:
#	Sources/Basics/CMakeLists.txt
#	Tests/BasicsTests/Archiver/Inputs/archive.tar.gz
#	Tests/BasicsTests/Archiver/Inputs/invalid_archive.tar.gz
@MaxDesiatov MaxDesiatov changed the base branch from maxd/tar-archives to main April 4, 2023 20:55
@MaxDesiatov
Copy link
Contributor Author

@swift-ci smoke test

@MaxDesiatov MaxDesiatov enabled auto-merge (squash) April 4, 2023 20:58
@MaxDesiatov
Copy link
Contributor Author

@swift-ci smoke test

@MaxDesiatov
Copy link
Contributor Author

@swift-ci test windows

@MaxDesiatov
Copy link
Contributor Author

@swift-ci smoke test macos

@MaxDesiatov
Copy link
Contributor Author

@swift-ci test windows

@MaxDesiatov
Copy link
Contributor Author

@swift-ci smoke test macos

@MaxDesiatov MaxDesiatov merged commit e21df87 into main Apr 5, 2023
MaxDesiatov added a commit that referenced this pull request Apr 5, 2023
In certain function signatures only a single archiver instance can be taken as an argument. We need to support multiple archive formats, such as `.zip` and `.tar.gz`, for example in cross-compilation destination bundles. Instead of updating those functions to take multiple archivers, creating a single type that can support multiple formats would be a better solution.

Added `MultiFormatArchiver` that can handle multiple formats by delegating to other archivers it was initialized with. Since most uses of `Archiver` are either generic or take an existential, this allows us to pass `MultiFormatArchiver` to functions that previously supported only a single archive format.

This is technically NFC as `MultiFormatArchiver` is not used anywhere yet.

rdar://107485048
@MaxDesiatov MaxDesiatov deleted the maxd/multi-format-archiver branch April 5, 2023 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants