A key requirement for the success of any developer platform is a way to use automated testing to identify software defects. Better APIs and tools for testing can greatly improve a platform’s quality. Below, we propose a new direction for testing in Swift.
We start by defining our basic principles and describe specific features that embody those principles. We then discuss several design considerations in-depth. Finally, we present specific ideas for delivering an all-new testing solution for Swift, and weigh them against alternatives considered.
Testing in Swift should be approachable by both new programmers and seasoned engineers. There should be few APIs to learn, and they should feel ergonomic and modern. It should be easy to incrementally add new tests alongside legacy ones. Testing should integrate seamlessly into the tools and workflows that people know and use every day.
A good test should be expressive and automatically include actionable information when it fails. It should have a clear name and purpose, and there should be facilities to customize a test’s representation and metadata. Test details should be specified on the test, in code, whenever possible.
A testing library should be flexible and capable of accommodating many needs. It should allow grouping related tests when beneficial, or letting them be standalone. There should be ways to customize test behaviors when necessary, while having sensible defaults. Storing data temporarily during a test should be possible and safe.
A modern testing system should have scalability in mind and gracefully handle large test suites. It should run tests in parallel by default, but allow some tests to opt-out. It should be effortless to repeat a test with different inputs and see granular results. The library should be lightweight and efficient, imposing minimal overhead on the code being tested.
Guided by these principles, there are many specific features we believe are important to consider when designing a new testing system.
- Be easy to learn and use: There should be few individual APIs to memorize, they should have thorough documentation, and using them to write a new test should be fast and seamless. Its APIs should be egonomic and adhere to Swift’s design guidelines.
- Validate expected behaviors or outcomes: The most important job of any testing library is checking that code meets specific expectations—for example, by confirming that a function returns an expected result or that two values are equal. There are many interesting variations on this, such as comparing whole collections or checking for errors. A robust testing system should cover all these needs, while using progressive disclosure to remain simple for common cases.
- Enable incremental adoption: It should gracefully coexist with projects that use XCTest or other testing libraries and allow incremental adoption so that users can transition at their own pace. This is especially important because this new system may take time to achieve feature parity.
- Integrate with tools, IDEs, and CI systems: A useful testing library
requires supporting tools for functionality such as listing and selecting
tests to run, launching runner processes, and collecting results. These
features should integrate seamlessly with common IDEs, SwiftPM’s
swift test
command, and continuous integration (CI) systems.
- Include actionable failure details: Tests provide the most value when they fail and catch bugs, but for a failure to be actionable it needs to be sufficiently detailed. When a test fails, it should collect and show as much relevant information as reasonably possible, especially since it may not reproduce reliably.
- Offer flexible naming, comments, and metadata: Test authors should be able to customize the way tests are presented by giving them an informative name, comments, or assigning metadata like labels to tests which have things in common.
- Allow customizing behaviors: Some tests share common set-up or tear-down logic, which need to be performed once for each test or group. Other times, a test may begin failing for an irrelevant reason and must be temporarily disabled. Some tests only make sense to run under certain conditions, such as on specific device types or when an external resource is available. A modern testing system should be flexible enough to satisfy all these needs, without complicating simpler use cases.
- Allow organizing tests into groups (or not): Oftentimes a component will have several related tests that would make sense to group together. It should be possible to group tests into hierarchies, while allowing simpler tests to remain standalone.
- Support per-test storage: Tests often need to store data while they are running and local variables are not always sufficient. For example, set up logic for a test may create a value the test needs to access, but these are in different scopes. There must be a way to carefully store per-test data, to ensure it is isolated to a single test and initialized deterministically to avoid unexpected dependencies or failures.
- Allow observing test events: Some use cases require an ability to observe test events—for example, to perform custom reporting or analysis of results. A testing library should offer hooks for event handling.
- Parallelize execution: Many tests can be run in parallel to improve
execution time, either using multiple threads in a single process or multiple
runner processes. A testing library should offer flexible parallelization
options for eligible tests, encourage parallelizing whenever possible, and
offer granular control over this behavior. It should also leverage Swift’s
data race safety features (such as
Sendable
enforcement) to the fullest extent possible to avoid concurrency bugs. - Repeat a test multiple times with different arguments: Many tests consist of a template with minor variations—for example, invoking a function multiple times with different arguments each time and validating the result of each invocation. A testing library should make this pattern easy to apply, and include detailed reporting so a failure during a single argument is represented clearly.
- Behave consistently across platforms: Any new testing solution should be cross-platform from its inception and support every platform Swift supports. Its observable behaviors should be as consistent as possible across those platforms, especially for core responsibilities such as discovering and executing tests.
Several areas deserve close examination when designing a new testing API. Some, because they may benefit from language or compiler toolchain enhancements to deliver the ideal experience, and others because they have non-obvious reasoning or requirements.
Testing libraries typically offer APIs to compare values—for example, to confirm that a function returns an expected result—and report a test failure if a comparison does not succeed. Depending on the library, these APIs may be called “assertions”, “expectations”, “checks”, “requirements”, “matchers“, or other names. In this document we refer to them as expectations.
For test failures to be actionable, they need to include enough details to understand the problem, ideally without a human manually reproducing the failure and debugging. The most important details relevant to expectation failures are the values being compared or checked and the kind of expectation being performed (e.g. equal, not-equal, less-than, is-not-nil, etc.). Also, if any error was caught while evaluating an expression passed to an expectation, that should be included.
Beyond the values of evaluated expressions, there are other pieces of information that may be useful to capture and include in expectations:
-
The source code location of the expectation, typically using the format
#fileID:#line:#column
. This helps test authors jump quickly to the line of code to view context, and lets IDEs present the failure in their UI at that location. -
The source code text of expression(s) passed to the expectation. In an example expectation API call
myAssertEqual(subject.label == "abc")
, the source code text would be the string"subject.label == \"abc\""
.Even though source code text may not be necessary when viewing failures in an IDE since the code is present, this can still be helpful to confirm the expected source code was evaluated in case it changed recently. It’s even more useful when the failure is shown on a CI website or anywhere without source, since a subexpression (such as
subject.label
in this example) may give helpful clues about the failure. -
Custom user-specified comments. Comments can be helpful to allow test authors to add context or information only needed if there was a failure. They are typically short and included in the textual log output from the test library.
-
Custom data or file attachments. Some tests involve files or data processing and may benefit from allowing expectations to save arbitrary data or files in the results for later analysis.
Since the most important details to include in expectation failure messages are the expression(s) being compared and the kind of expression, some testing libraries offer a large number of specialized APIs for detailed reporting. Here are some examples from other prominent testing libraries:
Java (JUnit) | Ruby (RSpec) | XCTest | |
---|---|---|---|
Equal | assertEquals(result, 3); |
expect(result).to eq(3) |
XCTAssertEqual(result, 3) |
Identical | assertSame(result, expected); |
expect(result).to be(expected) |
XCTAssertIdentical(result, expected) |
Less than or equal | N/A | expect(result).to be <= 5 |
XCTAssertLessThanOrEqual(result, 5) |
Is null/nil | assertNull(actual); |
expect(actual).to be_nil |
XCTAssertNil(actual) |
Throws | assertThrows(E.class, () -> { ... }); |
expect {...}.to raise_error(E) |
XCTAssertThrowsError(...) { XCTAssert($0 is E) } |
Offering a large number of specialized expectation APIs is a common practice
among testing libraries: XCTest has 40+ functions in its
XCTAssert
family;
JUnit has
several dozen;
RSpec has a
large DSL
of test matchers.
Although this approach allows straightforward reporting, it is not scalable:
- It increases the learning curve for new users by requiring them to learn many new APIs and remember to use the correct one in each circumstance, or risk having unclear test results.
- More complex use cases may not be supported—for example, if there is no
expectation for testing that a
Sequence
starts with some prefix usingstarts(with:)
, the user may need a workaround such as adding a custom comment which includes the sequence for the results to be actionable. - It requires testing library maintainers add bespoke APIs supporting many use cases which creates a maintenance burden.
- Depending on the exact function signatures, it may require additional overloads that complicate type checking.
We believe expectations should strive to be as simple as possible and involve few distinct APIs, but be powerful enough to include detailed results for every expression. Instead of offering a large number of specialized expectations, there should only be a few basic expectations and they should rely on ordinary expressions, built-in language operators, and the standard library to cover all use cases.
Expectations have certain rules which must be followed carefully when handling arguments:
- The primary expression(s) being checked should be evaluated exactly once. In particular, if the expectation failed, showing the value of any evaluated expression should not cause the expression to be evaluated a second time. This is to avoid any undesirable or unexpected side effects of multiple evaluations.
- Custom comments or messages should only be evaluated if the expectation failed, and at most once, to similarly avoid undesirable side effects and prevent unnecessary work.
A single test may include multiple expectations, and a testing library must decide whether to continue executing a test after one of its expectations fails. Some tests benefit from always running to completion, even if an earlier expectation failed, since they validate different things and early expectations are unrelated to later ones. Other tests are structured such that later logic depends heavily on the results of earlier expectations, so terminating the test after any expectation fails may save time. Still other tests take a hybrid approach, where only certain expectations are required and should terminate test execution upon failure.
This is a policy decision, and is something a testing library could allow users to control on a global, per-test, or per-expectation basis.
Often, expectation APIs do not preserve raw expression values when reporting a failure, and instead generate a string representation of those values for reporting purposes. Although a string representation is often sufficient, failure presentation could be improved if an expectation were able to keep values of certain, known data types.
As an example, imagine a hypothetical expectation API call
ExpectEqual(image.height, 100)
, where image
is a value of some well-known
graphical image type UILibrary.Image
. Since this uses a known data type, the
expectation could potentially keep image
upon failure and include it in test
results, and then an IDE or other tool could present the image graphically for
easier diagnosis. This capability could be extensible and cross-platform by
using a protocol to describe how to convert arbitrary values into one of the
testing library’s known data types, delivering much richer expectation results
presentation for commonly-used types.
A recurring theme in several of the features discussed above is a need to express additional information or options about individual tests or groups of tests. A few examples:
- Describing test requirements or marking a test disabled.
- Assigning a tag or label to a test, to locate or run those which have something in common.
- Declaring argument values for a parameterized or “data-driven” test.
- Performing common logic before or after a test.
Collectively, these are referred to in this document as traits. The traits for an individual test could be stored in a standalone file, separate from the test definition, but relying on a separate file has known downsides: it can get out of sync if a test name changes, and it’s easy to overlook important details—such as whether a test is disabled or has specific requirements—when they’re stored separately.
We believe that the traits for a single test should preferably be declared in code placed as close to the test they describe as possible to avoid these problems. However, global settings may still benefit from configuring via external files, as there may not be a canonical location in code to place them.
When grouping related tests together, if a test trait is specified both for an individual test and one of its containing groups, it may be ambiguous which option takes precedence. The testing library must establish policies for how to resolve this.
Test traits may fall into different categories in terms of their inheritance behavior. Some semantically represent multiple values that a user would reasonably expect to be added together. One example is test requirements: if a group specifies one requirement, while one of its test functions specifies another, the test function should only run if both requirements are satisfied. The order these requirements are evaluated are worth considering and formally specifying, so that a user could be assured that requirements are always evaluated “outermost-to-innermost” or vice-versa.
Another example is test tags: they are also considered multi-value, but items
with tags are typically expected to have Set
rather than Array
semantics
and ignore duplicates, so for this type of trait the evaluation order is
insignificant.
Other test traits semantically represent a single value and conflicts between
them may be more challenging to resolve. As a hypothetical example, imagine a
test trait spelled .enabled(Bool)
which includes a Bool
that determines
whether a test should run. If a group specifies .enabled(false)
but one of
its test functions specifies .enabled(true)
, which value should be honored?
Arguments could be made for either policy.
When possible, it may be easier to avoid ambiguity: in the previous example,
this may be solved by only offering a .disabled
option and not the opposite.
But the inheritance semantics of each option should be considered, and when
ambiguity is unavoidable, a policy for resolving it should be established and
documented.
A flexible test library should allow certain behaviors to be extended by test authors. A common example is running logic before or after a test: if every test in a certain group requires the same steps beforehand, those steps could be placed in a single method in that group rather than expressed as an option on a particular test. However, if only a few tests within a group require those steps, it may make sense to leverage a test trait to mark those tests individually.
Test traits should provide the ability to extend behaviors to support this workflow. For example, it should be possible to define a custom test trait, and implement hooks that allow it to run custom code before or after a test or group.
Some features require the ability to uniquely identify a test, such as selecting individual tests to run or serializing results. It may also be useful to access the name of a test inside its own body or for an entity observing test events to query test names.
A testing library should include a robust mechanism to uniquely identify tests and identifiers should be stable across test runs. If it is possible to customize a test’s display name, the testing library should decide which name is authoritative and included in the unique identifier. Also, function overloading could make certain test function names ambiguous without additional type information.
A frequent challenge for testing libraries in all languages is the need to locate tests in order to run them. Users typically expect tests to be discovered automatically, without needing to provide a comprehensive list since that would be a maintenance burden.
There are three types of test discovery worth considering in particular, since they serve different purposes:
- At runtime: When a test runner process is launched, the testing library needs to locate tests so it can execute them.
- After a build: After compilation of all test code has completed successfully, but before a test runner process has been launched, it may be useful for a tool to introspect the test build products and print the list of tests or extract other metadata about them without running them.
- While authoring: After tests have been written or edited, but before a build has completed, it is common for an IDE or other tool to statically analyze or index the code and locate tests so it can list them in a UI and allow running them.
Each of these are important to support, and may require different solutions.
Two of the above test discovery types—After a build and While authoring— require the ability to discover tests without launching a runner process, and thus without using the testing library’s runtime logic and models to represent tests. In addition to the IDE use case mentioned above, another reason discovering tests statically may be useful is so CI systems can extract information about tests and use it to optimize test execution scheduling on physical devices. It is common for CI systems to run a different host OS than the platform they are targeting—for example, an Intel Mac building tests for an iOS device—and in those situations it may be impractical or expensive for the CI system to launch a runner process to gather this information.
Note that not all test details are eligible to extract statically: those that enable runtime test behaviors may not be, but trivial metadata (such as a test’s name or whether it is disabled) should be extractable, especially with further advances in Swift’s support for Build-Time Constant Values. While designing a new testing API, it is important to consider which test metadata should be statically extractable to support these non-runtime discovery use cases.
Repeating a test multiple times with different arguments—formally referred to as
Parameterized or Data-Driven Testing
—can allow expanding test coverage to cover more scenarios with minimal code
repetition. Although a user could approximate this using a simple loop such as
for...in
in the body of a test, it’s often better to let testing libraries
handle this task. A testing library can automatically keep track of the
argument(s) for each invocation of a test and record them in the results. It can
also provide a way to selectively re-run individual argument combinations for
fine-grained debugging in case only one instance failed.
Note that recording individual parameterized tests’ arguments in results and re-running them requires some way to uniquely represent those arguments, which overlaps with some of the considerations discussed in Test identity.
A modern testing system should make efficient use of the machine it runs on. Many tests can safely run in parallel, and the testing system should encourage this by enabling per-test parallelization by default. In addition to faster results and shorter iteration time, running tests in parallel can help identify bugs due to hidden dependencies between tests and encourage better state isolation.
However, some tests may need to disable parallelization and run one at a time. It should be possible to opt-out, and this may be especially useful while migrating from older testing systems which don't support parallelization. Although opting-out of this behavior should be possible, it should be narrowly scoped to not sacrifice other tests' ability to run in parallel.
In addition to running tests in parallel relative to each other, tests themselves should seamlessly support Swift's concurrency features. In particular, this means:
- Tests should be able to use async/await whenever necessary.
- Tests should support isolation to a global actor such as
@MainActor
, but be nonisolated by default. (Isolation by default would undermine the goal of running tests in parallel by default.) - Values passed as arguments to parameterized tests should be
Sendable
, since they may cross between isolation domains within the testing system's execution machinery. - Types containing tests functions and their stored properties need not be
Sendable
, since they are only used from a single isolation domain while each test function is run.
A well-rounded testing library should be integrated with popular tools used by the community. This integration should include some essential functionality such as:
- Building tests into products which can be be executed.
- Running all built tests.
- Showing per-test results, including details of each individual failure during a test.
- Showing an aggregate summary of a test run, including failure statistics.
Beyond the essentials, tools may offer other useful features, such as:
- Filtering tests by name, specific traits (e.g. custom tags), or other criteria.
- Outputting results to a standard format such as JUnit XML for importing into other tools.
- Controlling runtime options such as whether parallel execution is enabled, whether failed tests should be reattempted, etc.
- Relaunching a test executable after an unexpected crash.
- Reporting "live" progress as each test finishes.
In order to deliver the functionality above, a testing system needs to track significant events which occur during execution, such as when each test starts and finishes or encounters a failure. There needs to be a structured, machine-readable representation of each event, and to provide granular progress updates, there should be an option for tools to observe a stream of such events. Any serialization formats involved should be versioned and provide some level of stability over time, to ensure compatibility with tools which may evolve on differing release schedules.
Some of the features outlined above are much easier to achieve if there are at least two processes involved: One which executes the tests themselves (which may be unstable and terminate unexpectedly), and another which launches and monitors the first process. In this document, we'll refer to this second process as the harness. A two-process architecture involving a supervisory harness helps enable functionality such as relaunching after a crash, live progress reporting, or outputting results to a standard format. Some cases effectively require a harness, such as when launching tests on a remotely-connected device. A modern testing library should provide any necessary APIs and runtime support to facilitate integration with a harness.
XCTest has historically served as the de facto standard testing library in Swift. It was originally written in 1998 in Objective-C, and heavily embraced that language’s idioms throughout its APIs. It relies on subclassing (reference semantics), dynamic message passing, NSInvocation, and the Objective-C runtime for things like test discovery and execution. In the 2010s, it was integrated deeply into Xcode and given many more capabilities and APIs which have helped Apple platform developers deliver industry-leading software.
When Swift was introduced, XCTest was extended further to support the new language while maintaining its core APIs and overall approach. This allowed developers familiar with using XCTest in Objective-C to quickly get up to speed, but certain aspects of its design no longer embody modern best practices in Swift, and some have become problematic and prevented enhancements. Examples include its dependence on the Objective-C runtime for test discovery; its reliance on APIs like NSInvocation which are unavailable in Swift; the frequent need for implicitly-unwrapped optional (IUO) properties in test subclasses; and its difficulty integrating seamlessly with Swift Concurrency.
It is time to chart a new course for testing in Swift, and in proposing a new direction, this ultimately represents a successor to XCTest. This transition will likely span several years, and we aim to thoughtfully design and deliver a solution that will be even more powerful while bearing in mind the many lessons learned from maintaining it over the years.
Note
The approach described below is not meant to include a solution for every consideration or feature discussed in this document. It describes a starting point for this new direction, and covers many of the topics, but leaves some to be pursued as part of follow-on work.
The new direction includes 3 major components exposed via a new module named
Testing
:
@Test
and@Suite
attached macros: These declare test functions and suite types, respectively.- Traits: Values passed to
@Test
or@Suite
which customize the behavior of test functions or suite types. - Expectations
#expect
and#require
: expression macros which validate expected conditions and report failures.
To declare test functions and suites (types containing tests), we will leverage
Attached Macros (SE-0389).
At a high level, this will consist of several attached macros
which may be placed on a test type or test function, defined in a new module
named Testing
:
/// Declare a test function
@attached(peer)
public macro Test(
// ...Parameters described later
)
/// Declare a test suite.
@attached(member) @attached(peer)
public macro Suite(
// ...Parameters described later
)
Then, test authors may attach these macros to functions or types in a test target. Here are some usage examples:
import Testing
// A test implemented as a global function
@Test func example1() {
// ...
}
@Suite struct BeginnerTests {
// A test implemented as an instance method
@Test func example2() { ... }
}
// Implicitly treated as a suite type, due to containing @Test functions.
actor IntermediateTests {
private var count: Int
init() async throws {
// Runs before every @Test instance method in this type
self.count = try await fetchInitialCount()
}
deinit {
// Runs after every @Test instance method in this type
print("count: \(count), delta: \(delta)")
}
// A test implemented as an async and throws instance method
@Test func example3() async throws {
delta = try await computeDelta()
count += delta
// ...
}
}
Test functions may be defined as global functions or as either instance or
static methods in a type. They must always be explicitly annotated as @Test
,
they need not follow any naming convention (such as beginning with “test”), and
they may include async
, throws
, or mutating
.
Suite types, or simply “suites”, are types containing @Test
functions or
other nested suite types. Suite types may include the @Suite
attribute
explicitly, although it is optional and only required when specifying traits
(described below). A suite type must have a zero-parameter init()
if it
contains instance @Test
methods.
Per-test storage: The IntermediateTests
example demonstrates per-test
set-up and tear-down as well as per-test storage: A unique instance of
IntermediateTests
is created for every @Test
-annotated instance method it
contains, which means that its init
and deinit
are run once before and
after each respectively, and they may contain set-up or tear-down logic. Since
count
is an instance stored property, it acts as per-test storage, and since
example3()
is isolated to its enclosing actor type it is allowed to mutate
count
.
Sendability: Note that the test functions and suite types in these examples
are not required to be Sendable
. At runtime, if the @Test
function is an
instance method, the testing library creates a thunk which instantiates the
suite type and invokes the @Test
function on that instance. The suite type
instance is only accessed from a single Task.
Actor isolation: @Test
functions or types may be annotated with a global
actor (such as @MainActor
), in accordance with standard language and type
system rules. This allows tests to match the global actor of their subject and
reduce the need for suspension points.
To facilitate test discovery, the attached macros above will eventually use a
feature such as @linkage
, an attribute for controlling low-level symbol
linkage (see
pitch).
It will allow placing variables and functions in a special section of the binary,
where they can be retrieved and at runtime or inspected statically at build time.
However, before that feature lands, the testing library will use a temporary approach of iterating through types conforming to a known protocol and gathering their tests by calling a static property. The attached macros will emit code which generates the types to be discovered using this mechanism. Once more permanent support lands, the attached macros will be adjusted to adopt it instead.
Regardless of which technique the attached macros above use to facilitate test
discovery, the APIs called by their expanded code need not be considered public
or stable. In particular, code emitted by these macros may call
underscore-prefixed APIs declared in the Testing
module, which are marked
public
to facilitate use by these macros but are generally considered a
private implementation detail.
As discussed earlier, it is important to support specifying traits for a test.
SE-0389
allows including parameters in an attached macro declaration, and this allows
users to pass arguments to a @Test
attribute on a test function or type.
The Testing
module will offer an extensible mechanism for specifying per-test
traits via types conforming to protocols such as TestTrait
and SuiteTrait
:
/// A protocol describing traits that can be added to a test function or
/// to a test suite.
public protocol Trait: Sendable { ... }
/// A protocol describing traits that can be added to a test function.
public protocol TestTrait: Trait { ... }
/// A protocol describing traits that can be added to a test suite.
public protocol SuiteTrait: Trait { ... }
Using these protocols, the attached macros @Test
and @Suite
shown earlier
will gain parameters accepting traits:
/// Declare a test function.
///
/// - Parameter traits: Zero or more traits to apply to this test.
@attached(peer)
public macro Test(
_ traits: any TestTrait...
)
/// Declare a test function.
///
/// - Parameters:
/// - displayName: The customized display name of this test.
/// - traits: Zero or more traits to apply to this test.
@attached(peer)
public macro Test(
_ displayName: _const String,
_ traits: any TestTrait...
)
/// Declare a test suite.
///
/// - Parameter traits: Zero or more traits to apply to this test suite.
@attached(member) @attached(peer)
public macro Suite(
_ traits: any SuiteTrait...
)
/// Declare a test suite.
///
/// - Parameters:
/// - displayName: The customized display name of this test suite.
/// - traits: Zero or more traits to apply to this test suite.
@attached(member) @attached(peer)
public macro Suite(
_ displayName: _const String,
_ traits: any SuiteTrait...
)
The specifics of the Trait
protocols and the built-in types conforming to
them will be left to subsequent proposals. But to illustrate the general
pattern they will follow, here is an example showing how a hypothetical option
for marking a test disabled could be structured:
/// A test trait which marks a test as disabled.
public struct DisabledTrait: TestTrait {
/// An optional comment related to this option.
public var comment: String?
}
extension TestTrait where Self == DisabledTrait {
/// Construct a test trait which marks a test disabled,
/// with an optional comment.
public static func disabled(_ comment: String? = nil) -> Self
}
// Usage example:
@Test(.disabled("Currently causing a crash: see #12345"))
func example4() {
// ...
}
Earlier examples showed how related tests may be grouped together by placing them within a type. This technique also allows forming sub-groups by nesting one type containing tests inside another:
struct OuterTests {
@Test func outerExample() { /* ... */ }
@Suite(.tags("edge-case"))
struct InnerTests {
@Test func innerExample1() { /* ... */ }
@Test func innerExample2() { /* ... */ }
}
}
When using this technique, test traits may be specified on nested types and
inherited by all tests they contain. For example, the .tags("edge-case")
trait shown here on InnerTests
would have the effect of adding the tag
edge-case
to both innerExample1()
and innerExample2()
, as well as to
InnerTests
.
Parameterized testing is easy to support using this API: The @Test
functions
shown earlier do not accept any parameters, making them non-parameterized, but
if a @Test
function includes a parameter, then a different overload of the
@Test
macro can be used which accepts a Collection
whose associated
Element
type matches the type of the parameter:
/// Declare a test function parameterized over a collection of values.
///
/// - Parameters:
/// - traits: Zero or more traits to apply to this test.
/// - collection: A collection of values to pass to the associated test
/// function.
///
/// During testing, the associated test function is called once for each element
/// in `collection`.
@attached(peer)
public macro Test<C>(
_ traits: any TestTrait...,
arguments collection: C
) where C: Collection & Sendable, C.Element: Sendable
// Usage example:
@Test(arguments: ["a", "b", "c"])
func example5(letter: String) {
// ...
}
Once Swift’s support for
Variadic Generics
gains more functionality, the signature of these @Test
macros may be revised
to accept more than one collection of arguments. This will expand the feature by
allowing a test function with arity N to be repeated once for each combination
of elements from N collections.
In existing test solutions available to Swift developers, there is limited
diagnostic information available for a failed expectation such as
assert(2 < 1)
. The expression is reduced at runtime to a simple boolean value
with no context (such as the original source code) available to include in a
test’s output.
By adopting
Expression Macros (SE-0382),
we can give developers implicitly expressive test
expectations. The expectation shown below, upon failure, can capture not just
the boolean value false
, but also the left-hand and right-hand operands and
the operator itself (that is, x
, 1
, and <
respectively) and expand any
sub-expressions to their evaluated values, such as x → 2
:
let x = 2
#expect(x < 1) // failed: (x → 2) < 1
Some expectations must pass for a test to proceed—these would be expressed
with a separate macro #require()
. Because #require()
must pass, we can
infer additional behaviors based on its argument that we cannot do with
#expect()
. For example, if an optional value is passed to #require()
, we
can infer that #require()
should return the optional value or fail if it is
nil
:
let x: Int? = 10
let y: String? = nil
let z = try #require(x) // passes, z == 10
let w = try #require(y) // fails, test ends early with a thrown error
We can also extract the components of an expression like a.contains(b)
and, on
failure, report the value of a
and b
:
let a = [1, 2, 3]
let b = 4
#expect(a.contains(b)) // failed: (a → [1, 2, 3]).contains(b → 4)
We can also leverage built-in language features for yet more expressiveness. Consider the following test logic:
let a = [1, 2, 3, 4, 5]
let b = [1, 2, 3, 3, 4, 5]
#expect(a == b)
This expectation will fail because of the extra element 3
in b
. We can
leverage
Ordered Collection Diffing (SE-0240)
to capture exactly how these arrays differ and present that information to the
developer as part of the test output or in the IDE.
Once this project reaches its first major release, its API will be considered stable and will follow the same general rules as the Swift standard library. Source-breaking API changes will be considered a last resort and be preceded by a generous deprecation period.
However, unlike the standard library, this project will generally not guarantee ABI stability. Tests are typically only run during the development cycle and aren't distributed to end users, so the runtime testing library will not be included in OS distributions and thus its interfaces do not need to maintain ABI stability.
For this testing solution to stand the test of time, it needs to be well maintained and any new features added to it should be thoughtfully designed and undergo community review.
The codebase for this testing system will be open source. Any significant additions to its feature set or changes to its API or behavior will follow a process inspired by, but separate from, Swift Evolution. Changes will be described in writing using a standard proposal template and discussed in the swift-testing category of the Swift Forums. The process for pitching and submitting proposals for review will be formalized separately and documented in the project repo.
A new group—tentatively named the Swift Testing Workgroup—will be formed to act as the primary governing body for this project. This group will be considered a sub-group of the planned Ecosystem Steering Group once it has been formed. The group will retroactively assume responsibility for the swift-corelibs-xctest project as well. The responsibilities of members of this workgroup will include:
- defining and approving roadmaps for the project;
- scheduling proposal reviews;
- guiding community discussion;
- making decisions about proposals; and
- working with members of related workgroups, such as the Platform, Server, or Documentation workgroups, on topics which intersect with their areas of focus.
The membership of this workgroup, its charter, and more specifics about its role in the project will be formalized separately.
As mentioned in Approachability above, testing should be easy to use. The easier it is to write a test, the more tests will be written, and software quality generally improves with more automated tests.
Most open source Swift software is distributed as source code and built by clients, using tools such as Swift Package Manager. Although this is very common, if we took this approach, every client would need to download the source for this project and its dependencies. A test target is included in SwiftPM's standard New Package template, so if this project became the default testing library used by that template, this would mean a newly-created package would require internet access to build its tests. In addition to downloading source, clients would also need to build this project along with its dependencies. Since this project relies on Swift Macros, it depends on swift-syntax and that is a large project known to have lengthy build times as of this writing.
Due to these practical concerns, this project will be distributed as part of the Swift toolchain, at least initially. Being in the toolchain means clients will not need to download or build this project or its dependencies, and this project's macros can use the copy of swift-syntax in the toolchain. Longer-term, if the practical concerns described above are resolved, the library could be removed from the toolchain, and doing so could yield other benefits such as more explicit dependency tracking and more portable toolchains.
Although the Swift toolchain will be the primary method of distribution, the project will support building as a Swift package. This will facilitate development of the package itself by its maintainers and outside contributors.
Clients will also have the option of declaring an explicit package dependency on this project rather than relying on the built-in copy in the toolchain. If a client does this, it will be possible for tools to integrate with the client's specified copy of the project, as long as it is a version of the package the tool supports.
Testing should be broadly available across platforms. The long-term goal is for this project to be available on all platforms Swift itself supports.
Whenever the Swift Platform Steering Group declares intention to support a new platform, this project will be considered one of the highest-priority components to get working on the new platform (after dependencies such as the standard library) since this project will enable qualification of many other components in the stack. The maintainers of this project will work with other Swift workgroups or steering groups to help enable support on new platforms.
One reason why broad platform support is important is so that this project can eventually support testing the Swift standard library. The standard library currently uses a custom library for testing (StdlibUnittest) but many of this project's benefits would be useful in that context as well, so eventually we would like to rebase StdlibUnitTest on this project.
While the goal is for this project to work on every platform Swift does, it currently does not work on Embedded Swift due to its reliance on existentials. It is possible this limitation may be overcome through project changes, but if it cannot be, the project maintainers will work with other Swift work/steering groups to identify a solution.
While exploring new testing directions, we considered and thoroughly prototyped an approach relying heavily on Result Builders. At a high level, the idea involved a few pieces:
- Types like
TestCase
andTestSuite
representing individual tests and groups, respectively. - A
@resultBuilder
typeTestBuilder
allowing declarative creation of test hierarchies. - A protocol named e.g.
TestProvider
with a requirement@TestBuilder static var tests: TestSuite<Self>
which suite types would implement in order to define their tests. - Tests defined as closures in the
static var tests
result builder above, accepting an instance of a type namedTestContext
which allowed accessing per-test instance storage.
This approach seemed promising at first and satisfied many of the goals described in the beginning of this document. But we discovered several significant drawbacks:
- Type-checking performance: Certain Result Builder usage patterns are known to lead to poor type-checking performance, especially when the expression is long. When describing an entire suite of tests, which may be nested arbitrarily, the work can become exponential and lead to a noticeable increase in build time or, in the extreme case, compiler timeouts.
- Accessing test state: Because tests are defined in a
static
context, per-test state must be accessed indirectly, via aTestContext
wrapper type. This made accessing per-test storage more verbose than necessary, and introduced the need for synchronization on the wrapper type. - Global actor isolation: It is difficult, or perhaps impossible, to use a
global actor (most often
@MainActor
) in both the test body and the type enclosing the test which stored its per-test state, and ensure they match. In practice, this means that tests whose subjects include global actors are challenging to write without lots ofawait
suspension points. - Build-time test discovery: It is difficult, or perhaps impossible, to discover tests comprehensively at build time since the definition of tests happens in result builder functions.
- Discovery: Using Result Builders did not fully solve the problem of
runtime test discovery; it is still necessary to locate types conforming to
TestProvider
protocol, even though the tests within each conforming type are trivial to gather by calling its statictests
property.
Another approach for defining tests is using a builder pattern with an
imperative style API. A good example of this is Swift’s own
StdlibUnittest
library which is used to test the standard library. To define tests, a user
first creates a TestSuite
and then calls .test("Some name") { /* body */ }
one or more times to add a closure containing each test.
One problem with generalizing this approach is that it doesn’t have a way to deterministically discover tests either after a build or while authoring tests in an IDE (see Test discovery). Because tests are defined using imperative code, it may contain arbitrary control flow logic static analysis may not be able to reason about. As a contrived example, imagine the following:
import StdlibUnittest
var myTestSuite = TestSuite("My tests")
if Bool.random() {
myTestSuite.test("Foo") { /* ... */ }
}
There may be arbitrary logic (such as if Bool.random()
here) which influences
the test suite construction, and this makes important features like IDE
discovery impossible in the general case.