Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex-validated string types (feedback reset) #41160

Open
RyanCavanaugh opened this issue Oct 19, 2020 · 113 comments
Open

Regex-validated string types (feedback reset) #41160

RyanCavanaugh opened this issue Oct 19, 2020 · 113 comments
Labels
Awaiting More Feedback This means we'd like to hear from more people who would be helped by this feature Suggestion An idea for TypeScript

Comments

@RyanCavanaugh
Copy link
Member

RyanCavanaugh commented Oct 19, 2020

This is a pickup of #6579. With the addition of #40336, a large number of those use cases have been addressed, but possibly some still remain.

Update 2023-04-11: Reviewed use cases and posted a write-up of our current evaluation

Search Terms

regex string types

Suggestion

Open question: For people who had upvoted #6579, what use cases still need addressing?

Note: Please keep discussion on-topic; moderation will be a bit heavier to avoid off-topic tangents

Examples

(please help)

Checklist

My suggestion meets these guidelines:

  • [?] This wouldn't be a breaking change in existing TypeScript/JavaScript code
  • [?] This wouldn't change the runtime behavior of existing JavaScript code
  • [?] This could be implemented without emitting different JS based on the types of the expressions
  • [?] This isn't a runtime feature (e.g. library functionality, non-ECMAScript syntax with JavaScript output, etc.)
  • [?] This feature would agree with the rest of TypeScript's Design Goals.
@RyanCavanaugh RyanCavanaugh added Awaiting More Feedback This means we'd like to hear from more people who would be helped by this feature Suggestion An idea for TypeScript labels Oct 19, 2020
@AnyhowStep
Copy link
Contributor

AnyhowStep commented Oct 19, 2020

Use case 1, URL path building libraries,

/*snip*/
createTestCard : f.route()
    .append("/platform")
    .appendParam(s.platform.platformId, /\d+/)
    .append("/stripe")
    .append("/test-card")
/*snip*/

These are the constraints for .append(),

  • ✔️ Must start with leading forward slash (/)
  • ❌ Must not end with trailing forward slash (/)
  • ❌ Must not contain colon character (:); it is reserved for parameters
  • ❌ Must not contain two, or more, forward slashes consecutively (//)

Use case 2,

  • ❌ Hexadecimal/binary/decimal/etc. strings of non-trivial length (explosion of union types)

Use case 3, safer RegExp constructor (and similar functions?),

new(pattern: string, flags?: PatternOf</^[gimsuy]*$/>): RegExp
  • flags should only contain the characters g,i,m,s,u,y
  • ❌ Each character should only be used once (To be fair, this condition would be hard for regexes, too, requiring negative lookahead or many states)
  • ❌ Characters can be specified in any order

@yume-chan
Copy link
Contributor

Template string type can only be used in conditional type, so it's really a "type validator", not a "type" itself. It also focuses more on manipulating strings, I think it's a different design goal from Regex-validated types.

It's doable to use conditional types to constrain parameters, for example taken from #6579 (comment)

declare function takesOnlyHex<StrT extends string> (
    hexString : Accepts<HexStringLen6, StrT> extends true ? StrT : {__err : `${StrT} is not a hex-string of length 6`}
) : void;

However I think this parttern has several issues:

  1. It's not a common pattern, and cumbersome to repeat every time.
  2. The type parameter should be inferred, but was used in a condition before it "can" be inferred, which is unintuitive.
  3. TypeScript still doesn't support partial generic inferrence (Implement partial type argument inference using the _ sigil #26349) so it may be hard to use this pattern with more generic parameters.

@bmix
Copy link

bmix commented Oct 21, 2020

Would this allow me to define type constraints for String to match the XML specification's Name constructs (short summary) and QNames by expressing them as regular expressions? If so, I am all for it :-)

@ksabry
Copy link

ksabry commented Oct 21, 2020

@AnyhowStep It isn't the cleanest, but with conditional types now allowing recursion, it seems we can accomplish these cases with template literal types: playground link

@AnyhowStep
Copy link
Contributor

AnyhowStep commented Oct 22, 2020

We can have compile-time regular expressions now.
But anything requiring conditional types and a generic type param to check is a non-feature to me.

(Well, non-feature when I'm trying to use TypeScript for work. All personal projects have --noEmit enabled because real TS programmers execute in compile-time)

@arcanis
Copy link

arcanis commented Dec 12, 2020

Open question: For people who had upvoted #6579, what use cases still need addressing?

We have a strongly-typed filesystem library, where the user is expected to manipulate "clean types" like Filename or PortablePath versus literal strings (they currently obtain those types by using the as operator on literals, or calling a validator for user-provided strings):

export interface PathUtils {
  cwd(): PortablePath;

  normalize(p: PortablePath): PortablePath;
  join(...paths: Array<PortablePath | Filename>): PortablePath;
  resolve(...pathSegments: Array<PortablePath | Filename>): PortablePath;
  isAbsolute(path: PortablePath): boolean;
  relative(from: PortablePath, to: PortablePath): P;
  dirname(p: PortablePath): PortablePath;
  basename(p: PortablePath, ext?: string): Filename;
  extname(p: PortablePath): string;

  readonly sep: PortablePath;
  readonly delimiter: string;

  parse(pathString: PortablePath): ParsedPath<PortablePath>;
  format(pathObject: FormatInputPathObject<PortablePath>): PortablePath;

  contains(from: PortablePath, to: PortablePath): PortablePath | null;
}

I'm investigating template literals to remove the as syntax, but I'm not sure we'll be able to use them after all:

  • They don't raise errors very well
  • Interfaces are a pain to type (both declaration and implementation would have to be generics)
  • More generally, we would have to migrate all our existing functions to become generics, and our users would have too

The overhead sounds overwhelming, and makes it likely that there are side effects that would cause problems down the road - causing further pain if we need to revert. Ideally, the solution we're looking for would leave the code above intact, we'd just declare PortablePath differently.

@RyanCavanaugh
Copy link
Member Author

RyanCavanaugh commented Dec 14, 2020

@arcanis it really sounds like you want nominal types (#202), since even if regex types existed, you'd still want the library consumer to go through the validator functions?

@hanneswidrig
Copy link

I have a strong use case for Regex-validated string types. AWS Lambda function names have a maximum length of 64 characters. This can be manually checked in a character counter but it's unnecessarily cumbersome given that the function name is usually composed with identifying substrings.

As an example, this function name can be partially composed with the new work done in 4.1/4.2. However there is no way to easily create a compiler error in TypeScript since the below function name will be longer than 64 characters.

type LambdaServicePrefix = 'my-application-service';
type LambdaFunctionIdentifier = 'dark-matter-upgrader-super-duper-test-function';
type LambdaFunctionName = `${LambdaServicePrefix}-${LambdaFunctionIdentifier}`;
const lambdaFunctionName: LambdaFunctionName  = 'my-application-service-dark-matter-upgrader-super-duper-test-function';

This StackOverflow Post I created was asking this very same question.

With the continued rise of TypeScript in back-end related code, statically defined data would be a likely strong use case for validating the string length or the format of the string.

@johnbillion
Copy link

johnbillion commented Apr 29, 2021

TypeScript supports literal types, template literal types, and enums. I think a string pattern type is a natural extension that allows for non-finite value restrictions to be expressed.

I'm writing type definitions for an existing codebase. Many arguments and properties accept strings of a specific format:

  • ❌ Formatted representation of a date, eg "2021-04-29T12:34:56"
  • ❌ Comma-separated list of integers, eg "1,2,3,4,5000"
  • ❌ Valid MIME type, eg "image/jpeg"
  • ❌ Valid hex colour code, already mentioned several times
  • ❌ Valid IPv4 or IPv6 address

@fabiospampinato
Copy link

fabiospampinato commented May 4, 2021

I'd like to argue against @RyanCavanaugh's claim in the first post saying that:

a large number of those use cases have been addressed, but possibly some still remain.

As it stands presently TypeScript can't even work with the following type literal:

type Digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9;

type Just5Digits = `${Digit}${Digit}${Digit}${Digit}${Digit}`;

Throwing an "Expression produces a union type that is too complex to represent.(2590)" error.

That's the equivalent of the following regex:

/^\d{5}$/

Just 5 digits in a row.

Almost all useful regexes are more complicated than that, and TypeScript already gives up with that, hence I'd argue the opposite of that claim is true: a small number of use cases have been addressed and the progress with template literals has been mostly orthogonal really.

@ghost
Copy link

ghost commented May 29, 2021

What about validation of JSON schema's patternProperties regex in TypeScript interfaces for the parsed object? This is a PERFECT application of the regex-validated string feature.

Possible syntax using a matchof keyword:

import { IJSONSchema, IJSONSchemaMap } from 'vs/base/common/jsonSchema';

export const UnscopedKeyPtn: string = '^[^\\[\\]]*$';

export type UnscopedKey = string & matchof RegExp(UnscopedKeyPtn);

export tokenColorSchema: IJSONSchema = {
    properties: {},
    patternProperties: { [UnscopedKeyPtn]: { type: 'object' } }
};

export interface ITokenColors {
    [colorId: UnscopedKey]: string;
}

@sushruth
Copy link

sushruth commented Jun 1, 2021

I just want to add to the need for this because template literals do not behave the way we think explicitly -

type UnionType = {
    kind: `kind_${string}`,
    one: boolean;
} | {
    kind: `kind_${string}_again`,
    two: string;
}

const union: UnionType = {
//     ~~~~~ > Error here -
/**
Type '{ kind: "type1_123"; }' is not assignable to type 'UnionType'.
  Property 'two' is missing in type '{ kind: "type1_123"; }' but required in type '{ kind: `type1_${string}_again`; two: string; }'.ts(2322)
*/
    kind: 'type1_123',
}

this shows template literals are not unique and one can be a subset of another while that is not the intention of use. Regex would let us have a $ at the end to denote end of string that would help discriminate between the constituent types of this union clearly.

@ghost
Copy link

ghost commented Jun 2, 2021

(CC @Igmat) It occurs to me that there's a leaning towards using regex tests as type literals in #6579, i.e.

type CssColor = /^#([0-9a-fA-F]{3}|[0-9a-fA-F]{4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$/i;
const color: CssColor = '#000000'; // OK

It seems that regexes are usually interpreted as values by the TS compiler. When used as a type, this usually throws an error that keeps types and values as distinct as possible. What do you think of:

  • using a *of keyword to cast regex values into a regex-validated type (maybe matchof)
  • having a keyword check for conditional types (maybe matches)
type CssColor = matchof /^#([0-9a-fA-F]{3}|[0-9a-fA-F]{4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$/i;
const color: CssColor = '#000000'; // OK

Editing this to note something - the RegExp.prototype.test method can accept numbers and other non-string primitives. I think that's a neat feature. If people want to strictly validate strings, they can use a intersection type with string. 😄

TL:DR; regex literal types aren't intuitively and visibly types without explicit regex->type casting, can we propose that?

@Etheryte
Copy link

Etheryte commented Jun 2, 2021

I'm not sure what the benefit of a separate keyword is here. There doesn't seem to be a case where it could be ambiguous whether the regex is used as a type or as a value, unless I'm missing something? I think #6579 (comment) and the replies below it already sketch out a syntax that hits the sweet spot of being both succinct and addressing all the use cases.

Regarding the intersection, the input to Regex.prototype.test is always turned into a string first, so that seems superfluous.

@ghost
Copy link

ghost commented Jun 2, 2021

Good to know about RegExp.prototype.test.

The ambiguity seems straightforward to me. As we know, TypeScript is a JS superset & regex values can be used as variables.

To me, a regex literal is just not an intuitive type - it doesn't imply "string that matches this regexp restriction". It's common convention to camelcase regex literals and add a "Regex" suffix, but that variable name convention as a type looks really ugly:

export cssColorRegex: RegExp = /^#([0-9a-fA-F]{3}|[0-9a-fA-F]{4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$/i;
const color: cssColorRegex = '#000000'; // OK
//           ^ lc 👎 ^ two options:
//                   - A. use Regex for value clarity but type confusion or 
//                   - B. ditch Regex for unclear value name but clear type name

The original proposal does suggests JSON schemas which would use the regex as a type and a value (if implemented).

@Etheryte
Copy link

Etheryte commented Jun 2, 2021

Perhaps I wasn't very clear, there doesn't seem to be a case where it would be ambiguous for the compiler whether a regex is a type or a value. Just as you can use string literals both as values and as types:

const foo = "literal"; // Used as a value
const bar: "literal" = foo; // Used as a type

The exact same approach can be applied for regex types without ambiguity.

@ghost
Copy link

ghost commented Jun 2, 2021

My concern is that the regex means two different things in the two contexts - literal vs "returns true from RegExp.test method". The latter seems like a type system feature exclusively - it wouldn't be intuitive unless there's syntax to cast the regex into a type

@ghost
Copy link

ghost commented Jun 5, 2021

There is also the issue of regex literals and regex types possibly being used as superclasses:

If all regex literals and type variables are cast into validators implicitly without a keyword, how do we use RegExp interfaces and regex literals with optional methods as a object type?

To me, context loss in #41160 (comment) is enough reason to add a keyword, but this is another reason. I'm unsure of the name I suggested but I do prefer the use of an explicit type cast.

@edazpotato
Copy link

edazpotato commented Jul 10, 2021

I would love this! I've had tons of issues that could be easily solved with RegEx types.

For example, a very basic IETF language tag type that accepts strings like "en-GB" or "en-US" but rejects strings that don't match the casing correctly.
Using template literals (doesn't work):
image
How it could be done easily with RegEx types:

export type CountryCode = /^[a-z]{2}-[A-Z]{2}$/;

(I know that technically you can represent this sort of type, but it's just a simple example)

@saltman424
Copy link

Another thing to add, this isn't just helpful for validation, but also for extracting information. E.g.

type Id<
  TVersion extends Id.Version = Id.Version,
  TPartialId extends Id.PartialId = Id.PartialId,
  TContext extends Id.Context | undefined = Id.Context | undefined
> = TContext extends undefined ? `${TVersion}:${TPartialId}` : `${TVersion}:${TContext}:${TPartialId}`
namespace Id {
  export type Version = /v\d+/
  export namespace Version {
    export type Of<TId extends Id> = TId extends Id<infer TVersion> ? TVersion : never
  }

  export type PartialId = /\w+/
  export namespace PartialId {
    export type Of<TId extends Id> = TId extends Id<any, infer TPartialId> ? TPartialId : never
  }

  export type Context = /\w+/
  export namespace Context {
    export type Of<TId extends Id> = TId extends Id<any, any, infer TContext> ? TContext : never
  }
}

type MyId = Id<'v1', 'myPartialId', 'myContext'> // 'v1:myContext:myPartialId'
type MyPartialId = Id.PartialId.Of<MyId> // 'myPartialId'

This can be done with just string instead of a regular expression, but that leads to ambiguity. In the above example, 'myContext:myPartial' could be interpreted as a single Id.PartialId.

@tsujp
Copy link

tsujp commented Nov 5, 2023

This constructs a literal string type containing only the allowed characters. If you attempt to pass invalid characters you get back never. This is fine for my usecase (albeit a lot more TypeScript than I'd like for something simple), maybe it will help others until this becomes a smoother experience in TypeScript.

type HexDigit =
   | 0
   | 1
   | 2
   | 3
   | 4
   | 5
   | 6
   | 7
   | 8
   | 9
   | 'a'
   | 'b'
   | 'c'
   | 'd'
   | 'e'
   | 'f'

// Construct a string type with all characters not in union `HexDigit` removed.
export type OnlyHexDigits<Str, Acc extends string = ''> =
   Str extends `${infer D extends HexDigit}${infer Rest}`
      ? OnlyHexDigits<Rest, `${Acc}${D}`>
      : Acc

// Return given type `Hex` IFF it was unchanged (and thus valid) by `OnlyHexDigits`.
export type HexIntLiteral<
   Hex,
   FilteredHex = OnlyHexDigits<Hex>
> =
   Hex extends FilteredHex
      ? Hex
      : never

// Effectively an alias of `HexIntLiteral<'123'>`.
function hexInt<Hex extends string> (n: Hex & HexIntLiteral<Hex>) {
   return n as HexIntLiteral<Hex>
}

// Without the 'alias' form.
declare const t1: HexIntLiteral<'123'> // '123'
declare const t2: HexIntLiteral<'cafebabe'> // 'cafebabe'

// Using the 'alias' form.
const t3 = hexInt('zzzz') // never
const t4 = hexInt('a_b_c_d') // never
const t5 = hexInt('9287319283712ababababdefffababa12312') // <-- that

// Remember, the type is a string literal so `let` is still (as far as TypeScript
//   is concerned) immutable (not _really_).
let t6 = hexInt('cafe123')

t6 = '123' // We (humans) know '123' is valid, but `t6` is a string literal `cafe123`
           //   so this is an error (2232): type '123' not assignable to type 'cafe123'
           //   because we construct a _string literal_ type.

This can likely be simplified but I waste a lot of time code golfing TypeScript types so I abstain this time.

@mauriziocescon
Copy link

mauriziocescon commented Apr 26, 2024

My case:

const obj = {
  _test1: '1', 
  test2: '2',
  _test3: '3',
  test4: '4',
};

function removeKeysStartingWith_(obj: Record<string, unknown>): Record<string, unknown> {
  const x: Record<string, unknown> = {};

  Object.keys(obj)
    .filter(key => !/^_/i.test(key))
    .forEach(key => x[key] = obj[key]);

    return x;
}

// {"test2":"2", "test4":"4"} 

I cannot express the fact that the return object of a function cannot have keys starting with "_". I cannot define the precise keyof set without a RegExp (to be used in combination with conditional types).

@RyanCavanaugh
Copy link
Member Author

@mauriziocescon template literal strings work fine for this; you don't need regexes

const obj1 = {
  _test1: '1', 
  test2: '2',
  _test3: '3'
};
type RemoveUnderscore<K> = K extends `_${string}` ? never : K;
type NoUnderscores<T> = {
    [K in keyof T as RemoveUnderscore<K>]: T[K];
}
declare function removeKeysStartingWith_<T extends object>(obj: T): NoUnderscores<T>; 
const p1 = removeKeysStartingWith_(obj1);
p1.test2; // ok
p1._test1; // not ok

@mauriziocescon
Copy link

Thanks a lot for the instantaneous feedback! I missed that part... 😅

@Peeja
Copy link
Contributor

Peeja commented Apr 26, 2024

@mauriziocescon Be careful, though: that type means that you definitely do not know whether any keys beginning with _, not that you know that they don't. Without exact types, TypeScript can't express the latter. But the former is usually good enough.

@saltman424
Copy link

saltman424 commented Apr 26, 2024

@RyanCavanaugh

Use case

I would like to use this type:

type Word = /^w+$/

I use this as a building block for many template strings. E.g.:

// I mainly don't want `TPartialId` to contain ':',
// as that would interfere with my ability to parse this string
type Id<
  TType extends Type,
  TPartialId extends Word
> = `${Type}:${TPartialId}`

Answers to some of your questions

I use this in a mix of static and dynamic use cases. E.g.

const validId: Id = 'sometype:valid'
// this should not be allowed
const invalidId: Id = 'sometype:invalid:'

declare function createId<TType extends Type, TPartialId extends Word>(
  type: TType,
  partialId: TPartialId
):  Id<TType, TPartialId>
declare function getPartialId<TId extends Id>(
  id: TId
): TId extends Id<any, infer TPartialId> ? TPartialId : Word

declare function generateWord(): Word

I absolutely want to use regular expression types in template literals (as seen in above examples). However, while it would be nice to have, I don't need to be able to use anything within my regular expression types. (e.g. I don't really need type X = /${Y}+/; type Y = 'abc')

I would appreciate the ability to do something like this:

const WORD_REGEXP = /^\w+$/
export type Word = Regex<typeof WORD_REGEXP>
export function isWord(val: unknown): val is Word {
  return typeof val === 'string' && WORD_REGEXP.test(val)
}

However, if I had to write the same regular expression twice, it would still be better than the current state.

I don't think the above part approaches nominal typing. At a high level, regular expression is basically a structural type for a string. You can determine if a string matches the regular expression solely based on the string's contents, ignoring any metadata about the string. With that being said, I do acknowledge that it is harder to determine if a type for a string matches a regular expression, which is where things get kind of nominal. Specifically, to your point:

There's also a problem of the implicit subtyping behavior you'd want here -- what if you tested for /^\d\d\d$/ instead of /^\d+$/? Programmers are very particular about what they think the "right" way to write a regex are, so the feature implies either implementing regex subtyping so that the subset behavior can be validated, or enduring endless flamewars in places like DT as people argue about which regex is the correct one for a given problem.

If you are within one project, you should create one type with whatever the "right" regex for that project is and reference that everywhere. If you are working with a library, you should use the type from that library. Either way, you shouldn't have to recreate a regular expression type in the way that you think is "right." And if you want to add additional restrictions, just use intersection. Although, I do recognize that without subtyping, things do get pretty nominal when determining if types match a regular expression. However, we currently deal with that type of problem with deferred evaluation of type parameters in functions/classes. So semi-nominal types in certain contexts doesn't seem to be a deal-breaker. Although, I do acknowledge deferred type parameters are never fun to deal with

Most functions with implicit data formats aren't also publishing a canonical regex for their data format.

To be fair, the canonical regex doesn't generally matter externally at the moment. If it did matter externally, e.g. it was used in a type, they would be more likely to publish it

Alternative: template string enhancements

I do agree that enhancements to template strings could work. In my use case, these would be sufficient:

  1. Some way to repeat 0+ or 1+ times (maybe circular references - see below)
  2. Preferably, built in utility types for \w, \d, \s, and other similar RegExp features. (e.g. type Digit = '0' | '1' | '2' | ...)

With these, I could do something like:

type WordCharacter = 'a' | 'b' | ... (preferably this is built into TypeScript)
type Word = `${WordCharacter}${Word | ''}` // === /^\w+$/
type WordOrEmpty = Word | '' // === /^\w*$/

However, these would not work if I wanted to do this through negation, which I had thought about. E.g.:

type PartialId = /^[^:]+$/

If you like these enhancements, I can put them in proposals in one or more separate issues

@samueldcorbin
Copy link

samueldcorbin commented Jun 30, 2024

To add a very straightforward use case to this: custom element names.

Custom element names must begin with a lowercase letter, must have a dash, and are frequently defined as string literals, not dynamically. This seems like something that TypeScript should absolutely be able to handle, it's easy for people to carelessly forget that the elements have to have a dash or must be lowercased, and it's annoying to only get it at runtime.

Sometimes people define custom element names dynamically, but they define them as literals often too. It would be nice if we could at least check the literals, even if we can't check the dynamic ones.

On the whole, the discussion of this proposal is extremely frustrating to read. The evaluation begins with "Checking string literals is easy and straightforward". Great. So why is adding an easy and straightforward thing being held up for literal years by discussion about maybe adding much less easy and much less straightforward things?

I understand the general sentiment that you want to be careful about making a simple syntax for the easy case that accidentally blocks future extension of functionality when you get to the hard cases, but that doesn't look like an issue here. Maybe capture groups would be useful, maybe dynamic strings would be useful. But adding support for string literals and regex without capture groups is easy and doesn't block adding support for dynamic strings and capture groups later.

@Oblarg
Copy link

Oblarg commented Aug 18, 2024

Another use-case: dynamically-typed reducers for event-based programming:

image

With current template literals, it's a bit cumbersome to do this even for a simple prefix search, and generally unreliable/impossible to do anything much more complicated than that. It turns out this is not so difficult to do for arbitrary-depth substitution of a single wildcard character (see above), thanks to recursive types - but regex-validated string types would make this way more powerful, especially when topic lists are old and not ideally systematic.

(an aside: the fact that typescript can infer the reduction of the declared union for the tooltip here is pretty darn impressive, though it falls back to the flat payload union if you change some of the intermediate types to use captures rather than explicit generic parameters)

@HansBrende
Copy link

HansBrende commented Dec 6, 2024

Nearly every use-case mentioned can already be implemented via built-in string template matching & extraction (see, for example, the "wouter" routing library for how they validate and extract route parameters from paths using this method).

The only problem is that the solution in all of these cases requires something like:

type ParseSomething<T extends string> = T extends `...${infer Something}...` ? T : never

function validateSomething<T extends string>(input: ParseSomething<T>) {
    // input has now been validated, but we required this useless function to do it.
    // also very painful to create "arrays of valid somethings"
}

What we really want to be able to do is throw away the T parameter after validation, as this would allow us to build data structures that don't care which hex code (for example), only that the hex code is valid.

We want to be able to say (for example):

type HexCode = <exists S extends string> ParseHex<S>

const hexCodes: HexCode[] = ['000000', 'FFFFFF']
// etc.

So, if I'm not mistaken, it seems like this issue can mostly be reduced to the introduction of existential types issue. Please upvote that one!

(The length-specific use-cases would probably also do well to upvote this length-specific issue.)

@saltman424
Copy link

@HansBrende how would you use existential types for the below use case?

type Word = /^w+$/

@HansBrende
Copy link

@saltman424

type WordChar = 'A'|'B'|'C'| ... |'Z'|'a'|'b'|'c'| ... |'z'|'0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'|'_'
type IsWord<S extends string> = S extends `${WordChar}${infer R}` ? R extends '' ? unknown : IsWord<R> : never

type Word = <exists S extends string> IsWord<S> & S

(Note: WordChar implementation would become a whole lot less verbose if the second issue I mentioned gets accepted, as then we could just do: type WordChar = 'ABC...Zabc...z0123456789_'[number].)

@hbiede
Copy link

hbiede commented Dec 6, 2024

@saltman424

type WordChar = 'A'|'B'|'C'| ... |'Z'|'a'|'b'|'c'| ... |'z'|'0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'|'_'

A) This doesn’t account for Unicode characters in a concise way
B) This is likely too complex for the type parser, no? I’ve had that issue when trying to write just a hex color type, which is only 9 characters at most, let alone an indefinite string

@HansBrende
Copy link

HansBrende commented Dec 6, 2024

@hbiede

This doesn’t account for Unicode characters in a concise way

True, but the original regexp doesn't either since it did not include the i and u/v flags.
If we wanted /^\w+$/iu instead, then we'd need to add 2 extra characters to WordChar: ſ and .

Unicode-aware expressions in general would obviously be a bit more complicated, but still potentially doable in many cases, including this one (but in many cases not). That's why I stated existential types would cover most of the use-cases, but not all.

This is likely too complex for the type parser, no?

Nope. Try it and see! The compiler can handle something like 10K types in a union (not sure what the exact number is), but this is only 26 + 26 + 10 + 1 = 63 types. Your Hex color type was probably defined differently, something like ${Hex}${Hex}${Hex}${Hex}${Hex}${Hex}. That is a union of 16^6 = 16777216 possibilities--much larger. You must use recursion for these sorts of validations to work (and the typescript compiler implements tail-recursion elimination so it is very efficient).

If both of the issues I linked to (existential types and string literal length--again, please upvote) were implemented, then you could implement HexColor very simply as follows:

type HexDigit = `${0|1|2|3|4|5|6|7|8|9}`|'a'|'b'|'c'|'d'|'e'|'f'
type IsHexString<S extends string> = S extends '' ? unknown : S extends `${HexDigit}${infer R}` ? IsHexString<R> : never
type IsHexColor<S extends string> = S extends `#${infer R}` & {length: 7 | 9} ? IsHexString<R> : never

// TA-DA!
type HexColor = <exists S extends string> IsHexColor<S> & S

@michaelinva
Copy link

This would extremely useful for completely ditching ORMs and just using pure typed-SQL queries. Most devs use ORMs for the type-safety but it introduces a ton of method-chaining overhead then you end writing SQL anyways, just with methods.

@RyanCavanaugh
Copy link
Member Author

I'm not clear on how regex is useful for SQL queries; can you clarify?

@ljharb
Copy link
Contributor

ljharb commented Dec 13, 2024

(One of the primary motivating use cases for tagged template literals was for being able to construct a context-aware DSL, including SQL queries and regular expressions)

@shaedrich
Copy link

shaedrich commented Dec 14, 2024

I can see both sides here:

  • Parsing actual RegExp types would add a significant amount of complexity to the TS language and compiler, so MS wants to avoid that as well as in any way possible
  • Devs are somewhat satisfied, yet not entirely
    • Current solutions cover many use cases, not some, they cannot cover
    • Of those, who are covered, some are possible, yet have terrible UX (meaning, it's a workaround at best)—let's showcase that by implementing UUID type:
Large table

Solution

Code

Pros

Cons

Okay, let's have a repeated type,
using string literals
type Numeric = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
type Alphabetic = 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z'
type Alphanumeric = Alphabetic | Numeric

type Repeat<
Char extends string,
Count extends number,
Joined extends string = ``,
Acc extends 0[] = []
> = Acc['length'] extends Count ? Joined : Repeat<Char, Count, ${Joined}${Char}, [0,...Acc]>

type UUIDV4 = ${Repeat<Alphanumeric, 8>}-${Repeat<Alphanumeric, 4>}-${Repeat<Alphanumeric, 4>}-${Repeat<Alphanumeric, 4>}-${Repeat<Alphanumeric, 12>}

(Source: https://overflow.freedit.eu/questions/68724603/how-to-create-a-uuid-template-literal-type-in-typescript)
✅ Human-readable
✅ checked at compile time (in theory, see cons)
❌ compiler error (too complex)
Well, then just use a branded string with type guards instead ¯\_(ツ)_/¯
type UUID = string & { __uuidBrand: never };

function isUUID(uuid: string): uuid is UUID {
return /^[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i.test(uuid);
}

(Source: https://www.webdevtutor.net/blog/typescript-define-uuid-type)
✅ compiles
✅ type is narrowed down
❌ checked at runtime
❌ no real meaning at compile time
Okay, last try: recursion
type VersionChar =
    | '1' | '2' | '3' | '4' | '5';

type Char =
| '0' | '1' | '2' | '3'
| '4' | '5' | '6' | '7'
| '8' | '9' | 'a' | 'b'
| 'c' | 'd' | 'e' | 'f';

type Prev =
[never, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ...never[]][X];

type HasLength<S extends string, Len extends number> = [Len] extends [0]
? (S extends '' ? true : never)
: (S extends ${infer C}${infer Rest}
? (Lowercase extends Char ? HasLength<Rest, Prev> : never)
: never);

type Char4<S extends string> = true extends HasLength<S, 4> ? S : never;
type Char8<S extends string> = true extends HasLength<S, 8> ? S : never;
type Char12<S extends string> = true extends HasLength<S, 12> ? S : never;

type VersionGroup<S extends string> = S extends ${infer Version}${infer Rest}
? (Version extends VersionChar
? (true extends HasLength<Rest, 3> ? S : never)
: never)
: never;

type NilUUID = '00000000-0000-0000-0000-000000000000';

type UUID<S extends string> = S extends NilUUID
? S
: (S extends ${infer S8}-${infer S4_1}-${infer S4_2}-${infer S4_3}-${infer S12}
? (S8 extends Char8
? (S4_1 extends Char4<S4_1>
? (S4_2 extends VersionGroup<S4_2>
? (S4_3 extends Char4<S4_3>
? (S12 extends Char12
? S
: never)
: never)
: never)
: never)
: never)
: never);

(Source: https://ybogomolov.me/type-level-uuid)
✅ checked at compile time ❌ quite a mouth full → terrible DX

    Together with the constant string length of #34692 mentioned by @HansBrende and built-in types like utility intrinsic string manipulation types (Uppercase, Lowercase, Capitalize and Uncapitalize), that look like common RegExp operation shortcuts (similar to what has been suggested in #34692 (comment)), which actually are just syntactic sugar for the last example above (which could then be optimized under the hood), this would look so much more clean and would be way less cumbersome to implement:

type UuidSegmentQuartett = `{(Alpha|number[1]){4}}`
type UUID = `{UuidSegmentQuartett[2]}-{UuidSegmentQuartett}-{UuidSegmentQuartett}-{UuidSegmentQuartett}-{UuidSegmentQuartett[3]}`

    Or, strings become generics (strings without type default to string<any>) similar to arrays

type UuidSegmentQuartett = `{(string<Alpha>|number[1]){4}}` // or `string<AlphaNum>[4]` // or even just `string<Hex>[4]`
type UUID = `{UuidSegmentQuartett[2]}-{UuidSegmentQuartett}-{UuidSegmentQuartett}-{UuidSegmentQuartett}-{UuidSegmentQuartett[3]}`

@HansBrende
Copy link

HansBrende commented Dec 14, 2024

@shaedrich Note: with existential types and string literal length (please upvote both of these issues), all the compiler problems in your first example go away:

type IsRepeated<Char extends string, S extends string> = 
      S extends '' ? unknown : 
      S extends `${Char}${infer R}` ? IsRepeated<Char, R> : never;

// Note that here, an *existential type parameter* replaces the need for a large union:
type Repeat<Char> = <exists S extends string> IsRepeated<Char, S> & S

type HexDigit = `${0|1|2|3|4|5|6|7|8|9}`|'a'|'b'|'c'|'d'|'e'|'f'

type XXXX = Repeat<HexDigit> & {length: 4}
type XXXXXXXX = Repeat<HexDigit> & {length: 8}
type XXXXXXXXXXXX = Repeat<HexDigit> & {length: 12}

// Should work fine now since large union has been replaced with existential types:
type UUID = `${XXXXXXXX}-${XXXX}-${XXXX}-${XXXX}-${XXXXXXXXXXXX}` 

@shaedrich
Copy link

I forgot to mention that my example is a little oversimplified, since

  • the first character of the third pair (the version number) can only have the values 0 – 8
  • there are version-specific constraints of what values other places within the rest of the UUID can take (especially the NIL and MAX UUIDs)

@RyanCavanaugh
Copy link
Member Author

By far the best feature for UUID is #43335

But again, how do you even get a malformed UUID in the first place? You can only ever copy-paste them. If you miss a digit from selecting wrong, you should get a more-or-less immediate exception. Why is this happening to people so often?

@HansBrende
Copy link

HansBrende commented Dec 16, 2024

@RyanCavanaugh to answer your question for my own use-cases, the ability to correctly type a UUID would be helpful mainly to ensure that UUIDs round-trip to the server without accidentally putting some other string identifier (whether that be some human-readable identifier, "code", or stringified serial ID) in the "id" field (which aligns with the whole point of using a typed language in the first place: fail at compile time instead of runtime).

Of course, that problem is easily solved by using opaque symbol tags as well, though it feels kind of hacky, especially when certain "special" uuids are hardcoded and must be cast to fit the opaque type.

A more compelling use-case in my mind is hex codes of a certain length, such as RGB or RGBA codes, especially when you are doing math on them and want to avoid writing extra code for error handling of non-valid inputs (or even worse: trying to support all possible formats to avoid error-handling).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting More Feedback This means we'd like to hear from more people who would be helped by this feature Suggestion An idea for TypeScript
Projects
None yet
Development

No branches or pull requests