Large branches on repeat fall into infinite loop in proc macro #70

maciejhirsz · 2019-02-20T18:51:39Z

Related to #68. regex = r"\w+" makes the derive macro fall into an infinite loop. A single \w produces a tree:

[
    '[0-9A-Z_a-z]' -> TOKEN "Label",
    'C2[AAB5BA]' -> TOKEN "Label",
    'C3[80-9698-B6B8-BF]' -> TOKEN "Label",
    'CB[80-8186-91A0-A4ACAE]' -> TOKEN "Label",
    '[C4-CACCD0-D1D3DA]80-BF' -> TOKEN "Label",
    'CD[80-B4B6-B7BA-BDBF]' -> TOKEN "Label",
    'CE[8688-8A8C8E-A1A3-BF]' -> TOKEN "Label",
    'CF[80-B5B7-BF]' -> TOKEN "Label",
    'D2[80-8183-BF]' -> TOKEN "Label",
    'D4[80-AFB1-BF]' -> TOKEN "Label",
    'D5[80-9699A0-BF]' -> TOKEN "Label",
    'D6[80-8891-BDBF]' -> TOKEN "Label",
    'D7[81-8284-858790-AAAF-B2]' -> TOKEN "Label",
    'D8[90-9AA0-BF]' -> TOKEN "Label",
    'D9[80-A9AE-BF]' -> TOKEN "Label",
    'DB[80-9395-9C9F-A8AA-BCBF]' -> TOKEN "Label",
    'DC90-BF' -> TOKEN "Label",
    'DD[80-8A8D-BF]' -> TOKEN "Label",
    'DE80-B1' -> TOKEN "Label",
    'DF[80-B5BABD]' -> TOKEN "Label",
    'E0' -> [
        'A080-AD' -> TOKEN "Label",
        'A1[80-9BA0-AA]' -> TOKEN "Label",
        'A2[A0-B4B6-BD]' -> TOKEN "Label",
        'A3[93-A1A3-BF]' -> TOKEN "Label",
        'A480-BF' -> TOKEN "Label",
        'A5[80-A3A6-AFB1-BF]' -> TOKEN "Label",
        'A6[80-8385-8C8F-9093-A8AA-B0B2B6-B9BC-BF]' -> TOKEN "Label",
        'A7[80-8487-888B-8E979C-9D9F-A3A6-B1BCBE]' -> TOKEN "Label",
        'A8[81-8385-8A8F-9093-A8AA-B0B2-B3B5-B6B8-B9BCBE-BF]' -> TOKEN "Label",
        'A9[80-8287-888B-8D9199-9C9EA6-B5]' -> TOKEN "Label",
        'AA[81-8385-8D8F-9193-A8AA-B0B2-B3B5-B9BC-BF]' -> TOKEN "Label",
        'AB[80-8587-898B-8D90A0-A3A6-AFB9-BF]' -> TOKEN "Label",
        'AC[81-8385-8C8F-9093-A8AA-B0B2-B3B5-B9BC-BF]' -> TOKEN "Label",
        'AD[80-8487-888B-8D96-979C-9D9F-A3A6-AFB1]' -> TOKEN "Label",
        'AE[82-8385-8A8E-9092-9599-9A9C9E-9FA3-A4A8-AAAE-B9BE-BF]' -> TOKEN "Label",
        'AF[80-8286-888A-8D9097A6-AF]' -> TOKEN "Label",
        'B0[80-8C8E-9092-A8AA-B9BD-BF]' -> TOKEN "Label",
        'B1[80-8486-888A-8D95-9698-9AA0-A3A6-AF]' -> TOKEN "Label",
        'B2[80-8385-8C8E-9092-A8AA-B3B5-B9BC-BF]' -> TOKEN "Label",
        'B3[80-8486-888A-8D95-969EA0-A3A6-AFB1-B2]' -> TOKEN "Label",
        'B4[80-8385-8C8E-9092-BF]' -> TOKEN "Label",
        'B5[80-8486-888A-8E94-979F-A3A6-AFBA-BF]' -> TOKEN "Label",
        'B6[82-8385-969A-B1B3-BBBD]' -> TOKEN "Label",
        'B7[80-868A8F-949698-9FA6-AFB2-B3]' -> TOKEN "Label",
        'B881-BA' -> TOKEN "Label",
        'B9[80-8E90-99]' -> TOKEN "Label",
        'BA[81-828487-888A8D94-9799-9FA1-A3A5A7AA-ABAD-B9BB-BD]' -> TOKEN "Label",
        'BB[80-848688-8D90-999C-9F]' -> TOKEN "Label",
        'BC[8098-99A0-A9B5B7B9BE-BF]' -> TOKEN "Label",
        'BD[80-8789-ACB1-BF]' -> TOKEN "Label",
        'BE[80-8486-9799-BC]' -> TOKEN "Label",
        'BF86' -> TOKEN "Label"
    ],
    'E1' -> [
        '[8084-8891-989EACAEB4-B6B8-BB]80-BF' -> TOKEN "Label",
        '81[80-8990-BF]' -> TOKEN "Label",
        '82[80-9DA0-BF]' -> TOKEN "Label",
        '83[80-85878D90-BABC-BF]' -> TOKEN "Label",
        '89[80-888A-8D90-96989A-9DA0-BF]' -> TOKEN "Label",
        '8A[80-888A-8D90-B0B2-B5B8-BE]' -> TOKEN "Label",
        '8B[8082-8588-9698-BF]' -> TOKEN "Label",
        '8C[80-9092-9598-BF]' -> TOKEN "Label",
        '8D[80-9A9D-9F]' -> TOKEN "Label",
        '8E[80-8FA0-BF]' -> TOKEN "Label",
        '8F[80-B5B8-BD]' -> TOKEN "Label",
        '9081-BF' -> TOKEN "Label",
        '99[80-ACAF-BF]' -> TOKEN "Label",
        '9A[81-9AA0-BF]' -> TOKEN "Label",
        '9B[80-AAAE-B8]' -> TOKEN "Label",
        '9C[80-8C8E-94A0-B4]' -> TOKEN "Label",
        '9D[80-93A0-ACAE-B0B2-B3]' -> TOKEN "Label",
        '9F[80-93979C-9DA0-A9]' -> TOKEN "Label",
        'A0[8B-8D90-99A0-BF]' -> TOKEN "Label",
        'A180-B8' -> TOKEN "Label",
        'A2[80-AAB0-BF]' -> TOKEN "Label",
        'A380-B5' -> TOKEN "Label",
        'A4[80-9EA0-ABB0-BB]' -> TOKEN "Label",
        'A5[86-ADB0-B4]' -> TOKEN "Label",
        'A6[80-ABB0-BF]' -> TOKEN "Label",
        'A7[80-8990-99]' -> TOKEN "Label",
        'A8[80-9BA0-BF]' -> TOKEN "Label",
        'A9[80-9EA0-BCBF]' -> TOKEN "Label",
        'AA[80-8990-99A7B0-BE]' -> TOKEN "Label",
        'AD[80-8B90-99AB-B3]' -> TOKEN "Label",
        'AF80-B3' -> TOKEN "Label",
        'B080-B7' -> TOKEN "Label",
        'B1[80-898D-BD]' -> TOKEN "Label",
        'B2[80-8890-BABD-BF]' -> TOKEN "Label",
        'B3[90-9294-B9]' -> TOKEN "Label",
        'B7[80-B9BB-BF]' -> TOKEN "Label",
        'BC[80-9598-9DA0-BF]' -> TOKEN "Label",
        'BD[80-8588-8D90-97999B9D9F-BD]' -> TOKEN "Label",
        'BE[80-B4B6-BCBE]' -> TOKEN "Label",
        'BF[82-8486-8C90-9396-9BA0-ACB2-B4B6-BC]' -> TOKEN "Label"
    ],
    'E2' -> [
        '80[8C-8DBF]' -> TOKEN "Label",
        '81[8094B1BF]' -> TOKEN "Label",
        '8290-9C' -> TOKEN "Label",
        '8390-B0' -> TOKEN "Label",
        '84[82878A-939599-9DA4A6A8AA-ADAF-B9BC-BF]' -> TOKEN "Label",
        '85[85-898EA0-BF]' -> TOKEN "Label",
        '8680-88' -> TOKEN "Label",
        '92B6-BF' -> TOKEN "Label",
        '9380-A9' -> TOKEN "Label",
        'B0[80-AEB0-BF]' -> TOKEN "Label",
        'B1[80-9EA0-BF]' -> TOKEN "Label",
        'B280-BF' -> TOKEN "Label",
        'B3[80-A4AB-B3]' -> TOKEN "Label",
        'B4[80-A5A7ADB0-BF]' -> TOKEN "Label",
        'B5[80-A7AFBF]' -> TOKEN "Label",
        'B6[80-96A0-A6A8-AEB0-B6B8-BE]' -> TOKEN "Label",
        'B7[80-8688-8E90-9698-9EA0-BF]' -> TOKEN "Label",
        'B8AF' -> TOKEN "Label"
    ],
    'E3' -> [
        '80[85-87A1-AFB1-B5B8-BC]' -> TOKEN "Label",
        '8181-BF' -> TOKEN "Label",
        '82[80-9699-9A9D-9FA1-BF]' -> TOKEN "Label",
        '83[80-BABC-BF]' -> TOKEN "Label",
        '84[85-AFB1-BF]' -> TOKEN "Label",
        '[8590-BF]80-BF' -> TOKEN "Label",
        '86[80-8EA0-BA]' -> TOKEN "Label",
        '87B0-BF' -> TOKEN "Label"
    ],
    'E4' -> [
        'B680-B5' -> TOKEN "Label",
        '[80-B5B8-BF]80-BF' -> TOKEN "Label"
    ],
    'E9' -> [
        'BF80-AF' -> TOKEN "Label",
        '80-BE80-BF' -> TOKEN "Label"
    ],
    'EA' -> [
        '9280-8C' -> TOKEN "Label",
        '9390-BD' -> TOKEN "Label",
        '98[80-8C90-AB]' -> TOKEN "Label",
        '99[80-B2B4-BDBF]' -> TOKEN "Label",
        '[80-9194-979A9DA2A6AAAEB0-BF]80-BF' -> TOKEN "Label",
        '9B80-B1' -> TOKEN "Label",
        '9C[97-9FA2-BF]' -> TOKEN "Label",
        '9E[80-888B-B9]' -> TOKEN "Label",
        '9FB7-BF' -> TOKEN "Label",
        'A080-A7' -> TOKEN "Label",
        'A180-B3' -> TOKEN "Label",
        'A3[80-8590-99A0-B7BBBD-BF]' -> TOKEN "Label",
        'A4[80-ADB0-BF]' -> TOKEN "Label",
        'A5[80-93A0-BC]' -> TOKEN "Label",
        'A7[808F-99A0-BE]' -> TOKEN "Label",
        'A880-B6' -> TOKEN "Label",
        'A9[80-8D90-99A0-B6BA-BF]' -> TOKEN "Label",
        'AB[80-829B-9DA0-AFB2-B6]' -> TOKEN "Label",
        'AC[81-8689-8E91-96A0-A6A8-AEB0-BF]' -> TOKEN "Label",
        'AD[80-9A9C-A5B0-BF]' -> TOKEN "Label",
        'AF[80-AAAC-ADB0-B9]' -> TOKEN "Label"
    ],
    'ED' -> [
        '9E[80-A3B0-BF]' -> TOKEN "Label",
        '9F[80-868B-BB]' -> TOKEN "Label",
        '80-9D80-BF' -> TOKEN "Label"
    ],
    'EF' -> [
        'A9[80-ADB0-BF]' -> TOKEN "Label",
        '[A4-A8AAB0-B3BA]80-BF' -> TOKEN "Label",
        'AB80-99' -> TOKEN "Label",
        'AC[80-8693-979D-A8AA-B6B8-BCBE]' -> TOKEN "Label",
        'AD[80-8183-8486-BF]' -> TOKEN "Label",
        'AE80-B1' -> TOKEN "Label",
        'AF93-BF' -> TOKEN "Label",
        'B480-BD' -> TOKEN "Label",
        'B590-BF' -> TOKEN "Label",
        'B6[80-8F92-BF]' -> TOKEN "Label",
        'B7[80-87B0-BB]' -> TOKEN "Label",
        'B8[80-8FA0-AFB3-B4]' -> TOKEN "Label",
        'B9[8D-8FB0-B4B6-BF]' -> TOKEN "Label",
        'BB80-BC' -> TOKEN "Label",
        'BC[90-99A1-BABF]' -> TOKEN "Label",
        'BD[81-9AA6-BF]' -> TOKEN "Label",
        'BE80-BE' -> TOKEN "Label",
        'BF[82-878A-8F92-979A-9C]' -> TOKEN "Label"
    ],
    'F0' -> [
        '90' -> [
            '80[80-8B8D-A6A8-BABC-BDBF]' -> TOKEN "Label",
            '81[80-8D90-9D]' -> TOKEN "Label",
            '[8290-9198-9BB0]80-BF' -> TOKEN "Label",
            '8380-BA' -> TOKEN "Label",
            '8580-B4' -> TOKEN "Label",
            '87BD' -> TOKEN "Label",
            '8A[80-9CA0-BF]' -> TOKEN "Label",
            '8B[80-90A0]' -> TOKEN "Label",
            '8C[80-9FAD-BF]' -> TOKEN "Label",
            '8D[80-8A90-BA]' -> TOKEN "Label",
            '8E[80-9DA0-BF]' -> TOKEN "Label",
            '8F[80-8388-8F91-95]' -> TOKEN "Label",
            '92[80-9DA0-A9B0-BF]' -> TOKEN "Label",
            '93[80-9398-BB]' -> TOKEN "Label",
            '94[80-A7B0-BF]' -> TOKEN "Label",
            '9580-A3' -> TOKEN "Label",
            '9C80-B6' -> TOKEN "Label",
            '9D[80-95A0-A7]' -> TOKEN "Label",
            'A0[80-85888A-B5B7-B8BCBF]' -> TOKEN "Label",
            'A1[80-95A0-B6]' -> TOKEN "Label",
            'A280-9E' -> TOKEN "Label",
            'A3[A0-B2B4-B5]' -> TOKEN "Label",
            'A4[80-95A0-B9]' -> TOKEN "Label",
            'A6[80-B7BE-BF]' -> TOKEN "Label",
            'A8[80-8385-868C-9395-9799-B5B8-BABF]' -> TOKEN "Label",
            'A9A0-BC' -> TOKEN "Label",
            'AA80-9C' -> TOKEN "Label",
            'AB[80-8789-A6]' -> TOKEN "Label",
            'AC80-B5' -> TOKEN "Label",
            'AD[80-95A0-B2]' -> TOKEN "Label",
            'AE80-91' -> TOKEN "Label",
            'B180-88' -> TOKEN "Label",
            'B2-B380-B2' -> TOKEN "Label",
            'B4[80-A7B0-B9]' -> TOKEN "Label",
            'BC[80-9CA7B0-BF]' -> TOKEN "Label",
            'BD80-90' -> TOKEN "Label"
        ],
        '91' -> [
            '[8086909298]80-BF' -> TOKEN "Label",
            '81[80-86A6-AFBF]' -> TOKEN "Label",
            '[82A0]80-BA' -> TOKEN "Label",
            '83[90-A8B0-B9]' -> TOKEN "Label",
            '84[80-B4B6-BF]' -> TOKEN "Label",
            '85[84-8690-B3B6]' -> TOKEN "Label",
            '87[80-8489-8C90-9A9C]' -> TOKEN "Label",
            '88[80-9193-B7BE]' -> TOKEN "Label",
            '8A[80-86888A-8D8F-9D9F-A8B0-BF]' -> TOKEN "Label",
            '8B[80-AAB0-B9]' -> TOKEN "Label",
            '8C[80-8385-8C8F-9093-A8AA-B0B2-B3B5-B9BB-BF]' -> TOKEN "Label",
            '8D[80-8487-888B-8D90979D-A3A6-ACB0-B4]' -> TOKEN "Label",
            '91[80-8A90-999E]' -> TOKEN "Label",
            '93[80-858790-99]' -> TOKEN "Label",
            '96[80-B5B8-BF]' -> TOKEN "Label",
            '97[8098-9D]' -> TOKEN "Label",
            '99[808490-99]' -> TOKEN "Label",
            '9A80-B7' -> TOKEN "Label",
            '9B80-89' -> TOKEN "Label",
            '9C[80-9A9D-ABB0-B9]' -> TOKEN "Label",
            'A2A0-BF' -> TOKEN "Label",
            'A3[80-A9BF]' -> TOKEN "Label",
            'A880-BE' -> TOKEN "Label",
            'A9[8790-BF]' -> TOKEN "Label",
            'AA[80-8386-999D]' -> TOKEN "Label",
            'AB80-B8' -> TOKEN "Label",
            'B0[80-888A-B6B8-BF]' -> TOKEN "Label",
            'B1[8090-99B2-BF]' -> TOKEN "Label",
            'B2[80-8F92-A7A9-B6]' -> TOKEN "Label",
            'B4[80-8688-898B-B6BABC-BDBF]' -> TOKEN "Label",
            'B5[80-8790-99A0-A5A7-A8AA-BF]' -> TOKEN "Label",
            'B6[80-8E90-9193-98A0-A9]' -> TOKEN "Label",
            'BBA0-B6' -> TOKEN "Label"
        ],
        '92' -> [
            '8E80-99' -> TOKEN "Label",
            '[80-8D9092-94]80-BF' -> TOKEN "Label",
            '9180-AE' -> TOKEN "Label",
            '9580-83' -> TOKEN "Label"
        ],
        '93' -> [
            '9080-AE' -> TOKEN "Label",
            '80-8F80-BF' -> TOKEN "Label"
        ],
        '94' -> [
            '9980-86' -> TOKEN "Label",
            '90-9880-BF' -> TOKEN "Label"
        ],
        '96' -> [
            'A880-B8' -> TOKEN "Label",
            'A9[80-9EA0-A9]' -> TOKEN "Label",
            'AB[90-ADB0-B4]' -> TOKEN "Label",
            'AC80-B6' -> TOKEN "Label",
            'AD[80-8390-99A3-B7BD-BF]' -> TOKEN "Label",
            'AE80-8F' -> TOKEN "Label",
            '[A0-A7B9BC]80-BF' -> TOKEN "Label",
            'BD[80-8490-BE]' -> TOKEN "Label",
            'BE8F-9F' -> TOKEN "Label",
            'BFA0-A1' -> TOKEN "Label"
        ],
        '[97A0-A9AD]80-BF80-BF' -> TOKEN "Label",
        '98' -> [
            '9F80-B1' -> TOKEN "Label",
            'AB80-B2' -> TOKEN "Label",
            '[80-9EA0-AA]80-BF' -> TOKEN "Label"
        ],
        '9B' -> [
            '8480-9E' -> TOKEN "Label",
            '85B0-BF' -> TOKEN "Label",
            '8B80-BB' -> TOKEN "Label",
            '[80-8386-8AB0]80-BF' -> TOKEN "Label",
            'B1[80-AAB0-BC]' -> TOKEN "Label",
            'B2[80-8890-999D-9E]' -> TOKEN "Label"
        ],
        '9D' -> [
            '85[A5-A9AD-B2BB-BF]' -> TOKEN "Label",
            '86[80-8285-8BAA-AD]' -> TOKEN "Label",
            '8982-84' -> TOKEN "Label",
            '[9096-99]80-BF' -> TOKEN "Label",
            '91[80-9496-BF]' -> TOKEN "Label",
            '92[80-9C9E-9FA2A5-A6A9-ACAE-B9BBBD-BF]' -> TOKEN "Label",
            '93[80-8385-BF]' -> TOKEN "Label",
            '94[80-8587-8A8D-9496-9C9E-B9BB-BE]' -> TOKEN "Label",
            '95[80-84868A-9092-BF]' -> TOKEN "Label",
            '9A[80-A5A8-BF]' -> TOKEN "Label",
            '9B[8082-9A9C-BABC-BF]' -> TOKEN "Label",
            '9C[80-9496-B4B6-BF]' -> TOKEN "Label",
            '9D[80-8E90-AEB0-BF]' -> TOKEN "Label",
            '9E[80-888A-A8AA-BF]' -> TOKEN "Label",
            '9F[80-8284-8B8E-BF]' -> TOKEN "Label",
            'A8[80-B6BB-BF]' -> TOKEN "Label",
            'A9[80-ACB5]' -> TOKEN "Label",
            'AA[849B-9FA1-AF]' -> TOKEN "Label"
        ],
        '9E' -> [
            '80[80-8688-989B-A1A3-A4A6-AA]' -> TOKEN "Label",
            'A3[80-8490-96]' -> TOKEN "Label",
            '[A0-A2A4]80-BF' -> TOKEN "Label",
            'A5[80-8A90-99]' -> TOKEN "Label",
            'B8[80-8385-9FA1-A2A4A7A9-B2B4-B7B9BB]' -> TOKEN "Label",
            'B9[8287898B8D-8F91-929497999B9D9FA1-A2A4A7-AAAC-B2B4-B7B9-BCBE]' -> TOKEN "Label",
            'BA[80-898B-9BA1-A3A5-A9AB-BB]' -> TOKEN "Label"
        ],
        '9F' -> [
            '84B0-BF' -> TOKEN "Label",
            '85[80-8990-A9B0-BF]' -> TOKEN "Label",
            '8680-89' -> TOKEN "Label"
        ],
        'AA' -> [
            '9B80-96' -> TOKEN "Label",
            '[80-9A9C-BF]80-BF' -> TOKEN "Label"
        ],
        'AB' -> [
            '9C80-B4' -> TOKEN "Label",
            'A0[80-9DA0-BF]' -> TOKEN "Label",
            '[80-9B9D-9FA1-BF]80-BF' -> TOKEN "Label"
        ],
        'AC' -> [
            'BA[80-A1B0-BF]' -> TOKEN "Label",
            '[80-B9BB-BF]80-BF' -> TOKEN "Label"
        ],
        'AE' -> [
            'AF80-A0' -> TOKEN "Label",
            '80-AE80-BF' -> TOKEN "Label"
        ],
        'AF' -> [
            'A880-9D' -> TOKEN "Label",
            'A0-A780-BF' -> TOKEN "Label"
        ]
    ],
    'F3A0' -> [
        '8780-AF' -> TOKEN "Label",
        '84-8680-BF' -> TOKEN "Label"
    ],
    '[E5-E8EB-EC]80-BF80-BF' -> TOKEN "Label"
]

This is pretty much working as intended, given that \w covers all possible unicode character lettering ranges, which then get converted to byte ranges. Doing a + after in the regex requires that we append another one of those at the end of each leaf, which can be simulated with a r"\w\w" pattern. This is bound to be slow, but it should still compile in finite time.

Given that this pattern compiled in 0.9 (as mentioned in #68), this is a regression in 0.10.

The text was updated successfully, but these errors were encountered:

maciejhirsz · 2019-02-20T20:06:20Z

The loop seems to happen inside code generator. It might not even be infinite, but my machine runs out of RAM trying to handle it (with 32GB, that really shouldn't happen).

My current hunch is that it's due to hashing and attempting to de-duplicate branches or patterns...

maciejhirsz · 2019-03-06T20:46:47Z

Update: this is actually not an infinite loop. The code produced for \w\w is some 228k lines, rustc runs out of memory when trying to manage that...

maciejhirsz · 2020-03-30T21:57:20Z

Fixed with #94.

maciejhirsz mentioned this issue Feb 20, 2019

Fix r"\w+" infinite loop #71

Closed

maciejhirsz added the bug Something isn't working label Feb 20, 2019

This was referenced Mar 22, 2020

Refactor logos-derive from tree to graph #94

Merged

Eats up all my RAM during building then dies. #78

Closed

logos_derive melts the CPU for XID, then gets a SIGKILL #79

Closed

logos_derive stack overflow in regex macro #80

Closed

maciejhirsz closed this as completed Mar 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large branches on repeat fall into infinite loop in proc macro #70

Large branches on repeat fall into infinite loop in proc macro #70

maciejhirsz commented Feb 20, 2019

maciejhirsz commented Feb 20, 2019

maciejhirsz commented Mar 6, 2019

maciejhirsz commented Mar 30, 2020

Large branches on repeat fall into infinite loop in proc macro #70

Large branches on repeat fall into infinite loop in proc macro #70

Comments

maciejhirsz commented Feb 20, 2019

maciejhirsz commented Feb 20, 2019

maciejhirsz commented Mar 6, 2019

maciejhirsz commented Mar 30, 2020