Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support repetition of group containing in regular expression #6469

Open
viadea opened this issue Aug 31, 2022 · 1 comment
Open

[FEA] Support repetition of group containing in regular expression #6469

viadea opened this issue Aug 31, 2022 · 1 comment
Labels
feature request New feature or request

Comments

@viadea
Copy link
Collaborator

viadea commented Aug 31, 2022

I wish we support repetition of group containing in regular expression.
Eg:

from pyspark.sql.types import StructType,StructField, StringType, IntegerType
from pyspark.sql.functions import expr

data2 = [("11p22p33p44p55p.abc",""),
    ("xyz[123","Rose")
  ]

schema = StructType([ \
    StructField("firstname",StringType(),True), \
    StructField("middlename",StringType(),True) \
  ])
 
df = spark.createDataFrame(data=data2,schema=schema)
df.printSchema()
df.show(truncate=False)

individual_match_regex = r"((([0-9]{2}p){5}))\\.abc"

df.withColumn('newcol', expr(f"""   regexp_extract(firstname, "{individual_match_regex}", 1)    """)).show()

Not-supported-messages:

      !Expression <RegExpExtract> regexp_extract(firstname#827, ((([0-9]{2}p){5}))\.abc, 1) cannot run on GPU because cuDF does not support repetition of group containing: [0-9]{2} near index 3; regex group count is 0, but the specified group index is 1
@viadea viadea added feature request New feature or request ? - Needs Triage Need team to review and classify labels Aug 31, 2022
@viadea
Copy link
Collaborator Author

viadea commented Aug 31, 2022

I found one workaround though

individual_match_regex = r"((([0-9]{2}p)([0-9]{2}p)([0-9]{2}p)([0-9]{2}p)([0-9]{2}p)))\\.abc"

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants