-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add name validation for metrics #5841
Conversation
b200b4c
to
a429cff
Compare
core/dbt/contracts/graph/unparsed.py
Outdated
if "name" in data: | ||
errors = [] | ||
if " " in data["name"]: | ||
errors.append("cannot contain spaces") | ||
if len(data["name"]) > 126: | ||
errors.append("cannot contain more than 126 characters") | ||
if not (re.match(r"^[A-Za-z]", data["name"])): | ||
errors.append("must begin with a letter") | ||
if not (re.match(r"[\w-]+$", data["name"])): | ||
errors.append("must contain only letters, numbers and underscores") | ||
|
||
if errors: | ||
raise ParsingException( | ||
f"Metrics name '{data['name']}' is invalid. It {', '.join(e for e in errors)}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@callum-mcdata would love some help with error messaging here! I went with concatenating all the errors together but open to other solutions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the framework here! I'd only tweak it slightly to The metric name
as opposed to Metrics name
. Apart from that I think it is does a great job of describing the error and I LOVE that it provides all the errors in one failed compile instead of one at a time like I might have novicely implemented it
Ah - I should have updated this issue after I did some more testing but I unfortunately put the information I found in dbt-labs/dbt_metrics#41. The TLDR is that it appears that Postgres/Redshift appear to be handling column name truncation under the hood and so failing queries due to too long metric names only appeared to occur in BigQuery and Snowflake. Should we consider changing the length limit to 250 (lower of BQ/Snowflake) ? |
Sounds reasonable to me @callum-mcdata. That's a simple change. I'll defer to @jtcohen6 for the final decision on length. |
250 chars feels fine — If you can tweet it, you can (probably) make it into a metric One question for you @callum-mcdata: Given that Postgres/Redshift truncate any column/CTE name that's too long, is there any risk for queries containing multiple metrics, where those metrics have the same first 126 characters? (You could also handle that in |
@jtcohen6 huh. Well .... I hadn't thought about that. Is it a fair answer to punt on that and say we'll solve that problem when we get to it? 😅 to me figuring out the pattern to account for that feels like we're optimizing for an edge case. If that problem did arise then I think your idea around fixing it in the package makes sense - it's a usage pattern more than it is a definition pattern so allowing that kind of naming convention should be supported in core. |
resolves #5456
Description
Adds name validation rules:
Adding exposure name validation in #5844
Checklist
changie new
to create a changelog entry