-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs blocks (#810) #888
Docs blocks (#810) #888
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really love this. Minimal comments after my first pass through the code -- everything in here looks super good. I'll have some more questions I think after playing around with this in concert with schema yml v2
def find_docs_by_name(self, name, package=None): | ||
for unique_id, doc in self.docs.items(): | ||
parts = unique_id.split('.') | ||
if len(parts) != 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am i crazy or should parts
have 3 items here? Something like docs.project_name.docs_name
?
dbt/parser/docs.py
Outdated
class DocumentationParser(BaseParser): | ||
@classmethod | ||
def load_file(cls, package_name, root_dir, relative_dirs): | ||
"""Load and parse documentation in a lsit of projects. Returns a list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lil typo here
|
||
# because docs are in their own graph namespace, node type doesn't | ||
# need to be part of the unique ID. | ||
unique_id = '{}.{}'.format(docfile.package_name, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah! I see. Is there any benefit to keeping the format of these unique ids consistent with the rest of dbt's unique ids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. In the nodes
and macros
dicts, there are potentially multiple node types so the .
-separated namespacing by type makes a lot of sense. In the docs
dict (and, I hope, all future manifest members) there's only one type. I would have removed resource_type from the docs entirely if I could have. Docs can only be referenced via doc()
and created via {% docs ... %}
, and nothing else can be referenced that way, so you never have any ambiguity (like you might with seeds vs models and ref
).
That said, if there are strong feelings about consistency, it's a pretty easy change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I buy that
return manifest.find_docs_by_name(target_doc_name, | ||
target_doc_package) | ||
|
||
candidate_targets = [current_project, node_package, None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is phenomenal
dbt/loader.py
Outdated
root_project=root_project, | ||
all_projects=all_projects, | ||
root_dir=project.get('project-root'), | ||
relative_dirs=project.get('source-paths', [])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beckjake this code looks for the docs files in the source-paths
directory. I think that's a pretty reasonable default. Do you think there's any merit to making an (optional) docs-paths
dir as well? I can imagine this would be useful for docs that don't pertain to a specific model, eg. docs for widely-used columns, or the "overview" markdown block, for instance.
cc @cmcarthur
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I didn't even consider that. I just immediately put it next to the schema.yml. But it does make some sense.
If we did so maybe we should consider something like: relative_dirs=project.get('source-paths', []) + project.get('docs-paths', [])
I actually also wouldn't be against also searching from the root directory, if that's feasible with dbt (maybe relative_dirs + ['.']
would work?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beckjake I was thinking the same thing around searching the root directory! I'm a little concerned thought that a random .md file could contain code that would break dbt.
Just as an example, a styleguide.md
file (like this might contain {{ ref(...) }}
in example sql, and I think the docs parser would probably choke on that, right?
After some testing, we do a recursive search through the dirs, so including .
in the list of paths will also snag the target/
directory, which is probably not what we want here.
I think the right way to handle this is to just use docs-paths
, where the default value is:
"docs-paths": ["models"]
This will let users override the dirs in their dbt_project.yml
file, but will still be sort of a sane default value. You buy that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly agree, but I think we should default docs-paths
to the contents of source-paths
, rather than ["models"]
, if we can. It's a subtle difference, but the idea of "by default we search your source paths" seems really compelling to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, I buy that. Let's make sure to document that functionality in the docs for the docs :)
dbt/parser/schemas.py
Outdated
@@ -403,11 +407,16 @@ def parse_model(cls, model, package_name, root_dir, path, root_project, | |||
all_projects, macros) | |||
yield 'test', node | |||
|
|||
context = {'doc': dbt.context.parser.docs(model, docrefs)} | |||
description = model.get('description') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like if the model doesn't have a description
, we pass None
into a ParsedNodePatch
below, resulting in a validation error. Can we use an empty string default here?
File "/Users/drew/fishtown/dbt/dbt/parser/schemas.py", line 448, in load_and_parse
for result_type, node in v2_results:
File "/Users/drew/fishtown/dbt/dbt/parser/schemas.py", line 364, in parse_v2_yml
for node_type, node in iterator:
File "/Users/drew/fishtown/dbt/dbt/parser/schemas.py", line 416, in parse_model
patch = ParsedNodePatch(
File "/Users/drew/fishtown/dbt/dbt/api/object.py", line 37, in __init__
self.validate()
File "/Users/drew/fishtown/dbt/dbt/api/object.py", line 85, in validate
raise ValidationException(msg)
dbt.exceptions.ValidationException: Runtime Error
Invalid arguments passed to "ParsedNodePatch" instance: description.None is not of type 'string'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes - I thought I got all of these, I'll see if I left any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ship it!
Add support for
docs
blocks and support for adoc()
Jinja macro that references them.