Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt builtin profiler #1020

Merged
merged 6 commits into from
Nov 20, 2018
Merged

dbt builtin profiler #1020

merged 6 commits into from
Nov 20, 2018

Conversation

cmcarthur
Copy link
Member

@cmcarthur cmcarthur commented Sep 20, 2018

I built a very blunt profiling tool to use with dbt. You can use it with the --record-timing-info or -r flags. So, dbt -r output.profile compile will run that command, writing profiler output to output.profile.

Fixes #1001

@cmcarthur
Copy link
Member Author

Here's an example output, visualized in snakeviz:

screen shot 2018-09-20 at 8 46 01 am

The most obvious low-hanging fruit here is that threading.py:263(wait) -- it turns out that we lock around actually acquiring a connection to the database. This seems unnecessary, we should only have to lock around acquiring a connection out of the pool, not necessarily opening it and performing other I/O on the database.

The more complex one (and also a bigger win if we can untangle it) is the loader -- it takes around 60% of the total run time. Half of that is spent in Jinja parsing

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! And this is great data to have, I think we could make a lot of great perf decisions with the help of this tool.

@elexisvenator
Copy link
Contributor

I pulled this branch locally, keen to see what is happening in my own projects.. and I get an error running any dbt command :( This is the output of dbt deps. The repo in question is https://github.com/elexisvenator/dbt-performance-issue

2018-09-20 23:26:25,252 (MainThread): Tracking: tracking
2018-09-20 23:26:25,252 (MainThread): Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f346f4f1be0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f346f4f1e80>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f346f4f1898>]}
2018-09-20 23:26:26,225 (MainThread): Executing "git clone --depth 1 https://github.com/fishtown-analytics/dbt-utils.git 859fd0fa21176efee375a447937f7a46"
2018-09-20 23:26:26,229 (MainThread): STDOUT: "b''"
2018-09-20 23:26:26,229 (MainThread): STDERR: "b"fatal: destination path '859fd0fa21176efee375a447937f7a46' already exists and is not an empty directory.\n""
2018-09-20 23:26:26,229 (MainThread): Updating existing dependency 859fd0fa21176efee375a447937f7a46.
2018-09-20 23:26:26,230 (MainThread): Executing "git rev-parse HEAD"
2018-09-20 23:26:26,234 (MainThread): STDOUT: "b'769794490d24b9efa1bede0a7c36451a630f9166\n'"
2018-09-20 23:26:26,234 (MainThread): STDERR: "b''"
2018-09-20 23:26:26,234 (MainThread):   Checking out branch 0.1.8.
2018-09-20 23:26:26,234 (MainThread): Executing "git remote set-branches origin 0.1.8"
2018-09-20 23:26:26,240 (MainThread): STDOUT: "b''"
2018-09-20 23:26:26,240 (MainThread): STDERR: "b''"
2018-09-20 23:26:26,241 (MainThread): Executing "git fetch --tags --depth 1 origin 0.1.8"
2018-09-20 23:26:27,767 (MainThread): STDOUT: "b''"
2018-09-20 23:26:27,767 (MainThread): STDERR: "b'From https://github.com/fishtown-analytics/dbt-utils\n * tag               0.1.8      -> FETCH_HEAD\n'"
2018-09-20 23:26:27,767 (MainThread): Executing "git tag --list"
2018-09-20 23:26:27,771 (MainThread): STDOUT: "b'0.0.1\n0.1.0\n0.1.1\n0.1.10\n0.1.11\n0.1.12\n0.1.13\n0.1.14\n0.1.15\n0.1.2\n0.1.3\n0.1.4\n0.1.5\n0.1.6\n0.1.7\n0.1.8\n0.1.9\n'"
2018-09-20 23:26:27,772 (MainThread): STDERR: "b''"
2018-09-20 23:26:27,772 (MainThread): Executing "git reset --hard tags/0.1.8"
2018-09-20 23:26:27,776 (MainThread): STDOUT: "b'HEAD is now at 7697944 Update README.md\n'"
2018-09-20 23:26:27,777 (MainThread): STDERR: "b''"
2018-09-20 23:26:27,777 (MainThread): Executing "git rev-parse HEAD"
2018-09-20 23:26:27,780 (MainThread): STDOUT: "b'769794490d24b9efa1bede0a7c36451a630f9166\n'"
2018-09-20 23:26:27,780 (MainThread): STDERR: "b''"
2018-09-20 23:26:27,781 (MainThread):   Already at 7697944, nothing to do.
2018-09-20 23:26:27,783 (MainThread): Executing "git clone --depth 1 https://github.com/PageUpPeopleOrg/pageup-dbt-utils.git a07af741d8a5793c88f4b3a3828050ff"
2018-09-20 23:26:27,787 (MainThread): STDOUT: "b''"
2018-09-20 23:26:27,787 (MainThread): STDERR: "b"fatal: destination path 'a07af741d8a5793c88f4b3a3828050ff' already exists and is not an empty directory.\n""
2018-09-20 23:26:27,787 (MainThread): Updating existing dependency a07af741d8a5793c88f4b3a3828050ff.
2018-09-20 23:26:27,787 (MainThread): Executing "git rev-parse HEAD"
2018-09-20 23:26:27,791 (MainThread): STDOUT: "b'1c9d8d36ae3518a6d2bf0e9df0fc08f5f1d7fd59\n'"
2018-09-20 23:26:27,791 (MainThread): STDERR: "b''"
2018-09-20 23:26:27,791 (MainThread):   Checking out branch 1.2.0.
2018-09-20 23:26:27,791 (MainThread): Executing "git remote set-branches origin 1.2.0"
2018-09-20 23:26:27,795 (MainThread): STDOUT: "b''"
2018-09-20 23:26:27,795 (MainThread): STDERR: "b''"
2018-09-20 23:26:27,795 (MainThread): Executing "git fetch --tags --depth 1 origin 1.2.0"
2018-09-20 23:26:29,247 (MainThread): STDOUT: "b''"
2018-09-20 23:26:29,247 (MainThread): STDERR: "b'From https://github.com/PageUpPeopleOrg/pageup-dbt-utils\n * tag               1.2.0      -> FETCH_HEAD\n'"
2018-09-20 23:26:29,247 (MainThread): Executing "git tag --list"
2018-09-20 23:26:29,252 (MainThread): STDOUT: "b'0.1.0\n0.1.1\n0.2.0\n0.2.1\n1.0.0\n1.1.0\n1.2.0\n2.0.0\n'"
2018-09-20 23:26:29,252 (MainThread): STDERR: "b''"
2018-09-20 23:26:29,252 (MainThread): Executing "git reset --hard tags/1.2.0"
2018-09-20 23:26:29,258 (MainThread): STDOUT: "b'HEAD is now at 1c9d8d3 Update equality_for_range to support filtering out timestamp columns. (#3)\n'"
2018-09-20 23:26:29,258 (MainThread): STDERR: "b''"
2018-09-20 23:26:29,258 (MainThread): Executing "git rev-parse HEAD"
2018-09-20 23:26:29,263 (MainThread): STDOUT: "b'1c9d8d36ae3518a6d2bf0e9df0fc08f5f1d7fd59\n'"
2018-09-20 23:26:29,263 (MainThread): STDERR: "b''"
2018-09-20 23:26:29,264 (MainThread):   Already at 1c9d8d3, nothing to do.
2018-09-20 23:26:29,268 (MainThread): Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f346f4cbc88>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f346f4f17b8>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f346f4f1f60>]}
2018-09-20 23:26:30,263 (MainThread): Flushing usage events
2018-09-20 23:26:30,264 (MainThread): Encountered an error:
2018-09-20 23:26:30,265 (MainThread): Runtime Error
  Invalid arguments passed to "Project" instance: models.None is not of type 'object'
2018-09-20 23:26:30,271 (MainThread): Traceback (most recent call last):
  File "/src/dbt/dbt/config.py", line 183, in from_project_config
    ProjectContract(**project_dict)
  File "/src/dbt/dbt/api/object.py", line 37, in __init__
    self.validate()
  File "/src/dbt/dbt/api/object.py", line 88, in validate
    raise JSONValidationException(type(self).__name__, errors)
dbt.exceptions.JSONValidationException: Runtime Error
  Invalid arguments passed to "Project" instance: models.None is not of type 'object'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/src/dbt/dbt/main.py", line 76, in main
    results, succeeded = handle_and_check(args)
  File "/src/dbt/dbt/main.py", line 133, in handle_and_check
    task, res = run_from_args(parsed)
  File "/src/dbt/dbt/main.py", line 188, in run_from_args
    results = run_from_task(task, cfg, parsed)
  File "/src/dbt/dbt/main.py", line 196, in run_from_task
    result = task.run()
  File "/src/dbt/dbt/task/deps.py", line 439, in run
    target_config = final_deps[name].fetch_metadata(self.config)
  File "/src/dbt/dbt/task/deps.py", line 83, in fetch_metadata
    self._cached_metadata = self._fetch_metadata(project)
  File "/src/dbt/dbt/task/deps.py", line 240, in _fetch_metadata
    return project.from_project_root(path)
  File "/src/dbt/dbt/config.py", line 319, in from_project_root
    return cls.from_project_config(project_dict, packages_dict)
  File "/src/dbt/dbt/config.py", line 185, in from_project_config
    raise DbtProjectError(str(e))
dbt.config.DbtProjectError: Runtime Error
  Invalid arguments passed to "Project" instance: models.None is not of type 'object'

It looks like it's not liking something in the pageup_dbt_utils project. Removing that project and all references fixes the issue, not sure exactly what the issue is though.

@cmcarthur
Copy link
Member Author

@elexisvenator do you also get this error on the dev/guion-bluford branch?

@beckjake
Copy link
Contributor

It's caused by the format of the models entry in that project's dbt_project.yml. Yaml interprets

models:

with no data inside it as None instead of []. We should fix that.

@drewbanin
Copy link
Contributor

@beckjake I think the desired default would be {}, but otherwise, yeah, i'm right there with you. Not germane to this particular PR though -- can you open an issue?

from pstats import Stats


@contextmanager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work on py2 and py3? And does it have any impact on error handling?

@drewbanin drewbanin changed the base branch from dev/guion-bluford to dev/grace-kelly November 20, 2018 02:08
@beckjake beckjake merged commit 029ef17 into dev/grace-kelly Nov 20, 2018
@beckjake beckjake deleted the profiler branch November 20, 2018 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants