Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-Scheduler for Soufflé #2237

Merged
merged 42 commits into from
Apr 1, 2022
Merged

Auto-Scheduler for Soufflé #2237

merged 42 commits into from
Apr 1, 2022

Conversation

SamArch27
Copy link
Collaborator

@SamArch27 SamArch27 commented Mar 25, 2022

This PR is to merge our recent auto-scheduler for Soufflé.

Currently, users manually find good join orders for rules (also called schedules) and annotate the rules with plan statements. Our auto-scheduler aims to derive good schedules for the user automatically.

Usage:

souffle <program> -p auto_profile --index-stats
souffle -c <program> --auto-schedule=auto_profile

  1. Given a representative input for the first run, it collects selectivity statistics via indexes and writes them to a profile.
  2. Soufflé reads the statistics from this profile and passes them to an embedded query optimizer, which finds good schedules using a cost model.

Fundamental Limitations:

  1. It only works for finding single-threaded schedules (no consideration of parallelism is involved)
  2. It doesn't work well for rules which exit early and produce 0 tuples
  3. Subtle interactions between indexing and scheduling can lead to sub-optimal performance (work on buffering and sorting is being done to help alleviate this problem somewhat)
  4. It doesn't perform well for very complex recursive workloads

Engineering Limitations:

  1. We only collect stats in interpreter mode for now
  2. We haven't considered the interactions with provenance yet

@codecov
Copy link

codecov bot commented Mar 26, 2022

Codecov Report

Merging #2237 (84401ed) into master (d98a78a) will decrease coverage by 0.76%.
The diff coverage is 67.20%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2237      +/-   ##
==========================================
- Coverage   75.83%   75.06%   -0.77%     
==========================================
  Files         454      455       +1     
  Lines       27588    28462     +874     
==========================================
+ Hits        20920    21364     +444     
- Misses       6668     7098     +430     
Impacted Files Coverage Δ
src/ast/analysis/ProfileUse.h 100.00% <ø> (+100.00%) ⬆️
src/ast2ram/seminaive/UnitTranslator.h 100.00% <ø> (ø)
src/ast2ram/utility/TranslatorContext.h 100.00% <ø> (ø)
src/interpreter/Engine.h 100.00% <ø> (ø)
src/ram/TupleElement.h 76.92% <ø> (ø)
src/ram/analysis/Index.h 98.66% <ø> (ø)
src/synthesiser/Synthesiser.cpp 82.74% <0.00%> (-4.31%) ⬇️
src/ram/CountUniqueKeys.h 48.38% <48.38%> (ø)
src/ram/utility/Visitor.h 54.54% <50.00%> (+1.46%) ⬆️
src/ast/utility/SipsMetric.cpp 49.14% <65.39%> (+27.81%) ⬆️
... and 56 more

@b-scholz
Copy link
Member

We need a test-case for auto-scheduling. This should be a functional test-case rather than a performance test (this is hard to implement with our current infrastructure).

Copy link
Member

@b-scholz b-scholz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Could you implement the count statements in the synthesiser as well?
  • Could you consider using RAM expressions for constants in the counting statements? This would permit packing/unpacking of constant terms and complex constant arithmetic expressions.

src/ast2ram/seminaive/ClauseTranslator.cpp Outdated Show resolved Hide resolved
src/ast2ram/seminaive/ClauseTranslator.cpp Outdated Show resolved Hide resolved
src/ast/analysis/UniqueKeys.cpp Outdated Show resolved Hide resolved
src/ast/analysis/UniqueKeys.cpp Outdated Show resolved Hide resolved
src/ast/analysis/UniqueKeys.cpp Outdated Show resolved Hide resolved
src/ram/CountUniqueKeys.h Outdated Show resolved Hide resolved
src/ram/CountUniqueKeys.h Outdated Show resolved Hide resolved
src/ram/Expression.h Outdated Show resolved Hide resolved
src/ram/FloatConstant.h Outdated Show resolved Hide resolved
src/ram/NumericConstant.h Outdated Show resolved Hide resolved
src/ast/analysis/ProfileUse.cpp Outdated Show resolved Hide resolved
src/ast/analysis/ProfileUse.cpp Outdated Show resolved Hide resolved
src/ast/analysis/ProfileUse.cpp Show resolved Hide resolved

/************************************************************************
*
* @file UniqueKeys.cpp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you rename the file to ProfilingCostCover (or something similar)?

@SamArch27
Copy link
Collaborator Author

The scheduler test keeps failing.

@b-scholz @XiaowenHu96

src/ast/analysis/UniqueKeys.cpp Outdated Show resolved Hide resolved
src/ast/analysis/UniqueKeys.cpp Outdated Show resolved Hide resolved
src/ast/analysis/UniqueKeys.cpp Show resolved Hide resolved
src/ast/analysis/UniqueKeys.cpp Outdated Show resolved Hide resolved
src/ast/analysis/UniqueKeys.cpp Outdated Show resolved Hide resolved
src/ast/analysis/UniqueKeys.cpp Outdated Show resolved Hide resolved
Copy link
Member

@b-scholz b-scholz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

@b-scholz b-scholz merged commit 37b159d into souffle-lang:master Apr 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants