Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize RAGE queries and fix some decoding #96

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

edwintorok
Copy link
Member

I've been testing this for a while, and the Quebec performance report was created using this.

Initially I just parallelized each row in the brief report, but then parallelizing all other queries wasn't that difficult, just had to work through the monadic conversion until it compiled :)

This also includes the update to OCaml 4.08.1 and the latest Core and Async version, and the Dune build system. Plus fixes for some other bugs found while testing this (e.g. regarding the encoding for -- seen in some new Phoronix tests, etc.).

There are some more improvements coming, but that is a topic for another PR (in fact this one should also probably be split into multiple PRs).

mg12ctx and others added 25 commits November 29, 2019 16:32
It wants an `&` at the end, otherwise the javascript interprets
`#brief_report_analysis` as part of whatever ends up as the last parameter.

Signed-off-by: Edwin Török <[email protected]>
XenRT suite files can contain '<include filename="perf.inc" />' to include variable definitions
from another file.
The filename itself can be a variable, usually ${PRODUCT_VERSION}.

Read PRODUCT_VERSION default from the configuration file (in our case /etc/rage_passwd).
Fetch and parse the include file for any parameters, and make upper-case parameters available for substitution.

The suite files can also contain RAGE's own lisp-like language which has lower-case variables,
so keep lower-case variables as is for expansion later by the lisp-like interpreter.

Once we have the substitutions apply them to each row we read from the suite file.

RAGE still runs on OCaml 4.01, and OMake, so this is using the Str module for regexes.
Should replace this some day with newer OCaml version and 're' module.

Tested by copying to /var/www/rage-test-edvint on rage, and opening
http://rage.uk.xensource.com/test-edvint.cgi?p=brief&id=https://info.citrite.net/display/~svcacct_ragebot/RAGE+report+test

Signed-off-by: Edwin Török <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
Polymorphic comparison is not visible by default now, the default
comparison operators are for int.
Have to locally open the module for the appropriate type, or use
Polymorphic_compare.(a = b) if none is available.

Similarly all List lookup functions take an ~equal parameter now.

List.sort and List.dedup_and_sort take a ~compare parameter
(there is no dedup without sorting anymore).

Be careful with comparing NULL values when deduping and sorting
(NULL) fails the int_of_string which could cause List.dedup_and_sort to loose 1 value
as it considered it identical to NULL if the `cmp` function was used for
comparison. Make sure to use the String.compare function as the original
could would implicitly use.

Signed-off-by: Edwin Török <[email protected]>
-- is not allowed inside XML tags.

Some phoronix tests have -- in tc_arguments now.

Signed-off-by: Edwin Török <[email protected]>
This is BIOS, UEFI, etc. which could matter when comparing results.

Signed-off-by: Edwin Török <[email protected]>
Again, some Phoronix tests use this.

Signed-off-by: Edwin Török <[email protected]>
patches_applied is a simplified form of what hotfixes have been applied,
e.g XS80E{001-006}, and most importantly the empty string when testing
an RTM release. This allows to create performance comparisons with RTM
releases easily.
(RTM releases have the 'all_hotfixes' build_tag with 0 patches applied,
 so previously we couldn't filter them)

build_is_release tries to determine whether a hotfix was a dev-signed
test, or a release-signed hotfix test. Release-signed doesn't
necessarily mean we released it, but if we remove all the dev-signed
ones we'll have less noise in our results.
Also unreleased hotfixes can (and in fact) have performance regressions
that were caught and fixed before a release. We do not want the same
performance regressions to go unnoticed in master.

Once a hotfix is released a cronjob could go through previous tests that
were imported and blacklist them, but for now this is a good
approximation.

Signed-off-by: Edwin Török <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
… query thousands of times

This reduces brief report load time to ~6s from ~10s for 90 rows (the full report contains thousands)

Signed-off-by: Edwin Török <[email protected]>
Use a Deferred monad for performing queries.
This is similar to Lwt: instead of getting a response we get a
promise/future that we can evaluate later when the answer is available.

This'll allow us to do things like List.map and run bunch of queries in
parallel.

I simply run the postgresql query in a different thread on its own
connection (limited to number of cores/machine).

It would be possible to avoid using multiple threads and use
multiplexing via epoll and non-blocking reads/writes but libpq is
difficult to use: it will reconnect and change the file descriptor
behing your back which will cause epoll to fail because it is no longer
registered for epoll.

Signed-off-by: Edwin Török <[email protected]>
Using the ppx syntax extension:
`let%map y = f x in ...` is equivalent to `f x >>| fun y -> ..`
`let%bind y = f x in ...` is equivalent to `f x >>= fun y -> ...`

Except these also allow you to do parallel binds, which is tedious to
accomplish "manually".

Signed-off-by: Edwin Török <[email protected]>
Also use Deferred.List.map instead of List.map

Signed-off-by: Edwin Török <[email protected]>
RAGE is not a long running daemon, it is a CGI, so stop the async
scheduler when done.

Signed-off-by: Edwin Török <[email protected]>
a suite can include quebec.inc, which includes perf_release.inc.
The latter defines the debian/centos distro that we use, so we must
expand nested includes.

Signed-off-by: Edwin Török <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
Especially with nested includes we need to fetch quite a few files,
so parallelize where possible.

OpenSSL is not thread safe by default, so we have to initialize
threading mode before using it.
(We don't use it directly, but through curl)

Signed-off-by: Edwin Török <[email protected]>
The median is only the n/2 indexed element when
there are an odd number of elements, otherwise have to interpolate.

Also calculate the Tukey Fences, defined as Q1-1.5*IQR, Q3+1.5*IQR with IQR=Q3-Q1.
Useful for spotting outliers, or non-normal distributions.

We usually have a small number of data points, and no guarantees that
the data is normally distributed, so using median and quantile
statistics is probably the right choice since they are more robust to
outliers.
(e.g. due to caching effects the 1st data point in a VM boot time
 could be significantly larger than the rest, although still valid)

Signed-off-by: Edwin Török <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
Signed-off-by: Edwin Török <[email protected]>
Copy link
Member

@jjd27 jjd27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the net effect of these changes looks good to me!

The patches themselves could be a little better organised, e.g. 51ee3c8 ("async wip") could be merged into the previous commit. And the introduction of postgresql_async is spread over multiple commits (e.g. the wrap_sql function is introduced in 4dcb9a9, and the calls to connect_pool and destroy_pool are added relatively late). Also src/OMakefile is deleted in 85d690b which feels like it belongs in an earlier commit.

| Some x -> x

let rage_username = get_config "rage_username"
let rage_password = get_config "rage_password"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we already have config fields called "rage_user" and "rage_pass"? See OMakefile. Is there a reason why these need to be different from those?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are for the database, and this one is for a service account used to access confluence/bitbucket (they no longer work without authentication).
Agreed that it is confusing if both are called rage_user*, I'll rename to something more obvious.

RUN_CMD=OCAMLRUNPARAM='b1' ./$(PROGRAM) "$(SETTINGS)"
SETTINGS=host=$(RAGE_HOST) user=$(RAGE_USER) password=$(RAGE_PASS) dbname=$(RAGE_DB)
PROGRAM=rage
RUN_CMD=OCAMLRUNPARAM='b1' ./$(PROGRAM) "$(SETTINGS)" /etc/rage_passwd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this file /etc/rage_passwd? It doesn't look like this belongs in this commit...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, should probably be called like svcacct_passwd

with _ -> 0
in
List.sort ~cmp (List.dedup (aux [] nRows))
List.sort ~compare (List.dedup_and_sort ~compare (aux [] nRows))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably the List.sort is redundant if we're doing a List.dedup_and_sort.

@@ -70,6 +70,7 @@ let t ~args = object (self)
("%2F","/");("%3F","?" ); ("%3D","="); ("%26","&");
("%25","%");("+"," "); ("%3E",">"); ("%3C","<");
("%3A",":");("&amp;","&");("&quot;","\"");
("%2d","-");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since most of the other matches assume upper-case hex digits, do we need to worry about "%2D"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add both.

@edwintorok
Copy link
Member Author

Thanks for the review, and Happy New Year!
I'll clean up the commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants