-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update the benchmark results with Julia 1.8 #48
Comments
We just need someone willing and able to get all the programs running on a single benchmark system. |
Could this instead be set up using GitHub actions for continuous benchmarking? See https://labs.quansight.org/blog/2021/08/github-actions-benchmarks/ for a discussion. It might be a good option, as the performances on a GitHub action VM are pretty consistent: If you run all benchmarks with a single action, then you would guarantee the same VM each time, and could measure the performance ratios. +you could also see the performance comparison across a variety of versions of each language. (This would exclude proprietary languages such as mathematica) |
A bunch of the benchmarked programs are proprietary such ass Matlab and Mathematica. |
It's only those two right? I think it's very reasonable to exclude proprietary software in a reproducible benchmark. e.g., https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html excludes any proprietary languages. Yes you would lose a couple of data points, but you would have an always up-to-date benchmark, which I see as significantly more important. In my opinion, the most important comparisons are against C, Rust, and Fortran (+ maybe numpy), since users of those packages are the ones who would look up speed comparisons - not so much Mathematica users. As long as those are included, we are good. As an alternative option, it seems there are some free versions provided by MATLAB and Mathematica which are available for GitHub actions: |
I disagree. First, it's going to be boring: any language that doesn't get in your way will be pretty fast. Second, the point of Julia is to be good at two things: easier to use than C and faster to run than Python. That's a point the benchmarks have to make, which between their source code and raw numbers, they do. |
I think that's what benchmarks against fast languages show, no?
Anyways, this is a second order effect. The more important point is updating out-of-date benchmarks. Basically I am saying that I don't think mathematica/matlab should be roadblocks to getting updated results against C/Rust/Fortran/Python. |
See #51 which drafts a GitHub workflow for running the suite |
My feeling is that MileCranmer is right, that having up-to-date benchmarks against just open languages is better than having them held up, going back multiple Julia versions, in order to have a couple proprietary languages. I really enjoyed doing the benchmarks back for Julia-1.0, but I haven't able to keep it up, due to the investment of time to update each language environment (many with their own peculiar set-up and build system), and also COVID, which has kept me working at home without access to campus-locked proprietary licenses. I was hoping to return this past fall, but delta and omicron have kept me from that. So I'm supportive of your effort to do this via GitHub workflow. |
Thanks @johnfgibson. So, with the loss of my sanity, I finally got the workflow running in #51 - it correctly generates the various csv files. This is in spite of most languages being easy to set up since there are already GitHub actions available which stack up on the same VM. I therefore greatly empathize with @johnfgibson in knowing that you had to set these up manually each time... The workflow runs for the following languages:
The following benchmarks are not part of the current workflow, for the reasons given below:
I think these excluded languages are lower priority, so I would vote for simply displaying the up-to-date benchmarks with the other languages. Then if/when the broken ones are fixed we can turn them back on. Thoughts? |
Here are the actual updated benchmarks, copied from the workflow's output. Could these automatically update the website after #51 is merged? Seems like parse_integers got a massive improvement compared to the currently displayed results, which is awesome. matrix_multiply also seems like it has put Julia clear in the lead now:
|
#51 is merged now 🎉 How do we update the benchmarks webpage? |
@MilesCranmer thanks for the amazing work on the ci!
I have taken @sbinet's #27 and added a commit to enable the go benchmark in #55 I'll try to work on getting Lua working if I get some time later. Edit: got Lua working as well. |
Also do you know of a way to get the system hardware specifications from the CI machine? While it doesn't necessarily matter to get the comparison, know the actual hardware might help interpret the numbers between CI runs. |
Nice work! I don't know how to get the hardware specs for a particular workflow. According to the docs, linux runs always use 2-core CPU, 7 GB of RAM memory, 14 GB of SSD disk space, (in a virtual machine) but they don't specify whether the CPU changes. According to the article here - https://labs.quansight.org/blog/2021/08/github-actions-benchmarks/, the times are noisy, so times should only be interpreted relative to C, rather than absolute times. |
I see, no worries. Thanks for digging that info up. |
The code for the benchmark webpage is located here: https://github.com/JuliaLang/www.julialang.org/blob/main/benchmarks.md The code used to create the graph is located here and other assets is located here: https://github.com/JuliaLang/www.julialang.org/tree/main/_assets/benchmarks |
I went ahead and updated the plotting code used to work with newer package versions and julia v1.7.2 And using the benchmark data output from the CI plotted the following graph: A couple notes: The data used was from the following CI run #57 : https://github.com/JuliaLang/Microbenchmarks/runs/5531819551?check_suite_focus=true For some Fortran benchmarks (see: #58) the values were interpolated based on the ratio of the old Fortran/C benchmark and that ratio multiplied with the newer C time in the CI benchmarks.csv file. The old ratio is computed based on the data (located here) used to create the current plot on the benchmarks webpage Similarly the Matlab/Mathematica values were interpolated based on their ratios from the same older benchmarks data. Since Go is not included in the CI along with Lua at the moment, Go values were interpolated based on the CI run at #55 Edit: Here is the actual interpolated CSV file I used to plot with: |
Here is a code to plot the benchmarks with PlotlyJS instead of Gadfly. It allows for interactivity such as automatic sorting of languages based on selected benchmarks. # Producing the Julia Microbenchmarks plot
using CSV
using DataFrames
using PlotlyJS
using StatsBase
benchmarks =
CSV.read("interp_benchmarks.csv", DataFrame; header = ["language", "benchmark", "time"])
# Capitalize and decorate language names from datafile
dict = Dict(
"c" => "C",
"fortran" => "Fortran",
"go" => "Go",
"java" => "Java",
"javascript" => "JavaScript",
"julia" => "Julia",
"lua" => "LuaJIT",
"mathematica" => "Mathematica",
"matlab" => "Matlab",
"octave" => "Octave",
"python" => "Python",
"r" => "R",
"rust" => "Rust",
);
benchmarks[!, :language] = [dict[lang] for lang in benchmarks[!, :language]]
# Normalize benchmark times by C times
ctime = benchmarks[benchmarks[!, :language] .== "C", :]
benchmarks = innerjoin(benchmarks, ctime, on = :benchmark, makeunique = true)
select!(benchmarks, Not(:language_1))
rename!(benchmarks, :time_1 => :ctime)
benchmarks[!, :normtime] = benchmarks[!, :time] ./ benchmarks[!, :ctime];
plot(
benchmarks,
x = :language,
y = :normtime,
color = :benchmark,
mode = "markers",
Layout(
xaxis_type = "categorical",
xaxis_categoryorder = "mean ascending",
yaxis_type = "log",
xaxis_title = "",
yaxis_title = "",
),
) plotly doesn't have support for sorting by geometric mean: See https://plotly.com/julia/reference/layout/xaxis/#layout-xaxis-categoryarray and the feature request. This makes it a bit rough for log scales, as the sorting is based on arithmetic mean. I've been thinking about it and I believe that the plotting code should probably reside in this repo instead of the julia website codebase. Only the final benchmark svg file should be pushed to the website repo. In terms of the website benchmark page tho, it might be pretty cool to have an embedded plotly instance for the benchmark graph, similar to what the plotly docs do. This would allow users to see/sort languages based on what benchmark they are most interested in. Some extra nonessential interactivity. Just throwing some ideas. |
From #62 (comment)
So basically like on every commit, get the
For sure, Github Actions has been a boon.
Yeah... How I'm currently handling this (to get the graph as shown here) is to interpolate the actual timings based on the ratios on the last known timing data for those languages which we don't have timings for. I'm not sure if publishing that kind of interpolated data on the JuliaLang website is honest (even with appropriate disclaimers), but I do think that our graph should contain data for those languages as no other benchmarks do. (I'm personally okay with this myself though, interpolated data is better than no data) There are options for CI as discussed here and if it comes down to it, I am still a student and have licenses for these commercial languages. I can try to run the tests myself on local hardware once I fix up tooling PRs such as this one.
While it is old, the information from the new graph is very similar to the previous graph. Rust and Julia both overtake Lua, but that's the only significant (trend) changes besides overall improvements in individual benchmarks. Let's try to 1) use interpolated data or 2) get commercial software working (CI/locally). I'm totally fine making PRs to the julialang website with option 1) as a stopgap till we get updated data with 2). |
I think it's fine if we do a single manual update to the csv/svg on the website, before automating benchmark updates (which might take a while longer to set up).
For now, it's probably best to leave those languages out for now by simply not plotting their points. My subjective view is that showing updated but narrower benchmarks is (probably) more useful to users than showing out-of-date but broader benchmarks. Thoughts? We could state: "Languages X, Y, and Z are not included in the latest benchmarks due to licensing issues, but you may view historical benchmarks comparing these languages to an older version of Julia by going to https://web.archive.org/web/20200807141540/https://julialang.org/benchmarks/" What do you think? |
the other approach would be to provide the out of date benchmarks for them. I think either would be acceptable. |
I disagree with that. This is because after doing the interpolation the only change (trend wise) is Rust/Julia vs Lua. I don't think showing this is enough to justify dropping many languages. Remember interpolation also includes Fortran not just the closed source ones. As a user I don't want to click another link to find the data I want.
Agreed, which is what we are currently doing. Basically, what I'm trying to get at is, we should not update if we gonna do it partially. If we update, we will do it properly. |
Wait, by interpolation, you just mean copying the datapoint from the old graph right? I think if it's just that–keeping performance ratios in the plot–is perfectly fine so long as this is described in the text. I was more thinking about excluding Mathematica/MATLAB from the new plot, if their entire benchmark is out of date (but even this, I don't think it's a big deal to copy the old benchmarks). But not updating, and instead, interpolating specific benchmarks for languages where there is an issue (like the Fortran compilation issue) sounds pretty reasonable to me. |
I would still prefer not to do this, since having comparisons with these languages is rarely seen. It helps new Julia users coming from those closed ecosystems see the light ;)
The reason why I don't like this is because we are essentially taking a shortcut in displaying the data. Especially since getting this to work is in our hands (compared to the Mathematica/Matlab issue). The onus is on us to fix our benchmarks, instead of throwing a rug on top of the actual issue and using older results. In any case for a decision like this I would like for @StefanKarpinski and @ViralBShah to provide the final say. |
SGTM! I suppose I agree–having the Mathematica/MATLAB results is really useful for users from those domains of scientific computing. I think as long as everything is described in the text about exclusions/interpolations, we are fine. I guess the question is: what is the purpose of these benchmarks? Is it a quantitative comparison table for attracting new users, or is it a scientific dataset of performance across languages? If it is the former, inclusion of these proprietary languages (even if the numbers are old) is really important to help demonstrate Julia's advantage against all other languages. If it is the latter, then having up-to-date and accurate numbers is most important, even if it means excluding some languages. In reality it's probably a combination, in which case this question is difficult to answer... |
Pinging this as I just noticed the benchmarks page is still showing Julia 1.0.0. Can we put this up soon? I'm linking to the benchmarks in a paper (coming out in two days) as evidence of Julia being as fast as C++/Rust; would be great if the measurements were up-to-date 🚀 |
Someone needs to set up all these environments and benchmark. I don't have any of these proprietary software licenses, for example. |
Maybe we could just have a second panel of benchmarks on https://julialang.org/benchmarks/?
It has been over 3 years since the last full-scale benchmark, so I don't have high hopes anybody will get around to doing it soon. But it would be great to display Julia 1.8 benchmarks for all to see, though, at least somewhere we can link to. |
We do have large github actions runners available in this org - which will help whenever we set that sort of thing up. @MilesCranmer Would it help if you had commit access to the MicroBenchmarks and this repo so that you can directly edit to your liking? EDIT: Sent you invite. |
Is there a way to see these results: https://github.com/JuliaLang/Microbenchmarks/actions/runs/5567263800/workflow#L103? Did I understand correctly that there would've been a csv file generated that has since been deleted (because the logs have expired)? |
Yes, I believe the logs get deleted, but perhaps we can run it again. |
Since we have made Julia 1.6 the new LTS version, it might make sense to update the benchmark results in https://julialang.org/benchmarks/.
The text was updated successfully, but these errors were encountered: