Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[benchmark] [async] Add more statistics for async benchmark #1747

Merged
merged 3 commits into from
Aug 22, 2020

Conversation

xumingkuan
Copy link
Contributor

@xumingkuan xumingkuan commented Aug 22, 2020

Related issue = #742 #1744

Changes:

  • Add instructions, offloaded_tasks, launched_kernels for both sync and async mode when running time benchmark.
  • Unify the time unit to s(second) because it's the standard unit. Although (say) 0.002s is harder to read than 2ms, we can make it more human-readable when plotting.
  • Rename stat_write_yaml to stat_write.
  • Use ti.benchmark() instead of directly ti.stat_write() in mpm2d.py.

Questions:

  • I'm not sure if I can get the compilation time of both sync mode and async mode accurately in this way. Will the one run later have caches warmed up and enjoy a performance boost?
  • I need something like ti.core.print_stat() but does not print it directly onto the screen. I use ti.core.stat() for now but better names are welcome!

Result (it takes 5 mins on my laptop because it runs in not only sync mode but also async mode now):

compilation_time:
  fill_scalar:
    async:
      x64: 0.0
    sync:
      opengl: 0.2922186851501465
      x64: 0.30919861793518066
  flat_range:
    async:
      x64: 0.02692723274230957
    sync:
      opengl: 0.3640279769897461
      x64: 1.8350934982299805
  flat_struct:
    async:
      x64: 0.008976221084594727
    sync:
      opengl: 0.36504411697387695
      x64: 0.3700106143951416
  memcpy:
    async:
      x64: 0.16655611991882324
    sync:
      x64: 0.6672163009643555
  memset:
    async:
      x64: 0.16455960273742676
    sync:
      x64: 0.5754601955413818
  nested_range:
    async:
      x64: 0.012970209121704102
    sync:
      opengl: 0.35205984115600586
      x64: 0.3839755058288574
  nested_range_blocked:
    async:
      x64: 0.009973287582397461
    sync:
      opengl: 0.3440995216369629
      x64: 0.3630058765411377
  nested_struct:
    async:
      x64: 0.042885541915893555
    sync:
      opengl: 0.3560490608215332
      x64: 0.045877695083618164
  nested_struct_fill_and_clear:
    async:
      x64: 0.09474706649780273
    sync:
      x64: 1.8191373348236084
  nested_struct_listgen_16x16:
    async:
      x64: 0.011969327926635742
    sync:
      opengl: 0.3460886478424072
      x64: 0.35505008697509766
  nested_struct_listgen_8x8:
    async:
      x64: 0.01396322250366211
    sync:
      opengl: 0.34707188606262207
      x64: 0.3620309829711914
  range:
    async:
      x64: 0.000997304916381836
    sync:
      opengl: 1.2267189025878906
      x64: 1.3992376327514648
  root_listgen:
    async:
      x64: 0.01496124267578125
    sync:
      opengl: 0.5096497535705566
      x64: 0.36003661155700684
  saxpy:
    async:
      x64: 0.20445513725280762
    sync:
      x64: 0.8936097621917725
  sscal:
    async:
      x64: 0.16456007957458496
    sync:
      x64: 0.6442744731903076
  struct:
    async:
      x64: 0.0009980201721191406
    sync:
      x64: 1.3274266719818115
instructions:
  fill_scalar:
    async:
      x64: 10
    sync:
      x64: 10
  flat_range:
    async:
      x64: 31
    sync:
      x64: 31
  flat_struct:
    async:
      x64: 17
    sync:
      x64: 17
  memcpy:
    async:
      x64: 15
    sync:
      x64: 15
  memset:
    async:
      x64: 12
    sync:
      x64: 12
  nested_range:
    async:
      x64: 28
    sync:
      x64: 28
  nested_range_blocked:
    async:
      x64: 38
    sync:
      x64: 38
  nested_struct:
    async:
      x64: 39
    sync:
      x64: 39
  nested_struct_fill_and_clear:
    async:
      x64: 59
    sync:
      x64: 59
  nested_struct_listgen_16x16:
    async:
      x64: 32
    sync:
      x64: 32
  nested_struct_listgen_8x8:
    async:
      x64: 32
    sync:
      x64: 32
  range:
    async:
      x64: 2143
    sync:
      x64: 2143
  root_listgen:
    async:
      x64: 32
    sync:
      x64: 32
  saxpy:
    async:
      x64: 24
    sync:
      x64: 24
  sscal:
    async:
      x64: 14
    sync:
      x64: 14
  struct:
    async:
      x64: 2121
    sync:
      x64: 2121
offloaded_tasks:
  fill_scalar:
    async:
      x64: 1
    sync:
      x64: 1
  flat_range:
    async:
      x64: 1
    sync:
      x64: 1
  flat_struct:
    async:
      x64: 1
    sync:
      x64: 1
  memcpy:
    async:
      x64: 1
    sync:
      x64: 1
  memset:
    async:
      x64: 1
    sync:
      x64: 1
  nested_range:
    async:
      x64: 1
    sync:
      x64: 1
  nested_range_blocked:
    async:
      x64: 1
    sync:
      x64: 1
  nested_struct:
    async:
      x64: 1
    sync:
      x64: 1
  nested_struct_fill_and_clear:
    async:
      x64: 7
    sync:
      x64: 7
  nested_struct_listgen_16x16:
    async:
      x64: 1
    sync:
      x64: 1
  nested_struct_listgen_8x8:
    async:
      x64: 1
    sync:
      x64: 1
  range:
    async:
      x64: 16
    sync:
      x64: 16
  root_listgen:
    async:
      x64: 1
    sync:
      x64: 1
  saxpy:
    async:
      x64: 2
    sync:
      x64: 2
  sscal:
    async:
      x64: 1
    sync:
      x64: 1
  struct:
    async:
      x64: 16
    sync:
      x64: 16
running_time:
  fill_scalar:
    async:
      x64: 8.978605270385742e-06
    sync:
      opengl: 3.9891481399536135e-05
      x64: 8.973836898803712e-06
  flat_range:
    async:
      x64: 0.029679227556501115
    sync:
      opengl: 0.001077120644705636
      x64: 0.029739067895071847
  flat_struct:
    async:
      x64: 0.010793146133422851
    sync:
      opengl: 0.0009813337326049804
      x64: 0.012237285137176514
  memcpy:
    async:
      x64: 0.16884851455688477
    sync:
      x64: 0.16864922046661376
  memset:
    async:
      x64: 0.16844961643218995
    sync:
      x64: 0.17154133319854736
  nested_range:
    async:
      x64: 0.013238614320755005
    sync:
      opengl: 0.001284564971923828
      x64: 0.013256561279296876
  nested_range_blocked:
    async:
      x64: 0.01028251051902771
    sync:
      opengl: 0.009497085809707642
      x64: 0.010276278257369995
  nested_struct:
    async:
      x64: 0.04715726613998413
    sync:
      opengl: 0.001052863597869873
      x64: 0.04707415421803792
  nested_struct_fill_and_clear:
    async:
      x64: 0.09620947043100993
    sync:
      x64: 0.09634244441986084
  nested_struct_listgen_16x16:
    async:
      x64: 0.012693208966936384
    sync:
      opengl: 0.0010600205830165318
      x64: 0.012566406045641219
  nested_struct_listgen_8x8:
    async:
      x64: 0.015094646453857422
    sync:
      opengl: 0.0010491974353790284
      x64: 0.015016855001449584
  range:
    async:
      x64: 0.001197049915790558
    sync:
      opengl: 2.718234062194824e-05
      x64: 0.0011509231925010681
  root_listgen:
    async:
      x64: 0.015114593207836152
    sync:
      opengl: 0.0011319747567176818
      x64: 0.014946293234825134
  saxpy:
    async:
      x64: 0.20924067497253418
    sync:
      x64: 0.20894131660461426
  sscal:
    async:
      x64: 0.1740346908569336
    sync:
      x64: 0.1723392963409424
  struct:
    async:
      x64: 0.0011940555572509766
    sync:
      x64: 0.0011746039986610412

[Click here for the format server]


@codecov
Copy link

codecov bot commented Aug 22, 2020

Codecov Report

Merging #1747 into master will increase coverage by 0.80%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1747      +/-   ##
==========================================
+ Coverage   43.35%   44.16%   +0.80%     
==========================================
  Files          44       44              
  Lines        6041     5858     -183     
  Branches     1044     1049       +5     
==========================================
- Hits         2619     2587      -32     
+ Misses       3270     3120     -150     
+ Partials      152      151       -1     
Impacted Files Coverage Δ
python/taichi/lang/__init__.py 48.33% <0.00%> (-3.16%) ⬇️
python/taichi/lang/ast_checker.py 70.58% <0.00%> (-1.64%) ⬇️
python/taichi/testing.py 77.41% <0.00%> (-0.71%) ⬇️
python/taichi/lang/linalg.py 89.33% <0.00%> (-0.67%) ⬇️
python/taichi/lang/meta.py 62.31% <0.00%> (-0.54%) ⬇️
python/taichi/misc/util.py 17.17% <0.00%> (-0.30%) ⬇️
python/taichi/misc/gui.py 0.00% <0.00%> (ø)
python/taichi/misc/task.py 0.00% <0.00%> (ø)
python/taichi/lang/shell.py 0.00% <0.00%> (ø)
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4cb7a4d...5e23b00. Read the comment docs.

Copy link
Member

@yuanming-hu yuanming-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! LGTM.

@yuanming-hu
Copy link
Member

  • I'm not sure if I can get the compilation time of both sync mode and async mode accurately in this way. Will the one run later have caches warmed up and enjoy a performance boost?

I guess getting compilation time will be hard given that async does parallel compilation. Maybe a better metric is something like "the time for the first substep to finish". Anyway, in the benchmarks we can focus on run time for now.

  • I need something like ti.core.print_stat() but does not print it directly onto the screen. I use ti.core.stat() for now but better names are welcome!

stat sounds good. We can stick to it for now and systematically refactor later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants