-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathHighPerformance_Python_Julia.txt
1251 lines (755 loc) · 41.2 KB
/
HighPerformance_Python_Julia.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# High Performance (Mutliprocess/Multithreading) for Python and Julia
## No.:
## Title:
## https links:
[Julia Lang part]
1.
What scientists must know about hardware to write fast code
https://github.com/jakobnissen/hardware_introduction
https://viralinstruction.com/posts/hardware/
2.
Julia for Economists Bootcamp, 2022
https://github.com/cpfiffer/julia-bootcamp-2022
https://github.com/cpfiffer/julia-bootcamp-2022#session-2-parallelization
https://www.youtube.com/watch?v=trhsvOAH0YI
https://github.com/cpfiffer/julia-bootcamp-2022/blob/main/session-2/parallelization-lecture.ipynb
https://github.com/cpfiffer/julia-bootcamp-2022#session-4-high-performance-julia
https://youtu.be/i35LlZWZl1g
https://github.com/cpfiffer/julia-bootcamp-2022/blob/main/session-4/speed-lecture.ipynb
3.
Hands-On Design Patterns and Best Practices with Julia
https://github.com/PacktPublishing/Hands-on-Design-Patterns-and-Best-Practices-with-Julia
4.
A quick introduction to data parallelism in Julia
https://juliafolds.github.io/data-parallelism/tutorials/quick-introduction/
5.
Parallelization
https://enccs.github.io/Julia-for-HPC/parallelization/
GPU programming
https://enccs.github.io/Julia-for-HPC/GPU/
6.
[ANN] Folds.jl: threaded, distributed, and GPU-based high-level data-parallel interface for Julia
https://discourse.julialang.org/t/ann-folds-jl-threaded-distributed-and-gpu-based-high-level-data-parallel-interface-for-julia/54701/3
https://github.com/JuliaFolds/ParallelMagics.jl
https://github.com/JuliaFolds
7.
Announcing composable multi-threaded parallelism in Julia
https://julialang.org/blog/2019/07/multithreading/
8.
Using Julia
https://www.carc.usc.edu/user-information/user-guides/software-and-programming/julia
# Parallel programming with Julia
Package Purpose
Base.Threads For explicit multi-threading
Distributed For explicit multi-processing
MPI.jl For interfacing to MPI libraries
DistributedArrays.jl For working with distributed arrays
Elemental.jl For distributed linear algebra
ClusterManagers.jl For launching jobs via cluster job schedulers (e.g., Slurm)
Dagger.jl For asynchronous evaluations and workflows
CUDA.jl For interfacing to Nvidia CUDA GPUs
9.
ML ⇌ Science Colaboratory's workshop Introduction to Machine Learning.
Supervised learning: One step at a time
https://github.com/mlcolab/IntroML.jl/blob/main/notebooks/supervised_learning.jl
https://mlcolab.github.io/IntroML.jl/dev/supervised_learning.html
10.
The Enzyme High-Performance Automatic Differentiator of LLVM
https://github.com/EnzymeAD/Enzyme
High-performance automatic differentiation of LLVM.
https://github.com/EnzymeAD/Enzyme.jl
Julia bindings for the Enzyme automatic differentiator
https://github.com/EnzymeAD/oxide-enzyme
Enzyme integration into Rust. Experimental, do not use.
11.
Julia v1.7 Release Notes
https://github.com/JuliaLang/julia/blob/master/HISTORY.md#julia-v17-release-notes
Multidimensional Array Literals
https://julialang.org/blog/2021/11/julia-1.7-highlights/#multidimensional_array_literals
https://github.com/JuliaLang/julia/issues/39285
https://github.com/JuliaLang/julia/issues/45461
https://docs.julialang.org/en/v1/base/arrays/#Base.hvncat
https://github.com/JuliaLang/julia/blob/7e54f9a069df2b382f765d5574787293c816fe26/base/abstractarray.jl#L2119
https://github.com/JuliaLang/julia/blob/master/HISTORY.md#language-changes-1
Multiple successive semicolons in an array expresion were previously ignored (e.g., [1 ;; 2] == [1 ; 2]). This syntax is now used to separate dimensions (see New language features).
v1 = [1, 2] # 2-element Vector{Int64}:
v2 = [3, 4] # 2-element Vector{Int64}:
[v1, v2] # 2-element Vector{Vector{Int64}}:
# [1, 2]
# [3, 4]
[v1; v2] # 4-element Vector{Int64}:
# 1
# 2
# 3
# 4
[v1;; v2] # 2×2 Matrix{Int64}:
# 1 3
# 2 4
[1,2,3 ;; 4,5,6] # syntax: unexpected semicolon in array expression
[1;2;3 ;;4;5;6] #3×2 Matrix{Int64}:
# 1 4
# 2 5
# 3 6
Meta.@lower [1 2;;; 3 4]
:($(Expr(:thunk, CodeInfo(
@ none within `top-level scope`
1 ─ %1 = Core.tuple(1, 2, 2)
│ %2 = Base.hvncat(%1, true, 1, 2, 3, 4)
└── return %2
))))
Meta.@lower [v1;;v2]
:($(Expr(:thunk, CodeInfo(
@ none within `top-level scope`
1 ─ %1 = Base.hvncat(2, v1, v2)
└── return %1
))))
Meta.@lower [v1;v2]
:($(Expr(:thunk, CodeInfo(
@ none within `top-level scope`
1 ─ %1 = Base.vcat(v1, v2)
└── return %1
))))
Meta.@lower [v1, v2]
:($(Expr(:thunk, CodeInfo(
@ none within `top-level scope`
1 ─ %1 = Base.vect(v1, v2)
└── return %1
))))
Meta.@lower[1 2;;
3 4]
:($(Expr(:thunk, CodeInfo(
@ none within `top-level scope`
1 ─ %1 = Base.hcat(1, 2, 3, 4)
└── return %1
))))
Property Destructuring
https://julialang.org/blog/2021/11/julia-1.7-highlights/#property_destructuring
https://github.com/JuliaLang/julia/blob/master/HISTORY.md#new-language-features-1
(; a, b) = x can now be used to destructure properties a and b of x. This syntax is equivalent to a = getproperty(x, :a); b = getproperty(x, :b)
New features coming in Julia 1.7
https://lwn.net/Articles/871486/
https://julialang.org/blog/2021/11/julia-1.7-highlights/
12.
Concurrency in Julia
https://lwn.net/Articles/875367/
The Julia programming language has its roots in high-performance scientific computing,
so it is no surprise that it has facilities for concurrent processing.
Those features are not well-known outside of the Julia community,
though, so it is interesting to see the different types of parallel and concurrent computation that the language supports.
In addition, the upcoming release of Julia version 1.7 brings an improvement to the language's concurrent-computation palette,
in the form of "task migration".
13.
libblastrampoline + MKL.jl
https://julialang.org/blog/2021/11/julia-1.7-highlights/#libblastrampoline_mkljl
Julia v1.7 introduces a new BLAS demuxing library called libblastrampoline (LBT),
that provides a flexible and efficient way to switch the backing BLAS library at runtime.
Because the BLAS/LAPACK API is "pure" (e.g. each BLAS/LAPACK invocation is separate from any other;
there is no carryover state from one API call to another) it is possible to switch
which BLAS backend actually services a particular client API call, such as a DGEMM call for
a Float64 Matrix-Matrix multiplication. This statelessness enables us to easily switch from one BLAS backend
to another without needing to modify client code, and combining this with a flexible wrapper implementation,
we are able to provide a single, coherent API that automatically adjusts for a variety of BLAS/LAPACK providers
across all the platforms that Julia itself supports.
14.
https://runebook.dev/en/docs/julia/-index-
https://runebook.dev
15.
Juliaで見る漸近理論 (確率変数でない数列の収束) + Real Analysis (Convergence & Bounded Series)
https://zenn.dev/hessihan/articles/26144a5ffb932a
INTRODUCTION TO THE CONVERGENCE OF SEQUENCES
https://math.uchicago.edu/~may/REU2015/REUPapers/Lytle.pdf
Convergent Sequences
https://users.math.msu.edu/users/zhan/Notes1.pdf
Math 320, Section 4: Analysis I
https://users.math.msu.edu/users/zhan/MTH320.html
Real Analysis Oral Exam study notes
http://www.math.toronto.edu/mnica/oral/real_notes.pdf
16.
Julia lang Garbage Collection
How much do collections of allocated objects cost?
https://bkamins.github.io/julialang/2021/06/11/vecvec.html
Julia gc.c
https://github.com/JuliaLang/julia/blob/master/src/gc.c
On the garbage collection
https://discourse.julialang.org/t/on-the-garbage-collection/35695
https://discourse.julialang.org/t/on-the-garbage-collection/35695/8
Details about Julia’s Garbage Collector, Reference Counting?
https://discourse.julialang.org/t/details-about-julias-garbage-collector-reference-counting/18021/3
17.
Julia Learning Circle: JIT and Method Invalidations
https://wesselb.github.io/2020/11/07/julia-learning-circle-meeting-1.html
18.
Solutionf For Hihg-Dimensional Statistics
https://wesselb.github.io/2020/08/21/high-dimensional-statistics.html
High-Dimensional Statistics A Non-Asymptotic Viewpoint - Martin J. Wainwright, University of California, Berkeley
https://www.cambridge.org/core/books/highdimensional-statistics/8A91ECEEC38F46DAB53E9FF8757C7A4E
DOI:https://doi.org/10.1017/9781108627771
https://high-dimensional-statistics.github.io
https://www.cambridge.org/core/services/aop-cambridge-core/content/view/30AF7B572184787F4C99715838549721/9781108498029c2_21-57.pdf/basic_tail_and_concentration_bounds.pdf
18-1.
High-Dimensional Probability - An Introduction with Applications in Data Science
Roman Vershynin
https://www.math.uci.edu/~rvershyn/
https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.html#
https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.pdf
19.
Julia Learning Circle: Memoty Allocations And Garbage Collection
https://wesselb.github.io/2020/11/23/julia-learning-circle-meeting-2.html
Julia Learning Circle: Generated Functins
https://wesselb.github.io/2020/12/13/julia-learning-circle-meeting-3.html
20.
JuliaNotes.jl
https://m3g.github.io/JuliaNotes.jl/stable/memory/
Vector{Int} <: Vector{Real} is false??
https://m3g.github.io/JuliaNotes.jl/stable/typevariance/
Assignment and mutation
https://m3g.github.io/JuliaNotes.jl/stable/assignment/
Workflows for developing effectivelly in Julia
https://m3g.github.io/JuliaNotes.jl/stable/workflow/
21.
Julia v1.8 Release Notes
https://github.com/JuliaLang/julia/blob/master/HISTORY.md#julia-v18-release-notes
Compiler/Runtime improvements
21-1. libjulia-codegen
The LLVM-based compiler has been separated from the run-time library into a new library, libjulia-codegen. It is loaded by default, so normal usage should see no changes. In deployments that do not need the compiler (e.g. system images where all needed code is precompiled), this library (and its LLVM dependency) can simply be excluded (#41936).
https://github.com/JuliaLang/julia/issues/41936
Unreasonably large executable size from create_app #660
https://github.com/JuliaLang/PackageCompiler.jl/issues/660
This is expected. Julia currently has no good way to run code without these supporting libraries. But there is work in progress of trying to improve this.
21-2. Base.@assume_effects macro
Inference now tracks various effects such as side-effectful-ness and nothrow-ness on a per-specialization basis. Code heavily dependent on constant propagation should see significant compile-time performance improvements and certain cases (e.g. calls to uninlinable functions that are nevertheless effect free) should see runtime performance improvements. Effects may be overwritten manually with the Base.@assume_effects macro (#43852).
https://github.com/JuliaLang/julia/issues/43852
https://github.com/JuliaLang/julia/commit/ef4220533d4a9a887b199362e37de0e056c1a458
improve concrete-foldability of core math functions #45613
https://github.com/JuliaLang/julia/pull/45613
Matrix{Int} \ Vector{Float32} is type-unstable #45696
https://github.com/JuliaLang/julia/issues/45696
Linear system solve promotes Float32 to Float64 #1041
https://github.com/JuliaArrays/StaticArrays.jl/issues/1041
21-3.
Bootstrapping time has been improved by about 25% (#41794).
https://github.com/JuliaLang/julia/issues/41794
22.
ENGR108: Introduction to Matrix Methods (Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares)
https://stanford.edu/class/engr108/
I just want to know how to get inverse matrix as using Julialang
https://stanford.edu/class/engr108/lectures/julia_inverses_slides.pdf
https://stanford.edu/class/engr108/lectures/
https://stanford.edu/class/engr108/lectures/julia_least_squares_slides.pdf
https://stanford.edu/class/engr108/lectures/julia_vectors_slides.pdf
https://stanford.edu/class/engr108/lectures/julia_matrices_slides.pdf
23.
Language introspection
https://juliateachingctu.github.io/Scientific-Programming-in-Julia/dev/lecture_06/lecture/
JuliaTeachingCTU/ Scientific-Programming-in-Julia
https://github.com/JuliaTeachingCTU/Scientific-Programming-in-Julia/blob/master/docs/src/index.md#
This repository contains all the course materials for the master course Scientific Programming in Julia
taught at the Czech Techincal University in Prague.
You can find more information on the official course website.
https://juliateachingctu.github.io/Scientific-Programming-in-Julia/stable/
Stages of compilation
Julia (as any modern compiler) uses several stages to convert source code to native code. Let's recap them
parsing the source code to abstract syntax tree (AST)
lowering the abstract syntax tree static single assignment form (SSA) see wiki
assigning types to variables and performing type inference on called functions
lowering the typed code to LLVM intermediate representation (LLVM Ir)
using LLVM compiler to produce a native code.
function nextfib(n)
a, b = one(n), one(n)
while b < n
a, b = b, a + b
end
return b
end
Meta.parse(
""" function nextfib(n)
a, b = nextfib(n)
while b < n
a, b = b, a + b
end
return b
end
""")
# For inserted debugging information, there is an option to pass keyword argument debuginfo=:source.
@code_lowered debuginfo=:source nextfib(3)
@code_lowered nextfib(3)
@code_typed nextfib(3)
@code_warntype nextfib(3)
@code_llvm debuginfo=:source nextfib(3)
@code_llvm nextfib(3)
@code_native debuginfo=:source nextfib(3)
@code_native nextfib(3)
@time
@allocated
@which
using BenchmarkTools
@btime
24.
JuliaTeachingCTU/ Julia-for-Optimization-and-Learning
https://github.com/JuliaTeachingCTU/Julia-for-Optimization-and-Learning
https://github.com/JuliaTeachingCTU/Julia-for-Optimization-and-Learning/tree/master/docs/src
https://juliateachingctu.github.io/Julia-for-Optimization-and-Learning/stable/
What will we emphasize?
The main goals of the course are the following:
You will learn the connections between theory and coding. There are many lectures which teach either only theory or only coding. We will show you both.
You will learn how to code efficiently. We will teach you to split the code into small parts which are simpler to debug or optimize. We will often show you several writing possibilities and comment on the differences.
You will learn about machine learning and neural networks. You will understand neural networks by writing a simple one from scratch. Then you will learn how to use packages to write simple code for complicated networks.
You will learn independence. The problem formulation of many exercises is very general, which simulates when no step-by-step procedure is provided.
Tyeps system and generic programming
https://juliateachingctu.github.io/Julia-for-Optimization-and-Learning/stable/lecture_06/compositetypes/
Optimization
https://juliateachingctu.github.io/Julia-for-Optimization-and-Learning/stable/lecture_08/theory/
25.
Julia Parallel, Multithreading, Multiprocess
A quick introduction to data parallelism in Julia
https://juliafolds.github.io/data-parallelism/tutorials/quick-introduction/
Parallelization
https://enccs.github.io/Julia-for-HPC/parallelization/
Julia for Economists - Parallelization for Fun and Profit - Cameron Pfiffer ([email protected])
https://github.com/cpfiffer/julia-bootcamp-2022/blob/main/session-2/parallelization-lecture.ipynb
https://github.com/JuliaParallel/Dagger.jl
A framework for out-of-core and parallel computing
At the core of Dagger.jl is a scheduler heavily inspired by Dask.
It can run computations represented as directed-acyclic-graphs (DAGs) efficiently on many Julia worker processes and threads,
as well as GPUs via DaggerGPU.jl.
https://github.com/JuliaParallel/DTables.jl
DTable – an early performance assessment of a new distributed table implementation
https://julialang.org/blog/2021/12/dtable-performance-assessment/
https://juliaparallel.org/Dagger.jl/stable/dtable/
The DTable, or "distributed table", is an abstraction layer on top of Dagger that allows loading table-like structures into a distributed environment.
The main idea is that a Tables.jl-compatible source provided by the user gets partitioned into several parts and stored as Chunks.
These can then be distributed across worker processes by the scheduler as operations are performed on the containing DTable.
https://github.com/JuliaParallel/DistributedArrays.jl
Distributed arrays for Julia.
DistributedArrays.jl uses the stdlib Distributed to implement a Global Array interface.
A DArray is distributed across a set of workers.
Each worker can read and write from its local portion of the array and each worker has read-only access to
the portions of the array held by other workers.
https://github.com/JuliaParallel/MPI.jl
This provides Julia interface to the Message Passing Interface (MPI), roughly inspired by mpi4py.
"Julia at Scale" topic on the Julia Discourse
https://discourse.julialang.org/c/domain/parallel/34
https://github.com/JuliaParallel/Elemental.jl
A package for dense and sparse distributed linear algebra and optimization.
The underlying functionality is provided by the C++ library Elemental written originally by Jack Poulson and now maintained by LLNL.
A Julia interface to Apache Spark™
http://dfdx.github.io/Spark.jl/dev/
https://github.com/dfdx/Spark.jl
https://spark.apache.org/downloads.html
https://github.com/JuliaFolds/Folds.jl
Folds.jl provides a unified interface for sequential, threaded, and distributed folds.
[ANN] Folds.jl: threaded, distributed, and GPU-based high-level data-parallel interface for Julia
https://discourse.julialang.org/t/ann-folds-jl-threaded-distributed-and-gpu-based-high-level-data-parallel-interface-for-julia/54701
https://github.com/JuliaFolds/FLoops.jl
FLoops.jl provides a macro @floop. It can be used to generate a fast generic sequential and parallel iteration over complex collections.
https://github.com/JuliaFolds/ParallelMagics.jl
ParallelMagics.jl is aiming at providing safe parallelism to Julia programmers such that
"No-brainer" parallelism using compiler analysis; i.e., the code is parallelized only if the compiler guarantees the safety.
https://github.com/JuliaFolds/FoldsCUDA.jl
FoldsCUDA.jl provides Transducers.jl-compatible fold (reduce) implemented using CUDA.jl.
This brings the transducers and reducing function combinators implemented in Transducers.jl to GPU.
Furthermore, using FLoops.jl, you can write parallel for loops that run on GPU.
https://github.com/JuliaFolds/Transducers.jl
Transducers.jl provides composable algorithms on "sequence" of inputs. They are called transducers, first introduced in Clojure language by Rich Hickey.
A quick introduction to data parallelism in Julia
https://juliafolds.github.io/data-parallelism/tutorials/quick-introduction/
https://juliafolds.github.io/Transducers.jl/dev/
26.
educational materials for MIT math courses
https://github.com/mitmath
MIT IAP short course: Matrix Calculus for Machine Learning and Beyond
https://github.com/mitmath/matrixcalc
18.330 Introduction to Numerical Analysis
https://github.com/mitmath/18330
18.335 - Introduction to Numerical Methods course
https://github.com/mitmath/18335
18.337J/6.338J: Parallel Computing and Scientific Machine Learning
https://github.com/mitmath/18337
18.S096 Special Subject in Mathematics: Applications of Scientific Machine Learning
https://github.com/mitmath/18S096SciML
19.
Julia Transpose
https://github.com/mitmath/1806/blob/master/notes/Matrix-mult-perspectives.ipynb
To get a row vector we must transpose the slice A[1,:]. In linear algebra, the transpose of a vector is usually denoted
. In Julia, the transpose is x.'.
If we omit the . and just write x' it is the complex-conjugate of the transpose, sometimes called the adjoint, often denoted
(in matrix textbooks),
(in pure math), or
(in physics). For real-valued vectors (no complex numbers), the conjugate transpose is the same as the transpose, and correspondingly we usually just do x' for real vectors.
Revert "add 'ᵀ postfix operator for transpose (#38062)" #40075
https://github.com/JuliaLang/julia/pull/40075
This reverts commit 665279a.
There has been some discussion about whether #38062 was such a good idea in hindsight (#40070, #38062 (comment)).
It might make sense to go back on this feature to give us some more time to think about it, before locking it in into 1.6.
Fixes #40070
20.
Julia With Calculus
https://github.com/jverzani/CalculusWithJulia.jl
https://jverzani.github.io/CalculusWithJuliaNotes.jl/dev/
http://mth229.github.io
http://mth229.github.io
https://www.math.csi.cuny.edu/Computing/matlab/Projects/MTH229/Mth229_Julia_Projects.pdf
https://github.com/mth229
https://www.math.csi.cuny.edu/Computing/matlab/Projects/MTH229/
21.
Julia with QuantEcon
https://julia.quantecon.org/intro.html
9.) Solvers, Optimizers, and Automatic Differentiation
Tools and Techniques
14.) Geometric Series for Elementary Economics
15.) Linear Algebra
16.) Orthogonal Projections and Their Applications
17.) LLN and CLT
18.) Linear State Space Models
19.) Finite Markov Chains
20.) Continuous State Markov Chains
21.) A First Look at the Kalman Filter
22.) Numerical Linear Algebra and Factorizations
23.) Krylov Methods and Matrix Conditioning
22.
Julia Arrays, Stack & Heap, slice, copy, shallow copy, deepcopy
Julialang functions: ismutable, isbits, objectid, eachindex, axes, eachrow, \xor,
Python functions: id, obj.copy(), import copy as cp cp.copy(), cp.deepcopy()
What scientists must know about hardware to write fast code - Jakob Nybo Nissen
https://biojulia.net/post/hardware/
https://docs.julialang.org/en/v1/devdocs/offset-arrays/
replace many uses of size with axes
replace 1:length(A) with eachindex(A), or in some cases LinearIndices(A)
replace explicit allocations like Array{Int}(undef, size(B)) with similar(Array{Int}, axes(B))
Julia arrays
https://danmackinlay.name/notebook/julia_arrays.html
Multi-dimensional Arrays
https://docs.julialang.org/en/v1/manual/arrays/
Copying Arrays in Julia
http://www.cristinagreen.com/copying-arrays-in-julia.html
What is the difference between copy() and deepcopy()?
https://discourse.julialang.org/t/what-is-the-difference-between-copy-and-deepcopy/3918/2
b = deepcopy(a) keeps unwrapping any mutables inside of ‘a’ until it reaches all the immutables at all the levels, and copies all the data and structure of the old object to a new object.
b = a copies ‘a’ by reference, so ‘b’ and ‘a’ refer to the same object. Therefore b.field1 = 2 makes a.field1 == 2 true.
Explanation of Deep and Shallow Copying
https://www.cs.utexas.edu/~scottm/cs307/handouts/deepCopying.htm
A shallow copy can be made by simply copying the reference.
A deep copy means actually creating a new array and copying over the values.
Python - 淺複製(shallow copy)與深複製(deep copy)
https://ithelp.ithome.com.tw/articles/10221255
淺複製僅複製容器中元素的地址
深複製完全複製了一份副本,容器與容器中的元素地址都不一樣
一般 copy
三種方法
b = list(a)
b = a[:]
b = a.copy() PS: 淺複製Shallow copy
淺複製與深複製 Shallow copy and deep copy 的差別
淺複製與深複製的關鍵差別在於,複製的變數中是否有可變型別
#%% Shallow copy and deep copy
import copy
a = [1, [2,3]]
a_ref = a
a_shallowcopy = copy.copy(a)
a_deepcopy = copy.deepcopy(a)
a[0] = 4
其中 a[0] 為數字,即為不可變型別,則深/淺複製沒有差別
a[1][1] = 5
其中 a[1][1] 為list,即為可變型別,可以發現淺複製 (shallow copy) 被改變了,而深複製 (deep copy) 則沒有被改變,故得知:
淺/深複在製第一層變數均已指向不同記憶體
BUT!!!
淺複製在第二層變數仍與原始變數指向相同記憶體
深複製在第二層變數已指向不同記憶體
深複製 (deep copy) 建立一份完全獨立的變數
23.
Julia Performance Tips
https://docs.julialang.org/en/v1/manual/performance-tips/
More dots: Fuse vectorized operations
Julia has a special dot syntax that converts any scalar function into a "vectorized" function call, and any operator into a "vectorized" operator, with the special property that nested "dot calls" are fusing: they are combined at the syntax level into a single loop, without allocating temporary arrays. If you use .= and similar assignment operators, the result can also be stored in-place in a pre-allocated array (see above).
In a linear-algebra context, this means that even though operations like vector + vector and vector * scalar are defined, it can be advantageous to instead use vector .+ vector and vector .* scalar because the resulting loops can be fused with surrounding computations. For example, consider the two functions:
julia> f(x) = 3x.^2 + 4x + 7x.^3;
julia> fdot(x) = @. 3x^2 + 4x + 7x^3 # equivalent to 3 .* x.^2 .+ 4 .* x .+ 7 .* x.^3;
Both f and fdot compute the same thing. However, fdot (defined with the help of the @. macro) is significantly faster when applied to an array:
julia> x = rand(10^6);
julia> @time f(x);
0.019049 seconds (16 allocations: 45.777 MiB, 18.59% gc time)
julia> @time fdot(x);
0.002790 seconds (6 allocations: 7.630 MiB)
julia> @time f.(x);
0.002626 seconds (8 allocations: 7.630 MiB)
That is, fdot(x) is ten times faster and allocates 1/6 the memory of f(x), because each * and + operation in f(x) allocates a new temporary array and executes in a separate loop. (Of course, if you just do f.(x) then it is as fast as fdot(x) in this example, but in many contexts it is more convenient to just sprinkle some dots in your expressions rather than defining a separate function for each vectorized operation.)
https://github.com/JuliaLang/julia/commit/51bb96857d26f67e62f0edc4fc4682a156cb3d08
a new temporary array and executes in a separate loop. In this example
`f.(x)` is as fast as `fdot(x)` but in many contexts it is more
convenient to sprinkle some dots in your expressions than to
define a separate function for each vectorized operation.
24.
Why is this broadcast operation slower than a nested for-loop
https://stackoverflow.com/questions/72753803/why-is-this-broadcast-operation-slower-than-a-nested-for-loop
using BenchmarkTools
function bounds_error(x, xl)
num_x_rows = size(x,1)
num_dim = size(xl, 1)
for i in 1:num_x_rows
for j in 1:num_dim
if (x[i, j] < xl[j,1] || x[i,j] > xl[j,2])
return true
end
end
end
return false
end
function bounds_error2(x, xl)
for row in eachrow(x)
xlt = transpose(xl)
if any(row .< xlt[1, :]) == true || any(row .> xlt[2, :])
return true
end
end
return false
end
#number of rows in xl (or xlimits) will always be equal to number of columns in x
xl = [ -5.0 5.0
-5.0 5.0
-5.0 5.0]
x = [1.0 2.0 3.0;
4.0 5.0 6.0]
The main reason for this difference is memory allocations (0 vs. 12 here).
# 20.645 ns (0 allocations: 0 bytes)
# 347.870 ns (12 allocations: 704 bytes)
Currently, slices in Julia create a copy, so xlt[1,:] and xlt[2,:] allocates memory. To remedy this problem you should use @views. The second issue is the element-wise comparisons row .< xlt[1,:] and row .> xlt[2,:] create a temporary Boolean array. To avoid allocation of a temporary array, you should map any(t->t[1]<t[2], zip(row,xl1)) so that the comparison is done one element at a time like a loop.
After applying these tips, the performance difference on my machine is now about 2ns only, which accounts for the convenience of eachrow, zip, etc. instead of manual loops.
Note, for the first function, you can use axes() to loop over first or second dimension conveniently. And when benchmarking any Julia code with BenchmarkTools.jl, don't forget to interpolate ($) all variable names of a function to avoid working on global variables.
function bounds_error(x, xl)
for i in axes(x,1)
for j in axes(xl, 1)
if (x[i, j] < xl[j,1] || x[i,j] > xl[j,2])
return true
end
end
end
return false
end
@views function bounds_error2(x, xl)
xl1, xl2 = xl[:,1], xl[:,2]
for row in eachrow(x)
if any(t->t[1]<t[2], zip(row,xl1)) || any(t->t[1]>t[2], zip(row,xl2))
return true
end
end
return false
end
@btime bounds_error($x, $xl) # 8.100 ns (0 allocations: 0 bytes)
@btime bounds_error2($x, $xl) # 10.800 ns (0 allocations: 0 bytes)
While allocations make the difference for these particular inputs, in general the loop will be faster because it bails out immediately when it finds a value that is outside the limits, while the broadcasted version checks both entire arrays. This will be much more important than the allocations. –
DNF - Jun 25 at 14:09
That's true, and is closely related to allocations as you said, any will wait for a whole row comparison to start its check. –
AboAmmar Jun 25 at 14:13
25.
Broadcasting is much slower than a for loop #28126
https://github.com/JuliaLang/julia/issues/28126
julia> using BenchmarkTools
julia> function foo(a::Vector{T}, b::Vector{T}, c::Vector{T}, d::Vector{T}, e::Vector{T}) where T
@. a = b + 0.1 * (0.2c + 0.3d + 0.4e)
nothing
end
foo (generic function with 1 method)
julia> function goo(a::Vector{T}, b::Vector{T}, c::Vector{T}, d::Vector{T}, e::Vector{T}) where T
@assert length(a) == length(b) == length(c) == length(d) == length(e)
@inbounds for i in eachindex(a)
a[i] = b[i] + 0.1 * (0.2c[i] + 0.3d[i] + 0.4e[i])
end
nothing
end
goo (generic function with 1 method)
julia> a,b,c,d,e=(rand(1000) for i in 1:5)
Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##9#10"))}(getfield(Main, Symbol("##9#10"))(), 1:5)
julia> @btime foo($a,$b,$c,$d,$e)
1.277 μs (0 allocations: 0 bytes)
julia> @btime goo($a,$b,$c,$d,$e)
345.568 ns (0 allocations: 0 bytes)
Workaround #28126, support SIMDing broadcast in more cases #30973
https://github.com/JuliaLang/julia/pull/30973
26.
Difference between Base and Core
https://discourse.julialang.org/t/difference-between-base-and-core/37426
julia> Base.Int
Int64
julia> Core.Int
Int64
julia> Base.Int == Core.Int
true
julia> Int.name.module
Core
I didn’t know either until I watched this last year starting around the 9:00 mark. https://youtu.be/TPuJsgyu87U?t=542
It’s because some parts have to be duplicated so the necessary compiler internals can work, or something to that effect.
Core is what’s defined in C as the very core of the language.
There is very little there.
It’s used to bootstrap the rest of the language by gradually defining more and more of Base in terms of
what was defined before.
Core is kind of an implementation detail that users should never need to interact with.
Core.AbstractArray
https://docs.julialang.org/en/v1/base/arrays/#Core.AbstractArray
Core.Array
https://docs.julialang.org/en/v1/base/arrays/#Core.Array
27.
Graph computing benchmarks: comparing the scalability of Dask, Dagger.jl, Tensorflow and Julius
https://discourse.julialang.org/t/graph-computing-benchmarks-comparing-the-scalability-of-dask-dagger-jl-tensorflow-and-julius/80745
https://juliustechco.github.io/JuliusGraph/dev/pages/t007_benchmark.html#Tutorial-7:-Graph-Computing-Benchmark-1
https://gist.github.com/jpsamaroo/95c78b3361ae454a51916183f2cf346f
https://github.com/JuliusTechCo/JuliusGraph
28.
UQ MATH2504 - Programming of Simulation, Analysis, and Learning Systems - (Semester 2 2022)
https://courses.smp.uq.edu.au/MATH2504/2022/lectures_html/lecture-unit-2.html
Where the O(⋅) here follow Big O notation.
Priority queues, heaps, and back to sorting
29.
Julia v1.8 @assume_effects
Quick intro to the new effect analysis of Julia compiler
https://aviatesk.github.io/posts/effects-analysis/index.html
Background:
Julia compiler is powered by abstract interpretation, which is powered by constant propagation
constant propagation == inject constant information into abstract interpretation
But constant prop can be slow!
Idea: replace abstract interpretation with const prop actual execution (i.e. concrete evaluation) instead!
The effect analysis is an technique to check when it is valid to perform concrete evaluation