HighPerformance_Python_Julia.txt

# High Performance (Mutliprocess/Multithreading) for Python and Julia 

## No.:
## Title:
## https links: 

[Julia Lang part] 

1.
What scientists must know about hardware to write fast code
https://github.com/jakobnissen/hardware_introduction

https://viralinstruction.com/posts/hardware/


2.
Julia for Economists Bootcamp, 2022
https://github.com/cpfiffer/julia-bootcamp-2022


https://github.com/cpfiffer/julia-bootcamp-2022#session-2-parallelization

https://www.youtube.com/watch?v=trhsvOAH0YI
https://github.com/cpfiffer/julia-bootcamp-2022/blob/main/session-2/parallelization-lecture.ipynb


https://github.com/cpfiffer/julia-bootcamp-2022#session-4-high-performance-julia

https://youtu.be/i35LlZWZl1g
https://github.com/cpfiffer/julia-bootcamp-2022/blob/main/session-4/speed-lecture.ipynb


3.
Hands-On Design Patterns and Best Practices with Julia
https://github.com/PacktPublishing/Hands-on-Design-Patterns-and-Best-Practices-with-Julia


4.
A quick introduction to data parallelism in Julia
https://juliafolds.github.io/data-parallelism/tutorials/quick-introduction/


5.
Parallelization
https://enccs.github.io/Julia-for-HPC/parallelization/


GPU programming
https://enccs.github.io/Julia-for-HPC/GPU/


6.
[ANN] Folds.jl: threaded, distributed, and GPU-based high-level data-parallel interface for Julia
https://discourse.julialang.org/t/ann-folds-jl-threaded-distributed-and-gpu-based-high-level-data-parallel-interface-for-julia/54701/3


https://github.com/JuliaFolds/ParallelMagics.jl

https://github.com/JuliaFolds


7.
Announcing composable multi-threaded parallelism in Julia
https://julialang.org/blog/2019/07/multithreading/


8.
Using Julia
https://www.carc.usc.edu/user-information/user-guides/software-and-programming/julia

# Parallel programming with Julia
Package	                Purpose
Base.Threads	        For explicit multi-threading
Distributed	            For explicit multi-processing
MPI.jl	                For interfacing to MPI libraries
DistributedArrays.jl	For working with distributed arrays
Elemental.jl	        For distributed linear algebra
ClusterManagers.jl	    For launching jobs via cluster job schedulers (e.g., Slurm)
Dagger.jl	            For asynchronous evaluations and workflows
CUDA.jl	                For interfacing to Nvidia CUDA GPUs


9.
ML ⇌ Science Colaboratory's workshop Introduction to Machine Learning.
Supervised learning: One step at a time
https://github.com/mlcolab/IntroML.jl/blob/main/notebooks/supervised_learning.jl

https://mlcolab.github.io/IntroML.jl/dev/supervised_learning.html


10.
The Enzyme High-Performance Automatic Differentiator of LLVM
https://github.com/EnzymeAD/Enzyme
High-performance automatic differentiation of LLVM.


https://github.com/EnzymeAD/Enzyme.jl
Julia bindings for the Enzyme automatic differentiator


https://github.com/EnzymeAD/oxide-enzyme
Enzyme integration into Rust. Experimental, do not use.


11.
Julia v1.7 Release Notes
https://github.com/JuliaLang/julia/blob/master/HISTORY.md#julia-v17-release-notes


Multidimensional Array Literals
https://julialang.org/blog/2021/11/julia-1.7-highlights/#multidimensional_array_literals


https://github.com/JuliaLang/julia/issues/39285

https://github.com/JuliaLang/julia/issues/45461

https://docs.julialang.org/en/v1/base/arrays/#Base.hvncat

https://github.com/JuliaLang/julia/blob/7e54f9a069df2b382f765d5574787293c816fe26/base/abstractarray.jl#L2119


https://github.com/JuliaLang/julia/blob/master/HISTORY.md#language-changes-1
Multiple successive semicolons in an array expresion were previously ignored (e.g., [1 ;; 2] == [1 ; 2]). This syntax is now used to separate dimensions (see New language features).

v1 = [1, 2] # 2-element Vector{Int64}:
v2 = [3, 4] # 2-element Vector{Int64}:

[v1, v2] # 2-element Vector{Vector{Int64}}:
# [1, 2]
# [3, 4]

[v1; v2] # 4-element Vector{Int64}:
# 1
# 2
# 3
# 4

[v1;; v2] # 2×2 Matrix{Int64}:
# 1  3
# 2  4

[1,2,3 ;; 4,5,6] # syntax: unexpected semicolon in array expression

[1;2;3 ;;4;5;6] #3×2 Matrix{Int64}:
# 1  4
# 2  5
# 3  6


Meta.@lower [1 2;;; 3 4]
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope`
1 ─ %1 = Core.tuple(1, 2, 2)
│   %2 = Base.hvncat(%1, true, 1, 2, 3, 4)
└──      return %2
))))

Meta.@lower [v1;;v2]
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope`
1 ─ %1 = Base.hvncat(2, v1, v2)
└──      return %1
))))

Meta.@lower [v1;v2]
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope`
1 ─ %1 = Base.vcat(v1, v2)
└──      return %1
))))

Meta.@lower [v1, v2]
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope`
1 ─ %1 = Base.vect(v1, v2)
└──      return %1
))))

Meta.@lower[1 2;;
       3 4]
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope`
1 ─ %1 = Base.hcat(1, 2, 3, 4)
└──      return %1
))))


Property Destructuring
https://julialang.org/blog/2021/11/julia-1.7-highlights/#property_destructuring

https://github.com/JuliaLang/julia/blob/master/HISTORY.md#new-language-features-1
(; a, b) = x can now be used to destructure properties a and b of x. This syntax is equivalent to a = getproperty(x, :a); b = getproperty(x, :b)


New features coming in Julia 1.7
https://lwn.net/Articles/871486/


https://julialang.org/blog/2021/11/julia-1.7-highlights/


12.
Concurrency in Julia
https://lwn.net/Articles/875367/

The Julia programming language has its roots in high-performance scientific computing, 
so it is no surprise that it has facilities for concurrent processing. 
Those features are not well-known outside of the Julia community, 
though, so it is interesting to see the different types of parallel and concurrent computation that the language supports. 
In addition, the upcoming release of Julia version 1.7 brings an improvement to the language's concurrent-computation palette, 
in the form of "task migration".


13.
libblastrampoline + MKL.jl
https://julialang.org/blog/2021/11/julia-1.7-highlights/#libblastrampoline_mkljl
Julia v1.7 introduces a new BLAS demuxing library called libblastrampoline (LBT), 
that provides a flexible and efficient way to switch the backing BLAS library at runtime. 
Because the BLAS/LAPACK API is "pure" (e.g. each BLAS/LAPACK invocation is separate from any other; 
there is no carryover state from one API call to another) it is possible to switch 
which BLAS backend actually services a particular client API call, such as a DGEMM call for
a Float64 Matrix-Matrix multiplication. This statelessness enables us to easily switch from one BLAS backend
to another without needing to modify client code, and combining this with a flexible wrapper implementation, 
we are able to provide a single, coherent API that automatically adjusts for a variety of BLAS/LAPACK providers 
across all the platforms that Julia itself supports.


14.
https://runebook.dev/en/docs/julia/-index-

https://runebook.dev


15.
Juliaで見る漸近理論 (確率変数でない数列の収束) + Real Analysis (Convergence & Bounded Series)
https://zenn.dev/hessihan/articles/26144a5ffb932a


INTRODUCTION TO THE CONVERGENCE OF SEQUENCES
https://math.uchicago.edu/~may/REU2015/REUPapers/Lytle.pdf


Convergent Sequences
https://users.math.msu.edu/users/zhan/Notes1.pdf

Math 320, Section 4: Analysis I
https://users.math.msu.edu/users/zhan/MTH320.html


Real Analysis Oral Exam study notes
http://www.math.toronto.edu/mnica/oral/real_notes.pdf


16.
Julia lang Garbage Collection

How much do collections of allocated objects cost?
https://bkamins.github.io/julialang/2021/06/11/vecvec.html


Julia gc.c 
https://github.com/JuliaLang/julia/blob/master/src/gc.c


On the garbage collection
https://discourse.julialang.org/t/on-the-garbage-collection/35695

https://discourse.julialang.org/t/on-the-garbage-collection/35695/8


Details about Julia’s Garbage Collector, Reference Counting?
https://discourse.julialang.org/t/details-about-julias-garbage-collector-reference-counting/18021/3


17.
Julia Learning Circle: JIT and Method Invalidations 
https://wesselb.github.io/2020/11/07/julia-learning-circle-meeting-1.html


18.
Solutionf For Hihg-Dimensional Statistics 
https://wesselb.github.io/2020/08/21/high-dimensional-statistics.html


High-Dimensional Statistics A Non-Asymptotic Viewpoint - Martin J. Wainwright, University of California, Berkeley
https://www.cambridge.org/core/books/highdimensional-statistics/8A91ECEEC38F46DAB53E9FF8757C7A4E

DOI:https://doi.org/10.1017/9781108627771

https://high-dimensional-statistics.github.io

https://www.cambridge.org/core/services/aop-cambridge-core/content/view/30AF7B572184787F4C99715838549721/9781108498029c2_21-57.pdf/basic_tail_and_concentration_bounds.pdf


18-1.
High-Dimensional Probability - An Introduction with Applications in Data Science
Roman Vershynin
https://www.math.uci.edu/~rvershyn/

https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.html#

https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.pdf


19.
Julia Learning Circle: Memoty Allocations And Garbage Collection 
https://wesselb.github.io/2020/11/23/julia-learning-circle-meeting-2.html


Julia Learning Circle: Generated Functins 
https://wesselb.github.io/2020/12/13/julia-learning-circle-meeting-3.html


20.
JuliaNotes.jl
https://m3g.github.io/JuliaNotes.jl/stable/memory/


Vector{Int} <: Vector{Real} is false??
https://m3g.github.io/JuliaNotes.jl/stable/typevariance/


Assignment and mutation
https://m3g.github.io/JuliaNotes.jl/stable/assignment/


Workflows for developing effectivelly in Julia
https://m3g.github.io/JuliaNotes.jl/stable/workflow/


21.
Julia v1.8 Release Notes
https://github.com/JuliaLang/julia/blob/master/HISTORY.md#julia-v18-release-notes

Compiler/Runtime improvements

21-1. libjulia-codegen
The LLVM-based compiler has been separated from the run-time library into a new library, libjulia-codegen. It is loaded by default, so normal usage should see no changes. In deployments that do not need the compiler (e.g. system images where all needed code is precompiled), this library (and its LLVM dependency) can simply be excluded (#41936).
https://github.com/JuliaLang/julia/issues/41936

Unreasonably large executable size from create_app #660
https://github.com/JuliaLang/PackageCompiler.jl/issues/660

This is expected. Julia currently has no good way to run code without these supporting libraries. But there is work in progress of trying to improve this.


21-2. Base.@assume_effects macro
Inference now tracks various effects such as side-effectful-ness and nothrow-ness on a per-specialization basis. Code heavily dependent on constant propagation should see significant compile-time performance improvements and certain cases (e.g. calls to uninlinable functions that are nevertheless effect free) should see runtime performance improvements. Effects may be overwritten manually with the Base.@assume_effects macro (#43852).
https://github.com/JuliaLang/julia/issues/43852

https://github.com/JuliaLang/julia/commit/ef4220533d4a9a887b199362e37de0e056c1a458

improve concrete-foldability of core math functions #45613
https://github.com/JuliaLang/julia/pull/45613

Matrix{Int} \ Vector{Float32} is type-unstable #45696
https://github.com/JuliaLang/julia/issues/45696

Linear system solve promotes Float32 to Float64 #1041
https://github.com/JuliaArrays/StaticArrays.jl/issues/1041


21-3.
Bootstrapping time has been improved by about 25% (#41794).
https://github.com/JuliaLang/julia/issues/41794


22.
ENGR108: Introduction to Matrix Methods (Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares)
https://stanford.edu/class/engr108/

I just want to know how to get inverse matrix as using Julialang 
https://stanford.edu/class/engr108/lectures/julia_inverses_slides.pdf

https://stanford.edu/class/engr108/lectures/

https://stanford.edu/class/engr108/lectures/julia_least_squares_slides.pdf

https://stanford.edu/class/engr108/lectures/julia_vectors_slides.pdf

https://stanford.edu/class/engr108/lectures/julia_matrices_slides.pdf


23.
Language introspection
https://juliateachingctu.github.io/Scientific-Programming-in-Julia/dev/lecture_06/lecture/

JuliaTeachingCTU/ Scientific-Programming-in-Julia
https://github.com/JuliaTeachingCTU/Scientific-Programming-in-Julia/blob/master/docs/src/index.md#

This repository contains all the course materials for the master course Scientific Programming in Julia 
taught at the Czech Techincal University in Prague. 
You can find more information on the official course website.

https://juliateachingctu.github.io/Scientific-Programming-in-Julia/stable/


Stages of compilation
Julia (as any modern compiler) uses several stages to convert source code to native code. Let's recap them

parsing the source code to abstract syntax tree (AST)
lowering the abstract syntax tree static single assignment form (SSA) see wiki
assigning types to variables and performing type inference on called functions
lowering the typed code to LLVM intermediate representation (LLVM Ir)
using LLVM compiler to produce a native code.

function nextfib(n)
    a, b = one(n), one(n)
    while b < n 
        a, b = b, a + b 
    end
    return b 
end

Meta.parse(
    """ function nextfib(n)
            a, b = nextfib(n)
            while b < n 
                a, b = b, a + b 
            end
            return b 
        end   
    """)

# For inserted debugging information, there is an option to pass keyword argument debuginfo=:source.
@code_lowered debuginfo=:source nextfib(3)
@code_lowered nextfib(3)


@code_typed nextfib(3)
@code_warntype nextfib(3)


@code_llvm debuginfo=:source nextfib(3)
@code_llvm nextfib(3)

@code_native debuginfo=:source nextfib(3)
@code_native nextfib(3)

@time 
@allocated 
@which 

using BenchmarkTools 
@btime 


24.
JuliaTeachingCTU/ Julia-for-Optimization-and-Learning
https://github.com/JuliaTeachingCTU/Julia-for-Optimization-and-Learning

https://github.com/JuliaTeachingCTU/Julia-for-Optimization-and-Learning/tree/master/docs/src

https://juliateachingctu.github.io/Julia-for-Optimization-and-Learning/stable/

What will we emphasize?
The main goals of the course are the following:

You will learn the connections between theory and coding. There are many lectures which teach either only theory or only coding. We will show you both.
You will learn how to code efficiently. We will teach you to split the code into small parts which are simpler to debug or optimize. We will often show you several writing possibilities and comment on the differences.
You will learn about machine learning and neural networks. You will understand neural networks by writing a simple one from scratch. Then you will learn how to use packages to write simple code for complicated networks.
You will learn independence. The problem formulation of many exercises is very general, which simulates when no step-by-step procedure is provided.


Tyeps system and generic programming 
https://juliateachingctu.github.io/Julia-for-Optimization-and-Learning/stable/lecture_06/compositetypes/


Optimization 
https://juliateachingctu.github.io/Julia-for-Optimization-and-Learning/stable/lecture_08/theory/


25.
Julia Parallel, Multithreading, Multiprocess 

A quick introduction to data parallelism in Julia
https://juliafolds.github.io/data-parallelism/tutorials/quick-introduction/


Parallelization
https://enccs.github.io/Julia-for-HPC/parallelization/


Julia for Economists - Parallelization for Fun and Profit - Cameron Pfiffer (cpfiffer@stanford.edu)
https://github.com/cpfiffer/julia-bootcamp-2022/blob/main/session-2/parallelization-lecture.ipynb


https://github.com/JuliaParallel/Dagger.jl
A framework for out-of-core and parallel computing

At the core of Dagger.jl is a scheduler heavily inspired by Dask. 
It can run computations represented as directed-acyclic-graphs (DAGs) efficiently on many Julia worker processes and threads, 
as well as GPUs via DaggerGPU.jl.


https://github.com/JuliaParallel/DTables.jl

DTable – an early performance assessment of a new distributed table implementation
https://julialang.org/blog/2021/12/dtable-performance-assessment/


https://juliaparallel.org/Dagger.jl/stable/dtable/
The DTable, or "distributed table", is an abstraction layer on top of Dagger that allows loading table-like structures into a distributed environment. 
The main idea is that a Tables.jl-compatible source provided by the user gets partitioned into several parts and stored as Chunks. 
These can then be distributed across worker processes by the scheduler as operations are performed on the containing DTable.


https://github.com/JuliaParallel/DistributedArrays.jl
Distributed arrays for Julia.

DistributedArrays.jl uses the stdlib Distributed to implement a Global Array interface. 
A DArray is distributed across a set of workers. 
Each worker can read and write from its local portion of the array and each worker has read-only access to 
the portions of the array held by other workers.


https://github.com/JuliaParallel/MPI.jl
This provides Julia interface to the Message Passing Interface (MPI), roughly inspired by mpi4py.

"Julia at Scale" topic on the Julia Discourse
https://discourse.julialang.org/c/domain/parallel/34


https://github.com/JuliaParallel/Elemental.jl
A package for dense and sparse distributed linear algebra and optimization. 
The underlying functionality is provided by the C++ library Elemental written originally by Jack Poulson and now maintained by LLNL.


A Julia interface to Apache Spark™
http://dfdx.github.io/Spark.jl/dev/

https://github.com/dfdx/Spark.jl

https://spark.apache.org/downloads.html


https://github.com/JuliaFolds/Folds.jl
Folds.jl provides a unified interface for sequential, threaded, and distributed folds.

[ANN] Folds.jl: threaded, distributed, and GPU-based high-level data-parallel interface for Julia
https://discourse.julialang.org/t/ann-folds-jl-threaded-distributed-and-gpu-based-high-level-data-parallel-interface-for-julia/54701


https://github.com/JuliaFolds/FLoops.jl
FLoops.jl provides a macro @floop. It can be used to generate a fast generic sequential and parallel iteration over complex collections.


https://github.com/JuliaFolds/ParallelMagics.jl
ParallelMagics.jl is aiming at providing safe parallelism to Julia programmers such that
"No-brainer" parallelism using compiler analysis; i.e., the code is parallelized only if the compiler guarantees the safety.


https://github.com/JuliaFolds/FoldsCUDA.jl
FoldsCUDA.jl provides Transducers.jl-compatible fold (reduce) implemented using CUDA.jl. 
This brings the transducers and reducing function combinators implemented in Transducers.jl to GPU. 
Furthermore, using FLoops.jl, you can write parallel for loops that run on GPU.


https://github.com/JuliaFolds/Transducers.jl
Transducers.jl provides composable algorithms on "sequence" of inputs. They are called transducers, first introduced in Clojure language by Rich Hickey.

A quick introduction to data parallelism in Julia
https://juliafolds.github.io/data-parallelism/tutorials/quick-introduction/


https://juliafolds.github.io/Transducers.jl/dev/


26.
educational materials for MIT math courses
https://github.com/mitmath


MIT IAP short course: Matrix Calculus for Machine Learning and Beyond
https://github.com/mitmath/matrixcalc


18.330 Introduction to Numerical Analysis
https://github.com/mitmath/18330


18.335 - Introduction to Numerical Methods course
https://github.com/mitmath/18335


18.337J/6.338J: Parallel Computing and Scientific Machine Learning
https://github.com/mitmath/18337


18.S096 Special Subject in Mathematics: Applications of Scientific Machine Learning
https://github.com/mitmath/18S096SciML


19.
Julia Transpose 
https://github.com/mitmath/1806/blob/master/notes/Matrix-mult-perspectives.ipynb

To get a row vector we must transpose the slice A[1,:]. In linear algebra, the transpose of a vector  is usually denoted 
. In Julia, the transpose is x.'.

If we omit the . and just write x' it is the complex-conjugate of the transpose, sometimes called the adjoint, often denoted 
 (in matrix textbooks), 
 (in pure math), or 
 (in physics). For real-valued vectors (no complex numbers), the conjugate transpose is the same as the transpose, and correspondingly we usually just do x' for real vectors.


Revert "add 'ᵀ postfix operator for transpose (#38062)" #40075
https://github.com/JuliaLang/julia/pull/40075

This reverts commit 665279a.

There has been some discussion about whether #38062 was such a good idea in hindsight (#40070, #38062 (comment)). 
It might make sense to go back on this feature to give us some more time to think about it, before locking it in into 1.6.
Fixes #40070


20.
Julia With Calculus 

https://github.com/jverzani/CalculusWithJulia.jl

https://jverzani.github.io/CalculusWithJuliaNotes.jl/dev/


http://mth229.github.io

http://mth229.github.io

https://www.math.csi.cuny.edu/Computing/matlab/Projects/MTH229/Mth229_Julia_Projects.pdf

https://github.com/mth229

https://www.math.csi.cuny.edu/Computing/matlab/Projects/MTH229/


21.
Julia with QuantEcon 
https://julia.quantecon.org/intro.html

9.) Solvers, Optimizers, and Automatic Differentiation

Tools and Techniques
14.) Geometric Series for Elementary Economics
15.) Linear Algebra
16.) Orthogonal Projections and Their Applications
17.) LLN and CLT
18.) Linear State Space Models
19.) Finite Markov Chains
20.) Continuous State Markov Chains
21.) A First Look at the Kalman Filter
22.) Numerical Linear Algebra and Factorizations
23.) Krylov Methods and Matrix Conditioning


22.
Julia Arrays, Stack & Heap, slice, copy, shallow copy, deepcopy
Julialang functions: ismutable, isbits, objectid, eachindex, axes, eachrow, \xor, 
Python functions: id, obj.copy(), import copy as cp cp.copy(), cp.deepcopy() 

What scientists must know about hardware to write fast code - Jakob Nybo Nissen
https://biojulia.net/post/hardware/


https://docs.julialang.org/en/v1/devdocs/offset-arrays/

replace many uses of size with axes
replace 1:length(A) with eachindex(A), or in some cases LinearIndices(A)
replace explicit allocations like Array{Int}(undef, size(B)) with similar(Array{Int}, axes(B))


Julia arrays
https://danmackinlay.name/notebook/julia_arrays.html


Multi-dimensional Arrays
https://docs.julialang.org/en/v1/manual/arrays/


Copying Arrays in Julia
http://www.cristinagreen.com/copying-arrays-in-julia.html


What is the difference between copy() and deepcopy()?
https://discourse.julialang.org/t/what-is-the-difference-between-copy-and-deepcopy/3918/2

b = deepcopy(a) keeps unwrapping any mutables inside of ‘a’ until it reaches all the immutables at all the levels, and copies all the data and structure of the old object to a new object.

b = a copies ‘a’ by reference, so ‘b’ and ‘a’ refer to the same object. Therefore b.field1 = 2 makes a.field1 == 2 true.


Explanation of Deep and Shallow Copying
https://www.cs.utexas.edu/~scottm/cs307/handouts/deepCopying.htm

A shallow copy can be made by simply copying the reference.

A deep copy means actually creating a new array and copying over the values.


Python - 淺複製(shallow copy)與深複製(deep copy)
https://ithelp.ithome.com.tw/articles/10221255

淺複製僅複製容器中元素的地址
深複製完全複製了一份副本，容器與容器中的元素地址都不一樣
一般 copy
三種方法

b = list(a)
b = a[:]
b = a.copy() PS: 淺複製Shallow copy


淺複製與深複製 Shallow copy and deep copy 的差別
淺複製與深複製的關鍵差別在於，複製的變數中是否有可變型別

#%% Shallow copy and deep copy
import copy
a = [1, [2,3]]
a_ref = a
a_shallowcopy = copy.copy(a)
a_deepcopy = copy.deepcopy(a)

a[0] = 4
其中 a[0] 為數字，即為不可變型別，則深／淺複製沒有差別

a[1][1] = 5
其中 a[1][1] 為list，即為可變型別，可以發現淺複製 (shallow copy) 被改變了，而深複製 (deep copy) 則沒有被改變，故得知:

淺/深複在製第一層變數均已指向不同記憶體
BUT!!!
淺複製在第二層變數仍與原始變數指向相同記憶體
深複製在第二層變數已指向不同記憶體
深複製 (deep copy) 建立一份完全獨立的變數


23.
Julia Performance Tips 
https://docs.julialang.org/en/v1/manual/performance-tips/

More dots: Fuse vectorized operations
Julia has a special dot syntax that converts any scalar function into a "vectorized" function call, and any operator into a "vectorized" operator, with the special property that nested "dot calls" are fusing: they are combined at the syntax level into a single loop, without allocating temporary arrays. If you use .= and similar assignment operators, the result can also be stored in-place in a pre-allocated array (see above).

In a linear-algebra context, this means that even though operations like vector + vector and vector * scalar are defined, it can be advantageous to instead use vector .+ vector and vector .* scalar because the resulting loops can be fused with surrounding computations. For example, consider the two functions:

julia> f(x) = 3x.^2 + 4x + 7x.^3;

julia> fdot(x) = @. 3x^2 + 4x + 7x^3 # equivalent to 3 .* x.^2 .+ 4 .* x .+ 7 .* x.^3;

Both f and fdot compute the same thing. However, fdot (defined with the help of the @. macro) is significantly faster when applied to an array:

julia> x = rand(10^6);

julia> @time f(x);
  0.019049 seconds (16 allocations: 45.777 MiB, 18.59% gc time)

julia> @time fdot(x);
  0.002790 seconds (6 allocations: 7.630 MiB)

julia> @time f.(x);
  0.002626 seconds (8 allocations: 7.630 MiB)

That is, fdot(x) is ten times faster and allocates 1/6 the memory of f(x), because each * and + operation in f(x) allocates a new temporary array and executes in a separate loop. (Of course, if you just do f.(x) then it is as fast as fdot(x) in this example, but in many contexts it is more convenient to just sprinkle some dots in your expressions rather than defining a separate function for each vectorized operation.)

https://github.com/JuliaLang/julia/commit/51bb96857d26f67e62f0edc4fc4682a156cb3d08

a new temporary array and executes in a separate loop. In this example
`f.(x)` is as fast as `fdot(x)` but in many contexts it is more
convenient to sprinkle some dots in your expressions than to
define a separate function for each vectorized operation.


24.
Why is this broadcast operation slower than a nested for-loop
https://stackoverflow.com/questions/72753803/why-is-this-broadcast-operation-slower-than-a-nested-for-loop


using BenchmarkTools

function bounds_error(x, xl)
    num_x_rows = size(x,1)
    num_dim = size(xl, 1)
    for i in 1:num_x_rows
        for j in 1:num_dim
            if (x[i, j] < xl[j,1] || x[i,j] > xl[j,2])
                return true
            end
        end
    end
    return false
end

function bounds_error2(x, xl)
    for row in eachrow(x)
        xlt = transpose(xl)
        if any(row .< xlt[1, :]) == true || any(row .> xlt[2, :])
            return true
        end
    end
    return false
end

#number of rows in xl (or xlimits) will always be equal to number of columns in x

xl =  [     -5.0  5.0
            -5.0  5.0
            -5.0  5.0]

x = [1.0 2.0 3.0; 
     4.0 5.0 6.0]


The main reason for this difference is memory allocations (0 vs. 12 here).

#  20.645 ns (0 allocations: 0 bytes)
# 347.870 ns (12 allocations: 704 bytes)
Currently, slices in Julia create a copy, so xlt[1,:] and xlt[2,:] allocates memory. To remedy this problem you should use @views. The second issue is the element-wise comparisons row .< xlt[1,:] and row .> xlt[2,:] create a temporary Boolean array. To avoid allocation of a temporary array, you should map any(t->t[1]<t[2], zip(row,xl1)) so that the comparison is done one element at a time like a loop.

After applying these tips, the performance difference on my machine is now about 2ns only, which accounts for the convenience of eachrow, zip, etc. instead of manual loops.

Note, for the first function, you can use axes() to loop over first or second dimension conveniently. And when benchmarking any Julia code with BenchmarkTools.jl, don't forget to interpolate ($) all variable names of a function to avoid working on global variables.


function bounds_error(x, xl)
    for i in axes(x,1)
        for j in axes(xl, 1)
            if (x[i, j] < xl[j,1] || x[i,j] > xl[j,2])
                return true
            end
        end
    end
    return false
end

@views function bounds_error2(x, xl)
    xl1, xl2 = xl[:,1], xl[:,2]
    for row in eachrow(x)
        if any(t->t[1]<t[2], zip(row,xl1)) || any(t->t[1]>t[2], zip(row,xl2))
            return true
        end
    end
    return false
end

@btime bounds_error($x, $xl)  #  8.100 ns (0 allocations: 0 bytes)
@btime bounds_error2($x, $xl) # 10.800 ns (0 allocations: 0 bytes)


While allocations make the difference for these particular inputs, in general the loop will be faster because it bails out immediately when it finds a value that is outside the limits, while the broadcasted version checks both entire arrays. This will be much more important than the allocations. – 
DNF -  Jun 25 at 14:09

That's true, and is closely related to allocations as you said, any will wait for a whole row comparison to start its check. – 
AboAmmar  Jun 25 at 14:13


25.
Broadcasting is much slower than a for loop #28126
https://github.com/JuliaLang/julia/issues/28126

julia> using BenchmarkTools

julia> function foo(a::Vector{T}, b::Vector{T}, c::Vector{T}, d::Vector{T}, e::Vector{T}) where T
           @. a = b + 0.1 * (0.2c + 0.3d + 0.4e)
           nothing
       end
foo (generic function with 1 method)

julia> function goo(a::Vector{T}, b::Vector{T}, c::Vector{T}, d::Vector{T}, e::Vector{T}) where T
           @assert length(a) == length(b) == length(c) == length(d) == length(e)
           @inbounds for i in eachindex(a)
               a[i] = b[i] + 0.1 * (0.2c[i] + 0.3d[i] + 0.4e[i])
           end
           nothing
       end
goo (generic function with 1 method)

julia> a,b,c,d,e=(rand(1000) for i in 1:5)
Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##9#10"))}(getfield(Main, Symbol("##9#10"))(), 1:5)

julia> @btime foo($a,$b,$c,$d,$e)
  1.277 μs (0 allocations: 0 bytes)

julia> @btime goo($a,$b,$c,$d,$e)
  345.568 ns (0 allocations: 0 bytes)


Workaround #28126, support SIMDing broadcast in more cases #30973
https://github.com/JuliaLang/julia/pull/30973


26.
Difference between Base and Core
https://discourse.julialang.org/t/difference-between-base-and-core/37426

julia> Base.Int
Int64

julia> Core.Int
Int64

julia> Base.Int == Core.Int
true

julia> Int.name.module
Core


I didn’t know either until I watched this last year starting around the 9:00 mark. https://youtu.be/TPuJsgyu87U?t=542
It’s because some parts have to be duplicated so the necessary compiler internals can work, or something to that effect.

Core is what’s defined in C as the very core of the language. 
There is very little there. 
It’s used to bootstrap the rest of the language by gradually defining more and more of Base in terms of 
what was defined before. 
Core is kind of an implementation detail that users should never need to interact with.


Core.AbstractArray 
https://docs.julialang.org/en/v1/base/arrays/#Core.AbstractArray

Core.Array 
https://docs.julialang.org/en/v1/base/arrays/#Core.Array


27.
Graph computing benchmarks: comparing the scalability of Dask, Dagger.jl, Tensorflow and Julius
https://discourse.julialang.org/t/graph-computing-benchmarks-comparing-the-scalability-of-dask-dagger-jl-tensorflow-and-julius/80745

https://juliustechco.github.io/JuliusGraph/dev/pages/t007_benchmark.html#Tutorial-7:-Graph-Computing-Benchmark-1

https://gist.github.com/jpsamaroo/95c78b3361ae454a51916183f2cf346f

https://github.com/JuliusTechCo/JuliusGraph


28.
UQ MATH2504 - Programming of Simulation, Analysis, and Learning Systems - (Semester 2 2022)
https://courses.smp.uq.edu.au/MATH2504/2022/lectures_html/lecture-unit-2.html

Where the O(⋅) here follow Big O notation.

Priority queues, heaps, and back to sorting


29.
Julia v1.8 @assume_effects
Quick intro to the new effect analysis of Julia compiler
https://aviatesk.github.io/posts/effects-analysis/index.html


Background:

Julia compiler is powered by abstract interpretation, which is powered by constant propagation
constant propagation == inject constant information into abstract interpretation
But constant prop can be slow!

Idea: replace abstract interpretation with const prop actual execution (i.e. concrete evaluation) instead!
The effect analysis is an technique to check when it is valid to perform concrete evaluation

Results:
huge compiler performance improvements,
great runtime performance improvement,
and it also gives you a more fine grained control for compiler behaviors!


https://docs.julialang.org/en/v1.9-dev/base/base/#Base.@assume_effects


R: Constant Propagation
https://cran.r-project.org/web/packages/rco/vignettes/opt-constant-propagation.html

For example, consider the following code:
x <- 14
y <- 7 - x / 2
z <- y * (28 / x + 2) - x

Here, x is assigned a constant, and thus, can be propagated (three times). Propagating yields:
x <- 14
y <- 7 - 14 / 2
z <- y * (28 / 14 + 2) - 14

Constant propagation enables the code to assign static values, which is faster than looking up and copying the value of a variable, 
and also saves time by eliminating assigning a value to a variable that is itself subsequently used only to propagate that value throughout the code. 
In some cases, copy propagation itself may not provide direct optimizations, but simply facilitates other transformations, 
such as [constant folding], code motion, and [dead code elimination].

https://cran.r-project.org/web/packages/rco/vignettes/opt-constant-folding.html

https://cran.r-project.org/web/packages/rco/vignettes/opt-dead-code.html


Constant Propagation
https://homes.cs.washington.edu/~bodik/ucb/cs264/lectures/4-chaotic-notes.pdf


Constant Propagation in Compiler Design
https://www.geeksforgeeks.org/constant-propagation-in-complier-design/

Constant Propagation is one of the local code optimization technique  in Compiler Design. 
It can be defined as the process of replacing the constant value of variables in the expression. 
In simpler words, we can say that if  some value is assigned a known constant, than we can simply replace the that value by constant. 
Constants assigned to a variable can be propagated through the flow graph and can be replaced when the variable is used. 

Constant propagation is executed using reaching definition analysis results in compilers, 
which means that if reaching definition of all variables have same assignment which assigns a same constant to the variable, 
then the variable has a constant value and can be substituted with the constant. 

Suppose we are using pi variable and assign it value of 22/7


pi = 22/7 = 3.14


In the above code the compiler has to first perform division operation, 
which is an expensive operation and then assign the computed result 3.14 to the variable pi. 
Now if anytime we have to use this constant value of pi, then the compiler again has to look – up for the value and again 
perform division operation and then assign it to pi and then use it. 
This is not a good idea when we can directly assign the value 3.14 to pi variable, thus reducing the time needed for code to run.  

Also, Constant propagation reduces the number of cases where values are directly copied from one location or variable to another, in 
order to simply allocate their value to another variable. For an example :
Consider the following pseudocode :  


a = 30
b = 20 - a /2
c = b * ( 30 / a + 2 ) -  a


We can see that in the first expression value of a have assigned a constant value that is 30. 
Now, when the compiler comes to execute the second expression it encounters a, 
so it goes up to the first expression to look for the value of a and then assign the value of 30 to a again, 
and then it executes the second expression. Now it comes to the third expression and encounters b and a again, 
and then it needs to evaluate the first  and second expression again in order to compute the value of c. 
Thus, a needs to be propagated 3 times This procedure is very time consuming.

We can instead , rewrite the same code as :


a = 30
b = 20 - 30/2
c = b * ( 30 / 30 + 2) - 30


This updated code is faster as compared to the previous code as the compiler does not need to again and again go back to the previous expressions 
looking up and copying the value of a variable in order to compute the current expressions. 
This saves a lot of time and thus, reducing time complexity and perform operations more efficiently.   

Note that this constant propagation technique behavior depends on compiler 
like few compilers perform constant propagation operations within the basic blocks;  
while a few compilers perform constant propagation operations in more complex control flow. 


About @assume_effects
https://discourse.julialang.org/t/about-assume-effects/84668/2


Take purity modeling seriously #43852
https://github.com/JuliaLang/julia/pull/43852


reland "effects: add effects analysis for array construction" (#46282)
https://github.com/JuliaLang/julia/commit/3abe00a395830243b4669903ba6a85317e6f538f


30.
separate codegen/LLVM from julia runtime #41936
https://github.com/JuliaLang/julia/pull/41936

This is a follow-up to #39129. 
The current state of the PR is that by default, everything works as normal, 
except during the build we generate both libjulia-internal (which does not link against LLVM) and libjulia-codegen (which does). 
The loader attempts to load libjulia-codegen after libjulia-internal. 
If that fails, it populates the codegen entry points with fallbacks in libjulia-internal that do nothing. 
That means you can simply delete libjulia-codegen, start with --compile=no, 
and enjoy a compiler-free julia runtime (assuming you don't actually need the compiler of course 😄 ).

This probably still needs a bunch of cleanup, and it also tends to segfault with --compile=no, which I assume can be fixed.

All in all, the loader framework we have is quite a nice way to separate the system into multiple optional components, 
and I foresee doing more of that. 
The parser and front-end would be another logical component to separate (and then ultimately replace with a new implementation written in julia and separately compiled!). 
It would be nice if we could streamline adding plugins a bit --- naturally, 
we thought the loader would only ever load one thing, so a few too many places need to be modified to add another.


/*#-/*#-/*#-/*#-/*#-/*#-/*#-/*#-/*#-/*#-/*#-/*#-/*#-/*#-/*#-/*#-/*#-/*#-/*#-/*#-/*#-/

[Python part]

1.
多執行緒 — Python Threading
https://medium.com/ching-i/%E5%A4%9A%E5%9F%B7%E8%A1%8C%E7%B7%92-python-threading-52e1dfb3d5c9

https://github.com/chingi071/Thread

多執行緒
https://medium.com/ching-i/%E5%A4%9A%E5%9F%B7%E8%A1%8C%E7%B7%92-de16f92944c8


2.
Concurrency in Python - Quick Guide
https://www.tutorialspoint.com/concurrency_in_python/concurrency_in_python_quick_guide.htm


3.
Python Concurrency Tutorial
https://medium.com/@santiagobasulto/python-concurrency-tutorial-a5a8aee3b595

https://github.com/santiagobasulto/pycon-concurrency-tutorial-2020


4.
Python 的 concurrency 和 parallelization
https://medium.com/@alan81920/python-%E7%9A%84-concurrency-%E5%92%8C-parallelization-efeddcb30c4c


5.
Asynchronous Code
https://fastapi.tiangolo.com/async/#technical-details


6.
asyncio — Asynchronous I/O
https://docs.python.org/3.9/library/asyncio.html


https://bbc.github.io/cloudfit-public-docs/asyncio/asyncio-part-1.html


https://djangostars.com/blog/asynchronous-programming-in-python-asyncio/


https://www.datacamp.com/tutorial/asyncio-introduction


7.
Big O notation

Big O Notation and Algorithm Analysis with Python Examples
https://stackabuse.com/big-o-notation-and-algorithm-analysis-with-python-examples/


The Big O Notation
https://towardsdatascience.com/big-o-notation-32fb458e5260


All You Need to Know About Big O Notation [Python Examples]
https://skerritt.blog/big-o/


Difference between Big-O and Little-O Notation
https://stackoverflow.com/questions/1364444/difference-between-big-o-and-little-o-notation


What is Big O Notation Explained: Space and Time Complexity
https://www.freecodecamp.org/news/big-o-notation-why-it-matters-and-why-it-doesnt-1674cfa8a23c/

Big O, Little O, Omega & Theta

Big O: “f(n) is O(g(n))” iff for some constants c and N₀, f(N) ≤ cg(N) for all N > N₀

Omega: “f(n) is Ω(g(n))” iff for some constants c and N₀, f(N) ≥ cg(N) for all N > N₀

Theta: “f(n) is Θ(g(n))” iff f(n) is O(g(n)) and f(n) is Ω(g(n))

Little O: “f(n) is o(g(n))” iff f(n) is O(g(n)) and f(n) is not Θ(g(n))

—Formal Definition of Big O, Omega, Theta and Little O
In plain words:

Big O (O()) describes the upper bound of the complexity.
Omega (Ω()) describes the lower bound of the complexity.
Theta (Θ()) describes the exact bound of the complexity.
Little O (o()) describes the upper bound excluding the exact bound.

For example, the function g(n) = n² + 3n is O(n³), o(n⁴), Θ(n²) and Ω(n). But you would still be right if you say it is Ω(n²) or O(n²).

Generally, when we talk about Big O, what we actually meant is Theta. It is kind of meaningless when you give an upper bound that is way larger than the scope of the analysis. This would be similar to solving inequalities by putting ∞ on the larger side, which will almost always make you right.


Analysis of Algorithms | Big-O analysis
https://www.geeksforgeeks.org/analysis-algorithms-big-o-analysis/


Understanding Big O notation
https://python.plainenglish.io/understanding-the-big-o-notation-5041f25a3f91


Difference between Big-O and Little-o Notations
https://www.baeldung.com/cs/big-o-vs-little-o-notation


Big O Notation Explained with Examples
https://www.freecodecamp.org/news/big-o-notation-explained-with-examples/


Big O Notations
https://juliatorrejon.com/documentation/big-o-notation/