Multiprocess utilization #8

mirth · 2019-01-15T16:54:59Z

It seems that adding more cores doesn't improve speed proportionally. Why is that?

xgfs · 2019-01-15T17:35:04Z

Are you sure you add cores rather than threads? What is the size of the graph you are trying to process? Most likely it's a memory issue.

mirth · 2019-01-16T21:06:03Z

I increase threads parameter while running on 64 core machine.

xgfs · 2019-01-17T07:07:11Z

If the size of the graph is really large, there might still be memory issues, especially in NUMA architectures. Can you (roughly) specify the size of the graph, and memory system used?

mirth · 2019-01-18T18:42:27Z

Sorry for not starting with numbers first.

./deepwalk -input in.bcsr -output out.model -threads 64 -dim 300 -nwalks 1000 -walklen 5 -window 3 -seed 4242 -verbose 2
nv: 747556, ne: 861549483
PR estimate complete
Using vectorized operations
Constructing HSM tree...
Done! Average code size: 20.7494
lr 0.000002, Progress 100.00%
Calculations took 8225.28 s to run

./deepwalk -input in.bcsr -output out.model -threads 8 -dim 300 -nwalks 1000 -walklen 5 -window 3 -seed 4242 -verbose 2
nv: 747556, ne: 861549483
PR estimate complete
Using vectorized operations
Constructing HSM tree...
Done! Average code size: 20.7604
lr 0.000002, Progress 100.00%
Calculations took 12681.46 s to run

I ran it on 64 cores & 256Gb mem ec2 instance. The graph bcsr file is 3.3Gb. The process consumes 5.5Gb of memory while runnnig.

xgfs · 2019-01-20T17:26:52Z

From this limited information my most probably cause is memory access time. There is single DRAM contoller fetching graph (and embedding) parts from the memory, causing a bottleneck. This is probably not solvable unless you change the algorithm. If you are okay with higher memory consumption, you can consider an implementation that caches random walks in the memory.

On completely different issue, parameters you use seem quite a bit off. Why are you chaning the defaults?

mirth · 2019-01-21T10:56:51Z

You mean -nwalks 1000 -walklen 5 -window 3? I just playing with parameters to see how it behaves.

xgfs · 2019-01-23T12:55:34Z

If you want to see where the problem comes from, I would recommend running the process under GNU perf (tutorial here http://www.brendangregg.com/perf.html). I would expect that the running time is dominated with the memory access.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiprocess utilization #8

Multiprocess utilization #8

mirth commented Jan 15, 2019

xgfs commented Jan 15, 2019

mirth commented Jan 16, 2019

xgfs commented Jan 17, 2019

mirth commented Jan 18, 2019

xgfs commented Jan 20, 2019

mirth commented Jan 21, 2019

xgfs commented Jan 23, 2019

Multiprocess utilization #8

Multiprocess utilization #8

Comments

mirth commented Jan 15, 2019

xgfs commented Jan 15, 2019

mirth commented Jan 16, 2019

xgfs commented Jan 17, 2019

mirth commented Jan 18, 2019

xgfs commented Jan 20, 2019

mirth commented Jan 21, 2019

xgfs commented Jan 23, 2019