MKL-DNN #157

MikeInnes · 2018-01-25T15:38:24Z

We may be able to take advantage of Intel's high-performance deep learning primitives on CPUs – for example, for optimised convolutions.

Worth looking at DAAL, which we already have a wrapper for and MKL-DNN.

Shivanshmundra · 2018-02-11T10:18:46Z

Hey @MikeInnes, I am new to this but I seriously willing to work on this.
Can you please suggest some starting points or resources to begin with?

Shivanshmundra · 2018-02-12T14:06:57Z

?

MikeInnes · 2018-02-13T11:22:06Z

I'm not sure what I can provide aside from the links above; you'll have to do a little research to figure out how to get MKL-DNN running. It may be helpful to see how the wrapping works for CUDNN.

mbrookhart · 2018-04-13T03:08:35Z

Hi @MikeInnes,

How would you feel about an integration with https://github.com/NervanaSystems/ngraph instead of mkldnn?

I'm a developer on that project at Intel, but this isn't really an official offer from Intel (all opinions expressed are my own). I'm interested in working more on the Julia ML ecosystem, especially how to extend Julia ML with existing C++ code. I've got experience with Julia frontend work and C++ backend work, but not the intermediaries :)

Matthew

ViralBShah · 2018-04-13T03:42:45Z

Does ngraph give good performance on Xeon CPUs too, or do you still MKL-DNN in addition?

jekbradbury · 2018-04-13T08:05:33Z

Flux is more similar to PyTorch than to graph-based deep learning frameworks like TensorFlow, so an ngraph integration would probably look a lot like FluxJS, i.e. a combination of tracing and compilation. It may or may not be easier to target ONNX as an intermediate representation; I’m not familiar enough with ngraph to know what the tradeoffs there would be.
But while building exporters on top of Flux is fantastic it’s probably still worth it to try to wrap MKLDNN at some point as an alternate implementation for CPU convolutions in NNlib so that native Flux runs as fast as it can on CPU.

mbrookhart · 2018-04-13T13:32:14Z

nGraph is positioned as a multi-hardware graph compiler and an intermediate layer of abstraction between the framework and the hardware. The goal is to make it easier to integrate new hardware accelerators as they come online. We're using mkldnn in combination with a few other libraries to get good performance on CPU, often better than what can achieve with direct integration of mkldnn because we can perform more complex optimisation/fusion.

That being said, it is a C++ abstraction layer, and may not fit in with the pure Julia stack you guys are going for. I'm simultaneously looking at ngraph and mkldnn bridges and the Flux code to see if one or the other makes more sense.

ViralBShah · 2018-04-13T14:10:42Z

The idea is not so much, in my mind, to have a pure Julia stack, as much as having a fast moving flexible set of tools. So, all the functionality should be pure Julia, but adding acceleration through other libraries should be ok - so long as they are not mandatory.

mbrookhart · 2018-04-13T15:29:31Z

I'm in the initial investigation stage, but with the beauty of multiple dispatch, I think I can extend the Flux API with nGraph in a minimally invasive way, mkldnn may be harder due to it's memory primitives. Either way though, it's a side project, so I'll let you know when I have a proof of concept.

mbrookhart · 2018-04-13T15:32:14Z

Also, I really appreciate the ideas you guys put forth in https://julialang.org/blog/2017/12/ml&pl. I definitely don't want to add something that would detract from that vision.

ViralBShah · 2018-04-13T18:09:16Z

Thank you. We want to grow the group and the thinking around it. You nailed why multiple dispatch makes this all clean.

MikeInnes · 2018-04-14T00:40:25Z

Hey @mbrookhart, this is great, I'm excited to see nGraph integration. As I see it, there are a several stages of integration, which naturally build on each other:

Exposing the core nGraph API and functionality as a Julia package (so that it's possible to build and run computations by hand).
Supporting conversion from Flux layers -> nGraph for a little more automation (I can help figure out the right approach for this)
Exposing compiled graphs as Flux layers (so models can mix nGraph and Julia code, reuse Flux's optimisers etc)
Having Flux find models or sub-graphs that can be compiled to nGraph as a transparent optimisation

Steps 1-3 at least can be done without touching Flux's core code at all, and the nGraph package might well find other uses across the Julia ecosystem. Hope that makes sense, and I'm happy to clarify that or discuss other approaches, or help if you have any other questions.

mbrookhart · 2018-04-16T15:47:00Z

Hi @MikeInnes,

That mirrors the plan in my head pretty explicitly. 1 is a work in progress with some design thrashing at the moment, 2 and 3 should be relatively easy, 4 may or may not provide a larger challenge. I think it's about the same amount of work as integrating mkldnn directly, but we get Intel NNP support for free when that's more widely available. :)

I'm working on an nGraph.jl package in my spare time right now (still a local repo, haven't pushed to github yet). I have a simple forward pass of A * B working from Julia->nGraph. I'm finding that wrapping nGraph with Cxx.jl is pretty rough (the template and shared pointer design of nGraph isn't handled very well), so I might take a break to rework my interface with CxxWrap.jl or a C-API around nGraph.

As soon as I have a relatively Julian interface that can run a forward pass MLP, I'll push the repo and we can start talking about how to integrate nGraph's autodiff with Julia's autograd (that's probably the biggest open question in my head).

Thanks for the support, and I'm excited to see the expansion of the Julia ML community as we go through this!

Matthew

MikeInnes · 2018-04-16T15:58:56Z

Great, feel free to open issues on Cxx.jl if there are things that might be improved there.

Hooking the AD should be reasonably straightforward I think, it's similar to e.g. hooking CUDNN's LSTM implementations. Not sure that code isn't more confusing than helpful thanks to the CUDNN business, but it's on my todo list to write more docs on this stuff.

MikeInnes · 2019-03-26T14:45:56Z

Our more recent thinking is that MKL-DNN is pretty heavy for this, but it'd actually be really neat (and fairly transparent) to do the same blocking optimisations as a custom array type (FluxML/NNlib.jl#97). We also have NNPACK on the way (FluxML/NNlib.jl#67) for faster non-blocked convs.

MikeInnes added the help wanted label Jan 25, 2018

MikeInnes mentioned this issue Oct 23, 2018

Use oneDNN FluxML/NNlib.jl#74

Open

MikeInnes closed this as completed Mar 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MKL-DNN #157

MKL-DNN #157

MikeInnes commented Jan 25, 2018

Shivanshmundra commented Feb 11, 2018

Shivanshmundra commented Feb 12, 2018

MikeInnes commented Feb 13, 2018

mbrookhart commented Apr 13, 2018

ViralBShah commented Apr 13, 2018

jekbradbury commented Apr 13, 2018 •

edited by MikeInnes

Loading

mbrookhart commented Apr 13, 2018 •

edited

Loading

ViralBShah commented Apr 13, 2018

mbrookhart commented Apr 13, 2018

mbrookhart commented Apr 13, 2018

ViralBShah commented Apr 13, 2018

MikeInnes commented Apr 14, 2018

mbrookhart commented Apr 16, 2018

MikeInnes commented Apr 16, 2018

MikeInnes commented Mar 26, 2019

MKL-DNN #157

MKL-DNN #157

Comments

MikeInnes commented Jan 25, 2018

Shivanshmundra commented Feb 11, 2018

Shivanshmundra commented Feb 12, 2018

MikeInnes commented Feb 13, 2018

mbrookhart commented Apr 13, 2018

ViralBShah commented Apr 13, 2018

jekbradbury commented Apr 13, 2018 • edited by MikeInnes Loading

mbrookhart commented Apr 13, 2018 • edited Loading

ViralBShah commented Apr 13, 2018

mbrookhart commented Apr 13, 2018

mbrookhart commented Apr 13, 2018

ViralBShah commented Apr 13, 2018

MikeInnes commented Apr 14, 2018

mbrookhart commented Apr 16, 2018

MikeInnes commented Apr 16, 2018

MikeInnes commented Mar 26, 2019

jekbradbury commented Apr 13, 2018 •

edited by MikeInnes

Loading

mbrookhart commented Apr 13, 2018 •

edited

Loading