Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MKL-DNN #157

Closed
MikeInnes opened this issue Jan 25, 2018 · 15 comments
Closed

MKL-DNN #157

MikeInnes opened this issue Jan 25, 2018 · 15 comments

Comments

@MikeInnes
Copy link
Member

We may be able to take advantage of Intel's high-performance deep learning primitives on CPUs – for example, for optimised convolutions.

Worth looking at DAAL, which we already have a wrapper for and MKL-DNN.

@Shivanshmundra
Copy link

Hey @MikeInnes, I am new to this but I seriously willing to work on this.
Can you please suggest some starting points or resources to begin with?

@Shivanshmundra
Copy link

?

@MikeInnes
Copy link
Member Author

I'm not sure what I can provide aside from the links above; you'll have to do a little research to figure out how to get MKL-DNN running. It may be helpful to see how the wrapping works for CUDNN.

@mbrookhart
Copy link

Hi @MikeInnes,

How would you feel about an integration with https://github.com/NervanaSystems/ngraph instead of mkldnn?

I'm a developer on that project at Intel, but this isn't really an official offer from Intel (all opinions expressed are my own). I'm interested in working more on the Julia ML ecosystem, especially how to extend Julia ML with existing C++ code. I've got experience with Julia frontend work and C++ backend work, but not the intermediaries :)

Matthew

@ViralBShah
Copy link
Member

Does ngraph give good performance on Xeon CPUs too, or do you still MKL-DNN in addition?

@jekbradbury
Copy link
Contributor

jekbradbury commented Apr 13, 2018

Flux is more similar to PyTorch than to graph-based deep learning frameworks like TensorFlow, so an ngraph integration would probably look a lot like FluxJS, i.e. a combination of tracing and compilation. It may or may not be easier to target ONNX as an intermediate representation; I’m not familiar enough with ngraph to know what the tradeoffs there would be.
But while building exporters on top of Flux is fantastic it’s probably still worth it to try to wrap MKLDNN at some point as an alternate implementation for CPU convolutions in NNlib so that native Flux runs as fast as it can on CPU.

@mbrookhart
Copy link

mbrookhart commented Apr 13, 2018

nGraph is positioned as a multi-hardware graph compiler and an intermediate layer of abstraction between the framework and the hardware. The goal is to make it easier to integrate new hardware accelerators as they come online. We're using mkldnn in combination with a few other libraries to get good performance on CPU, often better than what can achieve with direct integration of mkldnn because we can perform more complex optimisation/fusion.

That being said, it is a C++ abstraction layer, and may not fit in with the pure Julia stack you guys are going for. I'm simultaneously looking at ngraph and mkldnn bridges and the Flux code to see if one or the other makes more sense.

@ViralBShah
Copy link
Member

The idea is not so much, in my mind, to have a pure Julia stack, as much as having a fast moving flexible set of tools. So, all the functionality should be pure Julia, but adding acceleration through other libraries should be ok - so long as they are not mandatory.

@mbrookhart
Copy link

I'm in the initial investigation stage, but with the beauty of multiple dispatch, I think I can extend the Flux API with nGraph in a minimally invasive way, mkldnn may be harder due to it's memory primitives. Either way though, it's a side project, so I'll let you know when I have a proof of concept.

@mbrookhart
Copy link

Also, I really appreciate the ideas you guys put forth in https://julialang.org/blog/2017/12/ml&pl. I definitely don't want to add something that would detract from that vision.

@ViralBShah
Copy link
Member

Thank you. We want to grow the group and the thinking around it. You nailed why multiple dispatch makes this all clean.

@MikeInnes
Copy link
Member Author

Hey @mbrookhart, this is great, I'm excited to see nGraph integration. As I see it, there are a several stages of integration, which naturally build on each other:

  1. Exposing the core nGraph API and functionality as a Julia package (so that it's possible to build and run computations by hand).
  2. Supporting conversion from Flux layers -> nGraph for a little more automation (I can help figure out the right approach for this)
  3. Exposing compiled graphs as Flux layers (so models can mix nGraph and Julia code, reuse Flux's optimisers etc)
  4. Having Flux find models or sub-graphs that can be compiled to nGraph as a transparent optimisation

Steps 1-3 at least can be done without touching Flux's core code at all, and the nGraph package might well find other uses across the Julia ecosystem. Hope that makes sense, and I'm happy to clarify that or discuss other approaches, or help if you have any other questions.

@mbrookhart
Copy link

Hi @MikeInnes,

That mirrors the plan in my head pretty explicitly. 1 is a work in progress with some design thrashing at the moment, 2 and 3 should be relatively easy, 4 may or may not provide a larger challenge. I think it's about the same amount of work as integrating mkldnn directly, but we get Intel NNP support for free when that's more widely available. :)

I'm working on an nGraph.jl package in my spare time right now (still a local repo, haven't pushed to github yet). I have a simple forward pass of A * B working from Julia->nGraph. I'm finding that wrapping nGraph with Cxx.jl is pretty rough (the template and shared pointer design of nGraph isn't handled very well), so I might take a break to rework my interface with CxxWrap.jl or a C-API around nGraph.

As soon as I have a relatively Julian interface that can run a forward pass MLP, I'll push the repo and we can start talking about how to integrate nGraph's autodiff with Julia's autograd (that's probably the biggest open question in my head).

Thanks for the support, and I'm excited to see the expansion of the Julia ML community as we go through this!

Matthew

@MikeInnes
Copy link
Member Author

Great, feel free to open issues on Cxx.jl if there are things that might be improved there.

Hooking the AD should be reasonably straightforward I think, it's similar to e.g. hooking CUDNN's LSTM implementations. Not sure that code isn't more confusing than helpful thanks to the CUDNN business, but it's on my todo list to write more docs on this stuff.

@MikeInnes
Copy link
Member Author

Our more recent thinking is that MKL-DNN is pretty heavy for this, but it'd actually be really neat (and fairly transparent) to do the same blocking optimisations as a custom array type (FluxML/NNlib.jl#97). We also have NNPACK on the way (FluxML/NNlib.jl#67) for faster non-blocked convs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants