merge with torch #1

elikosan · 2016-12-12T22:46:36Z

No description provided.

reduce and BLAS work

Pass CMAKE_CXX_COMPILER to new select_compute_arch.cmake

This allows to compile THC without cutorch

Move CMake files to THC

scatter / gather to new types

Fix sub and div for integer types

Index* and sort for new types

Use TH_INDEX_BASE in THC

Fixing bug in regex not accepting 2.1(2.0) notation

adding multiple types to squeeze

Magma functions to generic

Implement fmod, remainder, equal in Cutorch

fix memory leak in (equal)

Add half support for addmv and addr.

guard random functions for half

Previously, cutorch would initialize every CUDA device and enable P2P access between all pairs. This slows down start-up, especially with 8 devices. Now, THCudaInit does not initialize any devices and P2P access is enabled lazily. Setting the random number generator seed also does not initialize the device until random numbers are actually used.

* Implemented cudaMemGetInfo for caching allocator

Lazily initialize CUDA devices

Revert "Lazily initialize CUDA devices"

Previously, cutorch would initialize every CUDA device and enable P2P access between all pairs. This slows down start-up, especially with 8 devices. Now, THCudaInit does not initialize any devices and P2P access is enabled lazily. Setting the random number generator seed also does not initialize the device until random numbers are actually used.

Lazily initialize CUDA devices (take 2)

use local modified select_compute_arch.cmake for msvc

Adds a CUDA "sleep" kernel which spins for the given number of iterations. This is useful for testing correct synchronization with streams.

Adds a caching allocator for CUDA pinned (page-locked) memory. This avoid synchronization due to cudaFreeHost or cudaHostUnregister at the expense of potentially higher host memory usage. Correctness is preserved by recording CUDA events after each cudaMemcpyAsync involving the pinned memory. The pinned memory allocations are not reused until all events associated with it have completed.

Add caching allocator for pinned (page-locked) memory

Without this, the cuda_events could continuously grow from calls to cudaMemcpyAsync, but would never be processed if there were no new pinned memory allocations. For example: t1 = cutorch.createCudaHostTensor(10) t2 = torch.CudaTensor(10) while true do t2:copyAsync(t1) end

Process outstanding CUDA events in recordEvent

TensorInfo related code documentation

soumith and others added 30 commits July 29, 2016 01:09

add versioning script

77ed5b6

Merge pull request #456 from torch/more-cutorch-template-types

c2e2047

reduce and BLAS work

new select_compute_arch.cmake file from @BorisFM

3e06381

pass compiler to arch autodetect

28dfdd2

Merge pull request #457 from szagoruyko/cmake-compiler-autodetect

acb7a2e

Pass CMAKE_CXX_COMPILER to new select_compute_arch.cmake

adding expand etc. to other cuda types as well

28e97ee

Move CMake files to THC

8737f8e

This allows to compile THC without cutorch

Merge pull request #458 from apaszke/cmake_move

4c858ff

Move CMake files to THC

scatter / gather to new types

ea1cdf4

Merge pull request #459 from torch/scattergatherish

a7cecfe

scatter / gather to new types

more tests and cmake fix

8550fda

more tests for types

f48dae1

easy: change templated argument to capitalized

109f313

adding indexing for types

7b4dc42

making changes to sort and TopK for the changed index* API

e9a349b

fixing sort to use long indices

aaad09a

Fix sub and div for integer types

da4ccc8

Merge pull request #462 from apaszke/sub_div_fix

5511782

Fix sub and div for integer types

added multiple types for sort

3cd2b3f

fix a small bug

9d5b1b1

fixing backward compatibility for __index__ and __new_index__

b163edb

fixing CudaHalfTensor tests

b528d67

Merge pull request #461 from torch/indextype

e21121a

Index* and sort for new types

fix typo from #462

19310cc

Use TH_INDEX_BASE in THC

2c22745

Merge pull request #467 from apaszke/master

6d4e3c2

Use TH_INDEX_BASE in THC

Fixing bug in regex not accepting 2.1(2.0) notation

c35de2e

Merge pull request #473 from borisfom/cuda_arch_fix

6cc2356

Fixing bug in regex not accepting 2.1(2.0) notation

adding multiple types to squeeze

dfc8c4b

Merge pull request #474 from torch/squeezetype

fe27368

adding multiple types to squeeze

killeent and others added 29 commits November 15, 2016 13:31

[cutorch mag2gen] more cleanup

7914f6e

add support for remainder in cutorch

c65c1da

add support for fmod in cutorch

5652762

Merge pull request #602 from killeent/magma

6d6fe20

Magma functions to generic

add support for equal in cutorch

d708c93

Merge pull request #603 from killeent/remainder

857f6d2

Implement fmod, remainder, equal in Cutorch

fix memory leak in (equal)

45488e2

Merge pull request #604 from killeent/memleak

0afffe1

fix memory leak in (equal)

Add half support for addmv and addr.

cd8e209

Merge pull request #605 from gchanan/halfAddrAddmv

dd86d97

Add half support for addmv and addr.

guard random functions for half

6f40334

Merge pull request #607 from killeent/half-guard

2d75d41

guard random functions for half

Implemented cudaMemGetInfo for caching allocator (#600)

f593224

* Implemented cudaMemGetInfo for caching allocator

Merge pull request #610 from colesbury/lazy

f46ca39

Lazily initialize CUDA devices

remove spurious prints in tests

f8d05d2

Revert "Lazily initialize CUDA devices"

e9e131e

Merge pull request #611 from torch/revert-610-lazy

deff050

Revert "Lazily initialize CUDA devices"

Merge pull request #613 from colesbury/lazy

e2051b6

Lazily initialize CUDA devices (take 2)

use local modified select_compute_arch.cmake for msvc

8d8bbc3

Merge pull request #614 from BTNC/win

ce43bc5

use local modified select_compute_arch.cmake for msvc

Adds a CUDA "sleep" kernel

9d8e13d

Adds a CUDA "sleep" kernel which spins for the given number of iterations. This is useful for testing correct synchronization with streams.

Merge pull request #618 from colesbury/cached_pinned_memory

0267dae

Add caching allocator for pinned (page-locked) memory

Merge pull request #619 from colesbury/cached_pinned_memory_fix

bcbb427

Process outstanding CUDA events in recordEvent

TensorInfo related code documentation

01d7a63

Merge pull request #628 from killeent/more-documentation

e00f7d4

TensorInfo related code documentation

elikosan merged commit ffc32fa into elikosan:master Dec 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge with torch #1

merge with torch #1

elikosan commented Dec 12, 2016

merge with torch #1

merge with torch #1

Conversation

elikosan commented Dec 12, 2016