Throwing segfaults like rice at a wedding #41

odellus · 2018-05-26T05:43:54Z

So I'm running https://github.com/NLPLearn/QANet with tensorflow-upstream and I've had to cut my batch_size down to nothing to fit the model onto the GPU.

This is what happens when I try to train the model:

Building model...
WARNING:tensorflow:From /home/thomas/projects/qas/QANet/layers.py:52: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /home/thomas/projects/qas/QANet/model.py:134: calling softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.
Instructions for updating:
dim is deprecated, use axis instead
WARNING:tensorflow:From /home/thomas/projects/qas/QANet/model.py:174: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

Total number of trainable parameters: 788673
2018-05-25 22:40:06.553366: W tensorflow/stream_executor/rocm/rocm_driver.cc:404] creating context when one is currently active; existing: 0x7f3818e3d580
2018-05-25 22:40:06.553526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1451] Found device 0 with properties: 
name: Device 67df
AMDGPU ISA: gfx803
memoryClockRate (GHz) 1.34
pciBusID 0000:09:00.0
Total memory: 8.00GiB
Free memory: 7.75GiB
2018-05-25 22:40:06.553537: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1562] Adding visible gpu devices: 0
2018-05-25 22:40:06.553548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-25 22:40:06.553567: I tensorflow/core/common_runtime/gpu/gpu_device.cc:995]      0 
2018-05-25 22:40:06.553575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1008] 0:   N 
2018-05-25 22:40:06.553612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1124] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7539 MB memory) -> physical GPU (device: 0, name: Device 67df, pci bus id: 0000:09:00.0)
  0%|                                                 | 0/60000 [00:00<?, ?it/s]2018-05-25 22:40:41.253877: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:41.253878: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:42.476336: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:40:43.035086: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:40:45.046033: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:45.047533: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:46.301007: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:40:46.983404: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:40:47.838168: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:48.067349: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:40:49.404750: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:49.955002: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
Memory access fault by GPU node-1 on address 0x58a404000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)

The authors of the project are using a similar sized GPU (though twice as much desktop RAM/not sure if this is the problem) and aren't having to drop their batch size down around 4 to fit the model on their GPU.
From localminimum/QANet#2
"""
Hi @kamalkraj I uploaded the most recent model pretrained weights (EM/F1 = 70.0/79.4) and you can download it here.

The specification of the system I used is:
CPU: i7-3930K CPU @ 3.20GHz
GPU: GTX1080 (8GB)
RAM: 16GB
Training takes about 5~8 hours depending on your gpu/cpu spec. The model takes about 8 GB gpu memory so if you're using anything bigger than 96 as your hidden unit size then you'll get an OOM error. Or if you are using a preoccupied GPU it will also cause an OOM error.

NOTE: If you are using your desktop GPU, try running it in terminal mode (alt + ctrl + F1) and close all applications that require gpu memory (e.g. Xorg)
sudo service lightdm stop
python config.py --mode train
after training,
sudo service lightdm start
"""
I followed the advice to shut down all the other applications and just use terminal too. Won't fit. Any idea why this is happening? My RX580 is supposed to have the same amount of memory. Curious as to what's going on 😕

The text was updated successfully, but these errors were encountered:

odellus · 2018-05-26T05:50:20Z

This is with batch_size 8.

python3 config.py --mode train
Building model...
WARNING:tensorflow:From /home/thomas/projects/qas/QANet/layers.py:52: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /home/thomas/projects/qas/QANet/model.py:134: calling softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.
Instructions for updating:
dim is deprecated, use axis instead
WARNING:tensorflow:From /home/thomas/projects/qas/QANet/model.py:174: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

Total number of trainable parameters: 788673
2018-05-25 22:46:33.509859: W tensorflow/stream_executor/rocm/rocm_driver.cc:404] creating context when one is currently active; existing: 0x7f1ea390d8b0
2018-05-25 22:46:33.510010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1451] Found device 0 with properties: 
name: Device 67df
AMDGPU ISA: gfx803
memoryClockRate (GHz) 1.34
pciBusID 0000:09:00.0
Total memory: 8.00GiB
Free memory: 7.75GiB
2018-05-25 22:46:33.510022: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1562] Adding visible gpu devices: 0
2018-05-25 22:46:33.510033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-25 22:46:33.510039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:995]      0 
2018-05-25 22:46:33.510044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1008] 0:   N 
2018-05-25 22:46:33.510074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1124] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7539 MB memory) -> physical GPU (device: 0, name: Device 67df, pci bus id: 0000:09:00.0)
  0%|                                                 | 0/60000 [00:00<?, ?it/s]2018-05-25 22:47:11.700225: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:12.282363: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:12.955994: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:12.957020: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:13.599275: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:13.825643: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:15.644044: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:16.185436: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:17.826536: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:18.565328: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:19.299351: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:19.857104: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:20.471238: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:21.041978: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:21.698886: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:21.916486: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:22.196821: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:22.753152: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:23.459215: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:23.549742: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:23.628027: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:24.360281: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:25.088926: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:25.229095: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
  0%|                                     | 1/60000 [00:45<753:04:20, 45.19s/it]2018-05-25 22:47:29.739454: I .
.
.
.
  0%|                                     | 7/60000 [02:08<306:56:10, 18.42s/it]2018-05-25 22:48:53.226017: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:48:53.227340: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:48:53.853328: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:48:55.293085: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:48:56.867052: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:48:57.400217: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:48:58.034644: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:48:58.261886: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:48:59.258544: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:48:59.912278: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
Memory access fault by GPU node-1 on address 0x54be18000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)

Any idea what's going on? All the tests of MIOpen passed, am I in the right place? Is this a HIP issue? Any help or pointing in the right direction sure would be appreciated a whole lot.

odellus · 2018-05-26T05:51:55Z

Background:
My rocminfo:

 rocminfo
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (number of timestamp)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 3 1300X Quad-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0                                  
  Queue Min Size:          0                                  
  Queue Max Size:          0                                  
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768KB                            
  Chip ID:                 0                                  
  Cacheline Size:          64                                 
  Max Clock Frequency (MHz):3500                               
  BDFID:                   0                                  
  Compute Unit:            4                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    8176312KB                          
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8176312KB                          
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx803                             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128                                
  Queue Min Size:          4096                               
  Queue Max Size:          131072                             
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16KB                               
  Chip ID:                 26591                              
  Cacheline Size:          64                                 
  Max Clock Frequency (MHz):1340                               
  BDFID:                   2304                               
  Compute Unit:            36                                 
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          64                                 
  Workgroup Max Size:      1024                               
  Workgroup Max Size Per Dimension:
    Dim[0]:                  67109888                           
    Dim[1]:                  150995968                          
    Dim[2]:                  0                                  
  Grid Max Size:           4294967295                         
  Waves Per CU:            40                                 
  Max Work-item Per CU:    2560                               
  Grid Max Size per Dimension:
    Dim[0]:                  4294967295                         
    Dim[1]:                  4294967295                         
    Dim[2]:                  4294967295                         
  Max number Of fbarriers Per Workgroup:32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8388608KB                          
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64KB                               
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Acessible by all:        FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    AMD:AMDGPU:8:0:3                   
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Dimension: 
        Dim[0]:                  67109888                           
        Dim[1]:                  1024                               
        Dim[2]:                  16777217                           
      Workgroup Max Size:      1024                               
      Grid Max Dimension:      
        x                        4294967295                         
        y                        4294967295                         
        z                        4294967295                         
      Grid Max Size:           4294967295                         
      FBarrier Max Size:       32                                 
*** Done ***

I installed ROCm from binaries a few weeks ago.

I've built MIOpen from source after configuring with CXX=/opt/rocm/hcc/bin/hcc cmake -DMIOPEN_BACKEND=HIP -DMIOPEN_MAKE_BOOST_PUBLIC=ON -DCMAKE_PREFIX_PATH="/opt/rocm/hcc;/opt/rocm/hip" -DCMAKE_CXX_FLAGS="-isystem /usr/include/x86_64-linux-gnu/" -DHALF_INCLUDE_DIR=/home/thomas/code/half/include ..

I would think that this shouldn't be happening since it's a static computational graph we're talking about here, same model for each batch, and if you can allocate the space in memory you need for the data on a single batch, it should be the same for each batch and thus shouldn't be an issue for successive batches so I'm confused.

daniellowell · 2018-05-26T13:43:52Z

@odellus

Memory access fault by GPU node-1 on address 0x54be18000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)

It is not that your GPU is running out of memory. That is a GPU equivalent of a segmentation fault. Let us take a look at this network and see which layer is causing it.

daniellowell · 2018-05-26T16:10:46Z

@odellus What version of Tensorflow and which balranch did you pull from?

odellus · 2018-05-26T16:38:42Z

Default (develop-upstream) branch of ROCm tensorflow-upstream. tf version rocm-1.7.2. On May 26, 2018 09:10, "Daniel Lowell" <[email protected]> wrote: @odellus <https://github.com/odellus> What version of Tensorflow and which balranch did you pull from? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#41 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEeEbFttE2EarnzaM4u_mKEk27IgjTb4ks5t2X6HgaJpZM4UOys9> .

parallelo · 2018-05-27T03:03:12Z

Here are some environment variables that might help point us to a culprit. Just set these and collect all of the verbose output from running your workload.

export HCC_SERIALIZE_KERNEL=0x3
export HCC_SERIALIZE_COPY=0x3
export HIP_TRACE_API=0x3

Also, are you be able to reproduce via one of our pre-built Docker containers using TF1.3? When running with a 1.7.2 rock-dkms on the host, you can try something like this:

docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/tensorflow:rocm1.7.2

parallelo · 2018-05-27T03:13:00Z

Btw/ this ticket appears to be better suited to one of these projects:

odellus · 2018-05-27T03:22:30Z

I was going to submit to tensorflow-upstream, but they don't have an issues tab.

…

On Sat, May 26, 2018, 20:14 Jeff Poznanovic ***@***.***> wrote: Btw/ this ticket appears to be better suited to one of these projects: - https://github.com/ROCmSoftwarePlatform/tensorflow - https://github.com/ROCmSoftwarePlatform/tensorflow-upstream — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#41 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEeEbIGeLV_mzZ1LcgFZ9t9U8T7DUN7Hks5t2hn6gaJpZM4UOys9> .

odellus · 2018-05-27T05:15:06Z

Here's truncated output from setting those system debugging flags and running QANet
https://gist.github.com/odellus/87e2a382c5d6967b3b48ea5fbdf566a6

parallelo · 2018-05-28T02:57:33Z

Just to clarify, I was trying to connect you with the folks who most often deal with framework-level triage. At this point, we don't know whether this particular issue is an MIOpen library problem, or if it is somewhere else in the stack. In this circumstance, the frameworks team often does initial triage.

Edit: Thanks for opening a ticket with the tensorflow repo, we'll take a look.

722feea66 sp/chk precomp kernel 264 (#41) b9aba2034 Batch norm find compile (#50) 359f3da80 Fix missing link directives in fin binary (#48) a4020c1ba Cache Miss Fixes (#46) 2ec7ef44d Enable google test and compiling fin in the CI (#47) 8b6b453bc Applicability support for batch norm (#45) 44323aae9 Perf compile/eval for fin (#42) REVERT: a30a51bc6 remove unused header REVERT: 7d2fd834c reduce scope of variable REVERT: f6e9abe79 clang format REVERT: 834e9a397 remove comment REVERT: c8d6eb1a0 workspace rename REVERT: aa7d2ea24 Merge remote-tracking branch 'origin/develop' into cderb/miopen_perf REVERT: aaf13fb12 add to print for debug REVERT: 34e11fa70 Merge remote-tracking branch 'origin/develop' into cderb/miopen_perf REVERT: cb6c19d13 add search+update directives to execution context, add json examples for perf eval REVERT: 85029077b connecting new fin functions for perf eval REVERT: 4d1e031fd add outputs and definitions REVERT: 952538cb8 adding perf eval function, in progress REVERT: 617dccd9c rename REVERT: 5c35ae886 fixes for collecting kernel blobs REVERT: 5cfea7c43 syntax fixes REVERT: 2f2a4ed9f add test file REVERT: 7175019f5 first rendition of perf_compile git-subtree-dir: fin git-subtree-split: 722feea660e2e3d7f8e1edcc520a938be4885a44

30d699b9e Perf Eval Update (#60) 3535b948c PerfCompile and PerfEval changes (#59) de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56) 6924286a2 miopen hash update (#55) 530399575 Refactor googletest infra to align with MIOpen (#53) 71c50d146 Datatype fix for BN (#57) 8abe2f5c6 Perf Eval updates, Add find info (#51) e1c1ef0f5 filter find compile by solver input (#54) 722feea66 sp/chk precomp kernel 264 (#41) b9aba2034 Batch norm find compile (#50) 359f3da80 Fix missing link directives in fin binary (#48) a4020c1ba Cache Miss Fixes (#46) 2ec7ef44d Enable google test and compiling fin in the CI (#47) 8b6b453bc Applicability support for batch norm (#45) 44323aae9 Perf compile/eval for fin (#42) ebd9aa6bd update member name (#43) d6d798efe add cu count (#39) 8e1989a9f Add find option for selecting only dynamic solvers (#38) 0e164bf66 setting json version (#37) f3f7fed18 Remove function redefinition (#36) e1de51a58 Performance DB de-serialize test (#34) 043cdcdaa Layout support in Fin (#33) 3a1d58236 Hotfix (#32) ee3f0d543 4.4 Tuning Bugfixes (#31) 832dbe234 Tunability Reporting (#27) a564a229f include gfx90a_110 (#28) git-subtree-dir: fin git-subtree-split: 30d699b9edc014c6076a9649f849bd3c4588d4ab

* add perf cfg validity test to TestSysDbRecord * remove debug prints * removing invalid entries from all perf dbs * VACUUM sqlite * Squashed 'fin/' changes from 53d2563fe..30d699b9e 30d699b9e Perf Eval Update (#60) 3535b948c PerfCompile and PerfEval changes (#59) de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56) 6924286a2 miopen hash update (#55) 530399575 Refactor googletest infra to align with MIOpen (#53) 71c50d146 Datatype fix for BN (#57) 8abe2f5c6 Perf Eval updates, Add find info (#51) e1c1ef0f5 filter find compile by solver input (#54) 722feea66 sp/chk precomp kernel 264 (#41) b9aba2034 Batch norm find compile (#50) 359f3da80 Fix missing link directives in fin binary (#48) a4020c1ba Cache Miss Fixes (#46) 2ec7ef44d Enable google test and compiling fin in the CI (#47) 8b6b453bc Applicability support for batch norm (#45) 44323aae9 Perf compile/eval for fin (#42) ebd9aa6bd update member name (#43) d6d798efe add cu count (#39) 8e1989a9f Add find option for selecting only dynamic solvers (#38) 0e164bf66 setting json version (#37) f3f7fed18 Remove function redefinition (#36) e1de51a58 Performance DB de-serialize test (#34) 043cdcdaa Layout support in Fin (#33) 3a1d58236 Hotfix (#32) ee3f0d543 4.4 Tuning Bugfixes (#31) 832dbe234 Tunability Reporting (#27) a564a229f include gfx90a_110 (#28) git-subtree-dir: fin git-subtree-split: 30d699b9edc014c6076a9649f849bd3c4588d4ab * Squashed 'fin/' changes from 30d699b9e..ea5c844af ea5c844af fix direction test 3aa412ee1 Update to use revised testSysDbRecord miopen function git-subtree-dir: fin git-subtree-split: ea5c844aff8b5d46537aa59034a596fd15cd9e1e * rename pipe step * Squashed 'fin/' changes from ea5c844af..c702cb968 c702cb968 format git-subtree-dir: fin git-subtree-split: c702cb96800a03b17ee17d03a015dfa38e3883b9 * Squashed 'fin/' changes from c702cb968..d5397abd3 d5397abd3 rename targets git-subtree-dir: fin git-subtree-split: d5397abd37b6908bcd96ef750ea5a3ace04cdf3c * rename archive Co-authored-by: Jun Liu <[email protected]>

e05dcb421 perf db validation fix (#68) 260d9465d Add INT8 as a data_type v2 (#67) b6a5b2a77 sync with fin folder in miopen (#62) 0e03399ec prep for Palamida scan (#63) e6bd05c33 Performance db testing (#61) 30d699b9e Perf Eval Update (#60) 3535b948c PerfCompile and PerfEval changes (#59) de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56) 6924286a2 miopen hash update (#55) 530399575 Refactor googletest infra to align with MIOpen (#53) 71c50d146 Datatype fix for BN (#57) 8abe2f5c6 Perf Eval updates, Add find info (#51) e1c1ef0f5 filter find compile by solver input (#54) 722feea66 sp/chk precomp kernel 264 (#41) b9aba2034 Batch norm find compile (#50) 359f3da80 Fix missing link directives in fin binary (#48) a4020c1ba Cache Miss Fixes (#46) 2ec7ef44d Enable google test and compiling fin in the CI (#47) 8b6b453bc Applicability support for batch norm (#45) 44323aae9 Perf compile/eval for fin (#42) ebd9aa6bd update member name (#43) d6d798efe add cu count (#39) 8e1989a9f Add find option for selecting only dynamic solvers (#38) 0e164bf66 setting json version (#37) f3f7fed18 Remove function redefinition (#36) e1de51a58 Performance DB de-serialize test (#34) 043cdcdaa Layout support in Fin (#33) 3a1d58236 Hotfix (#32) ee3f0d543 4.4 Tuning Bugfixes (#31) 832dbe234 Tunability Reporting (#27) a564a229f include gfx90a_110 (#28) git-subtree-dir: fin git-subtree-split: e05dcb42187f05fe0d0d1b05b822dc4b750f199e

* remove datatype 0,1 from perf_db * rm invalid fp16 entries from pdb * Squashed 'fin/' changes from 53d2563fe..e05dcb421 e05dcb421 perf db validation fix (#68) 260d9465d Add INT8 as a data_type v2 (#67) b6a5b2a77 sync with fin folder in miopen (#62) 0e03399ec prep for Palamida scan (#63) e6bd05c33 Performance db testing (#61) 30d699b9e Perf Eval Update (#60) 3535b948c PerfCompile and PerfEval changes (#59) de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56) 6924286a2 miopen hash update (#55) 530399575 Refactor googletest infra to align with MIOpen (#53) 71c50d146 Datatype fix for BN (#57) 8abe2f5c6 Perf Eval updates, Add find info (#51) e1c1ef0f5 filter find compile by solver input (#54) 722feea66 sp/chk precomp kernel 264 (#41) b9aba2034 Batch norm find compile (#50) 359f3da80 Fix missing link directives in fin binary (#48) a4020c1ba Cache Miss Fixes (#46) 2ec7ef44d Enable google test and compiling fin in the CI (#47) 8b6b453bc Applicability support for batch norm (#45) 44323aae9 Perf compile/eval for fin (#42) ebd9aa6bd update member name (#43) d6d798efe add cu count (#39) 8e1989a9f Add find option for selecting only dynamic solvers (#38) 0e164bf66 setting json version (#37) f3f7fed18 Remove function redefinition (#36) e1de51a58 Performance DB de-serialize test (#34) 043cdcdaa Layout support in Fin (#33) 3a1d58236 Hotfix (#32) ee3f0d543 4.4 Tuning Bugfixes (#31) 832dbe234 Tunability Reporting (#27) a564a229f include gfx90a_110 (#28) git-subtree-dir: fin git-subtree-split: e05dcb42187f05fe0d0d1b05b822dc4b750f199e * fix clang-format issue Co-authored-by: Jun Liu <[email protected]>

49e3e3a62 clang format db80b1777 update to using TestPerfCfgParams for pdb validity checks e48a4fd3a format a4f85842c exception for non-tunable solvers in params check d58c42bbd Check params at end of perf tuning (#70) 1a3b47c7b Return status for failed compile commands (#69) d59962752 out_layout -> in_layout 6ba7a8f3f Rename conv_mode to mode (#64) 513a3da1b [bg/LWPTUNA-173] (#65) e05dcb421 perf db validation fix (#68) 260d9465d Add INT8 as a data_type v2 (#67) b6a5b2a77 sync with fin folder in miopen (#62) 0e03399ec prep for Palamida scan (#63) e6bd05c33 Performance db testing (#61) 30d699b9e Perf Eval Update (#60) 3535b948c PerfCompile and PerfEval changes (#59) de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56) 6924286a2 miopen hash update (#55) 530399575 Refactor googletest infra to align with MIOpen (#53) 71c50d146 Datatype fix for BN (#57) 8abe2f5c6 Perf Eval updates, Add find info (#51) e1c1ef0f5 filter find compile by solver input (#54) 722feea66 sp/chk precomp kernel 264 (#41) b9aba2034 Batch norm find compile (#50) 359f3da80 Fix missing link directives in fin binary (#48) a4020c1ba Cache Miss Fixes (#46) 2ec7ef44d Enable google test and compiling fin in the CI (#47) 8b6b453bc Applicability support for batch norm (#45) 44323aae9 Perf compile/eval for fin (#42) ebd9aa6bd update member name (#43) d6d798efe add cu count (#39) 8e1989a9f Add find option for selecting only dynamic solvers (#38) 0e164bf66 setting json version (#37) f3f7fed18 Remove function redefinition (#36) e1de51a58 Performance DB de-serialize test (#34) 043cdcdaa Layout support in Fin (#33) 3a1d58236 Hotfix (#32) ee3f0d543 4.4 Tuning Bugfixes (#31) 832dbe234 Tunability Reporting (#27) a564a229f include gfx90a_110 (#28) git-subtree-dir: fin git-subtree-split: 49e3e3a62a7cc54adacbeea95680d35f9a4685de

odellus changed the title ~~OOM when there should be space~~ OOM when there should be space? May 26, 2018

odellus changed the title ~~OOM when there should be space?~~ Throwing segfaults like rice at a wedding May 27, 2018

odellus closed this as completed May 27, 2018

alexandraBara mentioned this issue Sep 11, 2020

Solver generic_search fail: ConvHipImplicitGemmBwdDataV1R1Xdlops and ConvHipImplicitGemmForwardV4R4Xdlops #427

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throwing segfaults like rice at a wedding #41

Throwing segfaults like rice at a wedding #41

odellus commented May 26, 2018 •

edited

Loading

odellus commented May 26, 2018 •

edited

Loading

odellus commented May 26, 2018 •

edited

Loading

daniellowell commented May 26, 2018 •

edited

Loading

daniellowell commented May 26, 2018

odellus commented May 26, 2018 via email

parallelo commented May 27, 2018

parallelo commented May 27, 2018

odellus commented May 27, 2018 via email

odellus commented May 27, 2018 •

edited

Loading

parallelo commented May 28, 2018 •

edited

Loading

Throwing segfaults like rice at a wedding #41

Throwing segfaults like rice at a wedding #41

Comments

odellus commented May 26, 2018 • edited Loading

odellus commented May 26, 2018 • edited Loading

odellus commented May 26, 2018 • edited Loading

daniellowell commented May 26, 2018 • edited Loading

daniellowell commented May 26, 2018

odellus commented May 26, 2018 via email

parallelo commented May 27, 2018

parallelo commented May 27, 2018

odellus commented May 27, 2018 via email

odellus commented May 27, 2018 • edited Loading

parallelo commented May 28, 2018 • edited Loading

odellus commented May 26, 2018 •

edited

Loading

odellus commented May 26, 2018 •

edited

Loading

odellus commented May 26, 2018 •

edited

Loading

daniellowell commented May 26, 2018 •

edited

Loading

odellus commented May 27, 2018 •

edited

Loading

parallelo commented May 28, 2018 •

edited

Loading