Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throwing segfaults like rice at a wedding #41

Closed
odellus opened this issue May 26, 2018 · 10 comments
Closed

Throwing segfaults like rice at a wedding #41

odellus opened this issue May 26, 2018 · 10 comments

Comments

@odellus
Copy link

odellus commented May 26, 2018

So I'm running https://github.com/NLPLearn/QANet with tensorflow-upstream and I've had to cut my batch_size down to nothing to fit the model onto the GPU.

This is what happens when I try to train the model:

Building model...
WARNING:tensorflow:From /home/thomas/projects/qas/QANet/layers.py:52: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /home/thomas/projects/qas/QANet/model.py:134: calling softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.
Instructions for updating:
dim is deprecated, use axis instead
WARNING:tensorflow:From /home/thomas/projects/qas/QANet/model.py:174: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

Total number of trainable parameters: 788673
2018-05-25 22:40:06.553366: W tensorflow/stream_executor/rocm/rocm_driver.cc:404] creating context when one is currently active; existing: 0x7f3818e3d580
2018-05-25 22:40:06.553526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1451] Found device 0 with properties: 
name: Device 67df
AMDGPU ISA: gfx803
memoryClockRate (GHz) 1.34
pciBusID 0000:09:00.0
Total memory: 8.00GiB
Free memory: 7.75GiB
2018-05-25 22:40:06.553537: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1562] Adding visible gpu devices: 0
2018-05-25 22:40:06.553548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-25 22:40:06.553567: I tensorflow/core/common_runtime/gpu/gpu_device.cc:995]      0 
2018-05-25 22:40:06.553575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1008] 0:   N 
2018-05-25 22:40:06.553612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1124] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7539 MB memory) -> physical GPU (device: 0, name: Device 67df, pci bus id: 0000:09:00.0)
  0%|                                                 | 0/60000 [00:00<?, ?it/s]2018-05-25 22:40:41.253877: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:41.253878: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:42.476336: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:40:43.035086: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:40:45.046033: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:45.047533: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:46.301007: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:40:46.983404: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:40:47.838168: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:48.067349: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:40:49.404750: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:40:49.955002: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
Memory access fault by GPU node-1 on address 0x58a404000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)

The authors of the project are using a similar sized GPU (though twice as much desktop RAM/not sure if this is the problem) and aren't having to drop their batch size down around 4 to fit the model on their GPU.
From localminimum/QANet#2
"""
Hi @kamalkraj I uploaded the most recent model pretrained weights (EM/F1 = 70.0/79.4) and you can download it here.

The specification of the system I used is:
CPU: i7-3930K CPU @ 3.20GHz
GPU: GTX1080 (8GB)
RAM: 16GB
Training takes about 5~8 hours depending on your gpu/cpu spec. The model takes about 8 GB gpu memory so if you're using anything bigger than 96 as your hidden unit size then you'll get an OOM error. Or if you are using a preoccupied GPU it will also cause an OOM error.

NOTE: If you are using your desktop GPU, try running it in terminal mode (alt + ctrl + F1) and close all applications that require gpu memory (e.g. Xorg)
sudo service lightdm stop
python config.py --mode train
after training,
sudo service lightdm start
"""
I followed the advice to shut down all the other applications and just use terminal too. Won't fit. Any idea why this is happening? My RX580 is supposed to have the same amount of memory. Curious as to what's going on 😕

@odellus
Copy link
Author

odellus commented May 26, 2018

This is with batch_size 8.

python3 config.py --mode train
Building model...
WARNING:tensorflow:From /home/thomas/projects/qas/QANet/layers.py:52: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /home/thomas/projects/qas/QANet/model.py:134: calling softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.
Instructions for updating:
dim is deprecated, use axis instead
WARNING:tensorflow:From /home/thomas/projects/qas/QANet/model.py:174: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

Total number of trainable parameters: 788673
2018-05-25 22:46:33.509859: W tensorflow/stream_executor/rocm/rocm_driver.cc:404] creating context when one is currently active; existing: 0x7f1ea390d8b0
2018-05-25 22:46:33.510010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1451] Found device 0 with properties: 
name: Device 67df
AMDGPU ISA: gfx803
memoryClockRate (GHz) 1.34
pciBusID 0000:09:00.0
Total memory: 8.00GiB
Free memory: 7.75GiB
2018-05-25 22:46:33.510022: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1562] Adding visible gpu devices: 0
2018-05-25 22:46:33.510033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-25 22:46:33.510039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:995]      0 
2018-05-25 22:46:33.510044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1008] 0:   N 
2018-05-25 22:46:33.510074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1124] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7539 MB memory) -> physical GPU (device: 0, name: Device 67df, pci bus id: 0000:09:00.0)
  0%|                                                 | 0/60000 [00:00<?, ?it/s]2018-05-25 22:47:11.700225: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:12.282363: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:12.955994: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:12.957020: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:13.599275: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:13.825643: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:15.644044: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:16.185436: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:17.826536: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:18.565328: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:19.299351: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:19.857104: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:20.471238: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:21.041978: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:21.698886: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:21.916486: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:22.196821: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:22.753152: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:23.459215: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:23.549742: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:23.628027: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:24.360281: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:47:25.088926: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:47:25.229095: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
  0%|                                     | 1/60000 [00:45<753:04:20, 45.19s/it]2018-05-25 22:47:29.739454: I .
.
.
.
  0%|                                     | 7/60000 [02:08<306:56:10, 18.42s/it]2018-05-25 22:48:53.226017: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:48:53.227340: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:48:53.853328: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:48:55.293085: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:48:56.867052: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:48:57.400217: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:48:58.034644: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
2018-05-25 22:48:58.261886: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:48:59.258544: I tensorflow/core/kernels/conv_grad_filter_ops.cc:959] running auto-tune for Backward-Filter
2018-05-25 22:48:59.912278: I tensorflow/core/kernels/conv_grad_input_ops.cc:1007] running auto-tune for Backward-Data
Memory access fault by GPU node-1 on address 0x54be18000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)

Any idea what's going on? All the tests of MIOpen passed, am I in the right place? Is this a HIP issue? Any help or pointing in the right direction sure would be appreciated a whole lot.

@odellus
Copy link
Author

odellus commented May 26, 2018

Background:
My rocminfo:

 rocminfo
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (number of timestamp)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 3 1300X Quad-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0                                  
  Queue Min Size:          0                                  
  Queue Max Size:          0                                  
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768KB                            
  Chip ID:                 0                                  
  Cacheline Size:          64                                 
  Max Clock Frequency (MHz):3500                               
  BDFID:                   0                                  
  Compute Unit:            4                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    8176312KB                          
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8176312KB                          
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx803                             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128                                
  Queue Min Size:          4096                               
  Queue Max Size:          131072                             
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16KB                               
  Chip ID:                 26591                              
  Cacheline Size:          64                                 
  Max Clock Frequency (MHz):1340                               
  BDFID:                   2304                               
  Compute Unit:            36                                 
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          64                                 
  Workgroup Max Size:      1024                               
  Workgroup Max Size Per Dimension:
    Dim[0]:                  67109888                           
    Dim[1]:                  150995968                          
    Dim[2]:                  0                                  
  Grid Max Size:           4294967295                         
  Waves Per CU:            40                                 
  Max Work-item Per CU:    2560                               
  Grid Max Size per Dimension:
    Dim[0]:                  4294967295                         
    Dim[1]:                  4294967295                         
    Dim[2]:                  4294967295                         
  Max number Of fbarriers Per Workgroup:32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8388608KB                          
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64KB                               
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Acessible by all:        FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    AMD:AMDGPU:8:0:3                   
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Dimension: 
        Dim[0]:                  67109888                           
        Dim[1]:                  1024                               
        Dim[2]:                  16777217                           
      Workgroup Max Size:      1024                               
      Grid Max Dimension:      
        x                        4294967295                         
        y                        4294967295                         
        z                        4294967295                         
      Grid Max Size:           4294967295                         
      FBarrier Max Size:       32                                 
*** Done ***

I installed ROCm from binaries a few weeks ago.

I've built MIOpen from source after configuring with CXX=/opt/rocm/hcc/bin/hcc cmake -DMIOPEN_BACKEND=HIP -DMIOPEN_MAKE_BOOST_PUBLIC=ON -DCMAKE_PREFIX_PATH="/opt/rocm/hcc;/opt/rocm/hip" -DCMAKE_CXX_FLAGS="-isystem /usr/include/x86_64-linux-gnu/" -DHALF_INCLUDE_DIR=/home/thomas/code/half/include ..

I would think that this shouldn't be happening since it's a static computational graph we're talking about here, same model for each batch, and if you can allocate the space in memory you need for the data on a single batch, it should be the same for each batch and thus shouldn't be an issue for successive batches so I'm confused.

@odellus odellus changed the title OOM when there should be space OOM when there should be space? May 26, 2018
@daniellowell
Copy link
Contributor

daniellowell commented May 26, 2018

@odellus

Memory access fault by GPU node-1 on address 0x54be18000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)

It is not that your GPU is running out of memory. That is a GPU equivalent of a segmentation fault. Let us take a look at this network and see which layer is causing it.

@daniellowell
Copy link
Contributor

@odellus What version of Tensorflow and which balranch did you pull from?

@odellus
Copy link
Author

odellus commented May 26, 2018 via email

@parallelo
Copy link
Contributor

Here are some environment variables that might help point us to a culprit. Just set these and collect all of the verbose output from running your workload.

export HCC_SERIALIZE_KERNEL=0x3
export HCC_SERIALIZE_COPY=0x3
export HIP_TRACE_API=0x3

Also, are you be able to reproduce via one of our pre-built Docker containers using TF1.3? When running with a 1.7.2 rock-dkms on the host, you can try something like this:

docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/tensorflow:rocm1.7.2

@parallelo
Copy link
Contributor

Btw/ this ticket appears to be better suited to one of these projects:

@odellus
Copy link
Author

odellus commented May 27, 2018 via email

@odellus odellus changed the title OOM when there should be space? Throwing segfaults like rice at a wedding May 27, 2018
@odellus odellus closed this as completed May 27, 2018
@odellus
Copy link
Author

odellus commented May 27, 2018

Here's truncated output from setting those system debugging flags and running QANet
https://gist.github.com/odellus/87e2a382c5d6967b3b48ea5fbdf566a6

@parallelo
Copy link
Contributor

parallelo commented May 28, 2018

Just to clarify, I was trying to connect you with the folks who most often deal with framework-level triage. At this point, we don't know whether this particular issue is an MIOpen library problem, or if it is somewhere else in the stack. In this circumstance, the frameworks team often does initial triage.

Edit: Thanks for opening a ticket with the tensorflow repo, we'll take a look.

cderb added a commit that referenced this issue May 16, 2022
722feea66 sp/chk precomp kernel 264 (#41)
b9aba2034 Batch norm find compile (#50)
359f3da80 Fix missing link directives in fin binary (#48)
a4020c1ba Cache Miss Fixes (#46)
2ec7ef44d Enable google test and compiling fin in the CI (#47)
8b6b453bc Applicability support for batch norm (#45)
44323aae9 Perf compile/eval for fin (#42)
REVERT: a30a51bc6 remove unused header
REVERT: 7d2fd834c reduce scope of variable
REVERT: f6e9abe79 clang format
REVERT: 834e9a397 remove comment
REVERT: c8d6eb1a0 workspace rename
REVERT: aa7d2ea24 Merge remote-tracking branch 'origin/develop' into cderb/miopen_perf
REVERT: aaf13fb12 add to print for debug
REVERT: 34e11fa70 Merge remote-tracking branch 'origin/develop' into cderb/miopen_perf
REVERT: cb6c19d13 add search+update directives to execution context, add json examples for perf eval
REVERT: 85029077b connecting new fin functions for perf eval
REVERT: 4d1e031fd add outputs and definitions
REVERT: 952538cb8 adding perf eval function, in progress
REVERT: 617dccd9c rename
REVERT: 5c35ae886 fixes for collecting kernel blobs
REVERT: 5cfea7c43 syntax fixes
REVERT: 2f2a4ed9f add test file
REVERT: 7175019f5 first rendition of perf_compile

git-subtree-dir: fin
git-subtree-split: 722feea660e2e3d7f8e1edcc520a938be4885a44
cderb added a commit that referenced this issue Aug 3, 2022
30d699b9e Perf Eval Update (#60)
3535b948c PerfCompile and PerfEval changes (#59)
de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56)
6924286a2 miopen hash update (#55)
530399575 Refactor googletest infra to align with MIOpen (#53)
71c50d146 Datatype fix for BN (#57)
8abe2f5c6 Perf Eval updates, Add find info (#51)
e1c1ef0f5 filter find compile by solver input (#54)
722feea66 sp/chk precomp kernel 264 (#41)
b9aba2034 Batch norm find compile (#50)
359f3da80 Fix missing link directives in fin binary (#48)
a4020c1ba Cache Miss Fixes (#46)
2ec7ef44d Enable google test and compiling fin in the CI (#47)
8b6b453bc Applicability support for batch norm (#45)
44323aae9 Perf compile/eval for fin (#42)
ebd9aa6bd update member name (#43)
d6d798efe add cu count (#39)
8e1989a9f Add find option for selecting only dynamic solvers (#38)
0e164bf66 setting json version (#37)
f3f7fed18 Remove function redefinition (#36)
e1de51a58 Performance DB de-serialize test (#34)
043cdcdaa Layout support in Fin (#33)
3a1d58236 Hotfix (#32)
ee3f0d543 4.4 Tuning Bugfixes (#31)
832dbe234 Tunability Reporting (#27)
a564a229f include gfx90a_110 (#28)

git-subtree-dir: fin
git-subtree-split: 30d699b9edc014c6076a9649f849bd3c4588d4ab
averinevg pushed a commit that referenced this issue Aug 19, 2022
* add perf cfg validity test to TestSysDbRecord

* remove debug prints

* removing invalid entries from all perf dbs

* VACUUM sqlite

* Squashed 'fin/' changes from 53d2563fe..30d699b9e

30d699b9e Perf Eval Update (#60)
3535b948c PerfCompile and PerfEval changes (#59)
de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56)
6924286a2 miopen hash update (#55)
530399575 Refactor googletest infra to align with MIOpen (#53)
71c50d146 Datatype fix for BN (#57)
8abe2f5c6 Perf Eval updates, Add find info (#51)
e1c1ef0f5 filter find compile by solver input (#54)
722feea66 sp/chk precomp kernel 264 (#41)
b9aba2034 Batch norm find compile (#50)
359f3da80 Fix missing link directives in fin binary (#48)
a4020c1ba Cache Miss Fixes (#46)
2ec7ef44d Enable google test and compiling fin in the CI (#47)
8b6b453bc Applicability support for batch norm (#45)
44323aae9 Perf compile/eval for fin (#42)
ebd9aa6bd update member name (#43)
d6d798efe add cu count (#39)
8e1989a9f Add find option for selecting only dynamic solvers (#38)
0e164bf66 setting json version (#37)
f3f7fed18 Remove function redefinition (#36)
e1de51a58 Performance DB de-serialize test (#34)
043cdcdaa Layout support in Fin (#33)
3a1d58236 Hotfix (#32)
ee3f0d543 4.4 Tuning Bugfixes (#31)
832dbe234 Tunability Reporting (#27)
a564a229f include gfx90a_110 (#28)

git-subtree-dir: fin
git-subtree-split: 30d699b9edc014c6076a9649f849bd3c4588d4ab

* Squashed 'fin/' changes from 30d699b9e..ea5c844af

ea5c844af fix direction test
3aa412ee1 Update to use revised testSysDbRecord miopen function

git-subtree-dir: fin
git-subtree-split: ea5c844aff8b5d46537aa59034a596fd15cd9e1e

* rename pipe step

* Squashed 'fin/' changes from ea5c844af..c702cb968

c702cb968 format

git-subtree-dir: fin
git-subtree-split: c702cb96800a03b17ee17d03a015dfa38e3883b9

* Squashed 'fin/' changes from c702cb968..d5397abd3

d5397abd3 rename targets

git-subtree-dir: fin
git-subtree-split: d5397abd37b6908bcd96ef750ea5a3ace04cdf3c

* rename archive

Co-authored-by: Jun Liu <[email protected]>
cderb added a commit that referenced this issue Oct 5, 2022
e05dcb421 perf db validation fix (#68)
260d9465d Add INT8 as a data_type v2 (#67)
b6a5b2a77 sync with fin folder in miopen (#62)
0e03399ec prep for Palamida scan (#63)
e6bd05c33 Performance db testing (#61)
30d699b9e Perf Eval Update (#60)
3535b948c PerfCompile and PerfEval changes (#59)
de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56)
6924286a2 miopen hash update (#55)
530399575 Refactor googletest infra to align with MIOpen (#53)
71c50d146 Datatype fix for BN (#57)
8abe2f5c6 Perf Eval updates, Add find info (#51)
e1c1ef0f5 filter find compile by solver input (#54)
722feea66 sp/chk precomp kernel 264 (#41)
b9aba2034 Batch norm find compile (#50)
359f3da80 Fix missing link directives in fin binary (#48)
a4020c1ba Cache Miss Fixes (#46)
2ec7ef44d Enable google test and compiling fin in the CI (#47)
8b6b453bc Applicability support for batch norm (#45)
44323aae9 Perf compile/eval for fin (#42)
ebd9aa6bd update member name (#43)
d6d798efe add cu count (#39)
8e1989a9f Add find option for selecting only dynamic solvers (#38)
0e164bf66 setting json version (#37)
f3f7fed18 Remove function redefinition (#36)
e1de51a58 Performance DB de-serialize test (#34)
043cdcdaa Layout support in Fin (#33)
3a1d58236 Hotfix (#32)
ee3f0d543 4.4 Tuning Bugfixes (#31)
832dbe234 Tunability Reporting (#27)
a564a229f include gfx90a_110 (#28)

git-subtree-dir: fin
git-subtree-split: e05dcb42187f05fe0d0d1b05b822dc4b750f199e
junliume added a commit that referenced this issue Oct 6, 2022
* remove datatype 0,1 from perf_db

* rm invalid fp16 entries from pdb

* Squashed 'fin/' changes from 53d2563fe..e05dcb421

e05dcb421 perf db validation fix (#68)
260d9465d Add INT8 as a data_type v2 (#67)
b6a5b2a77 sync with fin folder in miopen (#62)
0e03399ec prep for Palamida scan (#63)
e6bd05c33 Performance db testing (#61)
30d699b9e Perf Eval Update (#60)
3535b948c PerfCompile and PerfEval changes (#59)
de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56)
6924286a2 miopen hash update (#55)
530399575 Refactor googletest infra to align with MIOpen (#53)
71c50d146 Datatype fix for BN (#57)
8abe2f5c6 Perf Eval updates, Add find info (#51)
e1c1ef0f5 filter find compile by solver input (#54)
722feea66 sp/chk precomp kernel 264 (#41)
b9aba2034 Batch norm find compile (#50)
359f3da80 Fix missing link directives in fin binary (#48)
a4020c1ba Cache Miss Fixes (#46)
2ec7ef44d Enable google test and compiling fin in the CI (#47)
8b6b453bc Applicability support for batch norm (#45)
44323aae9 Perf compile/eval for fin (#42)
ebd9aa6bd update member name (#43)
d6d798efe add cu count (#39)
8e1989a9f Add find option for selecting only dynamic solvers (#38)
0e164bf66 setting json version (#37)
f3f7fed18 Remove function redefinition (#36)
e1de51a58 Performance DB de-serialize test (#34)
043cdcdaa Layout support in Fin (#33)
3a1d58236 Hotfix (#32)
ee3f0d543 4.4 Tuning Bugfixes (#31)
832dbe234 Tunability Reporting (#27)
a564a229f include gfx90a_110 (#28)

git-subtree-dir: fin
git-subtree-split: e05dcb42187f05fe0d0d1b05b822dc4b750f199e

* fix clang-format issue

Co-authored-by: Jun Liu <[email protected]>
cderb added a commit that referenced this issue Nov 21, 2022
49e3e3a62 clang format
db80b1777 update to using TestPerfCfgParams for pdb validity checks
e48a4fd3a format
a4f85842c exception for non-tunable solvers in params check
d58c42bbd Check params at end of perf tuning (#70)
1a3b47c7b Return status for failed compile commands (#69)
d59962752 out_layout -> in_layout
6ba7a8f3f Rename conv_mode to mode (#64)
513a3da1b [bg/LWPTUNA-173] (#65)
e05dcb421 perf db validation fix (#68)
260d9465d Add INT8 as a data_type v2 (#67)
b6a5b2a77 sync with fin folder in miopen (#62)
0e03399ec prep for Palamida scan (#63)
e6bd05c33 Performance db testing (#61)
30d699b9e Perf Eval Update (#60)
3535b948c PerfCompile and PerfEval changes (#59)
de79468d2 remove unneccessary solution check, add check for previously modified kernel names (#56)
6924286a2 miopen hash update (#55)
530399575 Refactor googletest infra to align with MIOpen (#53)
71c50d146 Datatype fix for BN (#57)
8abe2f5c6 Perf Eval updates, Add find info (#51)
e1c1ef0f5 filter find compile by solver input (#54)
722feea66 sp/chk precomp kernel 264 (#41)
b9aba2034 Batch norm find compile (#50)
359f3da80 Fix missing link directives in fin binary (#48)
a4020c1ba Cache Miss Fixes (#46)
2ec7ef44d Enable google test and compiling fin in the CI (#47)
8b6b453bc Applicability support for batch norm (#45)
44323aae9 Perf compile/eval for fin (#42)
ebd9aa6bd update member name (#43)
d6d798efe add cu count (#39)
8e1989a9f Add find option for selecting only dynamic solvers (#38)
0e164bf66 setting json version (#37)
f3f7fed18 Remove function redefinition (#36)
e1de51a58 Performance DB de-serialize test (#34)
043cdcdaa Layout support in Fin (#33)
3a1d58236 Hotfix (#32)
ee3f0d543 4.4 Tuning Bugfixes (#31)
832dbe234 Tunability Reporting (#27)
a564a229f include gfx90a_110 (#28)

git-subtree-dir: fin
git-subtree-split: 49e3e3a62a7cc54adacbeea95680d35f9a4685de
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants