













**Bank Addressing Examples** 

:

ead 15

Bank 15

L7: Memory Hierarchy IV

No Bank Conflicts

Random 1:1 Permutation

Bank 15

UNIVERSIT







/NVIDIA and Wen-mei W. Hwu, 2007 Iniversity of Illinois, Urbana-Champaign 15 L7: Memory Hierarchy IV



## 4



|        | How to Map Jacobi to GPU (Tiling)                                                                                                                                                                                                                                                                                                                                                   |
|--------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|        | for (i=1; i <n; i++)<br="">for (j=1; j<n; j++)<br=""><math>b[i][j] = 0.5^{*}(a[i+1][j] + a[i-1][j] + a[i][j+1] + a[i][j-1]);</math></n;></n;>                                                                                                                                                                                                                                       |
|        | TILED SEQUENTIAL CODE<br>// For clarity, assume n is evenly divisible by TX and TY<br>for (i=1; i<(n/TX); i++) // MAP TO blockldx.y<br>for (x=0; x <tx; blockldx.y<br="" map="" to="" x++)="">for (y=0; y<ty; map="" possibly,="" threadldx.y<br="" to="" y++)="">b[TX*i+x][TY*j+y] = 0.5*(a[TX*i+x+2][TY*j+y+1] +<br/>a[TX*i+x+1][TY*j+y+2] +<br/>a[TX*i+x+1][TY*j+y];</ty;></tx;> |
| CS6963 | 19<br>L7: Memory Hierarchy IV                                                                                                                                                                                                                                                                                                                                                       |

















| How to Get Compiler Feedback                                                |
|-----------------------------------------------------------------------------|
| How many registers and shared memory does my code<br>use?                   |
| \$ nvccptxas-options=-v \                                                   |
| -I/Developer/CUDA/common/inc \                                              |
| -L/Developer/CUDA/lib mmul.cu -lcutil                                       |
| Returns:                                                                    |
| ptxas info :Compiling entry function                                        |
| 'globfuncZ12mmul_computePfS_S_i'                                            |
| ptxas info     : Used 9 registers, 2080+1056 bytes smem,<br>8 bytes cmem[1] |
| 6963 28<br>L7: Memory Hierarchy IV                                          |
|                                                                             |

## 7





