芯片设计 » History » Version 2
jun chen, 08/14/2025 04:50 PM
1 | 1 | jun chen | # 芯片设计 |
---|---|---|---|
2 | |||
3 | : |
||
4 | | |Goal |NoGoal | |
||
5 | |--|--|--| |
||
6 | | GPGPU 目标 | high-performance parallel computing, Rendering, Raytracing | | |
||
7 | | Standards | OpenCl 1.2, CUDA, Vulcan, Direct3D, OpenGL, OpenCL 3. SPIRV | RISC-V , OpenCL 2 | |
||
8 | | Performance | Performance | area | |
||
9 | | | | | |
||
10 | |||
11 | |||
12 | ## data model |
||
13 | |||
14 | dem (data element) 是GPGPU处理的最小单位,一个dem可以是 8, 16,32, 64 bits. |
||
15 | dem vector: 可以由 4 X dem8, 2 X dem16, 1 X dem32, 1 X dem64 组成 |
||
16 | dem array: 由m个dem array 组成, 一个array可能是 8 ~ 256 bit,有 m 行 n 列组成。其中 n 是每个vector所包含的 dem 个数, m 是vector个数 |
||
17 | dem fiber: 按照列进行区分?在dem16 两列 dem array场景下,low 16bits is low dem, high 16 bits is high dem |
||
18 | |||
19 | single thread 处理一个 dem vector, 一个dem vector包含多个 dems |
||
20 | 在gpgpu 处理时,数据(matrix/vector) 先存储到 TLRs,再计算。dem32 以下占一个register。dem64占两个register。 |
||
21 | 因此一个 m行n列的 dem array,无论用 dem8, dem16, dem32,都会占满一个reg 一行 |
||
22 | |||
23 | ## data process hierarchy: |
||
24 | |||
25 | ``` |
||
26 | GPU 系统包含多个 "计算节点 (nodes) -> 对应 node-wide thread group (NTG), 虚拟node" |
||
27 | 每个计算节点包含多个 "device -> 对应 device-wide thread group (DTG), 虚拟device" |
||
28 | 每个device包含多个 "very wide compute units (VCUs) -> 对应 VCU wide thread group (VTG)" |
||
29 | 每个VCU 包含多个 "wide compute units (WCUs) -> 对应 wide thread group (WTG)" |
||
30 | 每个WCU包含多个 "basic compute units (CUs) -> 对应 thread group (TG)" |
||
31 | 每个CU 包含多个 "execution units (EUs) -> 对应一个wrap" |
||
32 | 2 | jun chen | 每个EU 包含多个 "processing lane -> 对应一个thread,处理一个 dem vector" |
33 | ``` |
||
34 | |||
35 | ## data memory model |
||
36 | |||
37 | register file level: |
||
38 | |||
39 | ``` |
||
40 | Thread-local registers (TLR) |
||
41 | Closed segment regisers (x0) |
||
42 | Staging registers (g0-g15) |
||
43 | Wrap scalar registers (WSR) |
||
44 | Constant scalar regisers (CSR) |
||
45 | Group shared memory (GSM) |
||
46 | GEMM input buffer (GIB) |
||
47 | GEMM main buffer (GMB) |
||
48 | GEMM reduction buffer (GRB) |
||
49 | ``` |
||
50 | |||
51 | cache level |
||
52 | ``` |
||
53 | L1 cache |
||
54 | L2 cache |
||
55 | ``` |
||
56 | |||
57 | HBM level |
||
58 | ``` |
||
59 | buffer memory objects |
||
60 | image memory objects |
||
61 | matrix memory objects |
||
62 | global memory buffers (GLM) |
||
63 | constant buffers (CB) |
||
64 | Thread-local memory (TLM) |
||
65 | 1 | jun chen | ``` |