

IBM Systems & Technology Group Cell/Quasar Ecosystem & Solutions Enablement

# **SPU Dynamic Profiling**

#### Cell Programming Workshop Cell/Quasar Ecosystem & Solutions Enablement

Cell Programming Workshop

# **Course Objectives**

 To familiarize with the SPU dynamic profiling capability provided by the IBM full system simulator

**Trademarks** - Cell Broadband Engine and Cell Broadband Engine Architecture are trademarks of Sony Computer Entertainment, Inc.

# **Course Agenda**

- SPU performance model
- SPU pipeline statistics
- SPE visualizer
- Local Store stats

#### SPU Dynamic Performance Profile Checkpoints

```
#include "profile.h"
```

```
C header to enable profiling
prof_clear(); // clear performance info
prof_start(); // start recording performance info
```

< something interesting >

prof stop(); // stop recording performance info



# **SPU Performance Model**

- Referred to in the simulator as "pipeline mode"
- Models salient SPU microarchitectural behavior
  - In-order issue
  - No register renaming
  - Software-controlled branch hints (no HW predict)
  - LS arbitration rules
  - Destructive ILB prefetch load
  - Instruction run-out



# **SPU Instruction Mappings by Class**

| Inst<br>Class | Exec<br>Pipe | Exec<br>Cycles | lssue Stall<br>Cycles | Instruction Types                                                                                                              |
|---------------|--------------|----------------|-----------------------|--------------------------------------------------------------------------------------------------------------------------------|
| BR            | odd (1)      | 4              | 0                     | Branch                                                                                                                         |
| FP6           | even (0)     | 6              | 0                     | Single precision floating point                                                                                                |
| FP7           | even (0)     | 7              | 0                     | Integer multiply, integer/float conversion, interpolate                                                                        |
| FPD           | even (0)     | 7              | 6                     | Double precision floating point                                                                                                |
| FX2           | even (0)     | 2              | 0                     | Load immediate, logical operations, integer add/subtract, sign extend, count leading zeros, select bits, carry/borrow generate |
| FX3           | even (0)     | 4              | 0                     | Element rotate/shift                                                                                                           |
| FXB           | even (0)     | 4              | 0                     | Special byte operations                                                                                                        |
| LNOP          | odd (1)      | 0              | 0                     | No-op                                                                                                                          |
| LS            | odd (1)      | 6              | 0                     | Loads/stores, branch hints                                                                                                     |
| NOP           | even (0)     | 0              | 0                     | No-op                                                                                                                          |
| SHUF          | odd (1)      | 4              | 0                     | Shuffle bytes, quadword rotate/shift, estimate, gather, form select mask, generate insertion control                           |
| SPR           | odd (1)      | 6              | 0                     | Channel operations, move to/from SPR                                                                                           |

Ð

# **Obtaining SPU Pipeline Statistics**

#### Via TCL Commands

- \$sim spu n stats print
- array set s [\$sim spu n stats export]
- \$sim spu n display statistics ...

#### Via GUI Controls



#### TBM

# SPU Stats Summary: mysim spu n stats print

| The structure count       22002564         Oral Cycle count       22002564         Oral Instruction count       250020         Stats totaled for entire execution         SPU, unaffected by prof_CP*(         Performance Crit       2.39 (2.39)         Performance Crit       2.39 (2.39)         Performance Crit       2.39 (2.39)         Performance Crit       14360         Performance Crit       1400         Init Instructions       140         Init Instructions       140         Nucl cycle       175511 (27.5%)         Wool cycle       0 (0.00)         Stats totaled in regions brackete         prof_CP30 ()       and prof_CP3         Stati due to branch miss       0 (0.00)         Stall due to preferch miss       0 (0.00)         Stall due to prefere miss       0 (0.00)         Stall due to dependency       40016 (0.00)         Parmel stall cycle       0 (0.00)         Stall due to dependency estalls)       0 (0.00)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | SPU DD3.0                             |                         |         |          |                                      |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------|-------------------------|---------|----------|--------------------------------------|
| Total CPI       2590628         Total CPI       8.55         Set Contact CPI       8.55         Set Contact CPI       8.55         Set Contact CPI       8.55         Set Contact CPI       8.53         Performance Instruction count       2573522 (2573951)         Franch instructions       143160         Branch instructions       143160         Stant Instructions       140         Into Int       142740         Sonderstatem       142800         Sonderstatem       16000         Stall due to Presonce conflict       0 ( 0.0%         Stall due to spice heards       0 ( 0.0%         Stall due to spice heards       0 ( 0.0%         Stall due to spice heards       0 ( 0.0%         Stall cycles due to dependency stalls)       0 ( 0.0%         Stall cycles due to dependency stalls)       0 ( 0.0%                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Total Cycle count                     | 22082564                |         |          | - State totaled for optime execution |
| boal CPI     4.55       Performance CPI = count     6153081       Performance CPI     2.39 (2.39)       Branch instructions     143160       Branch taken     142800       Branch taken     280       Hint instructions     140       Hint instructions     140       Statts totaled in regions bracketed       Branch taken     12740       Contention at L5 between Load/Store and Prefetch 142800       Stall due to prefetch miss     0 ( 0.00       Stall due to free dil dependency stalls)     0 ( 0.00    <                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Total Instruction count               | 2583628                 |         |          |                                      |
| <pre>verformance Cycle count d150001 verformance Lastruction count 2575522 (2572961) verformance CPI 2.39 (2.39) branch instructions 140 branch instructions 140 branch taken 122800 branch taken 142880 branch taken 200 Mint instructions 140 bint hit 142740 contention at LS between Load/Store and Prefetch 142880 branch instructions 140 bint hit 142740 contention at LS between Load/Store and Prefetch 142880 bind evole 1715121 (27.9*) bind cycle 126200 (7.0) bind cycle 200 (0.0) bind to branch miss 7500 (0.0) bind to branch miss 7500 (0.0) bind to branch miss 7500 (0.0) bind to branch miss 00 (0.0) bind to branch miss 00</pre>                                                                                                                              | Total CPI                             | 8.55                    |         |          | SPU, unaffected by prof CP* ()       |
| Performance Cycle count of 153001<br>Performance CPI 2.39 (2.39)<br>branch instructions 143160<br>branch taken 142600<br>branch taken 280<br>hint instructions 140<br>contention at L5 between Load/Store and Prefetch 142600<br>contention at L5 between Load/Store and L5 between Load/Store at L5 between Load/Store and Prefetch 142600<br>contention at L5 between Load/Store at L5 between Load/Store at L5 between L5 betwee                                                                   | * * *                                 |                         |         |          | , , <u>, , -</u>                     |
| <pre>Performance instruction count 13/3524 (23/2961) Performance CPI 2.39 (23.39) Stranch instructions 143160 Franch int taken 142080 Franch int taken 142080 Finch int taken 142740 Contention at LS between Load/Store and Prefetch 142080 Single cycle 1715121 (27.94) Wo cycle 1200 (20.00) Stall due to branch mise 7550 (0.18) Stall due to prefetch mise 0 (0.00) Stall due to prefetch mise 0 (0.00) Stall due to presource conflict 0 (0.00) Stall due to fpresource conflict 0 (0.00) Stall due to fpresource conflict 0 (0.00) Stall due to prefetch int target 120 (0.00) Stall due to prefetch mise 0 (0.00) Stall due to fpresource conflict 0 (0.00) Stall due to fpresource conflict 0 (0.00) Stall due to dependency on each pipelines Stall due to due due dependency etalle) Stall due to due due due due talle) Stall due to due to dependency etalle) Stall due to due to dependency etalle) Stall due to due to due to dependency etalle) Stall due to due to due to due to due to talle due to tall dependency etalle) Stall due to due to due to talle due to talle) Stall due to due to tall de</pre>                                                                                                                             | Performance Cycle count               | 6153081                 |         |          |                                      |
| Vertormance CP1 2.39 (2.39)<br>branch instructions 143160<br>branch taken 142880<br>branch taken 280<br>hint instructions 140<br>hint hit 142740<br>contention at LS between Load/Store and Prefetch 142880<br>Stats totaled in regions bracketed<br>prof_CP30() and prof_CP33<br>bind cycle 0 0 0.00<br>stall due to branch miss 7560 ( 0.10<br>stall due to prefetch miss 0 ( 0.00<br>stall due to prefetch miss 0 ( 0.00<br>stall due to prefetch miss 0 ( 0.00<br>stall due to dependency 4001060 ( 55.01<br>stall due to dependency 6 ( 0.00<br>stall due to vatiring for hint target 420 ( 0.00<br>brance i stall cycle 0 ( 0.00<br>brance i due cycle 0 stall 0<br>brance i stall cycle 0 ( 0.00<br>brance i due cycle 0 stall<br>brance i due cycle 0 ( 0.00<br>brance i due cycle 0 ( 0. | Performance Instruction count         | 2573522 (2572961)       |         |          |                                      |
| <pre>hanch instructions 149160 branch taken 142880 branch taken 260 this instructions 140 fint hit 142740 contention at L5 between Load/Store and Prefetch 142880 contention at L5 between Load/Store at L5 between L5</pre>                                                                                                                             | Performance CPI                       | 2.39 (2.39)             |         |          |                                      |
| <pre>branch taken 142880 branch not taken 280 that instructions 140 that instructions 140 that hit 142740 contention at L5 between Load/Store and Prefetch 142880 angle cycle 1715121 (27.94) that (27.9</pre>                                                                                                                             | Branch instructions                   | 143160                  |         |          |                                      |
| <pre>franch not taken 200 fint instructions 140 fint hit 142740 contention at L5 between Load/Store and Prefetch 142880 single cycle 1715121 ( 27.9%) Nop cycle 428920 ( 7.0%) Stall due to prefetch miss 7560 ( 0.1%) Stall due to prefetch miss 0 ( 0.0%) Stall due to dependency stalls) Stall due to dependency</pre>                                                                                                                             | Branch taken                          | 142880                  |         |          |                                      |
| <pre>hint instructions 140 hint ht 12270 contention at L5 between Load/Store and Prefetch 142880 single cycle 1715121 (27.94) Wop cycle 2000 (7.04) Stall due to prefetch miss 7560 (0.04) Stall due to prefetch miss 0 (0.04) Stall due to graph matards 0 (0.04) Stall due to dependency on each pipelines FX2 143020 (3.64 of all dependency stalls) FX3 0 (0.04) of all dependency stalls) SFR 0 (0.04) of all dependency stalls) FX5 0 (0.04) of all dependency stalls) FX5 0 (0.04) of all dependency stalls) FX5 0 (0.05) of all dependency stalls) FX5 0 (0.05) of all dependency stalls) FY5 0 (0.05) of all</pre>                                                                                                                             | Branch not taken                      | 280                     |         |          |                                      |
| Hint hit     142740       Contention at L5 between Load/Store and Prefetch 142880       Single cycle     1715121 (27.94)       Oual cycle     428920 (7.05)       Stall due to prefetch miss     0 (0.06)       Stall due to prefetch miss     0 (0.06)       Stall due to apprefetch miss     0 (0.06)       Stall due to prefetch due to prefetch scards     0 (0.06)       Stall due to prefetch due to preheards     0 (0.06)       Stall due to prefetch due to preheards     0 (0.06)       Stall due to prefetch due to preheards     0 (0.06)       Stall due to preheards     0 (0.06)       Stall due to preheards     0 (0.06)       Stall due to appendency on each prepiemes     522 (143020 (3.6% of all dependency stalls)       Stall cycles due to dependency stalls)     537880 (2.1% of all dependency stalls)       STA     0 (0.06 of all dependency stalls)       FF     2.143200 (S.66 of all dependency stalls)       FF     2.143200 (S.66 of all dependency                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Hint instructions                     | 140                     |         |          |                                      |
| Contention at L3 between Load/Store and Prefetch 142880<br>Stagle cycle 1715121 ( 27.94)<br>bual cycle 428920 ( 7.04)<br>Nop cycle 0 ( 0.04)<br>Stall due to prefetch miss 7560 ( 0.14)<br>Stall due to prefetch miss 0 ( 0.04)<br>Stall due to fp resource conflict 0 ( 0.04)<br>Stall due to fp resource conflict 0 ( 0.04)<br>Stall due to fp resource conflict 0 ( 0.04)<br>Stall due to prehexards 0 ( 0.04)<br>Stall due to dependency 4001060 ( 65.04)<br>Stall due to dependency 1 ( 0.04)<br>Stall due to dependency 0 ( 0.04)<br>Stall due to dependency 1 ( 0.04)<br>Stall due to dependency 1 ( 0.04)<br>Stall cycle 0 ( 0.04)<br>Stall dependency stalls)<br>Stall cycle 0 ( 0.04) of all dependency stalls)<br>Stall cycle 0 ( 0.04) of all dependency stalls)<br>Stall cycle 0 ( 0.04) of all dependency stalls)<br>FF 0 ( 0.04) of all dependency stalls)                                                                                                                                                                                                                                                                                                                                    | Hint hit                              | 142740                  |         |          |                                      |
| Single cycle       1715121 ( 27.94)         Dual cycle       0 ( 0.0%)         Stall due to branch miss       7560 ( 0.1%)         Stall due to prefetch miss       0 ( 0.0%)         Stall due to greadency       4001060 ( 65.0%)         Stall due to swaiting for hint target       20 ( 0.0%)         Stall due to swaiting for hint target       420 ( 0.0%)         Tasue stalls due to pipe hazards       0 ( 0.0%)         Stall due to dependency on each pipelines       0 ( 0.0%)         FV2       143020 ( 3.6% of all dependency stalls)         Styre       857560 ( 21.4% of all dependency stalls)         STR       0 ( 0.0% of all dependency stalls)         SR       0 ( 0.0% of all dependency stalls)         SR       0 ( 0.0% of all dependency stalls)         FXS       0 ( 0.0% of all dependency stalls)         FXB       0 ( 0.0% of all dependency stalls)         FXB       0 ( 0.0% of all dependency stalls)         FXB       0 ( 0.0% of all dependency stalls)         FYF       0 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | Contention at LS between Load/S       | tore and Prefetch 1428; | 80      |          | Stats totaled in regions bracketed   |
| buil cycle     428920 ( 7.0k       Nop cycle     0 ( 0.0k)       Stall due to branch miss     7660 ( 0.1k)       Stall due to prefetch miss     0 ( 0.0k)       Stall due to fp resource conflict     0 ( 0.0k)       Stall due to waiting for hint target     420 ( 0.0k)       Stall due to waiting for hint target     420 ( 0.0k)       Stall due to waiting for hint target     0 ( 0.0k)       Stall due to waiting for hint target     0 ( 0.0k)       Stall due to graph hazards     0 ( 0.0k)       Stall cycles     0 ( 0.0k)       Stall cycles     0 ( 0.0k)       Stull cycles     6153081 (100.0k)       Stall cycles due to dependency on each pipelines     57560 ( 21.4k of all dependency stalls)       FX3     0 ( 0.0k of all dependency stalls)       FX3     0 ( 0.0k of all dependency stalls)       SFR     0 ( 0.0k of all dependency stalls)       SFR     0 ( 0.0k of all dependency stalls)       FX5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | Single cycle                          |                         | 1715121 | ( 27.9%) |                                      |
| Nop cycle         0         (         0.04           Stall due to branch miss         7560 (         0.14           Stall due to prefetch miss         0 (         0.04           Stall due to dependency         4001060 (         (         0.04           Stall due to prefetch miss         0 (         0.04         (         0.04           Stall due to dependency         4001060 (         (         0.04         (         0.04           Stall due to pipe hazards         0 (         0.04         (         0.04         (         0.04           Stall due to pipe hazards         0 (         0.04         (         0.04         (         0.04           Stall cycles         0 (         0.04         (         0.04         (         0.04           Stall cycles due to dependency on each pipelines         0 (         0.04         (         0.04           Stall cycles due to dependency stalls)         (         0.04         (         0.04           Stall cycles due to dependency stalls)         (         0.04 of all dependency stalls)         (         0.04 of all dependency stalls)           Stall cycles due to dependency stalls)         (         0.04 of all dependency stalls)         (         0.04 of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | Dual cycle                            |                         | 428920  | ( 7.0%)  |                                      |
| Stall due to branch miss       7560 ( 0.1%)         Stall due to grefetch miss       0 ( 0.0%)         Stall due to dependency       4001060 ( 65.0%)         Stall due to fp resource conflict       0 ( 0.0%)         Stall due to pipe hazards       0 ( 0.0%)         Stall due to pipe hazards       0 ( 0.0%)         Stall due to pipe hazards       0 ( 0.0%)         Stall cycle       0 ( 0.0%)         Stall cycle       0 ( 0.0%)         Stall cycles       0 ( 0.0%)         Stall dependency on each pipelines       57260 ( 21.4% of all dependency stalls)         Stall dependency stalls)       57800 ( 21.4% of all dependency stalls)         SPR       0 ( 0.0% of all dependency stalls)         SPR       0 ( 0.0% of all dependency stalls)         NOP       0 ( 0.0% of all dependency stalls)         FXB       0 ( 0.0% of all dependency stalls)         FYB       0 ( 0.0% of all dependency stalls)         FP7       0 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | Nop cycle                             |                         | 0       | ( 0.0%)  |                                      |
| Stall due to prefetch miss       0 ( 0.0%         Stall due to dependency       4001060 ( 65.0%)         Stall due to fp resource conflict       0 ( 0.0%)         Stall due to vaiting for hint target       420 ( 0.0%)         Issue stalls due to pipe hazards       0 ( 0.0%)         Issue stall cycle       0 ( 0.0%)         Stull due to vaiting for hint target       0 ( 0.0%)         Stull due to yipe hazards       0 ( 0.0%)         Stull due to yipe hazards       0 ( 0.0%)         Stull due to dependency on each pipelines       557260 ( 21.4% of all dependency stalls)         Str Str Str Str Str Str Str All dependency stalls)       57280 ( 21.4% of all dependency stalls)         Str Str Str O ( 0.0% of all dependency stalls)       57280 ( 21.4% of all dependency stalls)         Str Str O ( 0.0% of all dependency stalls)       57280 ( 21.4% of all dependency stalls)         Str O ( 0.0% of all dependency stalls)       57280 ( 21.4% of all dependency stalls)         Str O ( 0.0% of all dependency stalls)       57280 ( 0.0% of all dependency stalls)         NOP       0 ( 0.0% of all dependency stalls)       57280 ( 0.0% of all dependency stalls)         NOP       0 ( 0.0% of all dependency stalls)       57280 ( 0.0% of all dependency stalls)         FYA       0 ( 0.0% of all dependency stalls)       57480 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Stall due to branch miss              |                         | 7560    | ( 0.1%)  |                                      |
| Stall due to dependency4001060 ( 65.6%)Stall due to fpresource conflict0 ( 0.0%)Stall due to waiting for hint target420 ( 0.0%)Issue stalls due to pipe hazards0 ( 0.0%)Channel stall cycle0 ( 0.0%)SPU Initialization cycle0 ( 0.0%)Total cycle6153081 (100.0%)Stall cycles due to dependency on each pipelines                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | Stall due to prefetch miss            |                         | 0       | (0.0%)   |                                      |
| Stall due to fp resource conflict       0       0       0.0%         Stall due to vaiting for hint target       420       0.0%         Stall due to pipe hazards       0       0.0%         Channel stall cycle       0       0       0.0%         SPU Initialization cycle       0       0       0.0%         SPU Initialization cycle       0       0       0.0%         Stall cycles due to dependency on each pipelines       6153081       00.0%         SHUP       857560       (21.4% of all dependency stalls)       5         SSP       0       0.0% of all dependency stalls)       5         SPR       0       0.0% of all dependency stalls)       5         NOP       0       0.0% of all dependency stalls)       5         FPS       0       0.0% of all dependency stalls)       5         FP6       2143200       (5.3.6% of all dependency stalls)       5         FP7       0       0.0% of all dependency stalls)       5         FP0       0.0% of a                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Stall due to dependency               |                         | 4001060 | ( 65.0%) |                                      |
| Stall due to waiting for hint target     420 ( 0.0%)       Issue stalls due to pipe hazards     0 ( 0.0%)       SPU Initialization cycle     0 ( 0.0%)       SPU Initialization cycle     0 ( 0.0%)       Total cycle     6153081 (100.0%)       Stall cycles due to dependency on each pipelines     57250 ( 21.4% of all dependency stalls)       SFW     857560 ( 21.4% of all dependency stalls)       SFX     0 ( 0.0% of all dependency stalls)       SFR     0 ( 0.0% of all dependency stalls)       SFR     0 ( 0.0% of all dependency stalls)       SPR     0 ( 0.0% of all dependency stalls)       NOP     0 ( 0.0% of all dependency stalls)       FXB     0 ( 0.0% of all dependency stalls)       FYB     0 ( 0.0% of all dependency stalls)       FP6     2143200 ( 53.6% of all dependency stalls)       FP7     0 ( 0.0% of all dependency stalls)       FP7     0 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | Stall due to fp resource confli       | .ct                     | 0       | ( 0.0%)  |                                      |
| Issue stalls due to pipe hazards       0 ( 0.0%)         Channel stall cycle       0 ( 0.0%)         SPU Initialization cycle       0 ( 0.0%)         Total cycle       6153081 (100.0%)         Stall cycles due to dependency on each pipelines       6153081 (100.0%)         Stall cycles due to dependency on each pipelines       57260 ( 21.4% of all dependency stalls)         Stall cycles due to dependency stalls)       57560 ( 21.4% of all dependency stalls)         LS       857260 ( 21.4% of all dependency stalls)         SPR       0 ( 0.0% of all dependency stalls)         SPR       0 ( 0.0% of all dependency stalls)         SPR       0 ( 0.0% of all dependency stalls)         LNOP       0 ( 0.0% of all dependency stalls)         FXB       0 ( 0.0% of all dependency stalls)         FYB       0 ( 0.0% of all dependency stalls)         FP6       2143200 ( 53.6% of all dependency stalls)         FP7       0 ( 0.0% of all dependency stalls)         FP7       0 ( 0.0% of all dependency stalls)         FP7       0 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | Stall due to waiting for hint t       | arget                   | 420     | ( 0.0%)  |                                      |
| Shamel stall cycle     0 ( 0.0%)       SPU Initialization cycle     0 ( 0.0%)       Sput Initialization cycle     6153081 (100.0%)       Total cycle     6153081 (100.0%)       Stall cycles due to dependency on each pipelines     5       FX2     143020 ( 3.6% of all dependency stalls)       SHUF     857560 ( 21.4% of all dependency stalls)       FX3     0 ( 0.0% of all dependency stalls)       ER     0 ( 0.0% of all dependency stalls)       SPR     0 ( 0.0% of all dependency stalls)       SPR     0 ( 0.0% of all dependency stalls)       SPR     0 ( 0.0% of all dependency stalls)       NOP     0 ( 0.0% of all dependency stalls)       FXB     0 ( 0.0% of all dependency stalls)       FYB     0 ( 0.0% of all dependency stalls)       FYB     0 ( 0.0% of all dependency stalls)       FP6     2143200 ( 53.6% of all dependency stalls)       FP7     0 ( 0.0% of all dependency stalls)       FP7     0 ( 0.0% of all dependency stalls)       FP9     0 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | Issue stalls due to pipe hazard       | is                      | 0       | ( 0.0%)  |                                      |
| 3PD Initialization cycle       0 ( 0.04)         Total cycle       6153081 (100.04)         Stall cycles due to dependency on each pipelines       52         FX2       143020 ( 3.64 of all dependency stalls)         SHUF       857560 ( 21.44 of all dependency stalls)         FX3       0 ( 0.04 of all dependency stalls)         FX3       0 ( 0.04 of all dependency stalls)         SPR       0 ( 0.04 of all dependency stalls)         SPR       0 ( 0.04 of all dependency stalls)         SPR       0 ( 0.04 of all dependency stalls)         NOP       0 ( 0.04 of all dependency stalls)         FXB       0 ( 0.04 of all dependency stalls)         FYB       0 ( 0.04 of all dependency stalls)         FYB       0 ( 0.04 of all dependency stalls)         FYP6       2143200 ( 53.64 of all dependency stalls)         FP7       0 ( 0.04 of all dependency stalls)         FP7       0 ( 0.04 of all dependency stalls)         FP0       0 ( 0.04 of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | Channel stall cycle                   |                         | 0       | ( 0.0%)  |                                      |
| Total cycle6153081 (100.0%)Stall cycles due to dependency on each pipelinesFX2143020 ( 3.6% of all dependency stalls)SHUF857560 ( 21.4% of all dependency stalls)FX30 ( 0.0% of all dependency stalls)LS857280 ( 21.4% of all dependency stalls)BR0 ( 0.0% of all dependency stalls)SPR0 ( 0.0% of all dependency stalls)INOP0 ( 0.0% of all dependency stalls)FXB0 ( 0.0% of all dependency stalls)FXB0 ( 0.0% of all dependency stalls)FYB0 ( 0.0% of all dependency stalls)FP62143200 ( 53.6% of all dependency stalls)FP70 ( 0.0% of all dependency stalls)FP90 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | SPU Initialization cycle              |                         | U<br>   | ( 0.0%)  |                                      |
| Stall cycles due to dependency on each pipelinesFX2143020 ( 3.6% of all dependency stalls)SHUF857560 ( 21.4% of all dependency stalls)FX30 ( 0.0% of all dependency stalls)LS857260 ( 21.4% of all dependency stalls)BR0 ( 0.0% of all dependency stalls)SPR0 ( 0.0% of all dependency stalls)LNOP0 ( 0.0% of all dependency stalls)NOP0 ( 0.0% of all dependency stalls)FXB0 ( 0.0% of all dependency stalls)FYB0 ( 0.0% of all dependency stalls)FYF0 ( 0.0% of all dependency stalls)FYF0 ( 0.0% of all dependency stalls)FYF0 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Total cycle                           |                         | 6153081 | (100.0%) |                                      |
| FX2143020 ( 3.6% of all dependency stalls)SHUF857560 ( 21.4% of all dependency stalls)FX30 ( 0.0% of all dependency stalls)LS857280 ( 21.4% of all dependency stalls)BR0 ( 0.0% of all dependency stalls)SPR0 ( 0.0% of all dependency stalls)LNOP0 ( 0.0% of all dependency stalls)LNOP0 ( 0.0% of all dependency stalls)FXB0 ( 0.0% of all dependency stalls)FYB0 ( 0.0% of all dependency stalls)FYP2143200 ( 53.6% of all dependency stalls)FP70 ( 0.0% of all dependency stalls)FP70 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | Stall cycles due to dependency        | on each pipelines       |         |          |                                      |
| SHUF857560 ( 21.4% of all dependency stalls)FX30 ( 0.0% of all dependency stalls)LS857280 ( 21.4% of all dependency stalls)BR0 ( 0.0% of all dependency stalls)SPR0 ( 0.0% of all dependency stalls)LNOP0 ( 0.0% of all dependency stalls)LNOP0 ( 0.0% of all dependency stalls)FXB0 ( 0.0% of all dependency stalls)FXB0 ( 0.0% of all dependency stalls)FYP2143200 ( 53.6% of all dependency stalls)FPP0 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | FX2 143020 ( 3.6% of a                | all dependency stalls)  |         |          |                                      |
| FX30 ( 0.0% of all dependency stalls)LS857280 ( 21.4% of all dependency stalls)BR0 ( 0.0% of all dependency stalls)SPR0 ( 0.0% of all dependency stalls)LNOP0 ( 0.0% of all dependency stalls)NOP0 ( 0.0% of all dependency stalls)FXB0 ( 0.0% of all dependency stalls)FYB0 ( 0.0% of all dependency stalls)FP62143200 ( 53.6% of all dependency stalls)FP70 ( 0.0% of all dependency stalls)FP90 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | SHUF 857560 (21.4% of a               | all dependency stalls)  |         |          |                                      |
| LS 65/260 (21.4% of all dependency stalls)<br>ER 0 (0.0% of all dependency stalls)<br>SPR 0 (0.0% of all dependency stalls)<br>LNOP 0 (0.0% of all dependency stalls)<br>NOP 0 (0.0% of all dependency stalls)<br>FXB 0 (0.0% of all dependency stalls)<br>FY6 2143200 (53.6% of all dependency stalls)<br>FP7 0 (0.0% of all dependency stalls)<br>FPD 0 (0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | FX3 0 ( 0.0% of all de                | pendency stalls)        |         |          |                                      |
| bk       0 ( 0.0% of all dependency stalls)         SPR       0 ( 0.0% of all dependency stalls)         LNOP       0 ( 0.0% of all dependency stalls)         NOP       0 ( 0.0% of all dependency stalls)         FXB       0 ( 0.0% of all dependency stalls)         FY6       2143200 ( 53.6% of all dependency stalls)         FP7       0 ( 0.0% of all dependency stalls)         FPD       0 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | LD 857280 (21.4% of a                 | (11 dependency stalls)  |         |          |                                      |
| INOP     0 ( 0.0% of all dependency stalls)       INOP     0 ( 0.0% of all dependency stalls)       FXB     0 ( 0.0% of all dependency stalls)       FY6     2143200 ( 53.6% of all dependency stalls)       FP7     0 ( 0.0% of all dependency stalls)       FPD     0 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                       | pendency stalls)        |         |          |                                      |
| NOP     0 ( 0.0% of all dependency stalls)       FXB     0 ( 0.0% of all dependency stalls)       FP6     2143200 ( 53.6% of all dependency stalls)       FP7     0 ( 0.0% of all dependency stalls)       FPD     0 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | $r_{\rm NOP}$ 0 ( 0.0% of all de      | pendency stalls)        |         |          |                                      |
| FXB     0 ( 0.0% of all dependency stalls)       FP6     2143200 ( 53.6% of all dependency stalls)       FP7     0 ( 0.0% of all dependency stalls)       FPD     0 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | NOP $\Pi$ ( $\Pi$ , $\Pi$ ) of all de | pendency stalls)        |         |          |                                      |
| FP6     2143200 ( 53.6% of all dependency stalls)       FP7     0 ( 0.0% of all dependency stalls)       FPD     0 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | FXB $O = (0.05  of all de$            | nendency stalls)        |         |          |                                      |
| FP7     0 ( 0.0% of all dependency stalls)       FPD     0 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | FP6 2143200 ( 53,6% of                | all dependency stalls)  |         |          |                                      |
| FPD 0 ( 0.0% of all dependency stalls)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | FP7 0 ( 0.0% of all de                | mendency stalls)        |         |          |                                      |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | FPD 0 ( 0.0% of all de                | ependency stalls)       |         |          |                                      |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | - ,                                   | • ••• ••••••            |         |          |                                      |

The number of used registers are 15, the used ratio is 11.72



|                                                                                                                                                                                                                                                                                                                                                                    | mysim/SPE7: 9                                                                                                                                                                                                                                                                                                                          | itatistics                                                                     |                                                                                                             |                                                          | _ = ×                                                                | ĺ |                |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|----------------------------------------------------------|----------------------------------------------------------------------|---|----------------|
| SPU DD3.0                                                                                                                                                                                                                                                                                                                                                          |                                                                                                                                                                                                                                                                                                                                        |                                                                                |                                                                                                             |                                                          |                                                                      |   |                |
| Total Cycle count<br>Total Instruction count<br>Total CPI                                                                                                                                                                                                                                                                                                          | 478434<br>133990<br>3.57                                                                                                                                                                                                                                                                                                               |                                                                                |                                                                                                             |                                                          |                                                                      |   |                |
| Performance Cycle count<br>Performance Instruction count<br>Performance CPI                                                                                                                                                                                                                                                                                        | 378304<br>131456 (131264)<br>2.88 (2.88)                                                                                                                                                                                                                                                                                               |                                                                                |                                                                                                             |                                                          |                                                                      | N | lew in SDK 3.0 |
| Branch instructions<br>Branch taken<br>Branch not taken                                                                                                                                                                                                                                                                                                            | 16384<br>16320<br>64                                                                                                                                                                                                                                                                                                                   |                                                                                |                                                                                                             |                                                          |                                                                      |   |                |
| Hint instructions<br>Pipeline flushes<br>SP operations (MADDs=2)<br>DP operations (MADDs=2)                                                                                                                                                                                                                                                                        | 64<br>64<br>0<br>65536                                                                                                                                                                                                                                                                                                                 |                                                                                |                                                                                                             |                                                          |                                                                      |   |                |
| Contention at LS between Load/S                                                                                                                                                                                                                                                                                                                                    | Store and Prefetch 16384                                                                                                                                                                                                                                                                                                               |                                                                                |                                                                                                             |                                                          |                                                                      |   |                |
| Single cycle<br>Dual cycle<br>Nop cycle<br>Stall due to branch miss<br>Stall due to prefetch miss<br>Stall due to dependency<br>Stall due to fp resource confli<br>Stall due to waiting for hint t<br>Issue stalls due to pipe hazard<br>Channel stall cycle<br>SPU Initialization cycle                                                                           | ict<br>target<br>ls                                                                                                                                                                                                                                                                                                                    | 98368<br>16448<br>0<br>1152<br>0<br>163904<br>0<br>128<br>98304<br>0<br>0<br>0 | (26.0%)<br>(4.3%)<br>(0.0%)<br>(0.3%)<br>(2.3%)<br>(2.0%)<br>(2.0%)<br>(2.0%)<br>(2.0%)<br>(2.0%)<br>(0.0%) |                                                          |                                                                      |   |                |
| Total cycle                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                        | 378304                                                                         | (100.0%)                                                                                                    |                                                          |                                                                      |   |                |
| $\begin{array}{cccccccccccccccccccccccccccccccccccc$                                                                                                                                                                                                                                                                                                               | on each instruction cla<br>dependency stalls)<br>apendency stalls)<br>apendency stalls)<br>ll dependency stalls)<br>apendency stalls)<br>apendency stalls)<br>apendency stalls)<br>apendency stalls)<br>apendency stalls)<br>apendency stalls)<br>apendency stalls)<br>apendency stalls)<br>apendency stalls)<br>bl dependency stalls) | 123                                                                            |                                                                                                             |                                                          |                                                                      |   | Pipeline stats |
| The number of used registers an                                                                                                                                                                                                                                                                                                                                    | re 8, the used ratio is                                                                                                                                                                                                                                                                                                                | 6.25                                                                           |                                                                                                             |                                                          |                                                                      |   |                |
| Instruction Class                                                                                                                                                                                                                                                                                                                                                  | Insts                                                                                                                                                                                                                                                                                                                                  | Issued                                                                         | Insts Exec                                                                                                  | Exec Cycles                                              | Cycles/Inst                                                          |   |                |
| FN2 (EVEN): Logical and integer<br>SHOF (ODD): Shuffle, quad rotat<br>FN3 (EVEN): Element rotate/shi1<br>LS (ODD): Load/store, hint<br>ER (ODD): Branch<br>SPR (ODD): Channel and SPR mos<br>LNOP (EVEN): NOP<br>FXE (EVEN): NOP<br>FXE (EVEN): Special byte ops<br>FPE (EVEN): Special byte ops<br>FPE (EVEN): SP floating point<br>FPT (EVEN): DP floating point | r arithmetic<br>te/shift, mask<br>ft<br>zes<br>t conversion                                                                                                                                                                                                                                                                            | 49344<br>0<br>49152<br>16384<br>192<br>64<br>0<br>0<br>0<br>16384              | 49344<br>0<br>49216<br>16384<br>128<br>0<br>0<br>0<br>16384                                                 | 82304<br>0<br>163968<br>65536<br>640<br>0<br>0<br>114688 | 1.67<br>0.00<br>3.33<br>4.00<br>0.00<br>0.00<br>0.00<br>0.00<br>0.00 |   |                |
| dumped pipeline stats                                                                                                                                                                                                                                                                                                                                              |                                                                                                                                                                                                                                                                                                                                        |                                                                                |                                                                                                             |                                                          |                                                                      |   |                |



# **Cycle and Instruction Counts**





# **Branch and Hint Stats**

SPU DD3.0 \*\*\* Total Cycle count Total Instruction count Total CPI \*\*\* Performance Cycle count Performance Instruction count Performance CPI

Branch instructions Branch taken Branch not taken

Hint instructions Hint hit

11

Contention at LS between Load/Store and Prefetch 142880

22082564

2583628

6153081

2573522

143160 142880

280

140

142740

2.39 (2/39

2961

8.55

**Branch Instructions**: Total branchtype instructions executed (excl. stop, sync's, iret)

**Branch Taken**: Total "satisfied" branches (regardless of PC address change)

**Branch Not Taken**: (Branch instructions – Branch Taken

**Hint Instructions**: Count of HBRtype instructions executed (excl. hbrp)

**Hint Hits**: Count of executed instructions which were loaded from the hint target prefetch buffer

**LS Contention**: Count of cycles in which LS arbitration prevented instruction prefetch in favor of register load/store operations

# **Efficiency Stats**

| Single cycle                         | 1715121 ( 27.9%) |
|--------------------------------------|------------------|
| Dual cycle                           | 428920 ( 7.0%)   |
| Nop cycle                            | 0 ( 0.0%)        |
| Stall due to branch miss             | 7560 ( 0.1%)     |
| Stall due to prefetch miss           | 0 ( 0.0%)        |
| Stall due to dependency              | 4001060 ( 65.0%) |
| Stall due to fp resource conflict    | 0 ( 0.0 )        |
| Stall due to waiting for hint target | 420 ( 0.0%)      |
| Issue stalls due to pipe hazards     | 0 ( 0.0%)        |
| Channel stall cycle                  | O ( O.O%) 🔪 🔪    |
| SPU Initialization cycle             | O ( 0.0%) 🔪 🔪    |
|                                      | \ \ \ \          |
| Total cycle                          | 6153081 (100.0%) |
|                                      |                  |

Stall cycles due to dependency on each pipelines

| FX2  | 143020 ( 3.6% of all dependency stalls)   |
|------|-------------------------------------------|
| SHUF | 857560 ( 21.4% of all dependency stalls)  |
| FX3  | 0 ( 0.0% of all dependency stalls)        |
| LS   | 857280 ( 21.4% of all dependency stalls)  |
| BR   | 0 ( 0.0% of all dependency stalls)        |
| SPR  | 0 ( 0.0% of all dependency stalls)        |
| LNOP | 0 ( 0.0% of all dependency stalls)        |
| NOP  | 0 ( 0.0% of all dependency stalls)        |
| FXB  | 0 ( 0.0% of all dependency stalls)        |
| FP6  | 2143200 ( 53.6% of all dependency stalls) |
| FP7  | 0 ( 0.0% of all dependency stalls)        |
| FPD  | 0 ( 0.0% of all dependency stalls)        |
|      |                                           |

The number of used registers are 15, the used ratio is 11.72

Single Cycle: Cycles in which only 1 non-NOP instruction was executed Dual Cycle: Cycles in which 2 non-NOP instructions were executed

**NOP Cycle**: Cycles in which only NOP instructions were executed

**Branch Miss Stalls**: Cycles in which branch mispredict prevented any instruction from executing

**Prefetch Miss Stalls**: Cycles in which instruction run-out occurred

**Dependency Stalls**: Cycles in which source/target operand dependencies prevented any instruction from being issued

**FP Resource Stalls**: Cycles in which shared use of FPU stages prevented any instruction from being issued (e.g. FXB, FP6, FP7, FPD)

# **Dependency Statistics**



13



#### Instruction Histogram: mysim spu n display statistics hist

| mnemonic |        |  |  |  |  |  |
|----------|--------|--|--|--|--|--|
| +-       |        |  |  |  |  |  |
| lnop     | 281    |  |  |  |  |  |
| hbra     | 140    |  |  |  |  |  |
| а        | 142880 |  |  |  |  |  |
| and      | 141    |  |  |  |  |  |
| ai       | 428640 |  |  |  |  |  |
| brz      | 143020 |  |  |  |  |  |
| stqx     | 285760 |  |  |  |  |  |
| lqa      | 142880 |  |  |  |  |  |
| br       | 140    |  |  |  |  |  |
| fsmbi    | 140    |  |  |  |  |  |
| lqx      | 571520 |  |  |  |  |  |
| rotqby   | 142880 |  |  |  |  |  |
| nop      | 140    |  |  |  |  |  |
| il       | 280    |  |  |  |  |  |
| ila      | 140    |  |  |  |  |  |
| cgti     | 140    |  |  |  |  |  |
| fm       | 142880 |  |  |  |  |  |
| ceq      | 142880 |  |  |  |  |  |
| shufb    | 142880 |  |  |  |  |  |
| fma      | 285760 |  |  |  |  |  |



#### Branch History: mysim spu n display statistics branch

|                         |                                        |               | Hit: Cou<br>which w<br>target p | unt of executed<br>vere loaded from<br>refetch buffer fo | instructions<br>the hint<br>or this branch |        |
|-------------------------|----------------------------------------|---------------|---------------------------------|----------------------------------------------------------|--------------------------------------------|--------|
| Hint: Nur<br>instructio | nber of executed hins referencing this | int<br>branch |                                 |                                                          |                                            |        |
| Branch                  | histories                              |               |                                 |                                                          |                                            | ۱.     |
| INST                    | ADDRESS:                               | count         | taken                           | not_taken                                                | hint                                       | hit    |
| brsl                    | 0x00014:                               | 1             | 1                               | 0                                                        | 0                                          | 0      |
| brnz                    | 0x0017c:                               | 15            | 14                              | 1                                                        | 1                                          | 14     |
| brnz                    | 0x001c4:                               | 1             | 0                               | 1                                                        | 0                                          | 0      |
| brnz                    | 0x0026c:                               | 1             | 0                               | 1                                                        | 0                                          | 0      |
| bi                      | 0x00274:                               | 1             | 1                               | 0                                                        | 0                                          | 0      |
| br                      | 0x003c4:                               | 142881        | 142741                          | (140)                                                    | 141                                        | 142741 |
| sync                    | 0x004e0:                               | 1             | 0                               | 1                                                        | 0                                          | 0      |
| bi                      | 0x004ec:                               | 1             | 1                               | 0                                                        | 0                                          | 0      |
| brnz                    | 0x0004c:                               | 28            | 27                              | 1                                                        | 0                                          | 0      |
| stop                    | 0x00090:                               | 1             | 0                               | 1                                                        | 0                                          | 0      |
| br                      | 0x0009c:                               | 32            | 32                              | 0                                                        | 0                                          | 0      |
| bi                      | Ox3fe14:                               | 1             | 1                               | 0                                                        | 0                                          | 0      |
| brnz                    | 0x000d4:                               | 1             | 1                               | 0                                                        | 0                                          | 0      |
| brsl                    | 0x000fc:                               | 1             | 1                               | 0                                                        | 0                                          | 0      |
| brz                     | 0x00364:                               | 140           | 0                               | 140                                                      | 0                                          | 0      |
| br                      | 0x003cc:                               | 140           | 140                             | 0                                                        | 0                                          | 0      |
| Total                   |                                        | 143246        | 142960                          | 286                                                      | 142                                        | 142755 |



#### Hint History: mysim spu n display statistics hint





#### Register Utilization: mysim spu n display statistics reguse

| Register use: | READS( %tot)   | WRITES( %tot)  | R+W( %tot)     |
|---------------|----------------|----------------|----------------|
| 2:            | 1000300(14.00) | 857420(12.00)  | 1857720(26.00) |
| 3:            | 428640( 6.00)  | 428640( 6.00)  | 857280(12.00)  |
| 4:            | 142880( 2.00)  | 142880( 2.00)  | 285760( 4.00)  |
| 5:            | 142880( 2.00)  | 142880( 2.00)  | 285760( 4.00)  |
| 6:            | 142880( 2.00)  | 142880( 2.00)  | 285760( 4.00)  |
| 7:            | 857280(12.00)  | 143020( 2.00)  | 1000300(14.00) |
| 8:            | 428640( 6.00)  | 143020( 2.00)  | 571660( 8.00)  |
| 9:            | 143020( 2.00)  | 0( 0.00)       | 143020( 2.00)  |
| 10:           | 285760( 4.00)  | 143020( 2.00)  | 428780( 6.00)  |
| 12:           | 428640( 6.00)  | 0( 0.00)       | 428640( 6.00)  |
| 13:           | 142880( 2.00)  | 140( 0.00)     | 143020( 2.00)  |
| 14:           | 285760( 4.00)  | 0( 0.00)       | 285760( 4.00)  |
| 16:           | 285760( 4.00)  | 0( 0.00)       | 285760( 4.00)  |
| 17:           | 285760( 4.00)  | 0( 0.00)       | 285760( 4.00)  |
| 31:           | 282( 0.00)     | 141( 0.00)     | 423( 0.01)     |
| TOTAL use:    | 5001362(69.99) | 2144041(30.01) | 7145403        |
|               |                |                |                |

3/2/2008



18

# Event Log Sample

| X eventle                                                                                                                                                             | og_mysim                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                             |                                                                   |                                         |                                                       |               |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|-----------------------------------------|-------------------------------------------------------|---------------|
|                                                                                                                                                                       |                                                                                                                                                                                                                                                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Activity Cha                                                                                                                                                | art:                                                              |                                         |                                                       |               |
|                                                                                                                                                                       |                                                                                                                                                                                                                                                                                                                    | 590M                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 593 M                                                                                                                                                       | 596 M                                                             | 598 M                                   | 601M                                                  | 6(            |
| PROCESS                                                                                                                                                               | EXEC                                                                                                                                                                                                                                                                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                             |                                                                   |                                         |                                                       |               |
| SPU7                                                                                                                                                                  | DMA<br>RUN                                                                                                                                                                                                                                                                                                         | <sup>=</sup>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | - <u>-</u> =                                                                                                                                                |                                                                   | ~= <sup>~</sup> =                       | ==,<br>=                                              |               |
|                                                                                                                                                                       | CHANNEL_STALL                                                                                                                                                                                                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                             |                                                                   |                                         |                                                       | ·             |
|                                                                                                                                                                       | DMA_LIST                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                             | <u> </u>                                                          |                                         |                                                       | — <sup></sup> |
|                                                                                                                                                                       | DMAQ_SUSPEND_XLATE                                                                                                                                                                                                                                                                                                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                             |                                                                   |                                         |                                                       | χ             |
|                                                                                                                                                                       | SPE_XLATE_FAULT                                                                                                                                                                                                                                                                                                    | 11111111                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | · · · · · · · · · · · · · · · · · · ·                                                                                                                       |                                                                   | 11.2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1. | ن از از از این از | ······        |
| SPU6                                                                                                                                                                  | DMA<br>RUN<br>CHANNEL_STALL<br>DMA_LIST                                                                                                                                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                             |                                                                   |                                         |                                                       |               |
| 4                                                                                                                                                                     |                                                                                                                                                                                                                                                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                             |                                                                   |                                         |                                                       |               |
|                                                                                                                                                                       | Marker C                                                                                                                                                                                                                                                                                                           | cycle: 612756050                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Clip Start at Marker                                                                                                                                        | Unclip Start                                                      | Clip End at Marker                      | Unclip End                                            |               |
| 413233390<br>415238953<br>416730501<br>422431398<br>450424734<br>450242734<br>453275627<br>590663202<br>590691164<br>590691164<br>590691164<br>590691762<br>590691762 | PROCESS EXEC CPU#0<br>PROCESS EXEC CPU#0<br>SPE_DMA_START #7 A<br>SPE_DMA_START #7 A<br>SPE_DMA_START #7 A<br>CHANNEL_STALL_STARD<br>SPE_DMA_START #7 A<br>CHANNEL_STALL_STARD | 1 EXECHAME /bin/grep<br>1 EXECHAME /bin/grep<br>1 EXECHAME /bin/grep<br>1 EXECHAME /bir/bin/<br>1 EXECHAME /bir/bin/<br>1 EXECHAME /bir/chan<br>1 EXECHAME /bir/chan | Log Trace<br>id<br>id<br>callthru<br>dd<br>callthru<br>nulti_spe<br>0000000220000 LS:0x0000000<br>000000320000 LS:0x00000500<br>0000000320000 LS:0x00000500 | :<br>0 SIZE:3584<br>1 SIZE:3584 (DIFF<br>SIZE:2304<br>00 SIZE:128 | ': 1364 cycles)                         |                                                       | ł             |

# SPE Visualizer Detail Display



19

# **SPU Visualizer Summary Display**



# **SPE Visualizer Controls**

#### Via TCL Commands

- mysim pviz run/stop
- mysim pviz set spu n
- mysim pviz set delta x
- mysim pviz set scroll/compress

#### Via "apu\_callthru.h"

- MamboPVIZRun()
- MamboPVIZStop()

21

MamboPVIZSetDelta(x)

#### Via GUI Controls





# **Enabling SPU Pipe Trace**



# Pipe Trace Output

| 1:  | : CYCLE: 605035779, SPU-7 CYCLE: 14373254 (                               | Inst: 683872)              |                |  |  |  |  |
|-----|---------------------------------------------------------------------------|----------------------------|----------------|--|--|--|--|
| 2:  | : mispred 0 -> 003a8                                                      | ,                          |                |  |  |  |  |
| 3:  | hint 003c4->00380*                                                        |                            |                |  |  |  |  |
| 4:  | pre-fetch 2-00480 1-00400                                                 | pre-fetch 1s               |                |  |  |  |  |
| 5:  | : ilbh1 = 00380, ilbh2 = 003c0                                            | 1                          |                |  |  |  |  |
| 6:  | ilb11 x 00000, ilb12 x 00000, ilb21 x 000                                 | 00, i1b22 x 00000          |                |  |  |  |  |
| 7:  | : pred = 00380 (8)                                                        |                            |                |  |  |  |  |
| 8:  | $q = 0 \times 00380 a \$5, \$8, \$16 =$                                   | 0x00384 lqx \$3,\$7,\$12   |                |  |  |  |  |
| 9:  | : h = 0x003c0 ai \$7,\$7,16 =                                             | 0x003c4 brz \$6,68         |                |  |  |  |  |
| 10: | : i = 0x003b8 fma \$2,\$2,\$4,\$3 =                                       | 0x003bc stqx \$2,\$7,\$12  |                |  |  |  |  |
| 11: | : j = 0x003b0 shufb \$2,\$2,\$2,\$13 =                                    | 0x003b4 fm \$2,\$17,\$2    |                |  |  |  |  |
| 12: | : * X 0x00000 stop =                                                      | 0x003ac rotqby \$2,\$2,\$5 | (-1) (2)       |  |  |  |  |
| 13: | : k X 0x00000 stop R                                                      | 0x00000 stop               | (-1) (-1)      |  |  |  |  |
| 14: | : 1 X 0x00000 stop R                                                      | 0x00000 stop               | (-1) (-1)      |  |  |  |  |
| 15: | :m X 0x00000 stop R                                                       | 0x00000 stop               | (-1) (-1)      |  |  |  |  |
| 16: | : n X 0x00000 stop =                                                      | 0x003a8 lqx \$3,\$7,\$12   | (-1) (3)       |  |  |  |  |
| 17: | : o = 0x003a0 ai \$8,\$8,4 =                                              | 0x003a4 lqa \$4,39200      | (128) ( 4)     |  |  |  |  |
| 18: | :p X 0x00000 stop =                                                       | 0x0039c lqx \$2,\$8,\$16   | ( -1) (128)    |  |  |  |  |
| 19: | : q X 0x00000 stop =                                                      | 0x00398 stqx \$3,\$7,\$14  | ( -1) (128)    |  |  |  |  |
| 20: | r R 0x00000 stop X                                                        | 0x00000 stop               | (-1) (-1)      |  |  |  |  |
| 21: | :s R 0x00000 stop X                                                       | 0x00000 stop               | (-1) (-1)      |  |  |  |  |
| 22: | :t R 0x00000 stop X                                                       | 0x00000 stop               | (-1) (-1)      |  |  |  |  |
| 23: | u R 0x00000 stop X                                                        | 0x00000 stop               | (-1) (-1)      |  |  |  |  |
| 24: | : v R 0x00000 stop X                                                      | 0x00000 stop               | (-1) (-1)      |  |  |  |  |
| 25: | : CYCLE: 605035780, SPU-7 CYCLE: 14373255 (                               | Inst: 683873)              |                |  |  |  |  |
| 26: | : mispred 0 -> 003ac                                                      |                            |                |  |  |  |  |
| 27: | : hint 003c4-500300#                                                      |                            |                |  |  |  |  |
| 28: | titht http://www-128 ibm com/d                                            | eveloperworks/power/libra  | ry/pa-cellspu/ |  |  |  |  |
| 29: | 29: 11bh1 nccp.//www-120.1bm.com/deveroperworks/power/iibrary/pa-celispu/ |                            |                |  |  |  |  |
| 30: | 30: 11b11                                                                 |                            |                |  |  |  |  |
| 32. | a = 0x00380 = 55 = 58 = 516 = =                                           | 0x00384 lox \$3 \$7 \$12   |                |  |  |  |  |
| 33. | h = 0x003c0 ai \$7.\$7.16 =                                               | 0x003c4 brz \$6=68         |                |  |  |  |  |
| 34: | i = 0x003b8  fma  \$2.\$2.\$4.\$3 =                                       | 0x003bc stax \$2.\$7.\$12  |                |  |  |  |  |
| 35: | i = 0x003b0  shufb  \$2,\$2,\$2,\$13 =                                    | 0x003b4 fm \$2,\$17,\$2    |                |  |  |  |  |
| 36: | : * R 0x00000 stop X                                                      | 0x00000 stop               | (-1) (-1)      |  |  |  |  |
| 37: | : k X 0x00000 stop                                                        | 0x003ac rotoby \$2,\$2,\$5 | (-1) (2)       |  |  |  |  |
|     |                                                                           |                            |                |  |  |  |  |

# **Local Store Stats**





## **Special Notices -- Trademarks**

This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in other countries, and the information is subject to change without notice. Consult your local IBM business contact for information on the IBM offerings available in your area. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document.

Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY 10504-1785 USA.

All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or guarantees either expressed or implied.

All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations and conditions.

IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates are based on a client's credit rating, financing terms, offering type, equipment type and options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice.

IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies.

All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary.

IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.

Many of the features described in this document are operating system dependent and may not be available on Linux. For more information, please check: <u>http://www.ibm.com/systems/p/software/whitepapers/linux\_overview.html</u>

Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are dependent on many factors including system hardware configuration and software design and configuration. Some measurements quoted in this document may have been made on development-level systems. There is no guarantee these measurements will be the same on generally-available systems. Some measurements quoted in this document may have been estimated through extrapolation. Users of this document should verify the applicable data for their specific environment.

Revised January 19, 2006

#### Special Notices (Cont.) -- Trademarks

The following terms are trademarks of International Business Machines Corporation in the United States and/or other countries: alphaWorks, BladeCenter, Blue Gene, ClusterProven, developerWorks, e business(logo), e(logo)business, e(logo)server, IBM, IBM(logo), ibm.com, IBM Business Partner (logo), IntelliStation, MediaStreamer, Micro Channel, NUMA-Q, PartnerWorld, PowerPC, PowerPC(logo), pSeries, TotalStorage, xSeries; Advanced Micro-Partitioning, eServer, Micro-Partitioning, NUMACenter, On Demand Business logo, OpenPower, POWER, Power Architecture, Power Everywhere, Power Family, Power PC, PowerPC Architecture, POWER5, POWER5+, POWER6, POWER6+, Redbooks, System p, System p5, System Storage, VideoCharger, Virtualization Engine.

A full list of U.S. trademarks owned by IBM may be found at: http://www.ibm.com/legal/copytrade.shtml.

Cell Broadband Engine and Cell Broadband Engine Architecture are trademarks of Sony Computer Entertainment, Inc. in the United States, other countries, or both.

Rambus is a registered trademark of Rambus, Inc.

XDR and FlexIO are trademarks of Rambus, Inc.

UNIX is a registered trademark in the United States, other countries or both.

Linux is a trademark of Linus Torvalds in the United States, other countries or both.

Fedora is a trademark of Redhat, Inc.

26

Microsoft, Windows, Windows NT and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries or both.

Intel, Intel Xeon, Itanium and Pentium are trademarks or registered trademarks of Intel Corporation in the United States and/or other countries.

AMD Opteron is a trademark of Advanced Micro Devices, Inc.

Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States and/or other countries.

TPC-C and TPC-H are trademarks of the Transaction Performance Processing Council (TPPC).

SPECint, SPECfp, SPECjbb, SPECweb, SPECjAppServer, SPEC OMP, SPECviewperf, SPECapc, SPEChpc, SPECjvm, SPECmail, SPECimap and SPECsfs are trademarks of the Standard Performance Evaluation Corp (SPEC).

AltiVec is a trademark of Freescale Semiconductor, Inc.

PCI-X and PCI Express are registered trademarks of PCI SIG.

InfiniBand<sup>™</sup> is a trademark the InfiniBand<sup>®</sup> Trade Association

Other company, product and service names may be trademarks or service marks of others.



# **Special Notices - Copyrights**

(c) Copyright International Business Machines Corporation 2005. All Rights Reserved. Printed in the United Sates September 2005.

The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both. IBM IBM Logo Power Architecture

Other company, product and service names may be trademarks or service marks of others.

All information contained in this document is subject to change without notice. The products described in this document are NOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could result in death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this document was obtained in specific environments, and is presented as an illustration. The results obtained in other operating environments may vary.

While the information contained herein is believed to be accurate, such information is preliminary, and should not be relied upon for accuracy or completeness, and no representations or warranties of accuracy or completeness are made.

THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN "AS IS" BASIS. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document.

IBM Microelectronics Division 1580 Route 52, Bldg. 504 Hopewell Junction, NY 12533-6351 The IBM home page is http://www.ibm.com The IBM Microelectronics Division home page is http://www.chips.ibm.com