Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

A. Ferrerón<sup>1</sup>, D. Suárez-Gracia<sup>2</sup>, J. Alastruey-Benedé<sup>1</sup>, T. Monreal<sup>3</sup>, V. Viñals<sup>1</sup>

<sup>1</sup>Universidad de Zaragoza, Spain

<sup>2</sup>Qualcomm Research Silicon Valley, USA

<sup>3</sup>Universidad Politécnica de Cataluña, Spain

SBAC-PAD Oct-2014



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

1/1

Image: A mathematical states and a mathem

# Operation near the threshold voltage $(V_{th})$

#### $V_{dd}$ and $V_{th}$ scaling has stopped

Power density no longer stays constant among technology generations and dark silicon appears



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

2/1

イロト イヨト イヨト イヨト

# Operation near the threshold voltage $(V_{th})$

#### $V_{dd}$ and $V_{th}$ scaling has stopped

Power density no longer stays constant among technology generations and dark silicon appears

#### Operation at ultra-low $V_{dd}$

- Reduce the power and energy consumption
- Switch on more cores to exploit parallelism



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

2/1

イロト イヨト イヨト イヨト

Operation near the threshold voltage  $(V_{th})$ : Challenges

Delay increases: lower voltage  $\rightarrow$  lower frequency

 Compensate with parallelism: more active cores with the same power budget



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

# Operation near the threshold voltage $(V_{th})$ : Challenges

Delay increases: lower voltage  $\rightarrow$  lower frequency

 Compensate with parallelism: more active cores with the same power budget

Increasing sensitivity to process variation (deviation of device parameters from their nominal values)

- Memory structures especially sensitive to variation
  - Conventional 6T cells: read, write, access, and hold failures
  - ► Lower voltages → stability margins decrease → increasing cell failure rate

< ロ > < 回 > < 回 > < 回 > < 回 >

3/1

V<sub>ddmin</sub> of memory blocks to guarantee reliable operation

A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

#### Objective

Lower  $V_{dd}$  to near-threshold voltages  $\rightarrow$  energy efficient operation



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

#### Objective

Lower  $V_{dd}$  to near-threshold voltages  $\rightarrow$  energy efficient operation

#### Problem

High sensitivity of SRAM structures to variation at ultra-low  $V_{dd}$ 



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

4/1

イロト イヨト イヨト イヨト

#### Objective

Lower  $V_{dd}$  to near-threshold voltages  $\rightarrow$  energy efficient operation

#### Problem

High sensitivity of SRAM structures to variation at ultra-low  $V_{dd}$ 

#### Our proposal

Mitigate the impact of SRAM cell failures at ultra-low  $V_{dd}$  using low complexity techniques: Block Disabling with Operational Tags and Block Disabling with Operational Tags and Cache-to-cache Transfers

< ロ > < 回 > < 回 > < 回 > < 回 >

3

4/1

A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

## Outline



5 990

A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

5/1

< ロ > < 回 > < 回 > < 回 > < 回 >

## Outline



5 990

A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

6/1

< ロ > < 回 > < 回 > < 回 > < 回 >

## Example of Probability of Failure of SRAM Cells at 22nm



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

7/1

э

A B A B A
 A
 B
 A
 A
 B
 A
 A
 B
 A
 A
 B
 A
 A
 B
 A
 A
 B
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A
 A

Universidad Zaragoza

2

## Example of Probability of Failure of SRAM Cells at 22nm



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

7/1

< 同 ▶ < 三 ▶

#### Bit Probability of Failure Affects Yield



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

## Bit Probability of Failure Affects Yield



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

## Outline



5 990

A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

9/1

< ロ > < 回 > < 回 > < 回 > < 回 >

Traditional Cache Hierarchy



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

10/1

э

2

< ロ > < 回 > < 回 > < 回 > <</p>

#### Traditional Cache Hierarchy



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

10/1

イロト イヨト イヨト

Universidad Zaragoza

æ

## Block Disabling Fundamentals

SRAM cell failure detected: Block Disabling (BD) deactivates entry (tag and data) Simple implementation and low overhead: 1 bit per cache entry





A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

11/1

• • • • • • • • • • • •

## Block Disabling Fundamentals

SRAM cell failure detected: Block Disabling (BD) deactivates entry (tag and data) Simple implementation and low overhead: 1 bit per cache entry





A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

11/1

< D > < A > < B > < B >

## Block Disabling at Ultra-low Voltages

At lower voltages capacity and associativity degrade very fast

Available capacity for 16-way, 1MB cache bank with block disabling (block size is 64 bytes):

| Vdd   | Available capacity (KB) |
|-------|-------------------------|
| 0.55V | 887 KB (86%)            |
| 0.50V | 408 KB (40%)            |
| 0.45V | 138 KB (13%)            |



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

12/1

• • • • • • • • • • • •

#### Block Disabling at Ultra-low Voltages

At lower voltages capacity and associativity degrade very fast

Associativity degradation for 16-way, 1MB cache bank with block disabling (block size is 64 bytes):



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

#### **Inclusive Hierarchies**

#### LLC Bank (Shared)

#### Tag ArrayData Array



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

14/1

æ

< ロ > < 回 > < 回 > < 回 > < 回 >

Inclusive Hierarchies and Block Disabling Interaction

LLC Bank (Shared)

Tag ArrayData Array



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

15/1

3 x 3

イロト イロト イヨト イ

## Outline



5 990

A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

16/1

< ロ > < 回 > < 回 > < 回 > < 回 >

## BD with operational tags: BDOT

Allow blocks to be allocated as just tags: entries with faulty bits can still be used to allocate tag-only blocks in LLC



## BD with operational tags: BDOT

#### Protect the tag array

- Bigger/robust cells: bigger transistors/more transistors per cell (assist circuitry)
- More complex error correction codes (ECC)



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

## BD with operational tags: BDOT

#### Protect the tag array

 Bigger/robust cells: bigger transistors/more transistors per cell (assist circuitry)

More complex error correction codes (ECC)

- Why not protect the whole cache structure?
  - Area and power increase when using bigger/robust cells
  - Complex ECC require extra storage and checking hardware: might increase access latency
  - Tag array roughly 10% of the cache area (LLC)

A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

18/1

3

イロト イ団ト イヨト イヨト

## BDOT with cache-to-cache trasnfers: BDOT-C2C

 $\blacktriangleright$  Problem: requests to tag-only blocks  $\rightarrow$  off-chip transactions



## BDOT with cache-to-cache trasnfers: BDOT-C2C

- $\blacktriangleright$  Problem: requests to tag-only blocks  $\rightarrow$  off-chip transactions
- Observation: shared blocks already on-chip (private levels)



## BDOT with cache-to-cache trasnfers: BDOT-C2C

Provide cache-to-cache transfers of clean blocks: leverage coherence protocol

- The protocol already does cache-to-cache transfers of exclusively owned blocks
- Slight change in the coherence protocol behavior, but no hardware overhead
- Potential gain depends on the applications sharing degree

A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14 Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 20/1

## Outline



5 990

A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

21/1

< ロ > < 回 > < 回 > < 回 > < 回 >

# Methodology



 Experimental set-up: Simics + GEMS + GARNET + DRAMSim2 + McPAT

> Universidad Zaragoza

> > 3

22 / 1

イロト イ団ト イヨト イヨト

- PARSEC benchmark suite
- Random faults + Monte Carlo simulations

A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

# **On-chip Energy Consumption**





A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

# **On-chip Energy Consumption**



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

# On-chip Energy Consumption



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

## Total Energy Consumption



Universidad Zaragoza

A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

## Total Energy Consumption



Minimum system energy: off-chip memory energy consumption main source higher voltage values (0.55-0.6V)

A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

24/1

Universidad Zaragoza

## Outline



5 990

A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

25 / 1

< ロ > < 回 > < 回 > < 回 > < 回 >

## Conclusions

Operation near V<sub>th</sub> for energy efficient operation

- Switch on inactive cores
- Reduce the overall energy consumption
- SRAM structures fail when lowering V<sub>dd</sub>
   BD: simple, low overhead, but not effective at ultra-low V<sub>dd</sub>
   Inclusive hierarchies: BD increases inclusion victims
  - ▶ BDOT: allow blocks allocated as tag-only  $\rightarrow$  protect inclusion
  - BDOT-C2C: provide cache-to-cache transfers of shared blocks

     → reduce off-chip transactions

イロト イ団ト イヨト イヨト

3

26/1

 BDOT & BDOT-C2C: substantial reduction on-chip power and energy consumption

A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

A. Ferrerón<sup>1</sup>, D. Suárez-Gracia<sup>2</sup>, J. Alastruey-Benedé<sup>1</sup>, T. Monreal<sup>3</sup>, V. Viñals<sup>1</sup>

<sup>1</sup>Universidad de Zaragoza, Spain

<sup>2</sup>Qualcomm Research Silicon Valley, USA

<sup>3</sup>Universidad Politécnica de Cataluña, Spain

SBAC-PAD Oct-2014



A. Ferreron {ferreron@unizar.es} - SBAC-PAD'14

Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages

27 / 1

< D > < A > < B > < B >