Network Processing Forum Hardware Working Group



# Look-Aside (LA-1) Interface Implementation Agreement

April 15, 2004 Revision 1.1

Harmeet Bhugra LA-1 Editor (Revision 1.1)

IDT, Inc. 2975 Stender Way Santa Clara, CA 95054 USA Phone: 408-330-1838 Work Email: <u>harmeet.bhugra@idt.com</u> Base Email: harmeet@ieee.org **David Chapman** LA-1 Editor (Revision 1.0)

GSI Technology 4131 Spicewood Springs Rd., F-2 Austin, TX 78759 USA Phone: 512-345-6435 Work Email: <u>dchapman@gsitechnology.com</u>

Copyright © 2002 The Network Processing Forum

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction other than the following, (1) the above copyright notice and this paragraph must be included on all such copies and derivative works, and (2) this document itself may not be modified in any way, such as by removing the copyright notice or references to the NPF, except as needed for the purpose of developing NPF Implementation Agreements.

By downloading, copying, or using this document in any manner, the user consents to the terms and conditions of this notice. Unless the terms and conditions of this notice are breached by the user, the limited permissions granted above are perpetual and will not be revoked by the NPF or its successors or assigns.

THIS DOCUMENT AND THE INFORMATION CONTAINED HEREIN IS PROVIDED ON AN "AS IS" BASIS WITHOUT ANY WARRANTY OF ANY KIND. THE INFORMATION, CONCLUSIONS AND OPINIONS CONTAINED IN THE DOCUMENT ARE THOSE OF THE AUTHORS, AND NOT THOSE OF NPF. THE NPF DOES NOT WARRANT THE INFORMATION IN THIS DOCUMENT IS ACCURATE OR CORRECT. THE NPF DISCLAIMS ALL WARRANTIES, WHETHER EXPRESS, IMPLIED OR STATUTORY, INCLUDING BUT NOT LIMITED THE IMPLIED LIMITED WARRANTIES OF MERCHANTABILITY, TITLE OR FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS.

For additional information contact:

The Network Processing Forum, 39355 California Street, Suite 307, Fremont, CA 94538 +1 510 608-5990 phone + info@npforum.org

# **Table of Contents**

| 1. Scope and Purpose                                                                                                                                                 |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 2. Normative References                                                                                                                                              |
| 3. Acronyms and Abbreviations4                                                                                                                                       |
| 4. LA-1 Interface Overview4                                                                                                                                          |
| 5. LA-1 Signal Descriptions4                                                                                                                                         |
| 6. Port Operation Overview7                                                                                                                                          |
| 6.1. Data Alignment and Organization                                                                                                                                 |
| 7. Port Specifications                                                                                                                                               |
| 7.1. Write Operations87.2. Read Operations97.3. Byte Write Control97.4. AC Electrical Characteristics117.5. AC Test Conditions137.6. DC Electrical Characteristics13 |
| 8. Logical Interface13                                                                                                                                               |
| Appendix: Informative Annexes14                                                                                                                                      |
| Annex 1: Examples of Logical Layer Interfaces                                                                                                                        |
| A1.1 Search Structure for CAMs/Classifiers                                                                                                                           |
| Annex 2: LA-1 Port Depth Expansion                                                                                                                                   |
| Annex 3: Relationships Between C Clock Referenced Timing and Echo Clock Referenced<br>Timing                                                                         |
| A3.1 Overview                                                                                                                                                        |
| A3.2 C Clock Referenced Timing                                                                                                                                       |
| A.3.4 Explanation of Three SRAM Modes                                                                                                                                |

# Table of Figures

| Figure 1: System Block Diagram                                                         | 3  |
|----------------------------------------------------------------------------------------|----|
| Figure 2: LA-1 Interface Bus Signals                                                   |    |
| Figure 3: Simplified Timing Diagram for LA-1 Port Operation                            | 8  |
| Figure 4: LA-1 Port Read and Write Timing Diagram                                      | 12 |
| Figure 5: Output Load Equivalent                                                       | 13 |
| Figure 6: Depth Expansion with Programmable Port Enable Inputs                         | 15 |
| Figure 7: Depth Expansion without Enable Inputs                                        | 16 |
| Figure 8: Depth Expansion without Enable Inputs Using Only Last Echo Clock             | 17 |
| Figure 9: Output Timing Control Details                                                | 19 |
| Figure 10: Timing Relationship Between Type B Memory Model and Types A,C Memory Models | 20 |
| Figure 11: Example of Echo Clock Timings for 167 MHz Speed Bin                         | 21 |

# 1. Scope and Purpose

This document describes the Look Aside Interface (LA-1) Implementation Agreement. The lookaside interface is intended for devices located adjacent to a network processing device (NPU) that off load certain tasks from the network processor. The Streaming interface (NPF2001.121.xx) addresses processing in the data path that is complementary to the Look Aside interface. LA-1 is a memory mapped interface.





The LA-1 interface is targeted to support the lookup requirements for OC-48 through OC-192 line rates. The minimum performance specification for lookup-based coprocessors is four lookup operations at OC-48 or one lookup operation at OC-192. Packet count assumptions are for line rate performance using 40 byte packets and 144-bit search keys.

## 2. Normative References

The following documents contain provisions, which through reference in this text constitute provisions of this specification. At the time of publication, the editions indicated were valid. All referenced documents are subject to revision, and parties to agreements based on this specification are encouraged to investigate the possibility of applying the most recent editions of the standards indicated below.

• JESD8-6 – High Speed Transceiver Logic (HSTL Class II). A 1.5V Output Buffer Supply Voltage Based Interface Standard for Digital Integrated Circuits. See http://www.jedec.org/

## 3. Acronyms and Abbreviations

The following acronyms and abbreviations are used in this specification:

- DDR Double Data Rate
- HSTL High Speed Transceiver Logic
- LA-1 Network Processor Forum 1<sup>st</sup> generation look-aside interface
- # Denotes an active low signal (e.g. W# for Write-bar)

## 4. LA-1 Interface Overview

The LA-1 Interface is based on a Separate I/O DDR SRAM style interface. Although the LA-1 interface is modeled after an SRAM interface and imposes particular requirements upon SRAMs that implement an LA-1 compliant interface port, certain allowances are present to accommodate the needs of coprocessors and other devices implementing the LA-1 interface (generally referred to hereafter as "coprocessors"). When used on a coprocessor, the LA-1 interface allows the minimum latency between writing and reading to the same address to be different. Although coprocessor is required to respond to a read command (at the pin interface) as described in the following timing diagrams, the interface may be used with coprocessors that can return out-of-order results. A coprocessor is not required to guarantee that the results of any write operation are available within 1 cycle.

The LA-1 interface features include:

- Concurrent read and write operation
- Unidirectional read and write interfaces
- Single address bus
- 18 pin DDR data output path transfers 32 bits + 4 bits of even byte-parity per read.
- 18 pin DDR data input path transfers 32 bits + 4 bits of even byte-parity per write
- Byte write control

# 5. LA-1 Signal Descriptions

The LA-1 interface transfers information between an NPU and memory or coprocessor. One LA-1 port includes clock, address and control pins plus 16 data + 2 parity pins for write operations and 16 data + 2 parity pins for reads.

Figure 2 shows the bus signals for the LA-1 Interface.





\* = Optional on Slave † = Optional on Host

Table 1 describes the LA-1 interface signals. Table 1 is meant to describe the functionality not the timing. The timing information for each signal is described elsewhere in the document.

#### Table 1 LA-1 Interface Signals

| Signal           | Но  |     | Sla |     | Description                                                                                                                                                                                                                                                                                                                                              |
|------------------|-----|-----|-----|-----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| K                | Std | I/O | Std | I/O | Clock inputs for LA-1 interface inputs. Rising-edge active. The                                                                                                                                                                                                                                                                                          |
| K#               | Req | Out | Req | In  | rising edge of K is used to latch address and control inputs. The rising edge of K and K# are used to latch data. K# is ideally 180 degrees out of phase with K.                                                                                                                                                                                         |
| C<br>C#          | Opt | Out | Req | In  | Clock inputs for LA-1 interface outputs. Rising-edge active. It<br>provides a controlled means of tuning device output data. The<br>rising edge of C is used as the output timing reference for first<br>output data. The rising edge of C# is used as the output reference<br>for second output data. C# is ideally 180 degrees out of phase with<br>C. |
| A<br>[n:2]       | Req | Out | Opt | In  | Byte address pins. User has no access to A0 and A1.<br>Slave devices may implement zero or more address inputs.<br>For host devices 23 = n </= 29.</td                                                                                                                                                                                                   |
| E<br>[1:0]       | Req | Out | Opt | In  | Port enable. For host devices, functionally equivalent to address<br>outputs. Slave devices shall implement both or none. Enable inputs<br>are sampled at the same times as address inputs.                                                                                                                                                              |
| EP<br>[1:0]      | N/A | N/A | Opt | In  | Port Enable programming inputs. EP pins determine the sense corresponding E pins. (e.g. EP1 tied high results in E1 being active high). Required if port enable pins are implemented.                                                                                                                                                                    |
| R#               | Req | Out | Req | In  | Active-low read select input. When low, this input causes the address input to be registered and a read cycle to be initiated.                                                                                                                                                                                                                           |
| W#               | Req | Out | Req | In  | Active-low write select input. When low, this input causes the address input to be registered and a write cycle to be initiated.                                                                                                                                                                                                                         |
| BW#<br>[0:1]     | Req | Out | Opt | In  | Active-low byte-write inputs. Used to enable or block write of a specific byte a write cycle initiated with W#. BW0# controls D [0:7] and DP 0, while BW1# controls D [8:15] and DP 1.                                                                                                                                                                   |
| D<br>[0:15]      | Req | Out | Req | In  | Synchronous data inputs. This bus operates in response to W# commands                                                                                                                                                                                                                                                                                    |
| DP<br>[0:1]      | Req | Out | Req | In  | Synchronous even parity inputs. Shall be stored and checked for<br>the slave and retrieved and generated for the host. Correct when<br>Exclusive-OR of D [7:0] = DP 0, and Exclusive-OR of D [15:8] = DP<br>1. Error handling of incorrect parity is beyond the scope of this<br>document.                                                               |
| Q<br>[0:15]      | Req | In  | Req | Out | Synchronous data outputs. Output data is synchronized to the respective C and C#. This bus operates in response to R# commands.                                                                                                                                                                                                                          |
| QP<br>[0:1]      | Req | In  | Req | Out | Synchronous even parity outputs. Shall be stored and checked for<br>the host and retrieved and generated for the slave. Correct when<br>Exclusive-OR of Q [7:0] = QP0, and Exclusive-OR of Q [15:8] =<br>QP1. The ability to send correct parity is mandatory.                                                                                           |
| CQ<br>CQ#        | Req | In  | Opt | Out | Echo clock and echo clock-bar outputs. Edge aligned with data out transitions. CQ's track data changes caused by rising edges of C or K. CQ# 's track data changes caused by rising edges of C# or K#.                                                                                                                                                   |
| V <sub>REF</sub> | Req | In  | Req | In  | HSTL input reference voltage, it provides a reference voltage for<br>the HSTL input buffer trip point. It is nominally VDD/2 but it can be<br>adjusted to improve system noise margin.                                                                                                                                                                   |

Notes:

Std (Standard) column entries: Req = Required, Opt = Optional, N/A = Not applicable
 I/O (Signal direction) column entries: In = Input, Out = Output

# 6. Port Operation Overview

The LA-1 interface ports follow a few simple rules:

- Control inputs are always captured on the rising edge of K clock.
- Address and data are captured on the rising edges of K and K# clocks.
- Read or Write data transfers in progress may not be interrupted and re-started.

## 6.1. Data Alignment and Organization

Table 2: Control and Data Pin Names Vs. 32 Bit Write Data Alignment

| Byte Enable       | BW 1#                                  | BW 0#                                  |
|-------------------|----------------------------------------|----------------------------------------|
| Data Input pins   | D [15:8]                               | D[7:0]                                 |
| Parity Input pins | DP 1                                   | DP 0                                   |
| к↑                | Byte 0<br>Bits [31:24]<br>Parity Bit 0 | Byte 1<br>Bits [23:16]<br>Parity Bit 1 |
| <b>ĸ</b> #↑       | Byte 2<br>Bits [15:8]<br>Parity Bit 2  | Byte 3<br>Bits [7:0]<br>Parity Bit 3   |

Table 3: Data Output Pin Names Vs. 32 Bit Read Data Alignment

| Data Output pins   | Q [15:8]                               | Q [7:0]                                |
|--------------------|----------------------------------------|----------------------------------------|
| Parity Output pins | QP 1                                   | QP 0                                   |
| к↑                 | Byte 0<br>Bits [31:24]<br>Parity Bit 0 | Byte 1<br>Bits [23:16]<br>Parity Bit 1 |
| К#↑                | Byte 2<br>Bits [15:8]<br>Parity Bit 2  | Byte 3<br>Bits [7:0]<br>Parity Bit 3   |

## 6.2. Output Register Control (Slave Device Feature)

LA-1 ports offer two mechanisms for controlling the output data registers. Typically control is handled by the output register clock inputs C and C#. The output register clock inputs can be used to make small phase adjustments in the clocking of the output registers by allowing the user to delay driving data out as much as a few nanoseconds beyond the next rising edges of the K and K# clocks. If the C and C# clock inputs are tied high, the device reverts to K and K# control of the outputs, allowing the device to function as a conventional pipelined read device.

### 6.3. Echo Clocks (Optional on Slave Devices)

LA-1 ports may feature echo clocks, CQ and CQ#, that track the performance of the output drivers. The echo clocks are delayed copies of the output register clocks, C and C#. Echo clocks are designed to track changes in output driver delays due to variance in die temperature and supply voltage. The echo clocks are designed to transition with the rest of the data output drivers. Slave device echo clock output CQ tracks C (or K if C and C# are tied high) and echo clock output CQ# tracks C# (or K# if C and C# are tied high).

Echo Clocks are always active even if either or both of the read and/or write ports are deselected.

# 7. Port Specifications



Figure 3: Simplified Timing Diagram for LA-1 Port Operation

### 7.1. Write Operations

A write cycle is initiated by asserting W# low at K rising edge. The address for the Write cycle is provided at the following K# rising edge. At the same cycle, data is expected at the rising edge of K and K#. See Figure 3 for details.

In the case of an SRAM, a read can immediately follow a write even if they are to the same address. In the case of a coprocessor, there is a minimum latency between a write and a read to the same address that is dependent on the architecture of the coprocessor. Latency bounds on this operation are out of scope of this specification.

| Control     | inputs          |                              |           |                            |
|-------------|-----------------|------------------------------|-----------|----------------------------|
| K ↑<br>(tn) | K#↑<br>(tn + ½) | Input next state<br>(tn + ½) | D<br>(tn) | D<br>(t <sub>n + ½</sub> ) |
| W#          | E               |                              |           |                            |
| Х           | F               | Deselect                     | Х         | Х                          |
| 1           | Т               | Deselect                     | Х         | Х                          |
| 0           | Т               | Write                        | D0        | D1                         |

Notes:

1. X = Don't Care, H = High, L = Low. E = T (True) if E1 and E0 are evaluated true on the rising edge of K#.

2. W# is evaluated on the rising edge of K.

3. D0 and D1 are the first and second data input transfers in a write.

### 7.2. Read Operations

A read cycle is initiated by asserting R# low at K rising edge and the read address is presented on A. Data is delivered after the next rising edge of K using C and C# as the output timing references. See Figure 3 for details.

| Control inputs |    |                   |                       |                        |  |
|----------------|----|-------------------|-----------------------|------------------------|--|
| κ↑             |    | Output next state | Q                     | Q                      |  |
| (tn)           |    | (tn)              | (t <sub>n + 1</sub> ) | (t <sub>n + 1½</sub> ) |  |
| E              | R# |                   |                       |                        |  |
| F              | Х  | Deselect          | Hi-Z                  | Hi-Z                   |  |
| Т              | 1  | Deselect          | Hi-Z                  | Hi-Z                   |  |
| Т              | 0  | Read              | Q0                    | Q1                     |  |

#### Table 5: LA-1 Port Read Truth Table

Notes:

1. X = Don't care, 1 = High, 0 = Low. E = T (true) if E1 and E0 are evaluated true on the rising edge of K.

2. R# is evaluated on the rising edge of K.

3. Q0 and Q1 are the first and second data output transfers in a read.

### 7.3. Byte Write Control

Each write command and write address loaded provides the base address for a 2-beat data transfer so 32 data bits plus four even byte parity bits are written for each address loaded. If the device supports byte granularity then the following is implemented.

#### Table 6: LA-1 Port Byte Write Truth Table<sup>1</sup>

| Operation                                  | K   | K#  | BW1# | BW0# |
|--------------------------------------------|-----|-----|------|------|
| Write D [15:0], DP [1:0] at K rising edge  | L→H |     | 0    | 0    |
| Write D [15:0], DP [1:0] at K# rising edge |     | L→H | 0    | 0    |
| Write D [15:8], DP [1] at K rising edge    | L→H |     | 0    | 1    |
| Write D [15:8], DP [1] at K# rising edge   |     | L→H | 0    | 1    |
| Write D [7:0], DP [0] at K rising edge     | L→H |     | 1    | 0    |
| Write D [7:0], DP [0] at K# rising edge    |     | L→H | 1    | 0    |
| Write nothing at K rising edge             | L→H |     | 1    | 1    |
| Write nothing at K# rising edge            |     | L→H | 1    | 1    |

Note:

1. Assumes a write cycle was initiated via W# low. BW0# and BW1# are sampled at data in times and can be altered for any portion of the burst write operation provided that input setup and hold requirements are satisfied.

#### Table 7: Example Write Sequence Using Byte Write Enables

| Data In Sample Time | BW1# | BW0# | D [15:8],<br>DP [1] | D [7:0],<br>DP [0] |
|---------------------|------|------|---------------------|--------------------|
| κ↑                  | 0    | 1    | Data in             | Don't care         |
| K# ↑                | 1    | 0    | Don't care          | Data in            |

#### Table 8: Resulting Write Operation

| Byte 0    | Byte 1    | Byte 2    | Byte 3   |
|-----------|-----------|-----------|----------|
| D [15:8], | D [7:0],  | D [15:8], | D [7:0], |
| DP [1]    | DP [0]    | DP [1]    | DP [0]   |
| Written   | Unchanged | Unchanged | Written  |

## 7.4. AC Electrical Characteristics

| C: ma               | Description                               | 133 MHz 16 |         | 167      | MHz     | 200       | MHz     | Units | Notes |
|---------------------|-------------------------------------------|------------|---------|----------|---------|-----------|---------|-------|-------|
| Sym                 | •                                         |            | Max     | Min ·    | Max     | Min ·     | - Max   | Units | Notes |
| t <sub>KHKL</sub>   | Clock high time                           | 3          |         | 2.4      |         | 2         |         | ns    |       |
| t <sub>KLKH</sub>   | Clock low time                            | 3          |         | 2.4      |         | 2         |         | ns    |       |
| t <sub>KHK# H</sub> | K, C to K#, C#                            | 3.4        |         | 2.7      |         | 2.3       |         | ns    |       |
| t <sub>K#HKH</sub>  | K#, C# to K, C                            | 3.4        |         | 2.7      |         | 2.3       |         | ns    |       |
| t <sub>KHCH</sub>   | Clock to data clock                       | 0          | 2.5     | 0        | 2.0     | 0         | 1.6     | ns    |       |
| t <sub>KHKH</sub>   | Clock Cycle Time                          | 7.5        | 8       | 6        | 7.5     | 5.0       | 6       | ns    |       |
| t <sub>AVKH</sub>   | Address input valid to clock<br>high      | 0.8        |         | 0.7      |         | 0.6       |         | ns    | 2     |
| t <sub>DVKH</sub>   | Data input valid to clock high            | 0.8        |         | 0.7      |         | 0.6       |         | ns    | 2     |
| t <sub>IVKH</sub>   | Control input valid to clock<br>high      | 0.8        |         | 0.7      |         | 0.6       |         | ns    | 2, 7  |
| t <sub>KHAX</sub>   | Clock high to address hold                | 0.8        |         | 0.7      |         | 0.6       |         | ns    | 2     |
| t <sub>KHDX</sub>   | Clock high to data input hold             | 0.8        |         | 0.7      |         | 0.6       |         | ns    | 2     |
| t <sub>ĸнıx</sub>   | Clock high to control input<br>hold       | 0.8        |         | 0.7      |         | 0.6       |         | ns    | 2, 7  |
| t <sub>CHQX1</sub>  | C high to output data low-z               | 1.2        |         | 1.2      |         | 1.2       |         | ns    | 3, 4  |
| t <sub>CHQZ</sub>   |                                           |            | 3.0     |          | 2.5     |           | 2.3     | ns    | 3, 4  |
|                     | parameters for C clock reference          | d (non-    | echo cl | ock refe | erenced | ) read o  | peratio | ns    |       |
| t <sub>CHQV</sub>   | C, C# High to Data Output<br>Valid        |            | 3.0     |          | 2.5     |           | 2.3     | ns    | 3, 5  |
| t <sub>CHQX</sub>   | C, C# High to Data Output<br>Hold         | 1.2        |         | 1.2      |         | 1.2       |         | ns    | 3, 5  |
| Timing p            | parameters for echo clock refere          | nced re    | ad oper | rations  |         |           |         |       |       |
| t <sub>CHCQ#H</sub> | C High to CQ# High                        | 1.2        |         | 1.2      |         | 1.2       |         | ns    | 3, 6  |
| t <sub>C#HCQH</sub> | C # High to CQ High                       | 1.2        |         | 1.2      |         | 1.2       |         | ns    | 3, 6  |
| t <sub>CQHQV</sub>  | Echo Clock High to Data<br>Output Valid   |            | 0.4     |          | 0.4     |           | 0.38    | ns    | 3, 6  |
| t <sub>CQHQX</sub>  | Echo Clock High to Data<br>Output Invalid | -0.4       |         | -0.4     |         | -<br>0.38 |         | ns    | 3, 6  |
| t <sub>CQ#HCH</sub> | CQ# High to C High                        | 3          |         | 2.3      |         | 1.9       |         | ns    | 3, 6  |
| t <sub>CQHC#H</sub> | CQ High to C# High                        | 3          |         | 2.3      |         | 1.9       |         | ns    | 3, 6  |

## Table 9: LA-1 Slave Port AC Electrical Characteristics Note 8

Note:

- 1. Test conditions as specified with the output loading as shown in the AC test load diagram unless otherwise noted.
- 2. Control input signals may not be operated with pulse widths less than tKHKL (Min).
- 3. If C, C# are tied high, K and K# become the references for C and C# timing parameters.
- 4. Transition is measured  $\pm$  100mV from steady state voltage.
- 5. These parameters take precedence for slave ports implemented without echo clocks.
- 6. These parameters take precedence for slave ports implemented with echo clocks.
- 7. Applies to R#, W#, BW# and E (if E is implemented).
- 8. Compliant products must support one or more of these speed bin specifications.



#### Figure 4: LA-1 Port Read and Write Timing Diagram

## 7.5. AC Test Conditions

#### Figure 5: Output Load Equivalent

| Input Pulse Level             | 0.25 to 1.25V |
|-------------------------------|---------------|
| Input Rise/Fall Times         | 0.3ns         |
| Input Timing Reference Level  | 0.75V         |
| Output Timing Reference Level | (VDDQ/2)      |



## 7.6. DC Electrical Characteristics

#### Table 10: DC Electrical Characteristics

| Symbol           | Description                  | Min.                   | Max.                   | Units | Notes |
|------------------|------------------------------|------------------------|------------------------|-------|-------|
| VIH              | Input high (logic 1) voltage | V <sub>REF</sub> + 0.1 | $V_{DDQ} + 0.3$        | V     |       |
| VIL              | Input low (logic 0) voltage  | -0.3                   | V <sub>REF</sub> - 0.1 | V     |       |
| V <sub>OH</sub>  | Output high voltage          | $V_{DDQ} - 0.2$        | V <sub>DDQ</sub>       | V     | 1     |
| V <sub>OL</sub>  | Output low voltage           | V <sub>SS</sub>        | 0.2                    | V     | 2     |
| V <sub>DDQ</sub> | Output buffer supply voltage | 1.4                    | 1.9                    | V     |       |
| V <sub>REF</sub> | Input reference voltage      | 0.68                   | 0.95                   | V     |       |

# 8. Logical Interface

The LA-1 interface uses an SRAM style memory mapped structure. Address pins, as needed, are used to address logical registers on the device. The NPU uses register style read and write operations to initiate coprocessor actions, retrieve results, and optionally provide in band management. A memory mapped logical layer provides a flexible and minimum weight interface for coprocessor applications. Therefore, coprocessor architectures may provide differentiation and innovation because the logical layer does not excessively limit the designer.

Further definition of the LA-1 logical interface beyond the above memory architecture is out of scope for this implementation agreement.

# Appendix: Informative Annexes

# Annex 1: Examples of Logical Layer Interfaces

The LA-1 logical interface is fundamentally an SRAM memory mapped interface structure. The LA-1 interface uses an address bus to provide the NPU with the ability to control coprocessor functions. The LA-1 interface definition provides a minimal logical layer definition. This choice is intentional so that coprocessor architectures are not unduly restricted from providing innovative solutions.

### A1.1 Search Structure for CAMs/Classifiers

The LA-1 interface associates search key and results as a logical register set called a context. Note that addressing assignment and utilization is specific to the coprocessor implementation.

A search may be initiated when the NPU writes the key into a logical search register. Alternately, a search may be initiated by writing to a special control address or location. Valid search results return to register(s) associated with the search context. A result valid bit(s) provides the NPU with the ability to verify successful completion of the search.

The NPU associates threads with coprocessor context registers. Valid thread to context mappings can include one-to-one, one-to-many, and many-to-one. Coprocessor vendors may choose any mapping structure provided the interface remains memory mapped. The bullets below provide a few examples of the many valid and reasonable design choices.

- A coprocessor could map a single thread to provide a range of search or configuration options.
- A coprocessor could map four threads to a single address space for a FIFO oriented design. Search results could be tagged with control information.

### A1.2 Control Structure

LA-1 control operations should be accessible in band via the LA-1 interface. In band control must be provided via memory mapped registers. Specific designs may support context registers per context, globally, or both as a means to provide control functions. Control registers may be accessed via other out of band interfaces.

# Annex 2: LA-1 Port Depth Expansion

LA-1 Separate I/O Ports may implement chip enable inputs, E0 and E1. The sense of the inputs, whether they function as active low or active high inputs, may be determined by the state of the programming inputs, EP0 and EP1. For example, if EP1 is held at  $V_{DD}$ , E1 functions as an active high enable. If EP1 is held to  $V_{SS}$ , E1 functions as an active low chip enable input.

Programmability of two enable inputs (E1 and E2) would allow four banks of depth expansion to be accomplished with no additional logic. By programming the enable inputs of four LA-1 ports in binary sequence (00, 01, 10, 11) and driving the enable inputs with two address outputs, four LA-1 ports can be made to look like one port with a larger address space to the system.



#### Figure 6: Depth Expansion with Programmable Port Enable Inputs

Note: For simplicity  $\mathsf{BW\#}$  ,  $\mathsf{K\#}$  ,  $\mathsf{C\#}$  and  $\mathsf{CQ\#}$  are not shown.

#### Example Bank Enable Truth Table

|        | EP1 | EP0 | E0          | E1          |
|--------|-----|-----|-------------|-------------|
| Bank 0 | VSS | VSS | Active low  | Active low  |
| Bank 1 | VSS | VDD | Active low  | Active high |
| Bank 2 | VDD | VSS | Active high | Active low  |
| Bank 3 | VDD | VDD | Active high | Active high |





Note: For simplicity BW# , K# , C# and CQ# are not shown.



Figure 8: Depth Expansion without Enable Inputs Using Only Last Echo Clock

Note: For simplicity BW# , K# , C# and CQ# are not shown.

## Annex 3: Relationships Between C Clock Referenced Timing and Echo Clock Referenced Timing

### A3.1 Overview

The LA-1 specification provides for three different possible implementations for the data output control on slave devices. First, a slave device may or may not provide echo clock outputs. If it does not implement echo clock outputs, it is presumed that output data is controlled by and referenced to the C and C# clocks (or if the C clocks are both tied high, by the K clocks). The timing requirements for this implementation are found in the LA-1 specification labelled "C clock referenced read operations".

On the other hand, if the slave implements echo clocks, data valid times are referenced to echo clock edges and the specifications are described in the "echo clock referenced read timing" section of the specification. This case represents two of the three output timing control options. Slave device vendors who implement echo clocks may or may not elect to use a DLL to lock the echo clocks to the C clocks. Whether or not they do bears on where the echo clocks transition with respect to the C clocks. The LA-1 specification is written to accommodate either approach.

### A3.2 C Clock Referenced Timing

The C (and C#) clock referenced timing set is shown in *italic* in Figure 9. The specification ties the data output valid event at the beginning of a two-beat data transfer to the rising edge of the C clock. Data output hold time is referenced to the next rising edge of the C# clock and the data valid of the second beat is referenced to the same C# edge. Finally, the data output hold time of the second beat of the transfer is referenced to the next rising edge of the C clock.

### A3.3 Echo Clock (CQ) Referenced Timing

The CQ (and CQ#) clock referenced timing set is shown in **bold** in Figure 9. The LA-1 interface specification is written with enough latitude to allow output control implementations that utilize a DLL and implementations that do not utilize a DLL. The LA-1 specification requires that a slave device produce CQ# some minimum delay after the rising edge of a C clock and some minimum time before the next rising edge of the C clock. The inverse is also true, meaning the CQ rising edge must be produced between a C# rising edge and the next C# rising edge. In a DLL based implementation, one would expect the CQ# clock to be locked to the rising edge of C#. In a non-DLL implementation, one would expect the CQ# clock to be produced in response to the rising edge of the C clock, just as the data outputs, Q, are produced in response to a rising edge of C. Inversely, the second beat of data out and its associated echo clock, CQ, are produced by the rising edge of C#. It should be noted that CQ referenced specs add a minimum operating frequency limit to accommodate the needs of a DLL.

The tCHCQ#H and tCQ#HCH specifications, taken together, bracket the allowable window within which the CQ# clock must rise. Inversely, the tC#HCQH and tCQHC#H specifications, taken together, bracket the allowable window within which the CQ clock must rise. A host device intended to use echo clock referenced timing must accommodate echo clock variation over this range.



#### Figure 9: Output Timing Control Details

### A.3.4 Explanation of Three SRAM Modes

In defining the AC timings of the LA-1 Interface, the specification accommodates three types of SRAM models:

- A) An SRAM without echo clocks (for example  $QDR^{TM}$ -I),
- B) An SRAM with DLL based echo clocks (for example QDR<sup>™</sup>-II), and\_
- C) An SRAM without DLL based echo clocks (for example SigmaRAM<sup>™</sup>).

When a DLL is used to lock the echo clocks to the input clocks C, C# or K, K#, the output data  $Q_B$  (for memory model type B) is tightly coupled to the input clocks. This incurs an extra  $\frac{1}{2}$  clock cycle of input clock to data output latency difference between the memory model B and memory models A and C. Figure 10 illustrates this difference.

The read parameters for memory model A are covered by section "Timing parameters for C clock referenced (non-echo clock referenced) read operations" in Table 10 on page 13.\





QDR<sup>™</sup> – Quad Data Rate (Trademark of Cypress, IDT, Micron , NEC and Samsung) SigmaRAM<sup>™</sup> – Trademark of GSI Technologies, Sony, Toshiba, Mitsubishi, ISSI, Alliance Semiconductor The read parameters for memory models B and C are covered by section "Timing parameters for echo clock referenced read operations" in Table 10 on page 13. As described in Annex A1.3, the new parameters  $t_{CHCQ\#H}$  and  $t_{CQ\#HCH}$  allowed the read timings for memory models B and C to be combined.

Memory model B defines the minimum values for the parameters  $t_{CQ\#HCH}$  and  $t_{CQHC\#H}$  whereas memory model C sets the parameters  $t_{CHCQ\#H}$  and  $t_{C\#HCQH}$ . Figure 11 illustrates how the values for the new parameters that are mentioned above were derived.



Figure 11: Example of Echo Clock Timings for 167 MHz Speed Bin

Referring to Table 10 on page 13, and Figure 11, one can see that the  $t_{CHCQ\#H}$  value set by memory model C is 1.2ns minimum. For memory model B, it turns out to be 2.3ns. But, we have defined only the minimum value and not the max bound. Hence memory model B meets the spec for this parameter. The same logic can be applied for parameter  $t_{C\#HCQH}$ .

Similarly, the  $t_{CQ\#HCH}$  value set by memory model B is a minimum of 2.3ns. For memory model C, it turns out to be 3.5ns. But since we have specified only the minimum value, memory model C meets the spec for this parameter. The same logic can be applied for parameter  $t_{CQHC\#H}$ .

In reality, designers may not design such a relaxed window within which the echo clocks should be valid relative to the input clocks. Instead, they may choose to implement either memory model B or model C.

It should also be noted that in memory model C, the echo clocks have inverse relationship with the input clocks i.e.  $CQ_C$  is derived from input clock C# and  $CQ\#_C$  is derived from input clock C.