8.28. tx_scheduler_rr

8.28.1. Operation

The default transmit scheduler used in the Corundum NIC is a simple round-robin scheduler implemented in the tx_scheduler_rr module. The scheduler sends commands to the transmit engine to initiate transmit operations out of the NIC transmit queues. The round-robin scheduler contains basic queue state for all queues, a FIFO to store currently-active queues and enforce the round-robin schedule, and an operation table to track in-process transmit operations.

Similar to the queue management logic, the round-robin transmit scheduler also stores queue state information in BRAM or URAM on the FPGA so that it can scale to support a large number of queues. The transmit scheduler also uses a processing pipeline to hide the memory access latency.

The transmit scheduler module has four main interfaces: an AXI lite register interface and three streaming interfaces. The AXI lite interface permits the driver to change scheduler parameters and enable/disable queues. The first streaming interface provides doorbell events from the queue management logic when the driver enqueues packets for transmission. The second streaming interface carries transmit commands generated by the scheduler to the transmit engine. Each command consists of a queue index to transmit from, along with a tag for tracking in-process operations. The final streaming interface returns transmit operation status information back to the scheduler. The status information informs the scheduler of the length of the transmitted packet, or if the transmit operation failed due to an empty or disabled queue.

The transmit scheduler module can be extended or replaced to implement arbitrary scheduling algorithms. This enables Corundum to be used as a platform to evaluate experimental scheduling algorithms. It is also possible to provide additional inputs to the transmit scheduler module, including feedback from the receive path, which can be used to implement new protocols and congestion control techniques. Connecting the scheduler to the PTP hardware clock can be used to support TDMA, which can be used to implement circuit-switched architectures.

The structure of the transmit scheduler logic is similar to the queue management logic in that it stores queue state in BRAM or URAM and uses a processing pipeline. However there are a number of significant differences. First, the scheduler logic is designed so that the scheduler does not stall when a queue is empty and a subsequent dequeue operation fails. Second, the scheduler contains a FIFO to enforce the round-robin schedule. The use of this FIFO requires an explicit reset routine to make the internal state (namely the scheduled flag bits) consistent after a reset. Third, the scheduler also contains logic to track the active state of each queue based on incoming doorbell requests and dequeue failures.

../_images/corundum_tx_scheduler_block.svg

Fig. 8.4 Block diagram of the transmit scheduler module, showing queue state RAM and operation table. Ind = index, En = queue enable, GE = global enable, SE = schedule enable, Act = active, Sch = scheduled, QI = queue index, DB = doorbell, H = head, N = next, P = previous

A block diagram of the transmit scheduler module is shown in Fig. 8.4. The transmit scheduler is built around a scheduled queue FIFO. This FIFO stores the indices of the currently-scheduled queues. An active queue is one that is presumed to have at least one packet available for transmission, an enabled queue is one that has been enabled for transmission, and a scheduled queue is one that has an entry in the scheduler FIFO. A queue will be scheduled (marked as scheduled and inserted into the FIFO) if it is both active and enabled. A queue will be descheduled when it reaches the front of the schedule FIFO, but is not enabled or not active. Queue enable states are controlled via three different enable bits per queue: queue enable, global enable, and schedule enable. The queue enable and global enable bits are writable via AXI lite, while the schedule enable bit is controlled from the scheduler control module via an internal interface. A queue is enabled when the queue enable bit and either the global enable or schedule enable bits are set. Queues become active when doorbell events are received, and queues become inactive when a transmit request fails due to an empty queue.

Tracking the queue active states must be done carefully for several reasons. First, the driver can update the producer pointer after enqueuing more than one packet, so the number of generated doorbell events does not necessarily correspond to the number of packets that were enqueued. Second, because the queues are shared among all ports on the same interface, multiple ports can attempt to send packets from the same queue, and the port transmit schedulers have no visibility into what the other schedulers are doing. Therefore, the most reliable method for determining that a queue is empty is to try sending from it, and flagging the failure. Note that the cost of an error is much higher when the queue is active than when the queue is empty. Attempting to send from an empty queue costs a few clock cycles and temporarily occupies a few slots in corresponding operation tables. However, assuming a queue is empty when it is not will result in packets getting stuck in the queue. Fixing this stuck queue will not occur until the OS sends another packet on that queue and triggers another doorbell. Therefore, it is imperative to properly track doorbell events during transmit operations, as it is possible for a doorbell event to arrive after a dequeue attempt has failed, but before the failed transmit status arrives at the transmit scheduler module.

The pipeline in the transmit scheduler supports seven different operations: initialize, register read, register write, handle doorbell, transmit complete, scheduler control, and transmit request. The initialize operation is used to ensure the scheduler state is consistent after a reset. Register access operations over an AXI lite interface enable the driver to read all of the per-queue state and set the queue enable and global enable bits. The pipeline also handles incoming doorbell requests from the transmit queue manager module as well as queue enable/disable requests from the scheduler control module. Finally, the transmit request and transmit complete operations are used to generate transmit requests and handle the necessary queue state updates when the transmit operations complete.

Queues can become scheduled based on a register write that enables an active queue, a doorbell that activates an enabled queue, a scheduler operation that enables an active queue, and a transmit completion on an enabled queue that is either successful or has the doorbell bit set in the operation table. Queues can only be descheduled when the queue index advances to the front of the scheduler FIFO. If this occurs when the queue is both active and enabled, then the queue can be rescheduled and a transmit request generated. When the transmit operation completes, the transmit status response will be temporarily stored in a small FIFO and then processed by the pipeline to update the corresponding operation table entry and, if necessary, reschedule the queue.

The operation table tracks in-process transmit operations. Entries in the table consist of an active flag, the queue index, a doorbell flag, a head flag, a next pointer, and a previous pointer. The next and previous pointers form a linked list, enabling entries to be removed in any order while preserving the doorbell flag in the table. This prevents doorbells from getting ‘lost’ and the queue being mistakenly marked as inactive. A separate linked list is formed for each queue with active transmit operations. The operation table is implemented in such a way that it fits in distributed RAM.

8.28.2. Parameters

AXIL_DATA_WIDTH

Width of AXI lite data bus in bits, default 32.

AXIL_ADDR_WIDTH

Width of AXI lite address bus in bits, default 16.

AXIL_STRB_WIDTH

Width of AXI lite wstrb (width of data bus in words), must be set to AXIL_DATA_WIDTH/8.

LEN_WIDTH

Length field width, default 16.

REQ_TAG_WIDTH

Transmit request tag field width, default 8.

OP_TABLE_SIZE

Number of outstanding operations, default 16.

QUEUE_INDEX_WIDTH

Queue index width, default 6.

PIPELINE

Pipeline setting, default 3.

8.28.3. Ports

clk

Logic clock. Most interfaces are synchronous to this clock.

Signal

Dir

Width

Description

clk

in

1

Logic clock

rst

Logic reset, active high

Signal

Dir

Width

Description

rst

in

1

Logic reset, active high

m_axis_tx_req

Transmit request output, for transmit requests to the transmit engine.

Signal

Dir

Width

Description

m_axis_tx_req_queue

out

QUEUE_INDEX_WIDTH

Queue index

m_axis_tx_req_tag

out

REQ_TAG_WIDTH

Tag

m_axis_tx_req_dest

out

AXIS_TX_DEST_WIDTH

Destination port and TC

m_axis_tx_req_valid

out

1

Valid

m_axis_tx_req_ready

in

1

Ready

s_axis_tx_req_status

Transmit request status input, for responses from the transmit engine.

Signal

Dir

Width

Description

s_axis_tx_req_status_len

in

LEN_WIDTH

Packet length

s_axis_tx_req_status_tag

in

REQ_TAG_WIDTH

Tag

s_axis_tx_req_status_valid

in

1

Valid

s_axis_doorbell

Doorbell input, for enqueue notifications from the transmit queue manager.

Signal

Dir

Width

Description

s_axis_doorbell_queue

in

QUEUE_INDEX_WIDTH

Queue index

s_axis_doorbell_valid

in

1

Valid

s_axis_sched_ctrl

Scheduler control input, to permit user logic to dynamically enable/disable queues.

Signal

Dir

Width

Description

s_axis_sched_ctrl_queue

in

QUEUE_INDEX_WIDTH

Queue index

s_axis_sched_ctrl_enable

in

1

Queue enable

s_axis_sched_ctrl_valid

in

1

Valid

s_axis_sched_ctrl_ready

out

1

Ready

s_axil

AXI-Lite slave interface. This interface provides access to memory-mapped per-queue control registers.

Signal

Dir

Width

Description

s_axil_awaddr

in

AXIL_ADDR_WIDTH

Write address

s_axil_awprot

in

3

Write protect

s_axil_awvalid

in

1

Write address valid

s_axil_awready

out

1

Write address ready

s_axil_wdata

in

AXIL_DATA_WIDTH

Write data

s_axil_wstrb

in

AXIL_STRB_WIDTH

Write data strobe

s_axil_wvalid

in

1

Write data valid

s_axil_wready

out

1

Write data ready

s_axil_bresp

out

2

Write response status

s_axil_bvalid

out

1

Write response valid

s_axil_bready

in

1

Write response ready

s_axil_araddr

in

AXIL_ADDR_WIDTH

Read address

s_axil_arprot

in

3

Read protect

s_axil_arvalid

in

1

Read address valid

s_axil_arready

out

1

Read address ready

s_axil_rdata

out

AXIL_DATA_WIDTH

Read response data

s_axil_rresp

out

2

Read response status

s_axil_rvalid

out

1

Read response valid

s_axil_rready

in

1

Read response ready

control

Control and status signals

Signal

Dir

Width

Description

enable

in

enable

Enable

active

out

enable

Active