Computer Engineering Seminar

Enabling dedicated single-cycle connections over a shared Network-on-Chip

Tushar Krishna

Wednesday, November 19, 2014
12:30pm - 2:00pm
3725 BBB

Add to Google Calendar

About the Event

In the multicore era, moving to hundreds or thousands of cores will only be possible if the interconnect between the cores does not become a performance or power bottleneck. Typical on-chip network designs, including commercial research prototypes, use multi-stage complex router pipelines at each hop, adding delay and energy to all messages. Conventional wisdom thus says that communication is expensive, and scalability is only possible if on-chip traversals are reduced to a minimum. In this talk, I will challenge this conventional wisdom. I will present network-on-chip (NoC) designs that can achieve near single-cycle traversals across the chip for both unicast and collective (1-to-Many and Many-to-1) communication flows, approaching the performance of an "ideal" but impractical all-to-all connected network. This reverses the trade-offs one typically associates with local vs. remote cache access latencies, or broadcast vs. directory-based coherence protocols. The focus of my talk will be on SMART*: a technique that enables messages to traverse multiple-hops, potentially all the way from the source to the destination, within a single-cycle, over a NoC with shared links. SMART leverages repeated wires in the datapath, which can traverse 10+ mm at a GHz frequency. I will present a network flow-control technique that allows messages to dynamically reserve multiple links (with turns) within one cycle and traverse them in the next cycle. SMART reduces average network latency by 5-8X across traffic patterns as compared to a state-of-the-art network with single-cycle routers at every hop on a 64-core chip; this translates to 27/52% full-system runtime reduction for a Private/Shared L2 design and is within 12% of that achieved by an ideal contention-free all-to-all single-cycle network. If time permits, I will also present SMART FanOut (SFO) and SMART FanIn (SFI) that demonstrate near single-cycle traversals over multicast (1-to-Many) and reduction (Many-to-1) trees respectively over a NoC. Going forward, the ideas in SMART can pave the way for locality-oblivious shared-memory design. *Single-cycle Multi-hop Asynchronous Repeated Traversal


Tushar Krishna received a PhD in Electrical Engineering and Computer Science from MIT in 2014, where he worked with Prof Li-Shiuan Peh. He has an MSE from Princeton University, and a BTech from IIT Delhi, both in Electrical Engineering. His research interests are in networks-on-chip for many core systems, heterogeneous architectures and reconfigurable computing. He is currently a researcher at the VSSAD group at Intel, Massachusetts.

Additional Information

Sponsor(s): CSE

Open to: Public