On-chip multi-processors can remove some horizontal waste. Coarse and fine-grained multithreading can remove some (or all) vertical waste. Yet, there has to be a way to have cake and eat it too. Such is the philosophy of a yet more advanced form of multithreading, called Simultaneous Multithreading (SMT).
The high concept for SMT is to have the ability to run instructions from different threads, at any given time, in any given functional unit. By rotating through threads, an SMT chip acts like a FMT processor, and by executing instructions from different threads at the same time, it acts like a CMP processor. Because of this, it allows architects to design wider cores without the worry of diminishing returns.
'a' is a traditional super scalar, while 'b' represents a Simultaneous Multithreading architecture.
Figure b) shows an example of a 4-issue, 4-way SMT processor. The figure shows it as nearly "fully utilized" with 4 threads (which is not to say that it will happen that way in real life). However, it is realistic for SMT to achieve higher efficiency than FMT due to its ability to share "unused" functional units amongst differing threads. So in this way, SMT shows the efficiency of a CMP machine. Unlike a CMP machine, an SMT machine makes little to no sacrifice (the small sacrifice is discussed later) for single threaded performance.
The reason for this is simple: whereas much of a CMP processor remains idle when running a single thread -- the more processors on the die, the more this is pronounced -- an SMT processor can dedicate all functional units to that thread. While obviously not as nice as being able to run multiple threads, the ability to balance between single thread and multithreaded environments is a wonderful feature. This means that an SMT processor can exploit thread-level parallelism (TLP) if it is present, and if not, will give full attention to instruction level parallelism (ILP).
>> SMT Induced Changes/Concerns About SMT