SLCentral - Your logical choice for computing and technology
Navigation
  • Home
  • Search
  • Forums
  • Hardware
  • Games
  • Tech News
  • Deals
  • Prices
  • A Guru's World
  • CPU/Memory Watch
  • Site Info
  • Latest News
    Corsair TX750W Power Supply Unit Review
    Businesses For Sale
    Shure E530PTH Earphones Review
    Guide to HDTVs
    Cheap Web Hosting
    >> Read More
    Latest Reviews
    Corsair TX750W Power Supply Unit - 4-/-0/2008
    Shure E530PTH Earphones - 9-/-0/2007
    Suunto T6 Wrist Top Computer - 1-/-0/2007
    Suunto X9i Wristwatch - 9-/-0/2006
    Shure E3g Earphones - 5-/-0/2006
    >> Read More
    SL Newsletter
    Recieve bi-weekly updates on news, new articles, and more


    SLCentralArticlesTech Explanations Sep 25th, 2017 - 7:49 PM EST
    Fundamentals Of Multithreading
    Author: Paul Mazzucco
    Date Posted: June 15th, 2001

    On-Chip Multiprocessing

    In order to make a fair comparison between an on-chip multiprocessor (CMP) and a uniprocessor, one must compare similar architectural features - i.e., both the uniprocessor and a CMP chip should have a similar aggregate number of functional units, registers, and similar renaming registers. This is to say that for a die that has two CPU's on it, each individual CPU has half the registers of the single CPU die, but because there are two processors on a CMP chip, the total resources are the same. These processors can also have an on-die L2 cache, which would be shared. If the L1 caches were write-through, then the cache-coherency problem between the two processors would be solved.

    The general concept behind using multiple cores on one die is to extract more performance by executing two threads at once. By doing so, the two chips together are able to keep a higher percentage of the aggregate number of functional units doing useful work at all times. An example is shown below.


    Pictures adapted from Jack Lo's PhD dissertation[1], and Paul DeMone's "Simultaneous Multi-threat." [2] a) conventional superscalar CPU, b) a 2 CPU multiprocessor.

    To explain the context-switch code, I defer to Mr. DeMone's explanation:

    Each thread runs for a short interval that ends when the program experiences an exception like a page fault, calls an operating system function, or is interrupted by an interval timer. When a thread is interrupted, a short segment of OS code (shown in Figure 1A as gray instructions in issue slots) is run which performs a context switch and switches execution to a new thread. Multitasking provides the illusion of simultaneous execution of multiple threads but does nothing to enhance the overall computational capability of the processor. In fact, excessive context switching causes processor cycles, which could have been used running user code, to be wasted in the OS.

    The more functional units a processor has, the lower the percentage of units doing useful work is at any given time. The on-chip multi-processor lowers the number of functional units per processor, and distributes separate tasks (or threads) to each processor. In this way, it is able to achieve a higher throughput on both tasks combined. The comparative uniprocessor would be able to get through one thread, or task, faster than a CMP chip could, because, although there are wasted functional units, there are also "bursts" of activity produced when the processor computes multiple pieces of data at the same time and uses all available functional units. The idea behind multi-processors is to keep the processor from experiencing such "bursty" activity, and instead using what it has more frequently, and therefore efficiently. The non-use of some of the functional units during a clock cycle is known as horizontal waste, which CMP tries to avoid.

    Another advantage of using a CMP chip instead of a larger, more robust uniprocessor, is that there is less difficulty in designing a smaller, less complex chip. This is useful in a couple of ways: one, it allows the designers to spend less time on the chip (and thus time to market is shorter); and two, less complex, smaller processors tend to be able to execute at a higher frequency. In this way, a CMP chip in a multithreaded (or multiprogrammed) environment is able to execute faster due to more efficient use of available resources over the various threads, and because of the potential to increase the clock rate over that of a monolithic processor.

    The MAJC architecture from Sun Microsystems makes use of CMP. It allows one to four processors to share the same die, and for each to run separate threads. Each processor is limited to 4 functional units (each of which are able execute both integer and floating point operations, making the MAJC architecture more flexible).

    Another example of an on-chip multi-processor is the Power4 processor from IBM. This architecture does not make use of the philosophy of using smaller, easier to implement CPUs. Instead, it takes processors that, in and of themselves, could be considered full-fledged server chips. And yet, IBM has chosen to stick two onto each die, where it should have a die size of ~400mm^2 (smaller than the HP 8500, and the same amount of on-die cache) [3].

    There are problems with CMP, however. The traditional CMP chip sacrifices single-thread performance in order to expedite the completion of two or more threads. In this way, a CMP chip is comparatively less flexible for general use, because if there is only one thread, an entire half of the allotted resources are idle, and completely useless (just as adding another processor in while using a singly threaded program is useless in a traditional SMP system). Another approach to making the CPU's functional units more efficient is called course-grained multithreading.

    >> Course-Grained Multithreading

    Did you like this article?

    Article Navigation
    1. Introduction/Amdahl's Law
    2. Latencies And Bandwidth
    3. Latencies And Bandwidth Cont.
    4. ILP Background
    5. On-Chip Multiprocessing
    6. Course-Grained Multithreading
    7. Fine-Grained Multithreading
    8. Simultaneous Multithreading
    9. SMT Induced Changes/Concerns About SMT
    10. Jackson Technology And SMT
    11. Applications Of Multithreading: Dynamic Multithreading
    12. Applications Of Multithreading: Redundancy Is Faster?
    13. Summary Of The Forms Of Multithreading And Conclusion
    14. Bibliography
    Article Options
    1. Discuss This Article
    2. Print This Article
    3. E-Mail This Article
    Browse the various sections of the site
    Hardware
    Reviews, Articles, News, All Reviews...
    Gaming
    Reviews, Articles, News...
    Regular Sections
    A Guru's World, CPU/Memory Watch, SLDeals...
    SLBoards
    Forums, Register(Free), Todays Discussions...
    Site Info
    Search, About Us, Advertise...
    Copyright 1998-2007 SLCentral. All Rights Reserved. Legal | Advertising | Site Info