SLCentral - Your logical choice for computing and technology
Navigation
  • Home
  • Search
  • Forums
  • Hardware
  • Games
  • Tech News
  • Deals
  • Prices
  • A Guru's World
  • CPU/Memory Watch
  • Site Info
  • Latest News
    Corsair TX750W Power Supply Unit Review
    Businesses For Sale
    Shure E530PTH Earphones Review
    Guide to HDTVs
    Cheap Web Hosting
    >> Read More
    Latest Reviews
    Corsair TX750W Power Supply Unit - 4-/-0/2008
    Shure E530PTH Earphones - 9-/-0/2007
    Suunto T6 Wrist Top Computer - 1-/-0/2007
    Suunto X9i Wristwatch - 9-/-0/2006
    Shure E3g Earphones - 5-/-0/2006
    >> Read More
    SL Newsletter
    Recieve bi-weekly updates on news, new articles, and more


    SLCentralArticlesTech Explanations Oct 22nd, 2014 - 10:39 PM EST
    Fundamentals Of Multithreading
    Author: Paul Mazzucco
    Date Posted: June 15th, 2001

    Introduction

    NOTE: This article intentionally ignores the discussion of a NUMA architecture. It is aimed at smaller scale multi-processing, which is used as a lead into the various forms of on-chip multithreading.

    I remember, quite distinctly, my first encounter with someone with a multi-processor system as his home machine. "Bob" (fake name to hide his identity) claimed that he owned a 466Mhz computer, when at the time, the top-of-the-line Intel machine was a Pentium II 450. I told him that no, he didn't have a 466Mhz machine, and because "Bob" didn't know about Alpha processors, and had never heard of "overclocking" before, that could not have been it. Then it hit me: he had a dual Pentium II 233Mhz rig. After asking him if this was the case, he said that it was, and that two 233Mhz chips are comparable to a 466Mhz machine.

    What "Bob" didn't realize was that programs have to be written in a special way in order to take advantage of multiple processors (called multithreading). Nor did he realize that they needed to be using an OS which supports multiprocessors, such as Windows NT or *nixs (any sort of Unix variant). MacOSs below OSX had a crude form of multiprocessing support (OSX is based off of BSD, a *nix clone, and has full fledged multiprocessing support). Generally speaking, the computer needs OS support as well as application support to take advantage of multiprocessing.

    The other thing "Bob" didn't realize is that for the vast majority of current consumer based applications, the multithreading support, if existent, does not often expedite the completion of a task proportionally with the number of processors in the system. This fact does not preclude an application from experiencing linear scaling with respect to the number of processors - nay, some applications can experience superlinear scaling, though these are few and far between.

    Amdahl's Law

    Assuming that an application is multithreaded (programs written to execute in a parallel manner, rather than a serial or purely sequential one), there are inherent difficulties in making a program run faster proportional to the number of processors: the program needs to be written in a parallel fashion, and the program itself must be resource friendly.

    This is explained by Amdahl's Law: "…the performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used," (Hennessey and Patterson, 29). This law applies to more than just changing sequential code to parallel code.

    Assume that a program consists of two main parts, A and B, and that each can be optimized. Part A represents 90% of the execution time, and B represents the remaining 10%. Assume that B can be optimized in such a fashion so as to be able to finish in one tenth the time of the original version, and that A can be optimized so as to complete its part of the program in 2/3 the time it previously needed. If both parts A and B take the same amount of time to be optimized, and the programmer has time only to optimize one part, which should the programmer work on? Obviously, he/she should work on part A.

    Just to walk through the thinking involved, assume that this program requires 100 seconds to complete. Part A consumes 90 seconds of execution time, and B requires 10 seconds of execution time. After optimization part A would take 60 seconds, and B a mere 1 seconds. The choice is between a total of 70 seconds of execution time if A is optimized, and 91 seconds if B is optimized. Though this article is not intended to be about software engineering, if one looks at the payoffs for where to put effort into programming, the choice should be obvious.

    So where do multiprocessor-based system fit in here? Multithreaded programs must be written to be as parallel as possible. The same discussion about speeding up parts of a program to increase performance applies to the discussion of multithreading. To create a linear speedup (a performance increase of X due to having X processors), generally all of the code must be written in a parallel manner (generally, and not explicitly always, due to other circumstances discussed later). The mathematical representation of Amdahl's law with respect to parallel computing follows:


    Adapted from "Computer Architecture: a Quantitative Approach," second edition, page 645.

    And even this formula doesn't tell the whole story. This assumes that the parallel code, per processor, is just as fast as the sequential code, which may or may not be the case. It also assumes that there is no overhead due to the latencies of travel. This, of course, is not the case. Few commercial consumer based applications support nearly perfectly parallel code. The only way to superlinear performance increases occur is for the cache afforded by each additional processor to dramatically drop main-memory bandwidth usage (which is not common).

    The assumption about latencies is important, because it is twofold: one, the program will have the same latencies to all processors; and two, the latencies themselves are not, in fact, exacerbated by the presence of the other processors executing their code.

    >> Latencies And Bandwidth

    Did you like this article?

    Article Navigation
    1. Introduction/Amdahl's Law
    2. Latencies And Bandwidth
    3. Latencies And Bandwidth Cont.
    4. ILP Background
    5. On-Chip Multiprocessing
    6. Course-Grained Multithreading
    7. Fine-Grained Multithreading
    8. Simultaneous Multithreading
    9. SMT Induced Changes/Concerns About SMT
    10. Jackson Technology And SMT
    11. Applications Of Multithreading: Dynamic Multithreading
    12. Applications Of Multithreading: Redundancy Is Faster?
    13. Summary Of The Forms Of Multithreading And Conclusion
    14. Bibliography
    Article Options
    1. Discuss This Article
    2. Print This Article
    3. E-Mail This Article
    Browse the various sections of the site
    Hardware
    Reviews, Articles, News, All Reviews...
    Gaming
    Reviews, Articles, News...
    Regular Sections
    A Guru's World, CPU/Memory Watch, SLDeals...
    SLBoards
    Forums, Register(Free), Todays Discussions...
    Site Info
    Search, About Us, Advertise...
    Copyright © 1998-2007 SLCentral. All Rights Reserved. Legal | Advertising | Site Info