Multiprocessing and multithreading Microarchitecture
computer architects have become stymied growing mismatch in cpu operating frequencies , dram access times. none of techniques exploited instruction-level parallelism (ilp) within 1 program make long stalls occurred when data had fetched main memory. additionally, large transistor counts , high operating frequencies needed more advanced ilp techniques required power dissipation levels no longer cheaply cooled. these reasons, newer generations of computers have started exploit higher levels of parallelism exist outside of single program or program thread.
this trend known throughput computing. idea originated in mainframe market online transaction processing emphasized not execution speed of 1 transaction, capacity deal massive numbers of transactions. transaction-based applications such network routing , web-site serving increasing in last decade, computer industry has re-emphasized capacity , throughput issues.
one technique of how parallelism achieved through multiprocessing systems, computer systems multiple cpus. once reserved high-end mainframes , supercomputers, small-scale (2–8) multiprocessors servers have become commonplace small business market. large corporations, large scale (16–256) multiprocessors common. personal computers multiple cpus have appeared since 1990s.
with further transistor size reductions made available semiconductor technology advances, multi-core cpus have appeared multiple cpus implemented on same silicon chip. used in chips targeting embedded markets, simpler , smaller cpus allow multiple instantiations fit on 1 piece of silicon. 2005, semiconductor technology allowed dual high-end desktop cpus cmp chips manufactured in volume. designs, such sun microsystems ultrasparc t1 have reverted simpler (scalar, in-order) designs in order fit more processors on 1 piece of silicon.
another technique has become more popular multithreading. in multithreading, when processor has fetch data slow system memory, instead of stalling data arrive, processor switches program or program thread ready execute. though not speed particular program/thread, increases overall system throughput reducing time cpu idle.
conceptually, multithreading equivalent context switch @ operating system level. difference multithreaded cpu can thread switch in 1 cpu cycle instead of hundreds or thousands of cpu cycles context switch requires. achieved replicating state hardware (such register file , program counter) each active thread.
a further enhancement simultaneous multithreading. technique allows superscalar cpus execute instructions different programs/threads simultaneously in same cycle.
Comments
Post a Comment