1.6 General-purpose processes are optimized for general-purposecomputing. That is, they are optimized for behavior that isgenerally found across a large number of applications. However,once the domain is restricted somewhat, the behavior that is foundacross a large number of the target applications may be differentfrom general-purpose applications. One such application is deeplearning or neural networks. Deep learning can be applied to manydifferent applications, but the fundamental building block ofinference—using the learned information to make decisions—is thesame across them all. Inference operations are largely parallel, sothey are currently performed on graphics processing units, whichare specialized more toward this type of computation, and not toinference in particular. In a quest for more performance per watt,Google has created a custom chip using tensor processing units toaccelerate inference operations in deep learning.1 This approachcan be used for speech recognition and image recognition, forexample. This problem explores the trade-offs between this process,a general-purpose processor (Haswell
E5-2699 v3) and a GPU (NVIDIA K80), in terms of performance andcooling. If heat is not removed from the computer efficiently, thefans will blow hot air back onto the computer, not cold air. Note:The differences are more than processor—on-chip memory and DRAMalso come into play. Therefore statistics are at a system level,not a chip level.
a. If Google’s data center spends 70% of its time on workload Aand 30% of its time on workload B when running GPUs, what is thespeedup of the TPU system over the GPU system?
b. If Google’s data center spends 70% of its time on workload Aand 30% of its time on workload B when running GPUs, whatpercentage of Max IPS does it achieve for each of the threesystems?
c. Building on (b), assuming that the power scales linearly fromidle to busy power as IPS grows from 0% to 100%, what is theperformance per watt of the TPU system over the GPU system?
d. If another data center spends 40% of its time on workload A,10% of its time on workload B, and 50% of its time on workload C,what are the speedups of the GPU and TPU systems over thegeneral-purpose system?
e. A cooling door for a rack costs $4000 and dissipates 14 kW(into the room; additional cost is required to get it out of theroom). How many Haswell-, NVIDIA-, or Tensor- based servers can youcool with one cooling door, assuming TDP in Figures 1.27 and1.28?
f. Typical server farms can dissipate a maximum of 200 W persquare foot. Given that a server rack requires 11 square feet(including front and back clearance), how many servers from part(e) can be placed on a single rack, and how many cooling doors arerequired?
Answer to 1.6 General-purpose processes are optimized for general-purpose computing. That is, they are optimized for behavior that…