Lab Exercise #13 -- SPARC Performance Evaluation A. The files named "lab13.main.c" and "lab13.sub.A.s" contain the source code for a program which performs some floating point calculations. 1. Copy the files into your directory, then translate and execute the program using the commands: gcc lab13.main.c lab13.sub.A.s a.out Examine the source code and the output, then answer the questions below. Please note that the program may take several seconds to execute due to the number of times that function "sum_vector" is invoked. a) How many times is function "sum_vector" invoked by "main"? ____________ b) Execute the program three times and record the number of nanoseconds displayed by the program for each execution. Nanosecs: ________________________ Nanosecs: ________________________ Nanosecs: ________________________ Note: the number of nanoseconds may vary due to the limitations of the system clock and other external factors. 2. Describe the method which is used to measure the the number of nanoseconds consumed by a particular code segment. Use the command "man gethrtime" for more information about function "gethrtime". B. Modify function "sum_vector" to decrease the number of nanoseconds used by the program, without modifying its functionality. 1. In early versions of the SPARC, integer multiplication and division were not implemented as hardware instructions due to the length of time required for those operations to complete. Instead, those operations were performed by software routines. List the instructions in "lab13.sub.A.s" that represent the multiplication operation used to compute the array offset. 2. In later versions of the SPARC, hardware multiplication and division instructions were added to the integer unit. a) Copy function "sum_vector" into a file named "lab13.sub.B.s", then substitute the appropriate multiply instruction for the multiplication operation used to compute the array offset. b) Execute the program three times and record the number of nanoseconds displayed by the program for each execution. Nanosecs: ________________________ Nanosecs: ________________________ Nanosecs: ________________________ 3. Even with hardware multiplication and division instructions, it is often desirable to substitute shift instructions for multiplication and division operations, since shift instructions take fewer clock cycles to complete. a) Copy function "sum_vector" into a file named "lab13.sub.C.s", then substitute the appropriate shift instruction for the multiplication operation used to compute the array offset. b) Execute the program three times and record the number of nanoseconds displayed by the program for each execution. Nanosecs: ________________________ Nanosecs: ________________________ Nanosecs: ________________________ 4. The effect of all control-transfer instructions is delayed by one cycle, due to the instruction pipeline used in the SPARC. The instruction following a control-transfer instruction is said to appear in the "delay slot". In function "sum_vector", all control-transfer instructions are followed by "nop" instructions. Copy "lab13.sub.C.s" into a file named "lab13.sub.D.s", then attempt to use all delay slots. Where possible, replace each "nop" instruction with an instruction (from elsewhere in the function) which performs useful work. a) How many delay slots were you able to fill with useful work? ______ b) Execute the program three times and record the number of nanoseconds displayed by the program for each execution. Nanosecs: ________________________ Nanosecs: ________________________ Nanosecs: ________________________ c) For each "nop" instruction which still appears in a delay slot, explain why you were unable to replace that instruction with an instruction which performs useful work.