Perfmon is a tool that allows user-level code to access the performance counters present in the Ultra-series workstations and servers produced by Sun Microsystems. This is accomplished by a loadable driver that re-programs devices with performance counters so that user-level code can access these counters (normally, access to these counters is restricted to code running in privileged mode). For some devices, like the UltraSPARC CPU, accessing the performance counters requires special machine instructions. There is a user library component of Perfmon that provides access to these instructions via C function calls. The library also includes access to other useful functions such as memory/instruction barriers. Currently, the only devices supported are the UltraSPARC-I and the UltraSPARC-II CPUs and will be the only devices discussed in the remainder of this document. See the section on Future Work for devices that may be supported in future versions of Perfmon.
There are two parts to collecting performance data on UltraSPARC CPUs. The first is to program the Performance Control Register (PCR) indicating the type of events that you wish to count. Access to the PCR is always privileged and requires a call into the Perfmon device driver. The second part is to read the Performance Instrumentation Counter (PIC) register to get the current count of the watched events. Access to the PIC is normally privileged, but can be made non-privileged by turning off the lower bit of the PCR register. In addition to the PCR and PIC registers, user access to the UltraSPARC's TICK register (which is incremented once per machine clock cycle at all times) is also enabled. This is done by turning off the upper bit of the TICK register. See the Perfmon User's Guide and the UltraSPARC-I User's Manual for more information.
There were two basic requirements in designing Perfmon. The first was to allow user programs to access the performance counters, which is a privileged operation. The second was to have lightweight access to the accumulated data to minimize the amount of error introduced by the act of reading the performance registers.
Since access to the PCR is always privileged, and access to the PIC and TICK is by default privileged, it was necessary to write some code that runs in the kernel context. For maximum flexibility and ease of installation, it was decided to write a loadable device driver rather than have a specially modified kernel.
The loadable driver is a standard, autoconfiguring SVR4 character device driver. In addition to the static structures and functions needed to support a device driver (see Writing Device Drivers in AnswerBook for more details), there are two functions in the device driver that needed Perfmon-specific code to be written.
When the device driver is initially loaded, there is a certain sequence that of events that happen:
At this point, the driver is loaded and ready to accept requests from the user. The communication channel that is used between a user program and the device driver is the ioctl() system call. The sequence of events from the driver's point of view is:
/dev/perfmon. This ends up calling perfmon_open(), which the kernel locates through the static cb_ops structure. In the case of the Perfmon device driver, there is no special permission checking or state information that needs to be taken care of upon an open(), so perfmon_open() always returns successfully.
PERFMON_SETPCR: The user passed in a pointer to a 64-bit value that is to be stored in the PCR register. But since the pointer is to a user address, and is not valid in kernel space, we need to call copyin() to map the user's buffer into the kernel so that we can get at the value. After we have the value, we simply call pm_set_pcr() which sets the value of the PCR register.
PERFMON_GETPCR: The user passed in a pointer to a 64-bit buffer in which the current value of the PCR register is to be stored. We have the same memory mapping problem as before, so we call pm_get_pcr() to get the current PCR value and then store it in the user's buffer using copyout().
PERFMON_FLUSH_CACHE: In this case, we just call the pre-existing kernel routine that flushes the CPU's cache. This is done by reading a range of kernel addresses that is guaranteed to alias with all cache lines, causing them to be flushed. There was a slight problem with the implementation of this ioctl(), which is described in the section below.
If the driver is ever unloaded, there is a sequence of events that take place:
Earlier during development, I was seeing some cases on MP machines
where the driver would load, make the cross-call to turn off the TICK.npt
bit, and return. However,
when I ran a user program that tried to read TICK, it would crash with
Illegal instruction error, indicating that the TICK.npt
bit had not been turned off. If I waited a minute or two, the problem
would go away and everything would work perfectly, implying that the
cross-calls were working, but taking their time doing it. This was finally
resolved by adding calls to xc_attention() and xc_dismissed() around
the cross call. These functions basically forces all CPUs into a tight
loop, waiting to receive cross-calls, then release them. Since the
installation of this code, I have not been able to reproduce my
earlier problem, so I'm assuming that it's fixed.
The only tricky ioctl() to implement was PERFMON_FLUSH_CACHE. This causes the cache on the current CPU to be flushed. The actual flushing is done by calling a pre-existing kernel routine (cpu_flush_ecache()) that accesses a region of memory that aliases with each cache line in the CPU. The tricky part was getting access to this routine. Under Solaris 2.6 (where I did my initial development), the cpu_flush_ecache() function is a global kernel symbol, meaning that I can just reference that function in my driver code, and it will be resolved when my driver is loaded. However, under Solaris 2.5.1, this function is not a global symbol and cannot be resolved by the kernel module linker (krtld) at module load time. However, the symbol could be resolved once I was already loaded and running in kernel space. This means that in order to support this function, I need to make calls into krtld to resolve cpu_flush_ecache() myself. Luckily, this turned out to be less complicated than it sounds. The first time that a cache flush is requested by the user, I look up the address of cpu_flush_ecache() by using kobj_getsymvalue(). I then keep a pointer to this function around for later use, along with a flag indicating that I have attempted lookup (since it's possible that the symbol doesn't exist). And to make the driver MT-safe, the lookup has to be protected via a mutex lock to avoid any possible race conditions.
The user-land component of Perfmon was relatively easy and quick to implement. The library functions are all written in assembly since they need to use special machine instructions to do their work. Also, since most of the performance counter registers are 64-bit, and the Solaris compilers and OS are currently 32-bit, the library routines had to split the 64-bit registers into two separate registers so that the calling C code could deal with them properly.
Another issue for writing the user-level code was the fact that the performance counters are kept on a per-processor basis rather than a per-process basis. This means that if you run your program on an MP machine, and it migrates between CPUs during its run-time (which is pretty likely given Solaris' work-grabbing scheduler), the data read from the CPU performance counters is useless. Fortunately, there is a non-privileged system call named processor_bind() that will let you bind your process (or a single LWP) to a particular CPU.
Also, since there is some setup required by most programs using Perfmon, a skeleton program was provided to minimize development and testing of programs. The basic outline of the skeleton program is:
The interaction between user code and driver code of a typical user program using Perfmon would go something like this:
User Code | Kernel Code C assembly | C assembly ----------------------------------+---------------------------------- open("/dev/perfmon") | perfmon_open() | ioctl(PERFMON_SETPCR) | perfmon_ioctl() | copyin() | pm_set_pcr() | ioctl(PERFMON_FLUSH_CACHE) | perfmon_ioctl() | cpu_flush_ecache() | gethrtime() | clr_pic() | cpu_sync() | get_pic() | | Run code to be analyzed | | cpu_sync() | get_pic() | gethrtime() | | Analyze results | | close("/dev/perfmon") | perfmon_close() |
Since Perfmon requires the installation of a device driver on every machine that is to run it, the installation procedure was designed to be as easy as possible for the system administrator.
Solaris supports the installation of a collection of files through a mechanism called packages. Each package consists of a collection of files to be installed, optional scripts that are run before and after installation/removal, and two files that are used by Solaris to identify the package and its components.
When the system administrator wishes to add Perfmon to a machine,
they only need to type the command
pkgadd -d MSUperf and
answer "yes" to two questions. This causes the following sequence
of events to occur:
dependfile that is part of the Perfmon package. This tells the OS which system packages must be installed in order for Perfmon to function.
postinstallscript that is included in the Perfmon package is executed. This causes the following events to take place:
/etc/devlink.tabis examined and any existing entries relating to Perfmon are deleted.
add_drvcommand. This causes a major device number to be picked by Solaris and registered in the file
/etc/name_to_major. The file
/etc/minor_permis updated to reflect the desired permission on the Perfmon device node when it gets created.
/etc/devlink.tabgets the entry for the Perfmon device driver added to it. This will cause a symbolic link to be created from
/dev/perfmonto the actual device node, which usually resides at
/devices/pseudo/perfmon@0:perfmon. This is just for convenience to the user.
drvconfig -i perfmon.
/etc/devlink.tab. This is forced by running the program
At this point, the Perfmon driver is installed, loaded, and
ready for use. If the system administrator ever wishes to remove
Perfmon from the system, all they must do is execute the
pkgrm MSUperf and all of the above steps are
There are plenty of things that can be done to extend the features and usefulness of Perfmon. Some of the items that are planned for the future are: