The first and foremost factor to keep in mind in choosing a methodology to address some particular workload question is the purpose or goal of the research. This is true whether the selection is from among the kinds of methods discussed here or from other sources.
The method selected must provide measures that allow the detection of operationally important changes in the operator's ability to satisfy job demands as a function of the workload variables being manipulated. It is not sufficient that a given measure or pattern of measures merely reveal decrements for one configuration of demands in relation to some other configuration; rather, the decrements must be meaningfully relatable to critical operational tasks in terms of operator reliability, system safety, or probability of meeting job goals.
Alternatively, and this is much more difficult to establish, the method should provide for compelling predictions of the extent to which the operator could satisfy the job demands under operational conditions, even where no decrements are found for a given workload configuration. At the same time, every possible effort within reason and the scope of available resources should be made to design the research so that maximum generality is possible across systems. Clearly, when the method and dependent variables to be measured are selected, commitments are implicitly made to a particular realm of discourse as regards system-workload parameters. The researcher must ensure that the basic problem or question which gave rise to the research in the first place can, in fact, be handled within that realm of discourse; the importance of the selection of dependent variables has been noted in considerable detail elsewhere.
The most pressing and difficult problem in assessing workload effects, whatever the method selected, is that of developing reliable, valid, quantitative criteria to reflect system performance. Criteria against which to evaluate research results are needed. Compelling distinctions must be available to distinguish acceptable from unacceptable, good from acceptable, and excellent from good performances of the system. The distinctions must be quantitative and reliable, and they must permit the disentanglement of operator performance, machine performance, and system (operator-machine) performance. Ultimately, what is needed is a method that permits assigning the reliable variance, as appropriate, to the human, to the machine, and to the human-machine interface.
For some specific questions, this may appear to be a deceptively approachable question. For example, in order to determine which of two instrument landing systems makes the smaller contribution to pilot workload, it should be possible to obtain accurate measures of such performances as the deviation of the aircraft from the glide slope and the localizer, and perhaps from command airspeed. Comparisons of these measures of performance obtained with the two displays should provide an index of their workload-inducing properties. However, it is entirely conceivable that one display would lead to smaller errors only because the pilot could, by working harder, take advantage of some peculiarity of that display in holding to the proper course. At the same time, the pilot might very well be less able to respond appropriately to some emergency condition that might arise from some other quarter. Thus, in this specific example, an additional variable would be needed -- a variable that would shed light on how much of the pilot's workload capacity was being used up by each display.
The example is admittedly highly artificial, but one intent is merely to illustrate how what might appear to be a simple measurement problem might not be so simple or easily solved after all. Another purpose served by the example is to suggest that when conclusions based on a specific set of measures are drawn, the results may imply extrapolations that go well beyond the circumstances under which the measurements were made.
The approach to measurement described by Cotterman and Wood in their evaluation of performance in a space-vehicle simulator, as cited in the preceding section, appears to show considerable promise as a technique for converting "raw" performance measurements to probabilities of meeting criterion requirements. However, there is a gap between their application and the typical workload-measurement situation. Specifically, in the case of the Lunar Excursion Module, the maximum values of various parameters could be specified quite readily; for example, engineering specifications dictated that the impact velocity of the vehicle on landing could not exceed specified values without risk of damage. Such precision is less clearly identifiable in the majority of aircraft operating situations where, typically, rather broad latitude is possible in the flight parameters without risk of entering unsafe conditions of flight. Thus, in some areas of application, the specifications required by the Cotterman and Wood procedure might be a bit arbitrary. Perhaps, at least for research purposes, it would be necessary and profitable to set up much more stringent criteria than normal; but neither can they be made too stringent, for the difficulty level must logically permit the typical operator from the population of successfully employed operations to be capable of performing satisfactorily under normal conditions.
Where quantitative measures of "secondary tasks" are available from either system measures or measures of experimental tasks, the paired-comparisons scaling procedure could be used to develop a scale of workload or task difficulty. If adequate numbers of subjects were made available, and if suitable tasks and measurement situations could be agreed upon, substantial progress might even be made toward the development of a generalized method of specifying operator workload.