The Dangers and Pitfalls of Software Engineering Measurements
In the software engineering field, measurements are used to better understand, control, and improve the behaviors and results throughout the entire software engineering process and product (Fenton 4). However, in the real world (i.e. the software industry), there are several disadvantages and problems with measuring software and its development processes. Until these issues have been individually addressed, there should always be some caution when using software engineering measurements. This paper will attempt to impartially address many of these issues and try to suggest methods of overcoming them.
One of the bigger problems with software measurement is the discontinuity between “measurement research and measurement practice” (Pfleeger 33). Researchers in software engineering measurements are usually more interested in validating theoretical concepts and conclusions, while practitioners are software engineers or managers that usually want “short-term, useful results” that have been empirically tested (Pfleeger 34). This problem can be more precisely stated that software engineering measurements are only useful when researchers, practitioners, and even customers work together to define their goals, state their needs, and solve their problems. There are several other advantages of having research and practical application integrated together, such as the immediate application of new measurement technologies and the development of full scale processes and products can be more easily implemented. Unfortunately, “such collaboration is rare,” because these participants usually have motivations that are “different and sometimes conflicting” or are not communicating among each other properly (Pfleeger 33). For example, practitioners sometimes use the incorrect type of measurement scales when dealing with their data. This is not only the fault of the practitioners for making a simple mistake; but in addition, researchers are also at fault for not developing better measurement tools to assist and facilitate practitioners. Especially since practitioners are not interested in the details of software engineering measurements and are not intimately familiar with all the measurement rules, such as researchers are or should be. Furthermore, practitioners will avoid using measurements that they do not understand well or are not comfortable using. In conclusion, researchers should also develop better measurement tools to bridge the gap between researchers and practitioners, because practitioners that are incorrectly applying measurements will most likely result with erroneous conclusions and could even be counterproductive. One possible measurement tool could be a list of software engineering measurement techniques for certain types of development projects and their processes, that outlines which method or methods would be best to use in every conceivable project type and why. Thus making it as simple as possible for practitioners to understand and use, while still having a strong theoretical backing for researchers.
Another problem is the difficulty researchers have in correctly applying measurements of software code. For example, many researchers still have issues with program or module size, such as the total number of lines of code or Halstead measures (the total number of operators and operands), being proper and/or accurate enough for measuring. With either measurements, it is commonly believed that smaller sizes of programs or modules are better, because smaller sizes implies that they are less complex and easier to understand and change. Since the size and complexity of programs and modules are useful in predicting the effort necessary for implementing changes, a lot of attention has been given to these measurements. Unfortunately, these measurements can be misleading. Consider this example:
FOR i = 1 TO n DO READ (x[i])
This code fragment reads in a list of ‘n’ variables, has a total number of lines equal to 2, and has a Halstead number of about 5. Even though this code fragment has a small size according to both measurements, it is not robust enough to take into consideration if the user wants to read any number of variables up to ‘n’. For example:
i = 1 WHILE (NOT EOF) or (i <= n) DO READ (x[i]) i = i + 1 END
This new code fragment has a total number of lines equal to 5 and has a Halstead number of about 9 (Pfleeger 34). Even though the second code fragment has a larger size according to both software measurements, it has better functionality and is more robust. Yet, the second code fragment is more complex, harder to understand, and more difficult to test and modify. Obviously, both code fragments have their own advantages and disadvantages that are independent of their size or complexity, thus researchers cannot or should not determine what code is better by size or complexity alone. Therefore, it is very easy for researchers or practitioners to misuse software engineering measurements, even though those measurements are individually valid and initially seem logical for certain applications. This is a problem of misusing measurement processes, whereby the “metrics are used without keeping the development goals in mind,” such as the code size should only “support goals of testability and maintainability” (Pfleeger 35). It can be inferred that a larger set of diverse metrics with high variance would be preferable for determining suitability for supporting other goals, even though it would more difficult to interpret a larger set of data. Thus, each type of measurement should have predefined all the practical applications of these measurements, so that practitioners can be notified of these applications in a measurement tool.
There are also several problems with measuring processes that fit into the following categories. First, measurement processes “require validation, which is difficult to do.” Second, it is difficult for software managers to track all the process measures throughout the entire software life-cycle. However, it is relatively easier for individual processes to be measured when they are not intertwined with other processes. Unfortunately, this is not adequate enough for tracking all the process types. Third, measuring processes usually requires a model to better understand the “interrelation” between processes, however these are difficult to attain and more difficult to interpret. A specific type of measurement process that has several problems is data mining. Data mining is “the process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions” (Simoudis 26). The first step in data mining is data selection, when practitioners select the target data they are interested while ignoring extraneous data. This first step raises many issues if the practitioners’ selection is accurate or even useful, especially since, the selection might be based upon a hypothesis or opinion. The next step is to transform data into a set of data that is more practical and easier to understand. This step adds an extra level of complexity to the process when practitioners can easily make a mistake. The third step is data mining, when practitioners use induction or reasoning to develop a classification scheme for the data they are examining. In the real world, this induction is commonly just someone’s opinion about how the data should be classified, not necessarily an accurate representation of the data. And the final step is the interpretation of the results that can be very opinion and biased (Simoudis 27). “Thus, even as attention turns increasingly to process in the larger community, process measurement research and practice lag behind the use of other measurements” (Pfleeger 39). Therefore, there is a strong need for researchers to further develop methods to measure processes that are simple and easy enough for practitioners to understand and use.
There are even problems with measurements themselves, even though measuring a software product might seem simple and absolute. It is not uncommon in the real world for the integrity of data to be questionable. Measurements can easily be incorrect, incomplete, or inaccurate; such as, researchers and practitioners do not take into account the human factor of data collecting (Glass 15). Measurements are sometimes taken through surveys or questionnaires, whereby the responses given are what engineers think the answer should be, “rather than what they believe to be true.” In other cases, measurements are just estimations, guesses, or (even worse) made up; and they are later considered and used as concrete evidence (Glass 16). Obviously, these problems can quickly result with erroneous measurement predictions and models. Unfortunately, it is very difficult to take into consideration the human factor of data collecting, and researchers need to more carefully address this issue. The only current method for checking the quality of measurements is through careful inspections of the data collecting processes, yet this does not guarantee the quality of the inspections or data.
There are also several problems with measuring software resources. First, it is very difficult to quantify the “quality and variability” of software engineers (Ghezzi 418 and Pfleeger 40). There is “little attention being paid to human resource measurement, as developers and managers find it threatening” and too complex to do. On the contrary, “more attention has been paid to other resources: budget and schedule assessment, and effort, cost, and schedule prediction.” There are many models that can be developed to predict each of these measurements, however there is no method to integrate these models into a single comprehensive model. Even though this holistic model would be more accurate than any individual model, it has never been known to be applied in practice. Therefore, the second problem with measuring resources is that building a holistic model, for all measurement resources, only exists in theory. And finally, resource models have a short life-span when there are applied and are not used widely enough by practitioners (Pfleeger 40). As a result, there are several problems with measuring software resources that need to be address by both researchers and practitioners.
“A measure can be useful as a predictor without being valid in the sense of measurement theory”. This means that all measurements have some usefulness in predicting the same measurements in the near future, even though it might not be valid for measurement theory nor even useful. To complicate the issue further, model validity is separate from measurement validity, because “more accurate models” can be developed to base better measurements (Pfleeger 36). Since the complexity of measurement theory is not directly related to measurement validity or model validity, it is increasing likely that practitioners will not have a complete understanding of software measurements and all of its components (Pfleeger 37). Again showing the need for researchers to develop better tools to educate practitioners in the complex field of software engineering measurements.
Unfortunately in the software industry, prediction models have a short life-span when they are applied, and they are not used widely enough by practitioners. For instance, “the lack of models in software engineering is symptomatic of a lack of systems focus” and/or understanding (Pfleeger 37). There is also a misconception among practitioners, that models are too vague and promise too much, that also needs to be addressed. Once practitioners understand that useful results with prediction models do not happen quickly, only through a matter of time and patience, will software engineering measurements be more accepted in the software industry.
In conclusion, software engineering measurements have several issues and problems that still need to be address by both researchers and practitioners. This is “part the fault of researchers, who have not described the limitations of and constraints on techniques put forth for practical use” and documented for easy use and understanding (Pfleeger 42). The scientific community cannot expect practitioners to be informed as well as researchers “in statistics, probability, or measurement theory, or even in the intricacies of calculating code complexity or modeling parameters. Instead, we encourage researchers to fashion results into tools and techniques that practitioners can easily understand and apply” (Pfleeger 42). The software engineering measurement problems that were discussed are the following: the discontinuity between measurement research and practice, and applying measurements incorrectly. Other problems were found when measuring the software development process, the software product (such as code or design documentation), and even the software resources. These problems show that there is a strong need for researchers to further develop better methods to measure software engineering. Such as, bridging the gap between researchers and practitioners, so that practitioner can more easily and correctly use methods for measuring software. Also, researchers need to take into consideration the human factor of data collecting that can be very erroneous. As a result of all these factors and more, it would greatly help practitioners if there was a single tool to assist them throughout the entire measuring process, and not just to do mathematical calculations on measurements.
by Phil for Humanity
NOTE: This paper was first published in the Fall of 1997.
- Fenton, Norman E., and Shari Lawrence Pfleeger. (1997). Software metrics: A Rigorous & Practical Approach. Second Edition. PWS Publishing Company and International Thomson Computer Press.
- Ghezzi, Carlo, Mehdi Jazayeri, and Dino Mandrioli. (1991). Fundamentals of Software Engineering. Prentice-Hall, Inc.
- Glass, Robert L. (1997 July/August). IEEE Software. Telling Good Numbers from Bad Ones, pp. 15-16, 19.
- Jain, Raj. (1991). The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. John Wiley & Sons, Inc.
- Pfleeger, Shari Lawrence, Ross Jeffery, Bill Curtis, and Barbara Kitchenham. (1997 March/April). IEEE Software. Status Report on Software Measurement, 33-37, 39-42.
- Simoudis, Evangelos. (1996 October). IEEE Expert: Intelligent Systems and Their Applications. Reality Check for Data Mining, pp. 26-33.