Verification & Validation Channel

Verification is a Quality control process to evaluate whether a system complies with a set of specifications or regulations. Validation is Quality assurance process to establish a high degree of assurance that a system performs its intended requirements. This series focuses on the impacts on analysis, design, build, and test when trying to answer the two questions “are you building the right systems?” and ”did you build it correctly?”

Unit test tools and automatic test generation

Monday, March 19th, 2012 by Mark Pitchford

When are unit test tools justifiable? Ultimately, the justification of unit test tools comes down to a commercial decision. The later a defect is found in the product development, the more costly it is to fix  (Figure 1). This is a concept first established in 1975 with the publication of Brooks’ “Mythical Man Month” and proven many times since through various studies.

The later a defect is identified, the higher the cost of rectifying it.

The automation of any process changes the dynamic of commercial justification. This is especially true of test tools given that they make earlier unit test more feasible. Consequently, modern unit test almost implies the use of such a tool unless only a handful of procedures are involved. Such unit test tools primarily serve to automatically generate the harness code which provides the main and associated calling functions or procedures (generically “procedures”). These facilitate compilation and allow unit testing to take place.

The tools not only provide the harness itself, but also statically analyze the source code to provide the details of each input and output parameter or global variable in any easily understood form. Where unit testing is performed on an isolated snippet of code, stubbing of called procedures can be an important aspect of unit testing. This can also be automated to further enhance the efficiency of the approach.

This automation makes the assignment of values to the procedure under test a simpler process, and one which demands little intimate knowledge of the code on the part of the test tool operator. This distance enables the necessary unit test objectivity because it divorces the test process from that of code development where circumstances require it, and from a pragmatic perspective, substantially lowers the level of skill required to develop unit tests.

This ease of use means that unit test can now be considered a viable development option with each procedure targeted as it is written. When these early unit tests identify weak code, the code can be corrected immediately while the original intent remains fresh in the mind of the developer.

Automatically generating test cases

Generally, the output data generated through unit tests is an important end in itself, but this is not necessarily always the case. There may be occasions when the fact that the unit tests have successfully completed is more important than the test data itself. This happens when source code is tested for robustness. To provide for such eventualities, it is possible to use test tools to automatically generate test data as well as the test cases. High levels of code execution coverage can be achieved by this means alone, and the resultant test cases can be complemented by means of manually generated test cases in the usual way.

An interesting application for this technology involves legacy code. Such code is often a valuable asset, proven in the field over many years, but likely developed on an experimental, ad hoc basis by a series of expert “gurus” – expert at getting things done and in the application itself, but not necessarily at complying with modern development practices.

Frequently this SOUP (software of unknown pedigree) forms the basis of new developments which are obliged to meet modern standards either due to client demands or because of a policy of continuous improvement within the developer organization. This situation may be further exacerbated by the fact that coding standards themselves are the subject of ongoing evolution, as the advent of MISRA C:2004 clearly demonstrates.

If there is a need to redevelop code to meet such standards, then this is a need to not only identify the aspects of the code which do not meet them, but also to ensure that in doing so the functionality of the software is not altered in unintended ways. The existing code may well be the soundest or only documentation available and so a means needs to be provided to ensure that it is dealt with as such.

Automatically generated test cases can be used to address just such an eventuality. By generating test cases using the legacy code and applying them to the rewritten version, it can be proven that the only changes in functionality are those deemed desirable at the outset.

The Apollo missions may have seemed irrelevant at the time, and yet hundreds of everyday products were developed or modified using aerospace research—from baby formula to swimsuits. Formula One racing is considered a rich man’s playground, and yet British soldiers benefit from the protective qualities of the light, strong materials first developed for racing cars. Hospital patients and premature babies stand a better chance of survival than they would have done a few years ago, thanks to the transfer of F1 know-how to the medical world.

Likewise, unit testing has long been perceived to be a worthy ideal—an exercise for those few involved with the development of high-integrity applications with budgets to match. But the advent of unit test tools offer mechanisms that optimize the development process for all. The availability of such tools has made this technology and unit testing itself an attractive proposition for applications where sound, reliable code is a commercial requirement, rather than only those applications with a life-and-death imperative.

Unit, Regression and System Testing

Monday, February 20th, 2012 by Mark Pitchford

While unit testing at the time of development is a sound principle to follow, all too often ongoing development compromises the functionality of the software that is already considered complete. Such problems are particularly prevalent when adding functionality to code originally written with no forethought for later enhancements.

Regression testing is what’s needed here. By using a test case file to store a sequence of tests created for the original SOUP (Software of Unproven Pedigree), it is possible to recall and reapply it to the revised code to prove that none of the original functionality has been compromised.

Once configured, this regression testing can be initiated as a background task and run perhaps every evening. Reports highlight any changes to the output generated by earlier test runs. In this way, any code modifications leading to unintentional changes in application behavior can be identified and rectified immediately.

Modern unit test tools come equipped with user-friendly, point-and-click graphical user interfaces. However, when faced with thousands of test cases, a GUI interface is not always the most efficient way to handle the development of test cases. In recognition of this, test tools are designed to allow these test case files to be directly developed from applications such as Microsoft Excel. As before, the “regression test” mechanism can then be used to run the test cases held in these files.

Unit and system test in tandem

Traditionally, many applications have been tested by functional means only. The source code is written in accordance with the specification, and then tested to see if it all works. The problem with this approach is that no matter how carefully the test data is chosen, the percentage of code actually exercised can be very limited.

That issue is compounded by the fact that the procedures tested in this way are only likely to handle data within the range of the current application and test environment. If anything changes a little – perhaps in the way the application is used or perhaps as a result of slight modifications to the code – the application could be running entirely untested execution paths in the field.

Of course, if all parts of the system are unit tested and collated on a piecemeal basis through integration testing, then this will not happen. But what if timescales and resources do not permit such an exercise? Unit test tools often provide the facility to instrument code. This instrumented code is equipped to “track” execution paths, providing evidence of the parts of the application which have been exercised during execution. Such an approach provides the information to produce data such as that depicted in Figure 1.

Color-coded dynamic flow graphs and call graphs illustrate the parts of the application which have been exercised. In this example, note that the red coloring highlights exercised code.

Code coverage is an important part of the testing process in that it shows the percentage of the code that has been exercised and proven during test. Proof that all of the code has been exercised correctly need not be based on unit tests alone. To that end, some unit tests can be used in combination with system test to provide a required level of execution coverage for a system as a whole.

This means that the system testing of an application can be complemented by unit tests that execute code which would not normally be exercised in the running of the application. Examples include defensive code (e.g., to prevent crashes due to inadvertent division by zero), exception handlers, and interrupt handlers.

Unit test is just one weapon in the developer’s armory. By automatic use of unit test both in isolation and in tandem with other techniques, the development of robust and reliable software doesn’t need to carry the heavy development overhead it once did.

Unit testing: why bother?

Tuesday, October 25th, 2011 by Mark Pitchford

Unit test? Great in theory, but…

Unit test has been around almost as long as software development itself. It just makes sense to take each application building block, build it in isolation, and execute it with test data to make sure that it does just what it should do without any confusing input from the remainder of the application.

Without automation, the sting comes from not being able to simply lift a software unit from its development environment, compile and run it – let alone supply it with test data. For that to happen, you need a harness program acting as a holding mechanism that calls the unit, details any included files, “stubs” written to handle any procedure calls by the unit, and offers any initialization sequences which prepare data structures for the unit under test to act upon. Not only is creating that process laborious, but it takes a lot of skill. More often than not, the harness program requires at least as much testing as the unit under test.

Perhaps more importantly, a fundamental requirement of software testing is to provide an objective, independent view of the software. The very intimate code knowledge required to manually construct a harness compromises the independence of the test process, undermining the legitimacy of the exercise.

Deciding when to unit test

Unit test is not always justifiable and can vary in extent and scope depending on commercial issues such as the cost of failure in the field or the time unit testing will take.

To determine whether to move forward, you need to ask a couple of questions:

  • If unit testing is to take place, how much is involved?
  • Is it best to invest in a test tool, or is it more cost effective to work from first principles?

Developers must make pragmatic choices. Sometimes the choice is easy based on the criticality of the software. If the software fails, what are the implications? Will anyone be killed, as might be the case in aircraft flight control? Will the commercial implications be disproportionately high, as exemplified by a continuous plastics production plant? Or are the costs of recall extremely high, perhaps in a car’s engine controller? In these cases, extensive unit testing is essential and any tools that aid in that purpose make sense. On the other hand, if software is developed purely for internal use or is perhaps a prototype, then the overhead in unit testing all but the most vital of procedures would be prohibitive.

As you might expect, there is a grey area. Suppose the application software controls a mechanical measuring machine where the quantity of the devices sold is low and the area served is localized. The question becomes: Would the occasional failure be more acceptable than the overhead of unit test?

In these circumstances, it’s useful to prioritize the parts of the software which are either critical or complex. If a software error leads to a strangely colored display or a need for an occasional reboot, it may be inconvenient but not justification for unit testing. On the other hand, the unit test of code which generates reports showing whether machined components are within tolerance may be vital.

Beyond unit test

For some people, the terms “unit test” and “module test” are synonymous. For others, the term “unit” implies the testing of a single procedure, whereas “module” suggests a collection of related procedures, perhaps designed to perform some particular purpose within the application.

Using the latter definitions, manually developed module tests are likely to be easier to construct than unit tests, especially if the module represents a functional aspect of the application itself. In this case, most of the calls to procedures are related and the code accesses related data structures, which makes the preparation of the harness code more straightforward.

Test tools render the distinction between unit and module tests redundant. It is entirely possible to test a single procedure in isolation and equally possible to use the exact same processes to test multiple procedures, a file, or multiple files of procedures, a class (where appropriate), or a functional subset of an entire system. As a result, the distinction between unit and module test is one which has become increasingly irrelevant to the extent that the term “unit test” has come to include both concepts.

Such flexibility facilitates progressive integration testing. Procedures are first unit tested and then collated as part of the subsystems, which in turn are brought together to perform system tests. It also provides options when a pragmatic approach is required for less critical applications. A single set of test cases can exercise a specified procedure in isolation, with all of the procedures called as a result of exercising the specified procedure, or anything in between (See Figure 1). Test cases that prove the functionality of the whole call chain are easily constructed. Again, it is easy to “mix and match” the processes depending on the criticality of the code under review.

A single test case (inset) can exercise some or all of the call chain associated with it. In this example, “AdjustLighting,” note that the red coloring highlights exercised code.

This all embracing unit test approach can be extended to multithreaded applications. In a single-threaded application, the execution path is well-defined and sequential, such that no part of the code may be executed concurrently with any other part. In applications with multiple threads, there may be two or more paths executed concurrently, with interaction between the threads a commonplace feature of the system. Unit test in this environment ensures that particular procedures behave in an appropriate manner both internally and in terms of their interaction with other threads.

Sometimes, testing a procedure in isolation is impractical. For instance, if a particular procedure relies on the existence of some ordered data before it can perform its task, then similar data must be in place for any unit test of that procedure to be meaningful.

Just as unit test tools can encompass many different procedures as part of a single test, they can also use a sequence of tests with each one having an effect on the environment for those executed subsequently. For example, unit testing a procedure which accesses a data structure may be achieved by first implementing a test case to call an initialization procedure within the application, and then a second test case to exercise the procedure of interest.

Unit test does not imply testing in only the development environment. Integration between test tools and development environments means that unit testing of software can take place seamlessly using the compiler and target hardware. This is another example of the development judgments required to find an optimal solution – from performing no unit test at all, through to testing all code on the target hardware. The trick is to balance the cost of test against the cost of failure, and the overhead of manual test versus the investment cost in automated tools.

Does adding an IP address change embedded designs?

Thursday, September 15th, 2011 by Robert Cravotta

A recent analysis from McAfee titled “Caution: Malware Ahead” suggests that the number of IP-connected devices will grow by a factor of fifty over a ten year period based on the number of IP-connected devices last year. The bulk of these devices are expected to be embedded systems. Additionally, connected devices are evolving from a one-way data communication path to a two way dialog – creating potential new opportunities for hacking embedded systems.

Consider that each Chevy Volt from General Motors has its own IP address. The Volt uses an estimated 10 million lines of code executing over approximately 100 control units, and the number of test procedures to develop the vehicle was “streamlined” from more than 600 to about 400. According to Meg Selfe at IBM, they use the IP-connection for a few things today, like finding a charging station, but they hope to use it to push more software out to the vehicles in the future.

As IP-connected appliances become more common in the home and on the industrial floor, will the process for developing and verifying embedded systems change – or is the current process sufficient to address the possible security issues of selling and supporting IP-connected systems? Is placing critical and non-critical systems on separate internal networks sufficient in light of the intent of being able to push software updates to both portions of the system? Is the current set of development tools sufficient to enable developers to test and ensure their system’s robustness from malicious attacks? Will new tools surface or will they derive from tools already used in high safety-critical application designs? Does adding an IP address to an embedded system change how we design and test them?

What should design reviews accomplish?

Wednesday, September 7th, 2011 by Robert Cravotta

I remember my first design review. Well, not exactly the review itself, but I remember the lessons I learned while doing it because it significantly shifted my view of what a design review is supposed to accomplish. I was tasked with reviewing a project and providing comments about the design. It was the nature of my mentor’s response to my comments that started to shape my understanding that there can be disconnects with idealism and practicality.

In this review, I was able to develop a pretty detailed understanding of how the design was structured and how it would work. The idealist in me compelled me to identify not only potential problems in the design but to suggest better ways of implementing portions of the design. My mentor’s response to my suggestions caught me completely by surprise – he did not want to hear the suggestions. According to him, the purpose of the review was to determine whether the design did or did not meet the system requirements. The time for optimizing design decisions was passed – would the design accomplish the requirements or not.

His response baffled and irked me. Wasn’t a design review part of the process of creating the best design possible? Also, I had some really blindingly brilliant observations and suggestions that were now going to go to waste. Looking back, I think the hardline approach my mentor took helped make me a better reviewer and designer.

As it turns out, my suggestions were not discarded without a look; however, the design review is not the best point in the design cycle to explore the subtle nuances of one design approach versus another. Those types of discussions should have occurred and been completed before the design review process even started. On the other hand, for areas where the design does not or might not meet the system requirements, it is imperative that a discussion be initiated to identify where and why there might be some risks in the current design approach. My mentor’s harsh approach clarified the value of focusing observations and suggestions to those parts of the design that will yield the highest return for the effort spent doing the review.

Does this sound like how your design reviews proceed or do they take a different direction? What should be the primary accomplishment of a successful design review and what are those secondary accomplishments that may find their way into the engineering efforts that follow the review process?

Is bigger and better always better?

Wednesday, April 13th, 2011 by Robert Cravotta

The collision between an Airbus A380 and a Bombardier CRJ-700 this week at John F. Kennedy International Airport in New York City reminded me of some parallels and lessons-learned when we upgraded  the target processor with a faster version. I shared one of the lessons learned from that event in an article about adding a version control inquiry into the system. A reader added that the solution we used still could suffer from a versioning mismatch and suggested that version identifications also include an automatically calculated date and time stamp of the compilation. In essence, these types of changes in our integration and checkout procedures helped mitigate several sources of human or operator error.

The A380 is currently the world’s largest passenger jet with a wingspan of 262 feet. The taxiways at JFK Airport are a standard 75-foot-wide, but this accident is not purely the result of the plane being too large as there has been an Operation Plan for handling A380s at JFK Airport that has been successfully used since the 3rd quarter of 2008. The collision between the A380 and the CRJ appears to be the result of a series of human errors stacking onto each other (similar to the version inquiry scenario). Scanning the 36-page operation plan for the A380 provides a sense of how complicated it is to manage the ground operations for these behemoths.

Was the A380 too large for the taxiway? Did the CRJ properly clear the taxiway (per the operation plan) before the A380 arrived? Did someone in the control tower make a mistake in directing those two planes to be in those spots at the same time? Should someone have been able to see what was going to happen and stopped it in time? Should the aircraft sensors have warned the pilot that a collision was imminent? Was anyone in this process less alert or distracted at the wrong time? There have been a number of air traffic controllers that were caught sleeping on the job within the last few months, with the third instance happening this week.

When you make changes to a design, especially when you add a bigger and better version of a component into the mix, it is imperative that the new component be put through regression testing to make sure there are no assumptions broken. Likewise, the change should flag an effort to ensure that the implied (or tribal knowledge) mechanisms for managing the system accommodate for the new ways that human or operator error can affect the operation of the system.

Do you have any anecdotes that highlight how a new bigger and better component required your team to change other parts of the system or procedures to mitigate new types of problems?

Is peer code inspection worthwhile?

Wednesday, April 6th, 2011 by Robert Cravotta

I am a strong believer in applying multiple sets of eyes to tasks and projects. Design reviews provide a multi-stage mechanism for bringing independent eyes into a design to improve the probability of uncovering poor assumptions or missed details. Peer performed code inspection is another mechanism to bring multiple sets of eyes to the task of implementing software code. However, given the evolution of automated code checking tools, is the manual task of inspecting a peer’s code still a worthwhile task?

Even when tools were not readily available to check a developer’s code, my personal experience involved some worthwhile and some useless code inspection efforts. In particular, the time I engaged in a useless code inspection was not so much about the code, but rather about how the team leader approach the code inspection and micromanaged the process. That specific effort left a bad taste in my mouth for overly formal and generic procedures for a task that requires specific and deep knowledge to perform well.

A staggering challenge facing code inspectors is the sheer magnitude of software that is available for inspecting. The labor for inspecting software is significant and it requires a high level of special skills and knowledge to perform. Tools that perform automated code inspections have proliferated, and they continue to improve over time, but are they good enough alternative to peer code inspections? I like Jack Ganssle’s “Guide to Code Inspections”, but even his final statement in the article (“Oddly, many of us believe this open-source mantra with religious fervor but reject it in our own work.”) suggests that the actions of software developers imply that they do not necessarily consider code inspections a worthwhile redirection of development team’s time.

Is peer-based code inspection worthwhile? Are the automated code inspection tools good enough to replace peer inspection? When is peer inspection necessary, or in what ways is automated inspection insufficient?