The original domain of the TEMA toolset was the S60 UI framework. We modeled eleven of the built-in applications, including Calendar, Gallery, Music Player, Flash Player, Notes, Voice Recorder, Log, File Manager, Telephony, Contacts and Messaging. All in all, the original model library contains about 110 action machines with thousands of states and transitions. The results of the case study have been discussed in more detail in: Automatic GUI test generation for smartphone applications - an evaluation.

Facilitating an automatic test execution was problematic due to S60's poor support in that context. After trying a number of tools with poor success we handled the automation with a proprietary keyword-based test execution tool from one of our partners. Communication between the TEMA MBT test engine and the test automation tool required a simple integration code. However, even this tool had some accuracy/maintenance problems caused by the use of optical character recognition in the GUI verification.

During the case study, we found 21 bugs of different severities and priorities. Some of the bugs were detected in multiple phone models. The most severe of the found bugs caused the application to hang and show a "System error" message on the display.

The first case study taught us one thing that has been confirmed in the later studies: modeling is an efficient testing method. In the S60 case, about two thirds of the bugs were found during modeling an existing application while the remaining bugs were found in a random type of test execution.

The S60 case also showed us another benefit of our approach, which was finding concurrency related bugs when using long period random tests. For example, performing some multimedia-related functionality in one application and then switching to another application caused unexpected behavior in some circumstances. The concurrency features in our modeling approach are semi automatic: the user defines states in the application models where the application can be set to the background (switching application is possible only when the state of the application will not change while it is in the background). When the test engine reaches such states during test execution, it has a possibility to set any other application that was included in the test to foreground. This benefit gave us an idea for a specific test generation algorithm that pursues to cause unique switches between the tested applications.

When a long period random-based test run ends, the tester must find out what the reason for the stop was. Because the test engine logs each action taken from the model during the test execution, the test is easy to repeat, but finding the actual cause from a test that possibly runs for hours can be complicated. To debug the failures, we used two approaches. The simple approach, which is only possible in GUI testing, is based on recording the SUT to video and synchronizing actions from the test log as subtitles. That way you can easily see when the SUT no longer corresponds the model. The second approach, called the trace incrementation, searches for the minimum subsequence of the original test that still repeats the error. The method starts by executing the final action of the original test and then gradually increments the test until the bug has been found again. As an example, the "System error" bug mentioned earlier was originally found in a test run that lasted for about 110 minutes and was about 1850 keywords long. Using the trace incrementation method, we were able to shorten the test to about 100 keywords that can be executed in a few minutes. Debugging topics have been discussed more in Debug support for model-based GUI testing.