Android

The Adroid case introduced yet another platform and new test automation environment. The tested Android device was a prototype version, from where we modeled the Messaging, Contacts and Calendar applications. Test automation was based on a commercial tool by Symbio called A-Tool. With A-Tool controlling the device we used the OCR functionality of Microsoft Office Imaging to verify the status of the GUI from screenshots.

Later on we developed our own test automation for the Android Emulator (can also be used in hardware unlocked devices with security features off), that uses an API to read the contents of the GUI. Controlling the device was handled with the Android's Monkey testing tool. Using this adapter we were able to improve accuracy that was a problem with OCR. For the emulator, we modeled Messaging, Gallery, Camera and a news reader application called BBC News Widget. The test automation tool is available in our release package.

In the modeling perspective, the Android case taught us something about model reuse. Some of the modeled applications had already been modeled for S60 (Messaging, Contacts) so we were able to reuse the high-level action machine models. For example, the same action machine models of Messaging were used for three different devices: S60 3rd edition and two different Android versions (the Messaging in the prototype Android device differed from the vanilla Messaging used in the emulator). However, the messaging in Android had couple fairly large differences between the S60 version (e.g. grouping messages to threads), made using the exactly same action machines with S60 somewhat cumbersome. We think that the model reuse will be most efficient when used between different versions of the same product line (e.g. different Android versions). In any case, the S60 models were a good foundation for the Android models.

The BBC News widget was a fruitful testing target. We found eight bugs during modeling and six while running randomly generated long period tests. Two of the bugs caused the application to crash (one of them is shown in the video below). More information about the found issues can be found here. Android cases have been discussed in two publications: Model-based GUI testing of Android applications and Experiences of system-level model-based GUI testing of an Android application. The slides of the latter presentation can be downloaded from here.

Running simultaneous test runs

To save time, it may be a good idea to execute multiple test runs at the same time. This is possible if there are multiple devices available (or if using the Android emulator, which allows multiple instances to be created).

A problem with running multiple tests is that the test runs are likely to cover much of the same items if they don't know about each other. One way to tackle this problem is to make the test generation algorithms aware of each other via shared information.

We ran a little experiment of simultaneous testing using the Android models. The following test generation algorithms were used.

AlgorithmDescription
RANDOMSimply selects a random transition leaving the current model state to be executed.
TABULike RANDOM, but selects among transitions that have not been executed before. If there are no unexecuted transitions leaving the current state, one of the already executed ones is selected.
SHARED_TABULike TABU, except that the set of executed transitions is shared by all the simultaneous test runs.

The following two figures show the improvement caused by the shared information used by SHARED_TABU. They show the results of running 1,2,3,4, and 5 simultaneous tests. Each of the test runs executed 250000 transitions, so two simultaneous tests executed 500000 in total, and so on.

The first figure shows how many unique (= not executed before) transitions were executed. One run of TABU and SHARED_TABU is similar and above random, as expected. For two or more simultaneous runs, the shared version is clearly better at executing unique transitions. The non-shared algorithm's performance drops to the level of RANDOM when running five tests at the same time.

The number of unique transitions executed.

The second figure shows the same data as a share of unique transitions relative to the total number of transitions. The ratio of SHARED_TABU drops, but not as much as that of the plain TABU's.

Uniqueness ratio of transitions