Our Testing the Limits guest this month is Bob Binder. A noted author, consultant and speaker, Bob is currently the President of System Verification Associates. With experience in startups and the enterprise, Bob has become a well-known expert on object oriented testing, mobile app testing, test automation and many other areas. For more on his body of work, check out his website, read his blog or follow him on Twitter.
In this in-depth interview, we ask him about his latest mobile app testing course; the biggest testing mistake he ever made; the basics of model-based testing; the cost of open-source tools and much, much more. Enjoy!
uTest: You’ve received a lot of attention recently from your extensive course on How to Test Mobile Apps. What prompted you to design this course? And what was the biggest thing you learned about mobile app testing from this process?
BB: I’ll take the last question first.
The mobile app space is an unprecedented phenomena in many ways. I just finished a study of mobile apps for a certain aspect of driving (cars) — I found about 250 on iOS and Android. In the two weeks it took to complete the study, four simply vanished from their portals and many others were updated. Churn is very high and release cycles are on the order of weeks to months.
Most of these apps were the work of a single person – let’s call them app artisans. The remainder were authored by a big business or as part of a startup product or service. However, some artisan apps had millions of downloads and four or five star ratings.
The course is an attempt to provide something useful to this community. Having worked on mobile app testing since 2002, when I heard that millionth mobile app had been released in December 2012, I wondered “what kind of testing has been done on these apps?” My guess was, very little.
It seemed to me that artisans could benefit the most from good testing, but would probably have little inclination and time to do it. So, the challenge was to produce a course that assumed no prior knowledge of software engineering or testing and that did not require any tool support.
Instead of superficial hints to “explore” an app or platform-specific coding tricks, I provide very specific step-by-step guidance to develop a complete and reusable mobile app test plan. This approach is all manual and can be easily repeated for apps supported on multiple platforms (Android, iOS, etc.)
uTest: We don’t expect you to give away the good stuff, but what are some of typical mobile app challenges your course will help companies overcome?
BB: App artisans often have a good intuitive sense of “coolness.” But they don’t as often have an appreciation of how easily dependencies and oversights can lead to both annoying and catastrophic bugs. And, they don’t know how to be systematic in searching for these bugs. I show how to be systematic, efficient, and effective. I invite participants to use their own app to learn the techniques step-by-step.
Although great strides have been made in network reliability and coverage, mobile apps are subject to a much wider range of external events (suddenly, no bars) than desktop apps. The course includes a checklist of event conditions and explains how to generate events. Cast iron cookware can be handy to generate some event scenarios.
The huge number of combinations is still the main headache. I developed event-feature coverage as reasonable compromise. This requires that (1) every external event is triggered at least once and (2) every use case is disrupted at least once with an external event. I provide a worksheet to map this out and assign scenarios to test cases.
uTest: Fill in the blank: The biggest weakness in the way mobile apps are currently tested is ____?
BB: … the tendency to only exercise happy paths and “cool” stuff. Before uTest and its competitors offered crowdsourced coverage of device/platform/locale/network combinations, I would have said the inability to cover a significant number of “mobile testing nightmare” combinations.
uTest: You were awarded a patent for your “model-based testing of mobile systems.” Can you give our readers a quick overview of the problems this system solves? Why is the world a better place with this system?
BB: The testing technology described in the patent can be used to achieve high confidence that mobile apps will be reliable end-to-end for a large user population.
To truly evaluate a multi-user, client/server mobile app, you need to have a realistic exercise of features, controlling and observing both a multiplicity of end-points as well as the server side(s). This can reveal functional bugs. But, in contrast to fixed endpoints, there’s another dimension that determines overall system reliability: mobile endpoints move as they send and receive data. Owing to the vagaries of radio transmission, this means some bugs are only triggered as an endpoint moves through space. Also, location-specific behavior is unique to mobile endpoints. In a large population of mobile endpoints, movement will exert varying loads on the wireless network infrastructure. So, besides testing features at the top layer, the lower data link layers must be controlled so that the entire system is evaluated under the full range of location-sensitive data link variation – in effect, a testing sandwich.
The patent describes how to achieve all this. We worked for two years to build a prototype. It included a model-based test generator that produced test suites with user behavior (endpoint inputs) interleaved with movement through a simulated space and data link conditions at each position. It could generate a test to take away all your bars or add nasty noise just when you were going to, say, transfer money, send a text, or watch a sports video. We built a distributed test harness that simultaneously applied app-layer test inputs to an endpoint and commanded a software defined radio to emulate the effects of its movement on lower layers. Virtual mobiles were used to ramp-up the total number of endpoints. The test harness also controlled the server side (e.g., load/restore a database.)
We published our research results and demoed the prototype at the 2005 ICSE. After that, we commercialized the distributed test harness and shipped six releases, as “MTS.” Our customers put MTS to good use, for example testing the client app on a handheld scanner used daily by tens of thousands of package deliverers, supporting a national logistics system. We also developed a controllable wide-area network emulator that could inject noise and delay into any IP packet traffic. This allowed testing of transmission paths that had variable multi-second delays corresponding to endpoints on the other side of the world and impairments from sun spots, bad equipment, etc. But both endpoints were located in the same physical lab. Just like the prototype, this was controllable with the same test objects that drove the user interface or backend servers.
As a general purpose test harness, MTS blended the best features of the xUnit pattern (like Junit) and TTCN to provide controllable test objects. In contrast to the limited one-track xUnit execution strategy, MTS provided complete flexibility in test execution logic. It provided a TTCN style adapter framework to separate physical and logical interfaces, allowing simpler and more robust test objects.
So, we solved a lot of hard problems, published research results, and created value for our customers.
uTest: Another one of your areas of expertise is open-source test tools. Not long ago, you said that “they’re free, but they’re not cheap.” Can you give our readers an example of this axiom in action? In other words, what is the big downside to open-source test tools?
BB: I still don’t completely understand the economics of open-source software. I hope Steven Levitt (Freakonomics) takes a look and figures it out. The wisecrack is based on my experiences with open-source. There are many open-source systems that can be used out of the box to very good effect – that’s the “free” part. But what happens if you run into a bug or need something new? If you’re lucky, you can convince the maintainers to work on your problem. If you’re not, then you either have to find a workaround or become a maintainer yourself – that’s the “not cheap” part. I don’t know of many development organizations that aren’t already maxed out with their own codebase. Unless mod is trivial, you have to learn the open-source codebase internals and figure out how to change it without breaking it. Of course, once you’ve done that, you’ve just committed to maintaining your version of that “free” codebase in addition to what you had to begin with. This can be a nasty surprise.
uTest: You’ve worked in the start-up world and with giant companies like Microsoft. How does your approach to successful testing differ based on company size? Does it stay the same in any way?
BB: In a large organization, there are more stakeholders and often a lot of in-house proprietary technology. So, you have to first understand how to align goals for these players as well as adapting testing strategies and technology to their existing environment. In smaller organizations, all this is generally simpler. Testing burns money and it has to earn its keep. I always seek to optimize what test can do, but that’s specific to each situation. The client decides what reliability they have to have, how much they’re willing to spend to get that, and what kind of disruption they can tolerate getting there. Sometimes, education about these issues is helpful. I bring an ever-increasing bag of tricks to every situation. The puzzle is how to use available people, tools, and techniques to deliver the best possible result within the constraints of a particular situation.
uTest: You have written a lot about object oriented programming and are well-known for your book “Testing Object-Oriented Systems”. What are some of the new problems that object-oriented programming causes for testers (as opposed to procedural programming)?
BB: Testing is mostly a search problem – we know bugs exist, but we can only guess where they might be hiding. To the extent we have clues about those hiding places, we should use them. OO languages have three essential differences from procedural languages: inheritance, polymorphism, and intra-class dependencies. This results in well-understood but non-obvious dependencies. It is relatively easy to write OO code that seems to be correct but is very buggy in obscure ways. Unless unit tests are written to seek these bugs, it is also easy to write, run and pass many tests on very buggy code.
So, we should use test strategies that target bugs that are likely in OO software. My book provides fifteen test design patterns that do just that. My favorites are Invariant Boundaries, Modal Class Test, Modal Hierarchy Test, Polymorphic Server Test, and Class Association Test.
The Percolation pattern shows how to use built-in test to achieve robust white-box frameworks. It also shows how subtle OO interactions can be.
uTest: You worked on two big projects – one with the Chicago Board Options Exchange and the other with Microsoft – where there was a lot at stake. When did you feel confident that you had tested thoroughly enough?
BB: The Microsoft project used a requirements coverage goal established by a Federal Court to validate documentation of hundreds of Microsoft server APIs. There were over 60,000 total requirements –some APIs had several thousand. As each API was tested, a few requirements turned out to be untestable owing to technical problems. One of my roles was to scrutinize testers’ model-based test suites and test runs to see if all testable requirements had adequate tests and that any untested requirements were truly infeasible. For example, when encryption was required, it was not possible to completely check the contents of over-the-wire packets. So, if after study and discussion with the testers, I was convinced that a requirement was infeasible and the government could be convinced of that, we would waive coverage. If not, the testers were obligated to revise the test suite.
At the CBOE, I developed what I call multi-dimensional testing. Essentially, this means generating realistic and highly varied test inputs which are applied at varying rates, thereby achieving functional, performance, and stress testing in the same run. Multi-dimensional testing is very good at revealing nasty bugs that hide until you reach an unusual condition under a high load. For this, I developed a model-based testing tool that used a statistical usage profile to generate tens of millions of unique and realistic test inputs, and also used a load profile to change the rate at which these inputs were submitted. We also developed an automated oracle to evaluate the result of each test input. With this, we generated and ran many large test runs, each a unique blend of functional and stress. When these all passed without any failure on the release candidate, I was confident that the system under test was stable. In the first six months of production, zero failures occurred. Since then, the system has been handling ever-increasing loads and, unlike some other exchanges, has not once made headlines with a big outage.
uTest: Fill in the blank: The biggest mistake I ever made on a testing project was _____.
BB: Generating data-driven GUI scripts that read test inputs from a generated test data file. At the outset, this seemed like the cleanest software design to drive GUIs with model-based test suites. However, separating test data from the scripts made them hard to understand and debug. Brute-force turned out to be more practical: generate scripts with all test data in-lined. Although very big script files resulted, they were much easier to debug and hand-hack for unusual conditions. We didn’t try to maintain these scripts – we simply tweaked model parameters and re-generated.
uTest: As a frequently cited author, we’re curious which writers and bloggers you read on a regular basis?
BB: I regularly read IEEE Software, IEEE Computer, Communications of the ACM, and the Journal of Software Testing, Verification, and Review, both print and online. I’ve been following Grady Booch’s new project, Computing, fairly closely. http://computingthehumanexperience.com/
uTest: Your blog is always very up-to-date with the latest trends and topics. What’s the next BIG THING in the field of software development that no one is talking about yet?
BB: Alien abductions. (Editor’s note: I can’t wait to hear this as the keynote at a QA conference!)
uTest: What’s Robert Binder doing when he’s off duty?
BB: I like reading history: I’m currently about half-way through A History of the World in 100 Objects. Judith (“26 years of wedded bliss”) and I like hiking. We’re planning our first water hike – a kayak excursion on Lake Michigan — this month. I’m working on passing the U.S. Army physical fitness test. I need to drop a few minutes from my two mile run for that, so I’m now running about twelve miles a week, including two days of interval training. I like to BBQ – I think my steaks come out better than Gibson’s and my Texas smoked brisket is getting better. For mindless diversion and some campy laughs, I watch Supernatural; I won’t miss any episode of Justified. I’m a big fan of the Chicago Bears, White Sox, and Black Hawks.
Editor’s Note: We hope enjoyed our interview with Bob Binder. Please share your thoughts and reactions in the comments section. Until next time, happy testing!