Testing to Scale: Interview with Neil Baliga

Photo credit: Photo by Hunter Harritt on Unsplash

Sometimes things fall throught the cracks and I was so happy to rediscover a piece which didn't get published yet. I recall first meeting Neil Baliga President of Verifide Technologies, Inc. at a conference several years ago. Later I had interviewed him as part of the research for my book, Space Is Open for Business. We had shared a concern as to how many of these space hardware companies were going to scale. Space sector had long been focused low volume, highly bespoke machines. New Space organizations aspired to do things faster and differently. While some of the original interviews didn't make it to the book, Neil took it upon himself to reflect upon our conversations and write a piece that's contains some of my questions that posed.

The difference between verification and validation: why knowing the difference matters?

The concept of Verification and Validation is at the heart of the ISO 9000 quality standards and there is a significant difference between the two. A computer science professor (Barry Boehm) came up with an easy summary for this as follows:

Verification: Did we build the product right?

Validation: Did we build the right product?

Verification is the process of making sure the product meets specification and requirements. Verification does not pass any judgment on whether the specifications themselves are acceptable for the product's use case.

Validation, on the other hand, is to ensure that the requirements are correct and adequate to meet the user's needs. This is often more difficult than verification because feedback is essential to achieve validation. One of the benefits of Agile development methods is that both Verification and Validation are performed at each iteration thereby ensuring that the project does not go too far without some feedback and validation.

Knowing the difference between the two is important because they are separate and equal parts of a product's qualification. Going through product development and Verification only to find out it doesn't meet the intended use case is a recipe for failure. It is also essential that the entire product development team understand these concepts because the validation requires participation from marketing, field engineers, and support personnel as well.

For early stage companies without sufficient number of users, Validation of the product is more difficult, which is why user growth is often favored over revenue in many business circles.

How and why testing ought to matter to an investor who is looking at various space opportunities?

Investing in companies carries risk. Investing in technology companies carries additional risks; Risk that the product has technical challenges that it cannot be manufactured, risk that the product does not meet the market needs (volume, performance etc.), risk that production schedules prevent ideal time to market, and risk that the product quality is low and will adversely affect the company's image. For early stage companies, such stumbles out of the gate can be a major hindrance to its ability to grow. Even an established company like Samsung has taken a major hit in stature with its Note7 battery problems.

Testing is critical to alleviating these risks and to ensure that customers, especially early ones, are satisfied with the product and willing to give endorsements that fuel the growth stage of the company.

Space technology is a special case; Many new space companies are venturing into new technologies that have not been proven and as such testing will be integral to proving the merits of the business model and to find problems early on and correct for them.

There are two aspects of the new space industry that are of particular importance for testing:

High Volume Production

Testing is a time-consuming process and is a big contributor to the cost and schedule in manufacturing of space products. With space technology, the ability to diagnose problems after launch are extremely limited.

Newer space companies are primarily focused around smaller and higher volumes of devices and a lot of them are doing their own manufacturing. When both the product and the supporting production environment are brand new, it is to be expected that there will be problems in manufacturing and will take a good chunk out of the throughput.

A lot of the new space companies are basing their entire business model on a certain cost model for each unit. To meet these schedules, they need to have highly efficient testing infrastructure to be able to execute, analyze, and disposition results. Without such efficiency in testing, the time and cost to manufacture each unit will be much larger and in worst of cases result in product failures.

Growth stage companies often make the miscalculation on the impact of testing on their production schedules as they grow from an R&D concept phase to a full-blown production phase. Ad-hoc processes for testing and analysis may work in the R&D and prototype phases, but will not allow a company to scale fast enough, especially in a competitive industry. For example, Boeing Dreamliner planes are about 4 years behind schedule, but they’re partially getting away with it because there isn’t significant competition other than Airbus. This would not be true for a smaller company who will have lost significant market share.

High Iterations

Another aspect for an investor to be aware of with new technology is frequent iterations on the product, as is done with Agile development methods. This is common in early and growth stage companies; and for good reason. Frequent iterations allow incremental refinement of the product, periodic validation, and a better-quality product over time.

The frequent iterations, however, crash with the realities of manufacturing complex products.

Firstly, these iterations usually impact the product design which triggers repetitive rework, for each iteration, in the production and testing infrastructure. For example, tests written to v1.0 of the product will not be sufficient to test v2.0 of the product. Different test equipment hardware and production systems may need to be built for the v2.0 iteration. Having good test infrastructure will allow them to handle these iterations better with less repetitive development (software bugs etc.)

Secondly, the adage "if it ain’t broke, don't fix it" applies so appropriately here because each iteration gives birth to new risks which were painstakingly retired in the previous iteration. Testing is the only thing that can alleviate these risks.

In the software world where Agile methods were conceptualized, high levels of automated and regression testing are at the heart of these concepts. Each iteration of the software requires engineers to rewrite the unit tests to support regression testing as the product is iterated. Absent good testing infrastructure, iterations would be far between and defeat the very purpose of Agile. For new space companies looking to employ Agile methods, efficient test automation and tools need to be an essential part of the product plan.

Looking out 5 - 10 years: Would you be open to share where you envision testing trending?

I believe there will be three trends in the future of testing with some overlap in their adoption. What will drive these are the new norms of tighter schedules, lower cost due to increased competition, higher labor costs, and higher reliability expectations as we start to depend on technology more than ever before.

Built-in test

Testing products prior to shipment requires a lot of equipment and infrastructure. Expensive test systems are used to execute test plans and verify the product to specifications. Once the product is in the field, many devices are fitted with diagnostics and self-test capability to be able to do some level of measurement and analysis without the need for heavy test system infrastructure. Though this ability allows some level of disposition, it is very limited by the quality, grade, and number of the sensors on the device itself.

A trend I believe is coming is that Moore's law and advances in technology in respective industries (RF, Semi etc.) will allow for higher grade of measurement capabilities to be hosted on the device itself. This will mitigate the need for complex and expensive test equipment hardware and software.

This is the most ideal case where the device can test itself because factories can diagnose and even repair devices remotely without the need for material returns etc.

For space-based systems, Built-in test would be even more useful since the ability to bring the device back to the factory is almost impossible. This has not happened, in large part due to the cost of additional weight and equipment to measure data, CPU and memory requirements to perform analysis, bandwidth limitations etc.

There is some overlap here with the Internet of Things (IoT) trend. Large number of sensors feeding data for real-time evaluation of health and to aid in diagnostics. A secondary and evolutionary trend to Built-in test would be the ability to prevent the failure in the first place. On a technical level, this is very doable, but will require large amounts of sensors and analytical models to make it happen.

Test Data Analytics

At a very basic level, testing is about measurements that collect data and decisions made upon that data. Traditionally, a test will measure some quantities, evaluate them against a set of limits and determine whether that test has passed or failed. This simplistic method of testing gives a first-pass confidence that the device passed specifications.

Even though the pass or fail disposition says something, it is the actual measurement data that tells a much more intricate story of the device. High volume manufacturers already use statistical processing metrics on measurement data like Cpk Analysis to get more deterministic confidence bounds and predict failure rates.

Thus far, test has been mostly associated with test equipment and test code, however the trend is definitely moving towards data being at the center of testing. In the next 5-10 years, when the Sensor Revolution as they call it takes a stronger hold, the ability to measure will be significantly improved and will generate lots more data than traditional test has needed to handle for pass/fail disposition.

Coupled with large volumes of data will be the need for real-time analysis and insight. It will not suffice to collect a months' worth of data and spend the next month trying to make sense of it. Everything from automated cars, homes, satellites, and even weapons of war will need to harness large amounts of data and provide insight on a timely basis. Data centric algorithms and frameworks will take the place of traditional measurement equipment and test code which will cede market share to norms in measurements and sensors.

Let it Fail

There is a current (negative) trend which may gain steam where failure is expected and redundancy takes the place of quality. In such a model, the system and its capacity is planned around failure rates that it can tolerate with redundancy. For example, a satellite constellation may have more satellites than needed in orbit to handle periodic failures. This is not new, manufacturers always ship with a known process capability, however this trend seems to be moving towards a much higher tolerance of failure in industries where such tolerance was not previously acceptable.

The cause of any particular failure can be due to requirements, design, process, or testing. The let-it-fail trend is a dangerous one because it assumes the cause of failure will be process related. When failures are instead due to design, no amount of redundancy will be protective enough because the flaw is deeper and embedded into all units. This was the case with the Samsung Note7 battery where ALL phones had to be recalled and the product had to be retired. With new technology and early stage companies, the rate of failure is not known and there are no analytical models that are relevant, which makes the Let It Fail strategy even more risky.

In some products, failures can be tolerated to the extent that there are not catastrophic secondary and tertiary failures (e.g. BP Oil Spill - blowout preventer part failed). However, in high technology products, the loss is much larger than the sum of its parts. It takes time and money to figure out what failed, a redesign may be warranted, new product lots may be held from release, and results in a much larger cost in terms of money and schedule. The SpaceX rocket explosion in Sept 2016 halted its launch operations for 4 months. Even with high redundancy, it is problematic to assume that failures will be easily diagnosed and production will move on without impact.

Learn more about Neil and his business, Verifide Technologies:

https://www.linkedin.com/in/neil-baliga-15121/

https://www.verifide.com/web/

If you would like to send me a tip you can can via easily through this link --> https://www.buymeacoffee.com/robertjacobson

Subscribe to Robert Jacobson's newsletter.

Testing to Scale: Interview with Neil Baliga

Thanks!