AdaCore issued the following statement:
Last week an error in some automated high-frequency trading software
from Knight Capital Group caused the program to go seriously amok, and
when the cyberdust cleared, the company was left holding the bill for
almost a half-billion dollars to cover the erroneous trades. Much of the
ensuing uproar has cited the incident as rationale for additional
regulation and/or putting humans more directly in the decision loop.
However, that argument is implicitly based on the assumption that
software, or at least automated trading software, is intrinsically
unreliable and cannot be trusted. Such an assumption is faulty. Reliable
software is indeed possible, and people's lives and well-being depend on
it every day. But it requires an appropriate combination of technology,
process, and culture.
In this specific case, the Knight software was an update that was
intended to accommodate a new NYSE system, the Retail Liquidity Program
that went live on August 1. Other trading companies' systems were able
to cope with the new NYSE program; Knight was not so fortunate.
It's clear that Knight's software was deployed without adequate
verification. With a deadline that could not be extended, Knight had to
choose between two alternatives: delaying their new system until they
had a high degree of confidence in its reliability (possibly resulting
in a loss of business to competitors in the interim), or deploying an
incompletely verified system and hoping that any bugs would be minor.
They did not choose wisely.
With a disaster of this magnitude - Knight's stock has nosedived since
the incident -- there is of course a lot of post mortem analysis: what
went wrong, and how can it be prevented in the future.
The first question can only be answered in detail by the Knight software
developers themselves, but several general observations may be made.
First, the company's verification processes were clearly insufficient.
This is sometimes phrased as "not enough testing" but there is more to
verification than testing; for example source code analysis by humans or
by automated tools to detect potential errors and vulnerabilities.
Second, the process known as hazard analysis or safety analysis in other
domains was not followed. Such an analysis involves planning for "what
if..." scenarios: if the software fails, what is the worst that can
happen? Answering such questions could have resulted in code to perform
limit checks or carry out "fail soft" procedures.
The question of how to prevent such incidents in the future is more
interesting. Some commentators have claimed that the underlying
application (calculating trades within microseconds to take advantage of
fraction-of-a-cent price differentials) is simply a bad idea that
frightens investors and should be banned or heavily regulated. There are
arguments on both sides of that issue, and we will leave that discussion
to others. However, if such trading is permitted, then how are its risks
to be mitigated?
To put things in perspective, in spite of the attention that the
incident has caused, the overall system -- the trading infrastructure --
basically worked. Certainly Knight itself was affected, but the problem
was localized: we didn't have another "flash crash". We don't know yet
whether this is because we got lucky or because the "circuit breakers"
in the NYSE system were sufficient, but it's clear that such an error
has the potential to cause much larger problems.
What is needed is a change in the way that such critical software is
developed and deployed. Safety-critical domains such as commercial
avionics, where software failure could directly cause or contribute to
the loss of human life, have known about this for decades. These
industries have produced standards for software certification that
heavily emphasize appropriate "life cycle" processes for software
development, verification, and quality assurance. A "safety culture" has
infused the entire industry, with hazard/safety analysis a key part of
the overall process. Until the software has been certified as compliant
with the standard, the plane does not fly. The result is an impressive
record in practice: no human fatality on a commercial aircraft has been
attributed to a software error.
High-frequency trading is not avionics flight control, but the aviation
industry has demonstrated that safe, reliable real-time software is
possible, practical, and necessary. It requires appropriate development
technology and processes as well as a culture that thinks in terms of
safety (or reliability) first. That is the real lesson to be learned
from last week's incident. It doesn't come for free, but it certainly
costs less than $440M.
About AdaCore
Founded in 1994, AdaCore is the leading provider of commercial software
solutions for Ada, a state-of-the-art programming language designed for
large, long-lived applications where safety, security, and reliability
are critical. AdaCore's flagship product is the open source GNAT Pro
development environment, which comes with expert on-line support and is
available on more platforms than any other Ada technology. AdaCore has
an extensive world-wide customer base; see http://www.adacore.com/home/company/customers/
for further information.
Ada and GNAT Pro see a growing usage in high-integrity and
safety-certified applications, including commercial aircraft avionics,
military systems, air traffic management/control, railroad systems, and
medical devices, and in security-sensitive domains, such as financial
services. The SPARK Pro toolset, available from AdaCore, is especially
useful in such contexts.
AdaCore has North American headquarters in New York and European
headquarters in Paris. www.adacore.com
