Traders typically have some firm beliefs on back-testing. Some believe it’s required, others think it’s useful, and there’s a group that thinks back-testing is a way to fool yourself. I’ll give you some of my thoughts on why I believe that back-testing is a crucially important aspect of understanding your system. Even an experienced trader should carefully consider the results of a detailed back-test when taking on a new strategy or operating in a new time frame so that he or she is not misled. I’ll also provide some of my views on the limitations of back-testing because they are significant and not to be overlooked.
Back-testing can take many forms: on trading platforms, in back-testing software, on paper, over long periods of time or only in specific market types. Rather than spending time considering the types of back-testing, however, I think your level of back-testing is a more important consideration.
In some cases, experienced traders may only need minimal back-testing to be convinced that an idea is worth trading with live money at a reduced risk level if it is similar to systems previously traded that were reliable. In other cases, they may want to fully check out a new idea before trying to trade it live—even with small position sizes.
Properly constructed, back-testing will identify whether or not an idea has a persistent edge and under what conditions that edge will manifest. By properly controlling different parameters, we can isolate the ones that add the most value to this particular proposition. We can test for robustness and see how sensitive the edge is to changing parameters.
We may be able to identify specific market conditions where the edge is significant and tradable. We may be able to identify a subset of the total market trading targets in which this idea works best.
A particular edge may not convince some people unless they see it work over multiple time frames, in multiple markets and in all different market conditions. Others are satisfied that an idea only has to work within the definable set of parameters to be tradable. This is a matter of beliefs and personal taste as back-testing is not an academic exercise in the pursuit of the absolute truth. We’re traders, we want to make money.
What Back-testing Can Tell Us
Back-testing should tell us the likely win rate percentage, the importance of slippage and commissions, the trading frequency, the maximum adverse excursion, the longest normal winning and losing streaks, and the maximum and average figures for both wins and losses.
One of the most important result sets for analysis is the distribution of results in the form of a frequency histogram. We would like to see a somewhat normal distribution that has most of the trades clustered around the mean with an orderly profit tail to the right that suggests we have the possibility of large winning trades. We would also like to see a carefully controlled left tail of losses, which suggests that we are able to engineer our risk carefully.
Under these kinds of conditions and looking for this kind of information, back-testing is an important part of the trader’s repertoire. Having robust back-test data in hand allows us to determine where, when and under what conditions this idea is tradable and the expected results. When we proceed into live market trading as a prototype system with reduced real risk, we can then compare actual results with live money to see if the trade can be managed as intended.
Limitations of Back-testing
As much as back-testing can reveal, it can also fail to reveal that one overlooked truth can be mortally dangerous for traders. Back-testing results provide historical performance but they do not forecast how a system will perform now. This is perhaps the biggest challenge with back-testing.
Even if a trader performs a rigorous back-test with full knowledge of the limits of its ability to forecast into the future, it’s common to see a large discrepancy between back-test results and actual results from live trading. This phenomenon has several potential sources. Sometimes, back-tests are conducted in isolation and not as part of a full portfolio of strategies. Often back-tests may use optimistic values (though they may have seemed realistic at the time) for slippage and commissions. Many times back-tests are simply unable to integrate the human dimension of executing a set of trading rules. Experience shows me that this is perhaps one of the most important aspects overlooked when evaluating back-testing results.
Regardless of the reasons, some traders place so much reliance on their back-testing that very different live trading results fail to convince them they missed something important in their back-test. They might persist in trading a system that will simply not work in the real world. This situation illustrates the combination of an overconfidence bias and the need to be right.
Professional engineers and doctors are especially prone to this problem because of their belief systems and testing experiences from their previous professions. Those professions place a premium on being right to be successful. Yet in trading, the ability to act with incomplete knowledge and the willingness to be wrong can lead to excellent trading results.
Also, there is always a real danger of curve fitting and data mining to find a perfect system that would have worked with an exact set of market conditions in the past. The obvious predicament that thinking poses is that those conditions can never occur again in the future. Perfect fit system thinking neglects the reality of the market as a complex adaptive system that never shows you the same face twice. It also exposes the problem of over reliance upon the power of computation.
Back-testing offers many powerful advantages to the professional trader. At the same time, it must be undertaken with full recognition of the limits of its usefulness. The professional trader will take back-testing results into consideration as a way to select particular systems to prototype. In the prototype phase, the trader fully engages real money, real markets and the human factors where you can generate real world results. If the real world results prove attractive, then you can commit the system to full position sizing methods or “full production mode.”