CAPS Scoring Tweaks - Please break them
August 18, 2011
– Comments (46)
Having posted numerous times over the last year that we are analyzing how picks and scoring are working, I wanted to share with the CAPS community some of the general results and a few of the ideas as to how to nudge the scoring metric back to reflect the original intent. The idea behind this and future blogs is to help answer some of the questions about why we haven't pursued some paths already, show some concise demonstrations of certain issues (such as accuracy) that players have expressed concern with, and propose some possible tweaks to the system that might eleviate those issues. With the tweaks, what I really hope to find are people who are willing to think about how they would game the new system and seek to break it in certain ways. As always, we don't want to introduce new issues that are just as troublesome by solving old ones.
In the beginning.......
Admittedly I wasn't working for TMF when CAPS was first envisioned, so I'm making up any specifics here, but the general idea was to create a stock picking community that was fun and engaging to participate in, and to aggregate the information into a stock rating system. One of the key ideas here was to keep it engaging, which among other things required the construction of incentives for players to keep making picks. This isn't the level of incentives that get people battling it out for the top fool quote spot, but rather the incentive for people who might or might not want to participate to jump in and get their feet wet with stock picks. To do this require two basic components: 1) make the potential CAPS player ranking grow with the number of successful picks, and 2) make the scoring metrics sufficiently transparent that it was easy to figure out how your score is being calculated.
So what was the metric that was created? Anyone truly worried about this topic has probably got this memorized. The raw score from each pick is accumulated in an open ended fashion, so it can grow or shrink to unlimited size. This is the aspect of the scoring which draws people in to keep making picks that they believe will beat the market, even if just by a little, because they can grow their lifetime score. Because the cumulative raw score would encourage picking any ticker that a player believed had the slightest chance of beating the market, a closed metric was introduced to calculate an average value component as well. This was the accuracy score, whose intention was to provide a guide to the likelihood of an individual pick that was made exceeding chance (hold your objections - we'll get to the problems and tweaks to accuracy below). This bounded metric, if it were working as intended, would get players to balance the shear volume of picks made for points with their ability to consistently pick stocks that beat the market. Thus someone might opt for picking fewer stocks that they have great confidence in to move up on the accuracy metric, whereas a different player might pick a large number of stocks that they have moderate conviction in to garner points.
Reality sets in......
In the early stages of testing the scoring metrics, it rapidly became obvious that some level of minimum positive performance had to be set to determine positive accuracy, since it was simple to capture miniscule price fluctuations that always ended up positive. The threshold of +5% was set on an empirical basis, and has been the subject of much discussion and analysis since then. The problem that this threshold exhibits is that with the level of volatility in the markets over the past several years, exceeding this threshold in very short periods of time has become a fairly straight forward endeavour, to which quite a few players have systematically set out to exploit. I don't say exploit in an overly negative sense - it is an artifact of the scoring metrics which simply "is" - but it has potentially served to distract these players from examining and picking stocks with other types of strategies which were historically beneficial to the community. The loss of strategic diversity is what concerns me the most, since that is the greatest edge in information that community intelligence embodies.
Changing Player Behaviour.....
Saying that I want to change player pick behaviour is a somewhat dangerous thing, because that smacks of hubris in me thinking I know better than the community about what it should do. Taking the comments that have been coming for a few years, and the rather narrow strategy that is employed to boost accuracy so that it has distorted its intended information, There are a couple tweaks to the scoring system that I would like people to think about, and to figure out how these might be exploited themselves.
Quality of score: a simplified version of examining the raw score return vs. the volatility of the pick. High quality picks would have a high (score/volatility) measure, indicating that the return is more due to real price movement than to chance price fluctuations. For picks that are relying more on high volatility than actual directional movements to cross the +5% for accuracy capture, the Quality metric would end up with a low value, so there would be some balance that has to be struct between the two. This is still a fairly straight forward metric to understand, but should have an impact on player pick selection.
Accuracy Decay: Banking accuracy and holding on to it forever is the scoring bugaboo. With the typical pick volatility of ~30% annually in the past year (volatility is this high due to the relative movements of the market and the individual stocks), you can play a simulation game of making hypothetical picks that don't have any expected return (meaning the same rate of return for the stock and the benchmark). If you let that run and make a habit of closing every pick when it exceeds +5%, you find that you get an accuracy of 68%. If you have an actual information advantage (meaning the stock really does have a higher rate of return than the benchmark), the accuracy rate jumps up significantly from there. The reason that accuracy isn't working as it was intended is that a player can capture all the picks at different times in their random walks. As a deux ex machina players are no longer playing in the same statistics arena with each other, so the comparison of their accuracy is not so simple. In the extreme, a player who is solely focused on closing out +5% gains vs. a player who holds picks for very long periods of time cannot be compared to one another.
The level of complexity to "fix" the accuracy issue is pretty high, and requires a lot of assumptions about the statistics of stock price movements that may or may not be valid. Rather than structure some overly deep compensation mechanism, I'm more interested in a simple heuristic that allows the closed pick accuracy to decay over time in a sensible fashion. The open pick accuracy I have no issue with - these should always contribute fully to the accuracy score, because they are still being actively bounced around by the market. But when a pick closes, you only want that pick to contribute to your future calculation of accuracy in proportion to how much information it contained when it was closed. The simplest thing would be to set a fixed timeline, say 1 year, and have the contribution to accuracy decay linearly with time. But that doesn't make a lot of sense, because both the timeline for decay is arbitrary, and the value of the accuracy information is assumed to be equal. Kicking it up a notch, the three relevant factors seem to be: 1) magnitude of pick score, 2) volatility of pick, 3) duration for which the pick lasted. Let's examine this in more detail.
The magnitude and the volatility go hand in hand, in that we are interested in how likely we got the pick score that we have given the volatility. This works for both winning and losing scores. The time that the pick lasted is relevant because it can be used to reflect the typical pick style of a player, in particular having players who largely make short term picks only hold on to that accuracy for a short period of time, so they would have to constantly keep making picks or start holding longer to maintain their accuracy rating. I do not want to punish short term picks, but if that is primarily a player's style then I would expect them to need to maintain the activity to maintain their player standing. The pick decay might look something like this:
decay factor = exp(-time*(decay function))
The decay function would go faster for shorter pick hold times, and go slower for pick scores that are more standard deviations of the pick volatility. So, even if you have a very short duration pick, it might not decay if the stock score is so large (compared to the volatility) that it offsets the short time decay.
There are a lot of other details in both of these tweaks, and multiple other ideas being batted around, but I am hoping that some of my fellow CAPS gearheads will think about these ideas and see if they a) seem like they might make the scoring system better, and b) not introduce additional problems.
Fool On!
Xander