By Jon Knowles
The CAS OAK series were conducted following Ed May’s guidelines and practices as much as possible. E.g. the group manager did not do any viewing on the days when a CAS event was on tap. (Ed May does not view at all, I believe.) Each viewer had their own day when their event was up and did not do any other viewing on that day. We did not discuss the series as a group during the trial, etc.
Two of the series have been successful in terms of hit rates, while the pass rate has been quite high (c. 80%), though not much higher than that reported by Ed May for his work (c. 70%).
- CAS-OAK Series A had 4 hits, 0 misses, 16 passes for the 20 events. 6 viewers, 1 group manager.
- CAS-OAK Series B was a pilot solo trial and stats were not intended to be kept with APP stats. 0 hits, 1 miss, 9 passes. 1 viewer, self-coding (scoring) of sessions. Based on this one pilot series of 10 events, solo use of CAS is not recommended.
- CAS-OAK Series C had 2 hits, 2 misses, 16 passes for the 20 events. 5 viewers, 1 group manager.
Totals: CAS-OAK A and C: 6 hits, 2 misses, 32 passes for 40 events. 15% hits per event, 80% passes.
However, the 2 misses in Series C very much appear to be the result of what I call “sparse coding”. Ed May alluded to this possibility in one of the papers he coauthored during the many years of research for the Cognitive Sciences Laboratory that resulted in the CAS software. Sparse coding is when the coder (scorer) moves only a few sliders (2 or 3) of the most general CAS categories which are used to compile profiles of each of the 300 photos in the pool. Very high reliability can result and this can result in the CAS software awarding a “faux” high Figure of Merit (FoM).
This is what apparently occurred with the first two events in Series C. A possible miss on a later event in Series C was avoided when I decided on a pass since the coding again appeared to be an example of sparse coding. The high FoM did not actualize, so the pass was appropriate.
Sparse coding can occur when the viewer’s session has little in it which the coder can discern as matching the 24 CAS categories. The coder may have little choice if the data is sparse. In these situations, the coder should pass, regardless of the Figure of Merit. When viewers provide “enough” data suggestive of what appears in a photo taken outdoors, then sparse coding can be avoided.
This means that so far when following the CSL procedures, CAS OAK has not had any misses – when sessions are well-coded and only FoM’s above the threshold of .4519 are used as predictors. 6 hits, 22 passes (excluding the 2 examples of sparse coding which resulted in misses).
Are they any other indicators of high success rates, aside from a Figure of Merit above .4519?
What about basing a prediction on the higher of the two FoM’s? After 40 events, the higher FoM was a hit only 45% of the time.
What about when the absolute difference between the two FoM’s is greater than, say, .2? Looking at hypothetical not actual results: when the higher FoM would have been a hit, the difference between the two FoM’s is greater than .2 61% of the time. When the higher FoM would have been a miss, the difference between the two FoM’s is greater than .2 only 36% of the time. It appears that when the difference between the two FoM’s is greater than .2, we may have an indicator of greater than 50% success – if this holds with further series, how much greater remains to be seen.
What about the absolute difference between the two FoM’s for the actual hits and passes? The mean absolute difference between FoM’s for the actual 6 hits is .407. The mean absolute difference between FoM’s for the 31 passes (excluding the 3 events with sparse coding) is .141. This is a difference of .266. Is there a threshold here? In other words, if the absolute difference between the two FoM’s is greater than .2, or .25 or .27, do we have a strong indicator of success, even if neither FoM meets the threshold of .4519? Is there a reliable threshold in the range of .2 to .3?
We have 40 events in the CAS OAK series, which in many instances is enough for initial statistical meaning, I understand. I encourage those in APP who are far more versed in statistics than I am to comment, critique, and develop (if they have time) this data.