Statistical studies of impact crater populations have been used to model ages of planetary surfaces for several decades [1]. This assumes that crater counts are approximately invariant and a "correct" population will be identified if the counter is skilled and diligent. However, the reality is that crater counting is somewhat subjective, so variability between counters, or even between days for the same counter, is expected [e.g. 2, 3]. This study was undertaken to quantify that variability between crater counting experts.

Eight scientists, each with at least 5 years of crater counting experience, were recruited to count craters on the same image, using their preferred counting method. The software used included ArcGIS (by ESRI) with various extensions, JMARS (by Arizona State U), DS9 (by Smithsonian Astrophysical Obs.) with custom add-ons, and the Moon Mappers interface (by CosmoQuest). In addition, two researchers (Antonenko and Robbins) used several different interfaces to test the role of software on their results. The counting was conducted on a ~4 sq-km segment of a Lunar Reconnaissance Orbiter Narrow-Angle Camera image, M146959973L (63 cm/px), centered on the Apollo 15 landing site. With ~1000 craters in the 10-400 m range, this region is in saturation equilibrium for craters <~150 m (typical for mare [4]), and so represents an extreme case for crater counting repeatability.

Results from experts were grouped using a clustering code to identify which marked craters represented the same crater. Craters marked by 5 or more experts were deemed "verified" and added to a final crater catalog. Individual results were compared to each other and to the final catalog. Analyses were done in units of pixels, so that results may be generalized.

Cumulative size-frequency distribution (CSFD) plots, which show the total number of craters larger than a diameter (D) as a function of D, suggest significant variability between experts. A > 20% dispersion was found in the number of craters identified at any given diameter. Standard deviations ranged from 20.7% for D?18 px (~12 m for this data set) to 31.5% for D?100 px (~70 m). This implies that expert CSFD results are more consistent for smaller craters than larger craters, possibly due to fewer craters and more degradation at large sizes. This is similar to the results of [2].

Kolmogorov-Smirnov (K-S) tests were used to compare the populations of expert data. Under these tests, craters with D?18 px (~12 m for this data set) show poor agreement between experts, with 54% of data pairs representing different populations (P-value <0.01). Agreement improved significantly for larger diameters, with 39% representing different populations at D?22 px (~15 m) and only 18% being different at D?25 px (~17 m), suggesting that aliasing effects occur at smaller diameters. However, expert results agree well with the final crater catalog, where 75% of experts match the catalog population (regardless of diameter). Multiple analyses are clearly needed to examine different aspects of how well populations match.

Consistency among different interfaces for individual experts was also variable. Robbins conducted counts using Moon Mappers and ArcGIS (with personal software for fitting circles). His results show good agreement over the entire diameter range: the two CSFDs are within 1 standard deviation of each other's error bars and have a K-S test P-value of 0.59 at D?25 px (17 m). Antonenko conducted counts using Moon Mappers, JMARS, and ArcGIS (with CraterHelper tools). Her results are more complicated; all three methods agree to 1 standard deviation for large craters (D>80 px/ 55 m), but ArcGIS and JMARS data differ by >1 standard deviation from the other methods for medium (30 < D < 80 px or 20 < D < 55 m), and small (D < 25 px / 17 m) craters, respectively. For D?25 px (17 m), K-S test P-values of <0.05 suggest that none of the Antonenko data unambiguously represent the same population. This shows that even individual experts may produce varying results when using different interfaces.

This study has significant implications for comparisons of model surface ages determined by different researchers. Results show that variability in crater counts between different experts using different interfaces is ~20-30% but can be as much as 2-3x different. Disagreements on that order in the literature could therefore be simply due to counting differences. Furthermore, aggregation of data from multiple people and methods is expected to give more reliable results.

References

[1] Shoemaker & Hackman (1962), in The Moon, LPI, TX, 289-300.

[2] Greeley and Gault (1970), Moon, 2, 10-77.

[3] Hiesinger et al. (2012), JGR 117, doi: 10.1029/2011JE003935.

[4] Shoemaker (1965), in Nature of Lun. Surfaces, 23-77.