I volunteer with a local community organization in my home town, and I work on their technology advisory team. I was recently discussing the impact of spam on the organization with some really smart technical people, and I was amazed as I found myself trying to explain the Base-Rate Fallacy.
The implications of the base-rate fallacy for a technology like spam filtering are huge - even an incredibly small rate of false positives can render the ability to be confident in a result almost impossible. This one's tough for most people to accept - when we think of tuning spam software (or an IDS), we think that the goal should be to try to avoid false negatives.
Unfortunately, as you try to tune the false-negative curve, and induce more false positives, the worse things get in a real hurry.
The seminal paper on this for information security people is Stefan Axelsson's paper The Base Rate Fallacy and its Implications for the Difficulty of Intrusion Detection. The abstract of the paper:
Many different requirements can be placed on intrusion detection systems. One such im-
portant requirement is that it be effective i.e. that it should detect a substantial percentage of
intrusions into the supervised system, while still keeping the false alarm rate at an acceptable
This paper aims to demonstrate that, for a reasonable set of assumptions, contrary to what
has previously been thought, the false alarm rate is the limiting factor for the performance of
the intrusion detection system. This is due to the base-rate fallacy phenomenon, that in order
to achieve substantial values of the Bayesian detection rate, P Intrusion Alarm , we have to
achieveâ€”a perhaps unattainably lowâ€”false alarm rate, on the order of , or
I remember reading that paper for the first time... it was like the description of a religious epiphany - I felt as though the lights came on, and all was suddenly clear to me. IDS wasn't dead... just misused. As far as I'm concerned, that paper should be required reading for every new IT engineer, and they should be asked to what other domains it can be applied (e.g. spam filtration).