The dangers of relying on p-values are well known. "In most science journals, researchers report p-values without apology, and readers interpret them as evidence that the apparent effect is real. The lower the p-value, the higher their confidence in this conclusion." (ThinkStats).
A good example of the problem can be found here. But, in brief, imagine this:
- we test how effective are 1000 different drugs
- of which only 100 are truly efficacious and therefore the other 900 are not
- the p-value is a fairly typical 0.05 (5%)
- of the 900 useless drugs, 45 (=900*0.05) will appear efficacious when they are not.
- of the 100 efficacious drugs, 5 (=100*0.05) will appear useless when they are not.
We'll ignore 5 effective cures (false negatives) and think 45 are useful when they are not (false positives). Since only 100 are truly effective, these are large percentages in our errors.
From Machine Learning in Action (Manning)
Precision = TP/(TP+FP) . Precision tells us the fraction of records that were positive from the group that the classifier predicted to be positive.
Recall = TP/(TP+FN) . Recall measures the fraction of positive examples the classifier got right.
Given that our true positives cannot be better than 95 (as 5 were wrongly deemed ineffective drugs), our precision and recall would not be better than 0.68 and 0.95 in this case.