I would say SMART is a better way of
confirming a problem than relying on it to
catch one.
I've seen
many hard drives fail in my time. I just hauled off a box of about 130. It's been my observation that SMART is the first sign in about 2% of cases. The entire remainder I would split about 65/35 with sudden death being 65% (click click click) and slow performance being 35%. Of those 65% however, had they been checking io speeds I strongly suspect they'd have received advance warning before total failure. A lot of people say their computer was being sluggish recently and I assume that was it. So it's possible that the "slow io" warning occurs as much as 80% of the time before a catastrophic failure.
In almost every case of slow IO being the symptom of suspicion, when I check them their SMART is passing. Then I surface scan them with that nugget I posted earlier, and see the slow IO. Sometimes then I start seeing it get worse, and start producing io errors. At that point, about 20% of drives smart toggles to FAIL, the other 80% remains PASS until they suddenly start timing out or fail altogether and are a real pain to get anything off from.
Needless to say, I don't give SMART any kudos. It's a nice idea, but I just don't understand why the slow IO doesn't trip a flag, but it never seems to. And it's almost always a critical warning. I've only seen two drives in all my time that had slow io in a range and continued to run otherwise reliably for more than a couple weeks.
I am currently running a rather primitive script on the servers here that does a slow background incremental scan on all attached volumes, insuring a complete read pass on all attached drives at least once per month. (DRIVES, not VOLUMES, it watches both slices of a mirror) If anyone is interested in playing with it let me know. It's designed for machines that run 24/7 and have at least one large volume that needs to be kept an eye on. Known to work on servers with up to 7+ TB of drives. Very spartain interface (bash) but emails warnings. It tagged a failing 1TB a month ago here that turned out to be a failing bridge board in the enclosure. Had the replacement drive in hand the day it failed to remount.