Mac Mini (server, dual hdds) checked in for os trashed. (Server os, yay) After wasting several days' time to find out that apple had a bug in their service diagnostics that hangs during memory checks, i finally got to checking the hdd, which ALSO hangs the new ASD. So pull both hdds and scan them. One dumps out from my script with what is usually an io error.
But it wasn't. DD had terminated without reading the data AND without causing an IO error to show up in the logs. And when I tried to rerun the dd block read that returned the error, it
worked fine.
After running the script several more times it appeared that dd was dumping out for no apparent reason at a
random points on the hard drive. OK time to upgrade the script and tell me what DD is returning if not zero. DD's man page was unhelpful, saying the exit code is "0 for success, nonzero for an error". So I had just been checking for 0 or not 0. Now that I have it reporting, I get exit code 137.
All attempts to track down the meaning of code 137 have proved fruitless. I even found the source code (I think
here) and that appears to only be able to return 0 or 1. Considering the number, I assume 137 isn't being generated by dd, but instead is being generated by something that dd calls, that's not supposed to return nonzero, and is causing dd to halt and return that as its return code.
All of this is probably a waste of time on my part, the hard drive is at the very least possessed and needs to be replaced, but this has me very curious. Someone else pointed out that return codes over 127 may be a passed code, indicating the return code from the upper level is 9 (128+9=137) but I don't know where to look for that. I'm not kill (signal -9)'ing it. it's stopping on its own afaik.