Originally Posted By: RHV
Well, from the point of view of time, cloning by booting from a different volume is not the easiest way. The easiest way is to clone from the boot volume.

I wasn't talking about the easiest way to do it (which, by the way, is to not do it al all), but rather the easiest way to do it reliably.

The first rule of optimization is: any optimization must be correct. No one will be impressed if you get the wrong answer faster.

Originally Posted By: RHV
And, in my eight or so years of experience using Mac OS X's DU for cloning, up to Snow Leopard and now with Lion using CCC, cloning from the boot volume works just peachy fine. And I never bother to close down the internet connection either. Unlike Tacit, I don't bother to close down all the apps I've just been using. So what if Text Edit is open and so is Safari? Makes no difference.

There was a formative moment in my earlier career. I was working as a Tech Rep for Burroughs Corporation, and one of my fellow Tech Reps told a customer, a heavy user of Burroughs' reader/sorter machines (used to sort checks based on their MICR coding) "Don't worry about that. There's only a one-in-a-million chance that that would ever happen." The customer replied "At Bank of America, one in a million happens fifty times a day."

That made a big impression on me. It doesn't matter how unlikely something is. If it can happen, it will. Testing once or twice or even a million times isn't good enough. You have to design your algorithms and procedures to make errors impossible, not merely unlikely.

If you tell me you've been making clones of running systems for 8 years (and also tell me you test each one by booting from it for a day), I figure you've only tested it about 400 times (once a week at most), and "running with it for a day" hardly comes close to a thorough test. I am not impressed.

Backing up any digital file while it's being updated is a well-known danger. Look up the "readers and writers problem" in any text on databases, but don't think for a minute that the problem only affects "database applications". Or look up "readers and writers problem" in any text on concurrent programming. Even something as simple as incrementing a number in memory can be problematic on a machine with multiple cores.

This is not a hypothetical risk. It's common, especially among neophytes who think "I tested it and it works" means that it will always work. Experienced programmers know better. If you want to take a digital snapshot of something, you have to freeze it for the duration of the snapshot. One of the reasons we prefer digital copies over analog copies is that successive analog copies get blurrier and blurrier, but digits don't blur. Trouble is, when you make a digital copy of a changing file, you eventually discover that digits don't blur, they break.

Edsgar Dijkstra, one of the pioneers of Computer Science, famously said "Testing can only reveal the presence of bugs, never their absence." The tests you've made fall under that heading. The fact that you haven't found a bug yet doesn't mean you won't find one tomorrow, or even that there weren't undiscovered bugs in the past.

My data is more valuable to me than that. I want to know that my backups are good. Hoping isn't good enough.

Originally Posted By: RHV
After doing a clone, I always test it by booting from it for a test day. Never found a problem.
A day? How thorough a test can you do in a day? I've got 1.6 million files on my boot volume. How could I ever test all of them in a single day?

And you think booting off the clone (and not getting any real work done during that day, because you'd be updating the wrong disk) is easier than what I do, which only costs me about an hour of down time?

Originally Posted By: RHV
I'll happily and thankfully bow to Ganbustein if he can produce a good sized sample of competent Mac users who have had trouble cloning from the boot volume. Even better, if that sample was statistically significant. (Most samples aren't.)

Hal has already pointed out where you can find some, and most users who have problems won't say "I cloned a running system and this is what went wrong." Usually, if they say anything at all, they'll say "I'm having a problem with X", and if we're lucky we'll eventually pull out of them the oh-by-the-way comment about how they restored from such a backup two years ago, and it turns out the problem has been lurking in their system all that time. But usually, even if the backup is bad, they don't know, because most users never test their backups, and the few that do make only a few spot-checks, usually limited to "It boots, so it must be good."

I also talked at length about all the files that Retrospect found changing during each backup. Many of those were files I really wanted reliable backups of. And there's also the problem that many applications cache data in RAM, and don't bother keeping their disk files consistent until the program quits. Many of those applications are actually "system processes" that run in the background, so you don't even know you're running them, yet cloning their disk files will give you inconsistent data.

I have at times had to recover from backup. Most of those times, the problem was because the disk catalog got corrupted. For that reason, I don't trust anything that copies the catalog itself. In particular, I don't trust a sector-level copy, like with DU. Even if I'm not booted from the volume. If I had used a sector level copy in those situations, my "backup" would be just as bad as the original.

You're looking at it wrong. It's not a question of "what are the odds?" It's a question of "what's reliable?" For bugs, even a single occurrence is "statistically significant", especially if that occurrence happens to me.