Understanding Trash - FineTunedMac

A coupl'a weeks ago, Carbon Copy Cloner popped up an I/O error alert...a folder was sitting on bad media, and TechTool Pro confirmed the presence of bad blocks. (The blocks were newly developed, and the drive tanked within a week.)

CCC's advice was to delete the folder in Finder, which went as expected, but emptying Trash was a-whole-nother matter...

"Empty Trash" popped an immediate "can't do" (Sorry,but I wasn't paying attention and didn't make note of the reason.), so I tried "Secure Empty Trash", which froze my deuced Mac(hina) before failing, followed by "rm -rf", which did the same thing. (Being a bit compulsive, I tried both "Secure Empty Trash" and "rm -rf" at least twice and hosed my boot volume!)

Question: What are the mechanics (electronics?) of filling and emptying Trash that allowed the former but choked on the latter?

Thanks.

The former (moving a folder to the trash) affected only that folder, not anything inside it. Everything that was in the folder is still in the folder. It's just that the folder is now in a different place.

The latter (emptying the trash) requires that the folder and everything inside it be deleted. The problem was that some of the things in the folder could not be deleted.

The word deleted deserves a little clarification here. Putting something in the Trash does not delete it. The Trash is simply a folder (well, actually, it's a collection of folders, but that's not relevant here). Moving something to the Trash is no different than moving it to desktop or to your documents folder or to any other folder. The volume catalog still knows where the thing is, both logically (it's name and what folder it's in) and physically (what sectors of the disk it occupies, if it's a file). Deleting means that the volume catalog forgets all that, and any disk sectors it was using become available for re-use.

From your description, it sounds like what was damaged was the volume catalog's info about one or more of the files inside the folder. As long as the filesystem didn't try to do anything to that file, the damaged info about it was never noticed, but once you tried to actually delete the file, the damage could no longer be ignored. (For example, if the list of in-use disk sectors was corrupted, there would be no way to know which sectors were to become available.)

My bad!

Great explanation - as usual - but I forgot to say that the folder was empty.

Sorry.

Edit: It never occurred to me that an empty folder might be structurally different as respects emptying Trash than one with files in it, but perhaps that's why TTP reported "N/A" for the name of the folder sitting on the bad blocks.

An "empty" folder may still have invisible files in it. In particular, it almost certainly contains a .DS_Store file. If it has a custom icon, it will also contain an invisible Icon\r file. If it's one of the folders installed by default, either as a standard system install or during initialization of a new user's home folder, it will likely have an invisible (but empty) .localized file.

A truly empty folder typically has no disk sectors allocated to it, but since OS X 10.5 Lion (and optionally under OS X 10.4 Tiger), even a folder can have extended attributes, which are stored in the xattr fork. (Since OS X 10.6 Snow Lion, a short xattr fork is stored directly in the catalog, but a long xattr fork must be stored as a genuine fork, consuming disk sectors. Wherever xattrs are stored, they not counted by the du command.) (The xattr command synthesizes artificial extended attributes for FinderInfo and for the resource fork but these are not true extended attributes, and the du command will count the sectors used by the resource fork.)

But of course, the real problem is that you have a corrupted catalog. All routines that touch the catalog depend on the promises made by a valid catalog. They should (and usually do) check for should-never-happen conditions that would thwart whatever action they're trying to achieve, but detecting the error only means they can tell you about it. It does not mean they can succeed.

In short, once the catalog is corrupted, all bets are off. Garbage in; garbage out. The behavior of a filesystem whose data structures are invalid is unpredictable and often makes no apparent sense. Catalog damage is like rot; it will spread.

Your safest course of action is to be sure you have a good file-level (not sector-level) backup (or better yet two or three of them), erase your volume (which creates a known good, albeit empty, filesystem), and restore from backup.

Thanks, again, but I dunno; I may never understand this one.

Get Info on the current incarnation of the folder shows "Zero KB on disk (Zero bytes) for 0 items", and toggling invisibles shows nothing, but CCC identified the original as sitting on bad media. Does that mean that CCC couldn't read the bad blocks to determine that there was nothing to read? (The original was an empty, never-touched iPhoto default folder, so I'm assuming that its current incarnation is identical.)

At any rate, as I said, the drive tanked within days and has since been replaced. (But thanks for the good advice.)

Originally Posted By: artie505

Thanks, again, but I dunno; I may never understand this one.

Get Info on the current incarnation of the folder shows "Zero KB on disk (Zero bytes) for 0 items", and toggling invisibles shows nothing, but CCC identified the original as sitting on bad media. Does that mean that CCC couldn't read the bad blocks to determine that there was nothing to read? (The original was an empty, never-touched iPhoto default folder, so I'm assuming that its current incarnation is identical.)

What it means is when CCC attempted to read the data block pointed to by the entry in the volume structure (directory) it did not get a proper response or more likely when it attempted to verify the read operation it got a different answer and this continued through multiple attempts. In other words each read attempt yielded a different answer. Even if the data block was supposedly empty a read attempt would have found some data bits, even if they were all zero or all ones or random, and the verification step would have found the same data bits.

That is generally the result of the magnetic media flaking off the surface of the disk platter and the subsequent failure of the drive would seem to confirm that. Theoretically if the an attempt is made to write to a bad data block the firmware in the drive itself will validate the write and if it cannot validate after several attempts the data block will be marked as bad in the bitmap and the address remapped to a spare data block â€” if there are any spare data blocks remaining on the drive. But in this case, it is a read failure and too late for any remapping to do any good. Again given the subsequent drive failure it is quite possible, even probable, there were no spare data blocks remaining on the drive.

Thanks for clarifying.

> Again given the subsequent drive failure it is quite possible, even probable, there were no spare data blocks remaining on the drive.

The first thing I looked at after seeing the CCC error pop-up was Disk Warrior's nightly report, done about 10 hours earlier, which showed all 50 spare blocks still available...no use attempts, and for what it's worth, an immediate surface scan turned up only 3 (consecutively numbered) bad blocks. (When bad blocks are remapped to spares do they still show up as bad in a surface scan?)

Edit: I imagine they do...that mapping them out affects the catalog but leaves them "visible" to a surface scan?

I guess Trash's choking comes down to

Originally Posted By: ganbustein

In short, once the catalog is corrupted, all bets are off.

Originally Posted By: artie505

The first thing I looked at after seeing the CCC error pop-up was Disk Warrior's nightly report, done about 10 hours earlier, which showed all 50 spare blocks still available...no use attempts, and for what it's worth, an immediate surface scan turned up only 3 (consecutively numbered) bad blocks. (When bad blocks are remapped to spares do they still show up as bad in a surface scan?)

Edit: I imagine they do...that mapping them out affects the catalog but leaves them "visible" to a surface scan?

I guess Trash's choking comes down to

Originally Posted By: ganbustein

In short, once the catalog is corrupted, all bets are off.

Previously remapped bad blocks do not normally show up in a surface scan. The critical item in a surface scan is when new bad blocks appear which indicates the media is continuing to flake off the platter. Three consecutive new bad data blocks says this drive is failing â€” fast. The surface scan itself, as performed by Drive Genius or Tech Tool Pro, should force the data block to be remapped but they still report a bad data block had been detected found in that particular scan. The number of spare data blocks on a drive is a function of the total drive capacity and engineering decisions by the manufacturer, but 50 seems to me to be a very small number of available spares.

As far as bad data blocks in the catalog is serious, but I know of cases where DiskWarrior, TechTool Pro, or Drive Genius have been able to recover a volume where that has happened. Admittedly with data loss and sometimes severe data loss. But from the sound of things this drive might not have lasted long enough to recover the volume structure.

Originally Posted By: joemikeb

The surface scan itself, as performed by Drive Genius or Tech Tool Pro, should force the data block to be remapped but they still report a bad data block had been detected found in that particular scan.

Now that I think back on my out-of-control button pushing when CCC returned the error, I remember running a surface scan with "dd" and at least starting two or three TTP scans, and every scan detected the same bad blocks, so it looks like a scan does not force remapping, or at least not in the sense we're talking about.

Originally Posted By: joemikeb

The number of spare data blocks on a drive is a function of the total drive capacity and engineering decisions by the manufacturer, but 50 seems to me to be a very small number of available spares.

I'm pretty much positive that the four Hitachi drives (60Gb - 250Gb...best guess) I've had, whether pre-installed or purchased, had only 5 spare blocks each, and I know that the 500Gb WD Scorpio Blue I purchased had 40, and the (failed) 500Gb Toshiba that came with my MBP had 50; the 500Gb Apple-branded HDD with which AppleCare replaced the Toshiba has only 5 spares.

And since we're on the subject, what is the difference between a spare and any other unused block, i.e. why are spares necessary on a drive that's likely got hundreds of Mbs of empty blocks?

Originally Posted By: joemikeb

But from the sound of things this drive might not have lasted long enough to recover the volume structure.

It most certainly didn't, and once I was fully backed up and had an AppleCare case # there was no need to attempt heroics. (The scary part was that my backup drive had just developed new bad blocks [...tanked in short order]; I held my breath 'til the new drive was installed and I was restored. Strange that the only drives that have gone south on me did so within days of each other.)

Originally Posted By: artie505

And since we're on the subject, what is the difference between a spare and any other unused block, i.e. why are spares necessary on a drive that's likely got hundreds of Mbs of empty blocks?

Every sector on a disk has a sector address. Except for spare blocks.

A filesystem will note that some of those sector address point to sectors that contain data. Other sector addresses point to sectors that are currently unused, and as such are available for new data, be it for a new file or for extending an existing file.

When a drive detects that a sector has gone bad, it can reassign the sector's sector address to one of the spare sector. Before that, the spare sector had no sector address. After, the bad sector has no sector address. The filesystem has no knowledge that anything has happened, and continues using the same sector address, but data for that sector address is now being stored at a different place on the disk surface.

How many spare sectors you have on a drive is a design choice made by the drive manufacturer, but typically there will be several per cylinder. When possible, a bad sector will be remapped to a spare sector on the same cylinder, the idea being to preserve the sector number to head position mapping. What happens when all the spare sectors on a cylinder have been used up, and another is needed, is another manufacturer decision, but in most cases the bad sector is simply reported as bad. Bad sectors are usually not remapped to spare sectors on a different cylinder.

When the filesystem is told that a sector is bad, it reports that to the OS, which may collect the sector into an artificial file all of whose sectors are known to be bad. That marks those sectors "in use", so they will not be assigned to new files. (HFS and all its variants reserved a special file number (which Unix calls an inode number) for this special file.) This artificial file has no name, and cannot be deleted. (Deleting it would make those bad sectors available for use. Obviously not a good idea.)

Thanks for the explanation; it's logical and even makes perfect sense.

Since I've got zero idea what criteria manufacturers use to decide how many spare blocks an HDD should have, I'll guess that a lower number may be best, because, after all, how long do you want to "secretly" perpetuate a problem that will ultimately result in drive failure? (I check DiskWarrior's nightly "spares" report, but it didn't report any of my bad blocks. )

Does OS X pop up an I/O error when a spare block is used?

Originally Posted By: artie505

Does OS X pop up an I/O error when a spare block is used?

If it did there would be absolutely no point in remapping and address to a spare data block. The remapping feature is entirely within the drive firmware and theoretically invisible to the operating system â€” or at least it should be.

FWIW back in the day I have seen 10 GB hard drives with up to 100 spare data blocks. But in those days it was assumed that every driver shipped with at least one or two bad data blocks.

Now, I don't get itâ€¦ The spare block scheme sounds like it does little more than save a few CPU cycles by swapping out bad blocks without generating a pop-up while it masks a possible imminently fatal problem if the bad blocks are newly developed, rather than "factory issue".

Granted that the scheme was concocted in the early days of HDDs when quality may have been lower than it is today, not to mention years before the Google report, but newly developed bad blocks must always have been recognized as a red flag, and hundreds of spare blocks sounds like an invitation to disaster.

Sounds pretty durn counterintuitive, not to mention counterproductive!

What am I missing? What benefit does the spare block scheme provide that outweighs the importance of advising users about bad blocks?

Originally Posted By: joemikeb

The remapping feature is entirely within the drive firmware and theoretically invisible to the operating system â€” or at least it should be.

DiskWarrior's daily hardware reports include

4/18/14 3:51:36 AM [0x0-0x15015].com.alsoft.diskwarriorstarter DiskWarriorDaemon: [Fri Apr 18 03:51:36 EDT 2014] : Spare blocks for ATA device 'HGST HTS725050A7E630', serial number 'TF0500WH1029ML', appear to still be available. (Total Available: 5) (Use Attempts: 0)

so the remapping feature is visible in some fashion.

At least in the early days the spare block scheme was a technique to lower the reject rate of drove platters. Without spare data blocks the reject rate would have approached 100%. I have no idea what the rate is today but I would venture that without spare data blocks the platter reject would still be quite high and would significantly multiply the cost of drives.

The drive firmware does keep a map of bad data blocks that can be read by utilities such as Diskwarrior, but that is like S.M.A.R.T. where the drive accumulates the data but does not signal the OS when the values change. Again this is a function of the hardware/firmware in the drive itself and is completely independent of the operating system. I suppose it would be possible to have a utility function that continually monitored those values but the impact on I/O performance would probably be unacceptable to most users.

Thanks for the explanation.

It seems like it would be a good idea, though, to deal with the uncertainty that accompanies the economic reality, perhaps with a built-in utility that monitors the drive at a user's option and reports any spares used. (Does anybody know if DW pops up a notification if a spare block is used?)