I know there are many ways to do this, but I haven't found one that I'm really happy with.

I often find articles that I would like to download from the web. But I want only the text, not an html file.

I don't want to select the text and then cut and paste. I've done it many times in the past but it is tiresome, a pain in the neck, and often doesn't work very well. I don't want to have to deal with a lot of awkward paragraph spacing, extra space, odd sentence length and so on (Tom Bender's Tex Edit used to do a great job of cleaning up files, but that too was time consuming.) I know the browsers are supposed to be able to download text only files (I have Firefox 3.6 and Safari 4.1) but I can't seem to get it the way I want, text only. I've tried converting html with TexEdit but it doesn't work very well.

This is especially a problem on big files. I tried using TexEdit to convert them, but TexEdit does a poor job of handling long documents. That leaves me with NeoOffice, which should work converting html to text, but it doesn't seem to.

For example, a long file from Project Gutenberg:

http://www.gutenberg.org/files/16350/16350-h/16350-h.htm


2) I also have an xhtml file dowloaded from a Windows desktop but I can't convert it. Surely I have something on my computer that ought to be able to read this file. So far no luck. Tried TextEdit but it didn't work.