I received this in my in-basket and thought I would share it.
The latest incarnation of my DTDGenerator utility is at http://prdownloads.sourceforge.net/saxon/dtdgen7-0.zip
It’s a tiny download, just one HTML file and a single Java class, and the only thing it now needs to run is a SAX2 parser. Because it works entirely serially, it runs essentially at parser speed.
Sometime I will get round to:
(a) processing a collection of source documents, rather than just one (you can write a SAX filter to wrap a sequence of documents into a single document)
(b) generating a schema rather than a DTD.
My own experience of it is that it does a pretty good job of generating a DTD similar to the one the user would have written, except in the case where many elements have similar structural rules, e.g. HTML “inline” elements. In these cases you tend to find that only a small number of the theoretically valid structures are used in real instance documents, e.g. you may never encounter a tag within an tag.