TidyCOM

A COM Wrapper for HTML Tidy

Author: André Blavier

Unfortunately, I am no longer able to maintain TidyGUI and TidyCOM. However, notice that the HTML Tidy core software is still well alive (see the Tidy Source Forge project).
TidyGUI and TidyCOM (based on the 4th August 2000 version of HTML Tidy) will still be available from this site as long as necessary. If anybody is willing to continue the development of these programs, feel free to do so--all source code is available.
Many many thanks to the numerous folks who helped me improve the programs with their suggestions, bug reports and friendly messages.


What is it?

TidyCOM is a Windows COM component wrapping Dave Raggett's HTML Tidy, a free utility application from the World Wide Web Consortium that helps you clean up your web pages.

HTML Tidy is available from the W3C as a command-line program, à la Unix. To better fit in the Windows environment I have written a GUI front-end for Tidy called TidyGUI and a COM component wrapper for Tidy available here.

Details

Version

Current version (1.2.6, 27 June 2001) is based on the 4th August 2000 version of HTML Tidy.

COM Classes and Interfaces

TidyCOM exposes the TidyObject COM class (with interface ITidyObject). Through the Options property you can access the TidyOptions class (with interface ITidyOptions). The TidyOptions class allows you to change every Tidy option setting (but write-back).

Here's how you could use TidyCOM in VBScript:

Set TidyObj = CreateObject("TidyCOM.TidyObject")
TidyObj.Options.Doctype = "strict"
TidyObj.Options.DropFontTags = true
TidyObj.Options.OutputXhtml = true
TidyObj.Options.Indent = 2 'AutoIndent
TidyObj.Options.TabSize = 8
TidyObj.TidyToFile "bad.html", "good.html"

Or you could simply load a Tidy configuration file:

Set TidyObj = CreateObject("TidyCOM.TidyObject")
TidyObj.Options.Load "myconfig.txt"
TidyObj.TidyToFile "bad.html", "good.html"

Both interfaces are dual interfaces, so TidyCOM can be used from scripting languages and from compiled languages alike.

Warning: TidyCOM's code is not re-entrant--no more than 1 instance of TidyObject should be alive at the same time in the same process.

The ITidyObject Interface

The ITidyOptions Interface

Furthermore, there is a read-write property for each option that can be used in configuration files (only write-back is missing - you can achieve its effect with TidyToFile(sourceFile, sourceFile)). See the complete list of option properties.

Platforms

TidyGUI is supposed to work on all Win32 versions of Windows. It has been tested on Windows 95 and NT4.

Release Notes

Version 1.2.6 (27 June 2001)
Based on HTML Tidy version 4th August 2000.
TidyMemToMem(sourceString) changed again--it should now work on all Win32 platforms, whatever the input string's size. Special thanks to Greg Clouston for his kind help.
Version 1.2.5 (26 May 2001)
Based on HTML Tidy version 4th August 2000.
Corrected a bug in TidyMemToMem(sourceString) causing trouble with large files.
Version 1.2.4 (10 Mar. 2001)
Based on HTML Tidy version 4th August 2000.
Corrected a bug inverting the Markup option logic (thanks to Andrew Kidd for this one).
Performance should now be better when tidying large files.
Version 1.2.3
Based on HTML Tidy version 4th August 2000.
Corrected few bugs (special thanks to Ricardo Amador).
Version 1.2.2
Based on HTML Tidy version 4th August 2000.
Corrected a major bug introduced in the previous version causing various options (like clean!) to have no effect (thanks to Mark Carrington who found that bug).
Version 1.2.1
Based on HTML Tidy version 4th August 2000.
Added support for string-to-string tidying.
Version 1.2
Based on HTML Tidy version 8th July 2000.
Version 1.1
Based on HTML Tidy version 30th April 2000. Corrects a design flaw that prevented pure automation clients, e.g. JScript or Perl (not VBSript), to access the options interface.
Version 1.0 (first version)
Based on HTML Tidy version 30th April 2000.

Download and installation

Three steps are required to install TidyCOM:

  1. Download TidyCOM.zip. It contains two files: the component (TidyCOM.dll) and a utility program to register the component (regsvr32.exe).
  2. Extract the component from the .zip file.
  3. Register the component, typically with regsvr32.exe (in a command prompt, type: regsvr32 <full_pathname_of_TidyCOM.dll> - to unregister, type: regsvr32 /u  <full_pathname_of_TidyCOM.dll>)

Source code is available here (.zip file--also contains source code of TidyGUI). TidyCOM was developped with Visual C++ 6.0 and the ATL library.