Component HTML-to-RTF DLL 1.6 COM
Home Product Download Order Support Contact For Press
English Germany

Review: RTF to XHTML Converter

Introduction

In December, Max Sautin, president of SautinSoft sent us an email inviting us to review his RTF to HTML Converter. Since I know something about both RTF and HTML, I volunteered to do the review. The SautinSoft website at http://www.sautin.com/ is pretty straight-forward. The website claims the Converter can take RTF documents and turn them into small, clean HTML documents ready for your website or any other place you want an HTML document. In this review, I'll discuss how well RTF to HTML works. As many of you know, Microsoft Word also saves documents as HTML. I'll take a look at how Word's HTML document compares to an HTML document made by RTF to HTML.

Background

RTF is short for Rich TextFormat. RTF is a document specification created by Microsoft. These filenames end with a “.rtf”. RTF, is an open specification, which means any programmer wanting to add RTF capabilities to their program can go download the latest specification and read the rules. It is not,
however, open source. (Open source means many programmers and technical experts can collaborate on a project.) Microsoft maintains tight control over the RTF specification. In fact, when Word 2003 was released, it had new features which forced Microsoft to update the RTF specification. HTML is, as many of you know, short for HyperText Markup Language. It's the most common language for web browsing. The current versions of it are HTML 4.01 and XHTML 1.0 and XHTML 1.1 (X stands for eXtensible). With HTML, you can, optionally, specify style-related items in a separate CSS or Cascading
Style Sheets file. This allows you to keep the HTML documents full of content and remove the look-and-feel to another file. CSS makes changing whole sites a lot easier. These filenames end with either “.html” or “.htm”.

The Three Documents

1. This document is very plain. It's simply a copy of the minutes from another local computer club (Trilogy Computer Club). There's no special formatting and no table present. I thought it would make a good test for converting simpler documents. The document has one extra feature: a date is written using superscripting. 2. The second document I found by using Google. It's a basic document with some indenting, bulleted lists, and background shading. This, I felt, was a good representation of an average real-world document. 3. The third document I also found by using Google. It's a rather complete test document. It runs the gamut in terms of different paragraph styles, using tables, images, footnoting,
and some other things.

Critiquing


I focused on three areas: ease of use, accuracy of conversion, and resulting file sizes.
1. Ease of use:
Was it easy for me to find the options I wanted? Yes.
Did the options make sense? Yes.
Did the program behave the way I expected? Yes.
Did it have any bugs? Some minor and major bugs.
2. Accuracy of conversion: Did the resulting file look the same as the original? Did the resulting HTML file pass the official World Wide Web Consortium's validator without errors? (World Wide Web Consortium, or W3C, is the nonprofit organization that oversees development of many of the technologies related to the web. They are the organization who defines what command we can use in HTML and XHTML.)
3. Resulting file sizes: If I used Microsoft Word to save a file as HTML, Word would add a lot of non-HTML information to the file just for its own use. This same information is of no use to any web browser, web server, or anything else, except perhaps other Microsoft Office programs. Unfortunately,
they unnecessarily increase the size of the HTML file. They also don't pass the W3C validator.

The Goods

For this kind of test, the computer make and model doesn't matter as much. For the record, I'm using a 2.66 Ghz Pentium 4 Fujitsu Laptop with 1 GB of RAM. Word is version 2003.

The Settings

I used the same settings for all three documents. I told it to convert to XHTML 1.0 and to put the style information in a separate CSS file named common.css. When I used Word, I chose to save everything as unfiltered HTML files.

The File Size Results

All sizes are in bytes.

Document
RTF
HTML1
HTML2
1
5,459
3,004
6,478
2
20,130
10,606
30,967
3
24,851
9,805
50,389

The numbers along the left indicate which document is represented. HTML1 is the result from the RTF to HTML conversion tool. HTML2 is the result from Word.
Accuracy Results
1. This document fared well. Even the superscripting was converted.
2. Hanging paragraph failed. Entire document was centered when only the title should have been centered. Background color was lost.
3. Too many errors to list. All the document's text came through, but most of the styling was lost.

Summary

RTF to HTML Converter is off to a good start. With some improvements and refinements, it can become a very powerful tool at providing useful documents to a wider audience. For now, I would stick with using it for basic document conversion. HTML to RTF Converter, which Max also sent me, is
not much good for us in the U.S. It outputs the RTF using a page size of A3, A4, A5, B5, or other. Those page sizes are normal page sizes for most other parts of the world. However, “other” wouldn't save for me. Also, the page size dimensions for “other” and the margins have to be defined in illimeters, not inches. This means it makes generating a good, usable RTF difficult. Hopefully, the issues in both programs can be fixed or improved soon. If so, I'll write an update for the review.

by Ric Fischer, ASCIi
Secretary

Copyright © 2004 - 2006, SautinSoft. All rights reserved.
RTF to HTML Converter logo