| iMatix home page | htmlpp home page | << | < | > | >> |
![]() Version 3.9e |
Htmlpp is a preprocessor for HTML files, and is intended to simplify the task of maintaining large sets of HTML documents. You provide htmlpp with a document that is a mix of HTML-tagged text and htmlpp commands. Htmlpp generates a set of HTML files from that document.
To run htmlpp, use the following syntax:
htmlpp [-option...] filename ...
Where filename is assumed to have an extension '.txt' if necessary. The -debug option causes htmlpp to leave all its intermediate work files lying around. The -guru option makes htmlpp work in guru mode, which is explained in the guru mode section. The -env option tells htmlpp to include the current environment in the symbol table.
Htmlpp replaces symbols in command lines and HTML text. You can specify a symbol in various ways:
<A HREF="$(name)">name</A>.If the symbol name has an empty value, the <A...> and </A> tags are left-out - i.e. the link is not active.
<A HREF="$(name)">label</A>.If the symbol name has an empty value, the <A...> and </A> tags are left-out. You can use double quotes if the label itself contains ')'.
<A HREF="$(name)">$(name)</A>.
You can define symbols in terms of symbols: $($(name)) is quite okay, if you know what you are doing. Htmlpp inserts symbols in the above order, so it will translate all $(name)'s before looking at $(*name)'s.
Symbols are of various types
Htmlpp provides these standard symbols for use at any point in the document:
1 1.1 1.2 1.2.1 1.2.2 ... etc.Htmlpp automatically manages the numbering of header levels. You are, however, limited to the 'dotted number' syntax.
In addition, htmlpp will include the current environment symbols if you run it with the -env option. You can use this (although I don't see the utility immediately) to redefine any of the standard symbols such as $(EXT). Remember that you can also access any of the environment symbols using the %(...) syntax; e.g. %(PATH).
A htmlpp command starts with a dot, in column 1, followed by a keyword. You can put spaces between the dot and the keyword. To continue the command line over the next line, end the line with a hyphen (though you need to at least put the dot and the keyword on the same line. Commands can be in upper- or lower-case: .endblock and .EndBlock are equivalent.
These are the commands that htmlpp understands:
.define count = 1 .echo $(count) .define count = $(count) + 1 .echo $(count)Of course it helps to know that htmlpp will evaluate all variables before passing the expression to Perl to work out. So, the second .define is evaluated as '1 + 1'. If you decide to rely on Perl (a good bet for now), you can use the .define = command to execute shell commands, e.g.:
.if $(PASS) . define junk = system "rm *.htm"; .endif
.define INC++ ""Note that the empty string is treated as zero; the next time the symbol will be '1'. You can also use '--' after the symbol name to subtract one from its value each time it is used. You can stick the '++' or '--' before the symbol name: then the symbol is incremented or decremented before its value is taken.
<H1>$(TITLE)</H1>
.if -f myfile.htmAn .if block must be entirely in one line.
.if $(number) == 0
.if $(number) != 1
.if $(number) > 2
.if $(string) eq "value"
.if $(string) ne "value"
.define LOCAL i:/site: .define SERVER http://www.imatix.comThe directory must be relative to either of these two. It should start with '/' but not end with '/'. You can specify zero or more filenames or wildcards (htmlpp accepts * and ?, according to UNIX rules). If you specify no filespecs, htmlpp assumes you mean '*'. The filespecs can include PERL regular expressions: place the filespec between double quotes, e.g. to match all files with 'doc' or 'txt' somewhere in the name: .build dir /pub "doc|txt". An example might help:
.define .txt Text file .define .htm HTML document .define .zip ZIP archive .block dir_open <PRE> .block dir_entry $(*DIR_HREF="$(DIR_NAME)") $(DIR_SIZE) $($(DIR_EXT)) .block dir_close </PRE> .endblockNote the sneaky double-derefencing of $(DIR_EXT) which translates the file extension into a comment like 'Text file'. I usually stick all such .defines in a separate .include file, filetype.def.
Macros are a shorthand way to produce HTML tags and other constructs. This is how I define a macro 'H3':
.macro H3 <H3>$*</H3>
I use all uppercase names for macros, but this is just a convention, since the case is not important. We can use a macro like H3 in three ways:
.H3 some text
or
<!--.H3 some text-->
or
<.H3 some text>
The first form is good for titles and other constructs that come naturally on a line by themselves. Since it uses a syntax similar to htmlpp commands, there is a certain danger that a macro will conflict with some future command. This is just too bad; the alternative of inventing yet another syntax for macros was (for me) a worse choice. In any case, htmlpp will warn you if you try to define a macro that already exists as a command. The second form is compatible with HTML editors and some other HTML preprocessors, but is frankly a pain to type. The third form is good for mark-up tags. The second and third forms suffer from one problem: the whole thing has to come on a single line.
When you use a macro like this: <.H3 some text> you are supplying arguments. Here we supply two, 'some' and 'text'. You can refer to these as $1 and $2 inside the macro definition, or together as $*. Htmlpp can handle quotes correctly, so <.H3 "some text"> only supplies one argument, $1. The $+ symbol will expand to anything left over after $1, $2, etc. For instance, if you refer to $1 and $3 in the macro body, $+ refers to $4 and any remaining arguments.
You can define a macro with a section that repeats for each argument. This is useful if you don't know in advance how many arguments you are going to have. For instance, the standard .THEAD macro generates a table heading for one, two, three, or more columns. You specify the repeating section as {...$n...}. The text between '{' and '}' is repeated for each argument; "$n" (dollar sign, small 'n') is replaced by the argument value.
It is sometimes awkward to have to use quotes around multi-word arguments. Htmlpp lets you use underlines instead of spaces, so that this_is_treated_as_six_words.
The file macro.def that comes with htmlpp defines a set of standard macros. You can define multiline macros that include other commands, like .if and .include.
You can type accented characters directly, and htmlpp will do its best to convert these into HTML metacharacters. For instance, if your document contains an e-circumflex, htmlpp will replace it by the metacharacter ê.
This function works within certain limitations only. Firstly, your document will become non-portable: if you move it from a UNIX to a DOS box, the accents will get messed-up, unless your file transfer software can handle accents too. Secondly, htmlpp uses a look-up table based on the ASCII value for each accented character it knows about. These tables are system-specific, so htmlpp does a little testing of the wind to figure-out if it's running under a Unix or a DOS system. if you use htmlpp on a Mac, or on documents encoded using another character set -- e.g. Windows -- it won't work. Basically htmlpp handles MS-DOS accents if there is an environment variable 'COMPSPEC' defined, and Unix Latin-1 (aka. ISO-8859-1) accents if there is a file called "/etc/passwd" on the system. Under Windows you should save documents as 'DOS text'.
Oh, one more thing. Htmlpp does not try to handle every single accented character, just the ones that I could find the HTML codes for, and the Unix and DOS values for. If you find that the accents you use come-out as '?' (not found), just send me the HTML metachar for the accented character, plus the octal ASCII codes for Unix and MS-DOS. (Well, whatever you can get together would be nice too.)
Recognising that a True Guru does not have time to painfully mark-up large HTML documents, htmlpp includes a basic text-to-HTML converter. You can invoke this as a preprocessing phase to the normal htmlpp process. Right now, this is an either-or choice; you either use htmlpp commands in a HTML document, or a text document and guru mode, but not a mixture of the two modes. (Release 3.1 of htmlpp tried to make this work, but that did not last long :\ )
You can, usefully, use htmlpp's guru mode to mark-up a document, then fine-tune it by hand.
To use guru mode, run htmlpp with the '-guru' option:
htmlpp -guru filename
Guru mode works by recognising layout, and converting this to HTML. I've tried to keep a balance between features and complexity, to give you something useful without becoming too formal (which is what HTML is for). Basically, guru mode relies on layout rules that also help to make the text readable in any case. For example, blank lines and indentation are significant in most places. One consequence of this is that the plain text file is very readable even before it is HTML'd (assuming you do your bit.)
In guru mode, htmlpp reads an input text file (with any name and extension except '.hpp') and creates an output file with the same name and the extension '.hpp'. It then processes this file as it would any normal input file. The '.hpp' file remains afterwards, so you can use it as the basis for further refinement if wanted. (You should call it something else, to avoid embarrasing mistakes.)
The file 'guru.def' is always inserted at the start of the newly-created file. You can modify this file as wanted, to tune the results of guru mode. You cannot choose another name for this file other than by changing htmlpp's source code, which I don't recommend.
Htmlpp looks for a file called 'guru.fmt' which may exist and which may redefine the various HTML tags it uses. A file 'guru_opt.fmt' is supplied in the htmlpp distribution; rename or copy this to 'guru.fmt' and change any values you want to (I'd suggest you remove anything that does not change, just to make things clear). I've made it work in this way so that if you reinstall htmlpp, you don't loose your work.
Htmlpp handles three levels of headers, H1, H2, and H3. In the text these look like this:
Chapter Header ************** Section Header ============== Subsection Header -----------------
The line following the header text must start with 3 or more asterisks, equals, or hyphens. There is no way to specify H4 or other headers. I recommend that you start the document with a chapter header.
You can also request a horizontal rule (<HR>) by putting four or more dots on a line by themselves:
....
The header text line must come after a blank line, or at the start of the document.
If your document contains at least two chapters, htmlpp will insert a table of contents before the second chapter header. This works best if the first chapter is empty or contains a brief text to introduce the document. Htmlpp inserts the table of contents by adding a section header called 'Table of Contents', and then a line '.include contents.def', in the normal manner. You should not call the first chapter 'Table of Contents'.
Htmlpp inserts a '.page' command before each chapter header. Therefore, use chapter headers wisely to break the document into usable pages.
The guru.def file normally includes 'prelude.def', which defines page headers and footers for the document. You will normally tune these for any project -- the supplied files contain references to iMatix URLs that may not be appropriate for your work. I like to use the same headers and footers (the same prelude.def) for all the files in a project, including those I that use guru mode.
A paragraph is anything following a blank line that does not look like something else. Basically, any plain text following a blank line is given a <P> tag. Note however the exceptions that follow...
If a line is indented by 4 or more spaces, or a tab, htmlpp treats the line as 'preformatted' text and inserts a <PRE> tag. You can mix blank lines with preformatted text.
A paragraph starting with a hyphen and a space is considered to be a bulleted list item. A paragraph starting with a digit and a dot and optionally a space is considered to be a numbered list item. You can put blank lines between list items, but it's not necessary. Cosmetically, when list items are short, blank lines are disturbing. But when list items are several lines, blank lines make the text more readable. Either way, htmlpp is happy.
A definition list is a line ending in ':' followed by some lines indented by one or more spaces. For example:
Definition: Explanation of definition.
You can put blank lines between definition items, but again, it's a matter of cosmetics. There should be a blank line before the first definition item, however.
Tables are one of the real pains of HTML markup, in my opinion. Here htmlpp tries to solve the most common case; a two-column table consisting of a term or value in one column, and an explanation in the second column.
A table can start with a header, which is a line like this:
Some column: Followed by some explanation:
Here, the colons (':') are important. Htmlpp also wants a captial letter at the start of both phrases, and a space after the first colon. The table header is optional; you can start immediately with table items. Either way, htmlpp needs a blank line before the table. A table item looks like this:
Some_word: Followed by some explanation which can come on several lines.
The first column must be a single word - if you want several words, use underlines. Htmlpp replaces these by spaces. The explanation can come on several lines, which must be indented by one or more spaces.
To insert a figure, use one of these conventions:
[Figure filename: caption] [Figure "filename": caption]
Htmlpp inserts a figure caption, numbering the figures in a document
from 1 upwards. The caption is followed by an tag to display the
file. You can use a URI (a path) as the filename, or an URL (with a
host name specifier); you must put an URL in quotes. My preference is
to put image files locally with the HTML files, and use a simple
filename without a path. This is just easier to manage and lets you
put the HTML files plus images in any directory. If htmlpp can find
the image you specify, and it's a .GIF or .JPG file, it will insert
the WIDTH= and HEIGHT= tags automatically.
To insert a plain image, omit the 'Figure' keyword. For example, these are all examples of valid images:
[Figure somefile.gif: caption] [somefile.gif: caption] [Figure somefile.gif] [somefile.gif]
If you use <name@address>, this is converted into a mailto: URL hyperlink. If you use <http://address/document> -- or any other URL -- this is converted into a hyperlink as well.
Htmlpp does not presently allow links within the document or to other documents.
Since you're not typing HTML, htmlpp replaces <, > and & by HTML metacharacters. < and > are used to indicate hyperlinks.
Htmlpp provides a number of intrinsic functions that you can use in your text. The syntax for using an intrinsic function is:
&function-name(arguments)
Formats the current date according to a picture that you specify. The picture can consist of any mixture of these elements:
Examples:
.echo &date('mm d, yy') --> Dec 2, 95 .echo &date('d mmm, yy') --> 2 Dec, 95 .echo &date("yymd") --> 9512 2 .echo &date("yyyymmdd") --> 19951202
Formats the current time in the same way as the $(TIME) symbol. The difference is that $(TIME) is set when htmlpp starts working; &time() reflects the current time.
Since version 2.00, htmlpp uses a multipass technique to allow embedded blocks. For example, you can place .include actions in the header or footer blocks, or define your own blocks that have .define, .page, and other actions.
Htmlpp handles this using the following rules:
One consequence of this is that htmlpp needs a minimum of 3 passes to fully process a document, one to collect all the titles; one to insert page headers and footers, and a last one to break the text into individual pages. If any genius can help me reduce this to two (or one!) pass, go ahead.
The upside is that you can do really funky stuff in headers and footers: for instance, the htmlpp pages build a document index in the footer, switching hyperlinks on and off to indicate the current page in the index.
To see what htmlpp is doing with its passes, use the -debug switch, like this:
htmlpp -debug filenameThis leaves a number of .wrk files lying around; these contain the result of each pass.
| << | <
| > | >>
| htmlpp - The HTML Preprocessor | Installing Htmlpp | Getting Started | Htmlpp Reference | Frequently Asked Questions | Other Information |
![]() Copyright © 1996-97 iMatix |