Handling text transformations

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Handling text transformations

Robert O'Callahan-3
nsTextTransform is used by nsTextFrame (and other places) to do a few
things. The primary mode used by nsTextFrame is to traverse through a
string a "word" at a time, detecting word breaks and transforming each
word as instructed. The transformations include CSS text-transform (e.g.
"capitalize"), CSS whitespace collapsing, discarding of various
characters we wish to ignore, and conversion of   to whitespace.

This is all somewhat confusing to implement since nsTextFrame ends up
working with both pre-tranformed and post-transformed strings, in both
Unicode and ASCII formats, and handling the mapping between them. The
transformed strings can be either shorter (because of character
discarding/whitespace collapsing) or longer (because of uppercasing of
the szlig).

In the brave new world of gfxTextRun I propose to move all actual text
transforming out of nsTextFrame into gfxTextRun, as follows:
-- CSS text-transforms will be implemented as special gfxTextRun
implementations that wrap the underlying platform implementation. The
idea is that text-transform simply means that you get different glyphs
(and possibly clustering) for the text.
-- gfxTextRuns will be required to treat   as a regular space
character, always.
-- All the other transforms can be boiled down to deleting certain
characters from the string. I'm going to have the gfxTextRun interface
take as input a list of characters to skip (represented efficiently).
These skipped characters should simply be treated as if they were not in
the string; they don't generate glyphs or affect rendering or metrics in
any way. The string offsets passed into and out of gfxTextRun will still
be in terms of the original string, however.

The expected advantage is that nsTextFrame will only have to deal with
the DOM text and offsets into it, which should simplify that code a lot.
Also there might be performance advantages. For example for Pango we
have to convert the input text to UTF8, and we can strip out the skipped
characters at that stage, eliminating an unnecessary copy and dynamic

It does mean that some code will have to be duplicated across the
gfxTextRun implementations, in particular code for mapping string
offsets from the original string to the collapsed string and back, but
I'm writing a helper class to ameliorate that. The decision as to what
characters to skip is still up in layout.

nsTextTransform will still be used in some places and I'll be leaving it
around for now although I expect some of its code paths will no longer
be used.

Even if we kept using nsTextTransform in nsTextFrame, the interfaces
would have to change quite drastically because with gfxTextRun,
nsTextFrame will no longer do word-by-word processing of the text
(except for stuff like intrinsic min-width calculations), so
nsTextTransform::GetNextWord doesn't make sense.

dev-tech-layout mailing list
[hidden email]