Biblical Language Fonts and Unicode

The following is an excerpt from a paper that I presented at SBL, Nov. 2001 on the use of technology in teaching Greek.

One of the major problems with using modern technology to teach Greek is with fonts and various technical issues related to how computers handle fonts–much of which is arcane or incomprehensible to most of us in the classroom.[1] We are in a transitional time technologically and standards continue to develop. The following section is more extensive than might at first seem appropriate in proportion to the rest of the paper, but this is a crucial area and deserves far more attention than has typically been given it by our particular academic community.

Since computers have traditionally used a standard typewriter-based keyboard for text entry, the non-technical user has assumed either that there have been only approximately 128 characters available (i.e., “lower ASCII,” those that can be directly accessed from the keyboard, shifted and unshifted positions), or, for those who managed to figure out how to access the “upper ASCII” characters (not intuitive for the non-technical user), 256 characters total.[2] While this is not really true, it is how many of us have viewed things.

Such a pragmatic and limited conception has worked adequately for standard English text entry since most users have, in the past, been concerned only with entering text and printing it out in an appropriate format, whether class notes, journal articles, or books. In this “standalone” format and for this limited purpose, it has worked. Even when we have tried to adapt this system to include Greek, we have managed to cobble together an approach which works (sometimes well, sometimes only marginally)–but again, primarily for standalone use in printing material. The complications have arisen when we have tried to share data between computers, in which case the cobbled solutions have proven inadequate. The rise of the web as a medium of information exchange and as a pedagogical environment has also highlighted the shortcomings of our system for “doing Greek” in biblical studies.

To date, most foreign language fonts in common use in English-speaking academic settings (in our case, classical and koine Greek) have functioned by using English  character positions (technically, the Western Latin character set) but foreign language (i.e., Greek) character shapes (technically, glyphs) instead of English. Thus the obvious equivalents (such as alpha = a, beta = b)  have been used along with some less obvious ones based on some similarity of appearance (e.g., omega = w; theta = q, etc.). Diacritical marks have been a constant problem in Greek. Some Greek fonts commonly used in our circles implement this by creating overstrike accents and breathing marks (i.e., the diacritical character appears over the preceding character). This overstrike (“combining characters”) approach works well in some software, but displays very poorly in others. A few other such fonts have tried using precomposed characters (i.e., providing a series of, say, alphas with different diacritics over each one). This severely presses the limits of easily accessible character positions and also results in the need to remember all the possible character combinations and their respective (physical) keyboard equivalent.

Either approach leaves open the problem of entering such characters since they cannot be accessed directly from the keyboard. Two possible solutions are available. The first, is a smart input system. This might be through the use of a keyboard utility such as Tavultesoft’s Keyman. I have not used this Windows-only utility, so I cannot comment on its usefulness.[3] [Later note: I have now begun using Keyman on an XP system and find it very well designed and efficient. You’ll also want the Classical Greek keyboard to work with Keyman for entering polytonic Greek Unicode text. For current details, see my main Unicode page.]

The second possible solution is to use some form of complex-script rendering technology which converts sequences of keystrokes into the appropriate, precomposed character or into the properly positioned sequence of each component element (character and diacritic/s).[4]

OpenType,[5] for example, if I understand its potential correctly, will allow a font designer to specify tables of characters that automatically convert a sequence of keystrokes into a precomposed character. For example, an alpha followed by an acute accent would be automatically mapped by the font to a character position (above the standard ASCII range) that is a precomposed character consisting of an alpha with an integral, properly-positioned acute accent.[6] This allows the user easy access to such characters from a standard keyboard (input) and also allows the font designer to correctly place the accent in relation to the underlying glyph.[7] If this materializes in a useful form and tools[8] become available for the creation of good quality Greek OpenType fonts by Greek scholars for free or inexpensive use by the scholarly community (rather than professional type designers and commercial developers at commercial prices), this could be a major boon for those of us who work with Greek text, particularly in a web environment.[9]

Assuming this scenario for the moment, some of us have concluded that such existing fonts pose a variety of problems, particularly in a Greek class environment. Many of these problems we see as cosmetic in the sense that they relate to readability on screen. For example, some such fonts are serviceable, but not well polished. Others are serif designs best suited for printed output (though it can be used in other contexts). Others have character assignments (using a Windows-standard installation) that vary more radically from what has become a fairly standard keyboard layout (at least in our academic circles) for basic characters other than alphabetic ones.[10] Some fonts have character shapes that are rather awkward and uneven. The most typographically polished fonts for koine Greek are expensive commercial fonts. Not only are these fonts too expensive to require of students, but licensing may be unduly restrictive, making them unusable for Internet course development using Adobe Acrobat (some commercial licenses forbid even subset, “print and preview” embedding in a pdf document!).

At this point I have settled on using a standard student font that comes with their textbook and is serviceable. I have been gradually moving all my course materials for classroom use to my own Galilee font since these materials are for video projection display purposes in the classroom or for reading on screen by Internet students. I have deliberately designed Galilee as a sans serif, thick-stem-width font to make it more legible in those settings. I am watching developments in Unicode and OpenType and will incorporate them when it is practical to do so.[11]  [Galilee is now available in Unicode format.]

What I have just described in the preceding paragraphs is probably the way that most of my fellow Greek teachers might evaluate the situation. It contains some truth and reflects several valid concerns. But it does not represent an accurate assessment of the situation. We non-technical users do not, at least in most cases, understand the underlying font technology and often assume that surface problems (e.g., typographical design and input complications) are the real problems. These two examples are problems, but relatively minor ones that have relatively straightforward solutions.[12]

The real issues lie elsewhere and the solution to them requires a more technical discussion. The currently emerging standard is Unicode which provides a very large number of character assignments, including the glyphs from multiple languages, allowing a single encoding to represent and distinguish hundreds of different scripts in the same document.[13] In preparing this paper I have explored these font issues and the Unicode-related solutions. The following explanation of Unicode has been provided by Peter Constable of SIL International, one of the relatively few people who are conversant with the biblical languages and with Unicode and is included here with his permission. It is very important that anyone planning to adapt technology for use in Greek pedagogy come to grips with this information, especially if the intended delivery medium includes the web.

Excursus on Unicode (Peter Constable, SIL)

Unicode provides a key building block that is solving these font problems and making it possible to have software that knows how to handle both English (and other Latin text), and Greek text. So, what does it take to take advantage of these benefits offered by Unicode? There are four things that all need to fall into place: data, keyboards, fonts and applications. We’ll look briefly at each one in turn.

First, we need to create data that follows Unicode’s definitions for how characters match up with the numerical codes used to store them. Because Unicode is far more comprehensive than the old 8-bit character sets we have been used to working with, it’s also a bit more complicated. One problem related to switching to Unicode is that we have lots of data that isn’t in Unicode. That’s one of the challenges that we’ll have to overcome, and there are some tools for this that are starting to appear.

The next thing we need is keyboards.[14] There is good news here at least for Windows users: Windows XP comes with keyboards that can be used for entering polytonic Greek data encoded in Unicode. Not everyone will necessarily like the default keyboard layout, but programs like Tavultesoft Keyman can be used to design customized keyboard layouts.

The third thing we need is fonts. The fonts that we have been using will not work with Unicode because they have the Greek character shapes on the character codes for Latin characters. We will need fonts that have been designed to conform to Unicode encoding.

There are more font-related issues, but before we can talk about them, we need to explain a complication with Unicode that has to do with pre-existing standards. Because Unicode needed to be backward compatible with earlier standards, it supports two ways of dealing with Greek diacritics: either using separate “combining” characters for the diacritics, or using characters for precomposed combinations. So, for example, an alpha-acute can be represented in Unicode as an alpha (character code U+03B1) followed by a combining acute (U+0301), or it can be represented by a precomposed character alpha-acute (U+03AC). Of course, this has implications for our data and keyboards. Keyboards need to be designed to work one way or the other. As for the data, we need ways to deal with the possibility that the same thing can get represented in more than one way. This all sounds rather complex, and it is, but fortunately Unicode not only inherited this problem but has devised ways for software developers to solve them. Again, the technical details go beyond the scope of our discussion here.

This matter of having either non-composed or precomposed ways of representing Greek base plus diacritic combinations has implications for the fonts that we need. If data is encoded using the precomposed way of doing things, then the fonts just need to have shapes (“glyphs” in technical terms) for each of those precomposed combinations. That is not all that difficult. If data is using the non-composed way of doing things, though, then we are faced with the problem of overstriking diacritics: some combinations will not look good, and in some applications none of the combinations will look good.

Unicode normally assumes that scripts that have diacritics are encoded using the non-composed approach. It is intended to be used with software that automatically handles issues of making the diacritic combinations look good. Some software builds this support directly into the application, but most new software programs use standard font technologies known as “complex-rendering” or “smart-font” technologies. There are three of these technologies being introduced by different vendors.

The most commonly used of these is OpenType, developed jointly by Microsoft and Adobe. OpenType requires special software support. In Microsoft products, this is provided in a component known as Uniscribe. One of the aspects of Uniscribe is that support for particular scripts needs to be specifically written into the software. As a result, OpenType support with Uniscribe has been included with Microsoft applications starting with Internet Explorer 5.0 and Office 2000, but this did not include support for polytonic Greek. The Uniscribe software is being updated by Microsoft, however, and newer products will be able to support polytonic Greek quite well.

Apple computer has a different font technology known as Apple Advanced Typography (AAT), which has been part of the Mac OS since OS 8.6. Unlike OpenType, AAT places all of the script-related support inside the font. Thus, if you’ve got the font, you don’t have to wonder whether the particular version of the Mac OS will support that script or not.

The third font technology is known as Graphite, and has been developed by SIL International. The initial version has been written to run on Windows (it is now open source, and there has been talk of a port to Linux). The main difference between Graphite and OpenType is that Graphite follows AAT in putting all of the script-specific support into the font. (This was done since SIL needs to be able to work with minority scripts on Windows without being dependent on Microsoft having added support for that script to the Uniscribe component.)

So, we need fonts that conform to Unicode, and for the non-composed way of storing data, we need fonts that use one of the “smart” font technologies to deal with getting all the diacritics in just the right place. This takes a greater amount of development effort than the earlier fonts did, but once it is all working our lives will be a lot easier.

The last thing we need is applications. We need applications that support Unicode-encoded data, but we also need applications that can work with the new “smart” font technologies. For AAT and Graphite, the fonts will not work at all unless applications have been specifically written to support those technologies. This has a significant consequence in that, so far, there are very few applications that support these technologies, and none of our existing applications can benefit from them. This will be of particular concern to Mac users, since AAT is the main solution on that platform but hardly any applications exist so far that support it.

For Windows users, the situation is much better. First, Uniscribe and OpenType support have been incorporated into Windows 2000 and Windows XP so that most existing applications will immediately gain at least some benefit from whatever level of script support is provided in Uniscribe and the OpenType fonts the user has on their system. Moreover, all of Microsoft’s major applications–Internet Explorer, the entire Office Suite, Publisher and FrontPage–all provide full support for Uniscribe. For people wanting to work with polytonic Greek in those applications, they only need to wait for the right version of Uniscribe that supports Greek, and if they are working with data that uses the precomposed way of doing things, they can start working with these applications right away.

An additional note on working with Unicode-encoded Greek on web pages: the reader must have a browser that supports Unicode. Recent versions of browsers (at least Internet Explorer and Netscape) do support Unicode. The reader also must have appropriate fonts, and appropriate “smart” font rendering support (if combining diacritic characters are used). Unicode-conformant Greek fonts are starting to appear, and Microsoft may well bundle some adequate fonts as they update IE and Windows (they are tending to do this as their developers add support for more and more scripts). Also, it’s important to note that the World Wide Web Consortium recommends that text data on the Web use precomposed characters where they exist in Unicode (technically, Unicode normalization form C). What this means is that “smart” font rendering actually isn’t needed for polytonic Greek. (That’s good news for many people using browsers other than Internet Explorer on Windows.)

Of course, you also need a way to create those web pages, and know how to deal with the character encoding issues in the HTML editor. There are some important technical details (HTML numeric character references, charmap settings, Unicode character encoding forms and UTF-8) that would be too involved to cover in a brief overview. What is worth noting is that some ways of creating the HTML pages will work better than others.

For example, let’s assume that you have a keyboard that can generate Unicode Greek characters. You could use this to enter text into Word 97 and later, and from there export HTML pages that will work. On the other hand, if you try keying text directly into FrontPage 2000, you’ll probably find that it won’t accept them.[15] If you can create the text, FrontPage 2000 will view them without a problem; it just isn’t set up to receive polytonic Greek from the keyboard. This is fixed, however, in FrontPage 2002.

There are additional complications you’ll run into with keyboarding: Windows 95/98/Me have significant limitations in this regard. There are ways to enter Unicode polytonic Greek on those platforms, but few apps support them. The same situation occurs on the Mac. If you create a keyboard layout using Keyman 5, you can key Unicode polytonic Greek into Word 97 and later on Win9x/Me. There are not many other options for those platforms. I don’t know of any options at this time for creating Unicode polytonic Greek in HTML on the Mac. There may be something out there, but it would probably have to be something that was developed for the NextStep platform and ported to the Mac. The best platform for working with polytonic Greek at this point is probably Windows XP or Windows 2000.

So, there are at least some ways to create the pages. Many users will have adequate browsers, but they may not yet have the fonts. Indeed, there isn’t yet much of a selection of Unicode-conformant fonts for polytonic Greek. It’s not all in place yet, but we’re getting there. [16]

[1] Please note that this simplified summary is intended for the non-technical user, not for the font designer or programmer. Thus keyboard and “keys” is the way the non-technical user (i.e., the average Greek instructor) views this situation. There are technical distinctions that should be made between encodings, codepoints, codespaces, fonts, characters, glyphs, etc. that are not defined here in technical terms, though I have tried to reflect the basic notions in a reasonably accurate way. Also please note that in the first part of this section I have tried to describe the situation as non-technical users typically view it; in the second part I have attempted to provide some correctives and more technically accurate information to move us towards some real solutions. Even these, however, are not complete. More detail is necessary, but I have already extended this section beyond proportionate limits within the overall purpose of the paper. I want to express my appreciation to Peter Kirk, Patrick Rourke, and especially to Peter Constable of the Biblical-Languages List for their assistance and patience in helping me understand some of the technicalities that are included here. Misrepresentations that remain are, of course, my own. 
[2] It is possible to access upper ASCII characters from both Macintosh (through use of the option key) and Windows applications (and, I am told, even from DOS), but a standard installation of Windows does not make this at all easy or intuitive. In MS Word running under Windows 95/98/ME, e.g., one accesses the upper ASCII characters by pressing LeftAlt-zero (on the keypad) and then typing (with the Alt key still down), the 3-digit numerical ASCII character code (Num Lock must be on). This may work in other programs also. In any event, this requires a four-keystroke, arcane, non-mnemonic sequence to enter a single character.
[3] Those who have are quite pleased with the results, but given the context of this paper, it should be pointed out that it is unrealistic to expect students to master the use of such a utility for purposes of learning Greek. It may serve a helpful purpose in terms of the instructor preparing materials that the student only views (e.g., on screen). It is important to remember that although many more current students use computers and would, perhaps, call themselves “computer literate,” that seldom describes the proficiency necessary to install and use such utilities at Keyman. I have enough difficulties getting some of my students to install the Greek font necessary to read the materials and take the online quizzes!
[4] The current list of candidates include OpenType (Microsoft and Adobe), Apple’s AAT (Apple Advanced Typography), and Graphite (SIL International). See further information in Peter Constable’s excursus below. Either conversion option described above would work for koine Greek; there are probably advantages to one or the other in other languages such as Arabic which has a large number of characters that have multiple forms depending on their position in a word. Using precomposed characters requires a larger number of glyphs in the font. Specifying the proper positioning of each of the separate, decomposed characters requires fewer glyphs but involves a greater complexity in specifying the relative positions of each possible combination.
[5] The OpenType font format is an extension of the TrueType font format with added support for PostScript font data. These fonts are also referred to as TrueType Open v.2.0 fonts. For more information, see Adobe’s and Microsoft’s typography pages. (OpenType was developed jointly by Adobe and Microsoft). More technical OpenType information is also available.
[6] As Peter Constable explained it to me, OpenType “transforms character sequences into glyph sequences. It might take a character sequence <alpha, acute> and output a single glyph; in another font (implemented in a different way), it might take that same sequence and output a sequence of glyphs <glyf_alpha, glyf_acute> with a set of positioning coordinates” (biblical-languages list post, 11/08/2001).
[7] This is not easier for the font designer–it is a more complex design process with many more variables to define, but it is surely easier for the end user. Other possible features of such smart font technology include such things as automatic final sigmas whenever a sigma is followed by a space or punctuation mark–though this could be counterproductive in some instances for designing first year teaching materials where this is not a desirable behavior in some instances. Of course, the same font technology could also define a non-transforming sigma for such instances.
[8] Tools for working with OpenType are just becoming available; FontLab v. 4, released December 2001, provides the first such “friendly” tool for this technology of which I am aware. (Both Microsoft and Apple have highly specialized font tools available, but they are intended for professional type designers.) I have been working with FontLab for the past year and have found it to be relatively easy to learn and use. FontLab appears to be the premier font creation tool in the “under $1,000.00 range,” surpassing the former dominance of Fontographer (which has not been updated in many years). There are, of course, much more sophisticated tools used by commercial type houses, but these programs, if commercially available, cost thousands of dollars.
[9] I have developed a basic Greek font (“Galilee”) that I plan to move to the OpenType format as soon as that is feasible. It will be freely available for use by the scholarly community. Information on the current beta version in TrueType format may be found on the font page. The beta version is available for download from the same page. One of the goals for this project is to design a font whose glyphs are optimized for screen display (either projected or on the web) rather than primarily for printed output. Other fonts are currently under development by various parties.
[10] Such problems as character assignments can, of course, be corrected by use of a keyboard utility, etc., but that is not a helpful solution in the context of teaching Greek due to the lack of technical sophistication of the average student. The goal is to keep technology from being any more intrusive than necessary. If a student can’t use is “straight out of the box,” then he or she isn’t likely to be highly motivated to learn how to modify it. There are always the exceptions–students who love to tinker with such things, but courses, particularly online courses where the student isn’t physically present to help sort out such complexities, should not be designed with that sort of student as the model.
[11] At this point the beta version of Galilee still uses the traditional tactic of using Greek glyphs for Latin character positions; it is not encoded for Unicode. Assuming that becomes feasible in the near future, I would still face the challenge of converting substantial quantities of my existing teaching materials material (in excess of 100 meg) to use a Unicode font for Greek. This is perhaps the greatest challenge for those who have already developed teaching material for Greek using any of the current fonts.[[12] The typographical design issues are solved with a font editor such a Fontographer or FontLab (though remember that only the original font designer/owner can legally distribute a modified font) and input issues can be addressed with a keyboard utility such as Keyman. Both of these solutions, while “straightforward,” are neither easy nor intuitive for the typical user (professor or student). Some are also expensive (i.e., the cost of font design software).
[13] More technically, Unicode allows large codeset fonts to use the same encoding standard for multilingual data. There have been systems devised previously to enable more than 256 character positions in a font both on Mac and Windows (e.g., Far East Windows used double-byte encodings). Initial Unicode standards allotted two bytes per character rather than one byte, allowing 2562 possible characters: 65,536. The current Unicode specification apparently allows more than a million possible characters using multi-byte encodings. For more information on the technical issues surrounding Unicode Greek fonts, see “Unicode Polytonic Greek for the Web,” by Patrick Rourke (his specialty is classical Greek). Note that this book-length document is being revised (the draft posted as of this writing is v. 0.93).
[14] The reference here is to logical keyboards, not the physical keyboards on which we type.
[15] There is a way you could get it to accept Greek characters other than the diacritics, i.e. what would be needed for Modern Greek, but since that’s not what this audience wants, there’s not much point describing it.
[16] Peter Constable, Non-Roman Script Initiative, SIL International, Dallas, TX, personal correspondence, 11/8/01; cf. the SIL website. I have added one note and made a few minor revisions in wording for stylistic purposes.