Unicode in X11

2018-03-23

Why I'm writing this

I've been trying for some time to figure out how to display text written in other languages (e.g. Chinese) in X11 Windows. This is very poorly documented. I have recently been successful and here is what I have learned.

What was available

The standard text, Xlib Programming Manual predates Unicode and isn't much help. There's a man page (man Xutf8DrawString(3)) but actual code would be more helpful. I found two pages with code which were helpful: xtut7.c (linked to from Xlib tutorial part 7 -- FontSets) and LECTURE 11 日本語文字列を表示する.

How to draw Unicode in C

    XDrawString(display, window, gc, x, y, string, strlen(string)));
  
should be replaced by
    XmbDrawString(display, window, fontSet, gc, x, y, string, strlen(string)))
  

An XFontSet contains one or more fonts, and can be created by

    char **missingList;
    int missingCount;
    char *defString;
    XFontSet fontSet =
        XCreateFontSet(display, fontName,
                       &missingList, &missingCount, &defString);
    if (fontSet == NULL) {
	printf("Failed to create fontset\n");
	return;
    }
    XFreeStringList(missingList);
  

fontName is a null-terminated string with a value like "-jis-fixed-medium-r-normal--16-110-100-100-c-160-jisx0208.1983-0" but it can also contain wildcards, e.g. "-*-fixed-medium-r-normal--16-*-*-*" or even "-*-*-*-*-*-*-*-*-*-*-*-*-*-*". You can also use several comma-separated font names. In GNU/Linux, the program xfontsel detects the fonts available on your machine and allows you to compose font names.

It's best to have as few fonts as possible in your font set, otherwise you can run into performance problems, so wildcards and comma-separated lists of font names should be avoided if at all possible. Sometimes, however, it's difficult to identify the specific fonts X11 is using. Also, even when I do completely specify the font name, for reasons unknown to me, I usually get a font set with not one but two equivalent fonts with the same name but different memory addresses. When that happens, it's still safe to use the first. To extract the fonts, call XFontsOfFontSet:

    XFontStruct **fonts;
    char **fontNames;
    int numFonts = XFontsOfFontSet(fontSet, &fonts, &fontNames);
  

You'll also probably need to know the dimensions of your text in the window it's displayed in:

    int fontAscent = fonts[0]->ascent;
    int fontDescent = fonts[0]->descent;
  

If you have multiple fonts, you'll need to iterate through fonts and pick the one with the largest ascent, and the one with the largest descent:

    int i, fontAscent = 0, fontDescent = 0;
    for (i = 0; i < numFonts; ++i) {
	if (fonts[i]->ascent > fontAscent) {
            fontAscent = fonts[i]->ascent;
        }
    }
    for (i = 0; i < numFonts; ++i) {
	if (fonts[i]->descent > fontDescent) {
             fontDescent = fonts[i]->descent;
        }
   }
  

If you have a single font, you can use XTextWidth as you did with 7-bit ASCII:

    long textWidth = XTextWidth(fonts[0], string, strlen(string));
  
but with multiple fonts, you'll need to call
    long textWidth =
        Xutf8TextEscapement(fontSet, string, strlen(string));
  
Xutf8TextEscapement can literally take seconds to execute per call.

Right to left

If you want to draw Hebrew or Arabic text, there's a further hurdle: drawing from right to left. Unpointed Hebrew characters are two bytes in UTF-8. Starting at the beginning of the word, you decrease x by one character width, pass the next two bytes to XmbDrawString, and repeat. I presume it's the same for Arabic and Farsi.

Results

The C/C++ code shown above is based on code in my Emblem virtual machine. The code which runs on it, and opens and outputs to the Canvas shown below, is:

(setf CANVAS (new Canvas))
(openCanvas CANVAS)

;; Chinese
(drawText (windowOfCanvas CANVAS) (gcOfCanvas CANVAS) 100 100 BLACK
          "你好吗?"
          (findFontSet "-*-*-*-*-*-*-*-*-*-*-*-*-*-*"))

;; Korean
(drawText (windowOfCanvas CANVAS) (gcOfCanvas CANVAS) 100 150 BLACK
	  "안녕하십니까?"
	  (findFontSet "-*-*-*-*-*-*-*-*-*-*-*-*-*-*"))

;; Japanese
(setf FIXED16 (findFontSet "-*-fixed-medium-r-normal--16-*-*-*"))
(setf GREETING "こんにちは、お元気ですか?")
(drawLine (windowOfCanvas CANVAS) (gcOfCanvas CANVAS)
	  100 200 (+ 100 (textWidth FIXED16 GREETING)) 200 RED 1 0)
(drawText (windowOfCanvas CANVAS) (gcOfCanvas CANVAS) 100 200 BLACK
	  GREETING FIXED16)

;; Hebrew
(drawHebrewText (windowOfCanvas CANVAS) (gcOfCanvas CANVAS) 100 250 BLACK
		"שלום"
		(findFontSet "-misc-fixed-medium-r-normal--20-200-75-75-c-100-*-1"))
UnicodeInX11.png

Up

© Copyright Donald Fisk 2018