Rectangle 27 0

@Bryan Oakley I do not think font is the problem here, but its rendering is. For example, when I type two unicode characters u0BAE and u0BC6, it should be combined as a single Tamil character displaying "". But I think rendering engine is not present in tkinter for displaying some unicode languages.

@Vamana Yes, Indian languages have a 'combined single character notation', which require two unicode characters as I said above. When I type, say, charA, then charB, display should render into a single character, say charBA. But it displays charAB(which is wrong).

@schlenk Yes you are correct. I initially used IDLE, then tried running python in linux console, both rendered Tamil text wrongly for display. Hence I came to tkinter. Now, it's also in vain. I am currently using file IO. Now I think I should learn how to make a simple web page using python for input and output so that browser would render correctly.

python - tkinter cannot display unicode characters correctly - Stack O...

python unicode tkinter tamil
Rectangle 27 0

I managed to figure it out for myself. The comment above was helpful, but I still needed a way to concatenate the unicode characters because I was building a much larger list of characters.

To concatenate the unicode characters, I found pythons built in unicode function.

I returned the superscript like the above linked thread demonstrated:

def get_superscript_unicode(n):
    codes = {
        1 : u"\u00B9",
        2 : u"\u00B2",
        3 : u"\u00B3",
        4 : u"\u2074",
        5 : u"\u2075",
        6 : u"\u2076",
        7 : u"\u2077"
    }
    return unicode(codes[n])
for item in list:
    s += unicode(get_superscript_unicode(n)) + unicode(other text)
return unicode(s)

I probably have one too many calls to unicode there. I kind of quickly took out the relevant pieces of a much more complicated string being built.

displaying subscript/superscripts in python tkinter listbox with unico...

python unicode tkinter notation
Rectangle 27 0

The issue is that, at some point, there is a conversion of the unicode symbol into a particular sequence of bytes using an encoding that does not support that particular character (which causes you to get the replacement character instead, which happens to be a ? for this particular conversion).

The core of Tk is Unicode-aware and at least the initial stage of scripting will be using UTF-8; the character is (well, almost certainly) getting through from the keyboard and Windows correctly. What happens then is that the character is conveyed to the Python layer; I don't know that part of Tkinter very well, but it is where I suspect the problem is (e.g., if the wrong type of string is being generated). In other words, it smells like it might be a subtle Tkinter bug. (By comparison, Tcl's internal notion of strings is entirely Unicode-aware, which I rely on in my code rather a lot and have done for many years. This definitely has some trade-offs, and I know that Python's choice among those trade-offs was different.)

You can check further by seeing what exact type of string you've got. It should be a Unicode string or you'll be forever having problems with this sort of thing (some platforms and deployments must natively deal with far more than 256 characters).

Thanks, this is a great start. So, any unicode issues is really a problem with how Tkinter is using TCL/TK; ultimately the UTF-8 string is getting dropped somewhere after it gets to Tkinter?

Hi, I realized that if this answer is checked then it would limit others from answering more locations where a question mark is inserted. For example, I read somewhere in the TCL/TK documentation that if there is a problem finding the font for a given character than a question mark will be used instead.

@BiagioArobba I'm guessing that the problem is at the point where the string out of Tk is turned into a Python string, and that it is either using the wrong encoding or converting to the wrong type of string at that point. I've only ever very briefly skimmed the implementation of Tkinter I'm a Tk maintainer, not a Tkinter one and not this specific bit, so I really don't know what's going on. I just know what the problem smells like

@BiagioArobba And the replacement where Tk can't render a character is not necessarily ?; that really depends on the platform. (On Windows, in my experience it's typically a hollow box.) The ? is more common as a replacement character when converting from Unicode to an 8-bit character set ( is what you get going the other direction). Which isn't actually a Tk operation; Tk is internally Unicode-aware (but only in the BMP that's a standing bug)

python - Where in the Tkinter application stack is a question mark ("?...

python utf-8 tkinter tk
Rectangle 27 0

I had faced similar problems and discovered I used the Zero Width Joiner (U+200D) to explicitly tell the rendering engine to join two characters. That used to work in 2010 but looks like there have been changes in the rendering engine (that I am now aware of) and now in 2011 I find that having the joiner creates the problem ! (It broke my working code) I had to remove the explicit zero width joiners to have my code work again. Hope this helps.

python - tkinter cannot display unicode characters correctly - Stack O...

python unicode tkinter tamil
Rectangle 27 0

I assume one of the sequences that do not show correctly are the codepoints: 0BA9 0BC6 (TAMIL SYLLABLE NNNE), where 0BC6 is a reordrant class zero combining mark according to the Unicode standard, which basically means the glyphs get swapped.

The only way to fix it is to file a bug at the Tk bug tracker and hope it gets fixed.

Fixing it would probably be quite a task, needing something like pango or the windows equivalent to render Tamil correctly.

python - tkinter cannot display unicode characters correctly - Stack O...

python unicode tkinter tamil
Rectangle 27 0

It's hard to diagnose a program without code. See if you can boil down the code to something short that exhibits the problem, and post that.

I'm not familiar with Tamil glyphs, and they're pretty small, but looking at the screenshots, it looks like all the glyphs are there but certain glyphs are getting swapped, am I right?

(Hmm, I guess this should have been a "comment", not an "answer". Still finding my way around this site.)

python - tkinter cannot display unicode characters correctly - Stack O...

python unicode tkinter tamil
Rectangle 27 0

The issue is that, at some point, there is a conversion of the unicode symbol into a particular sequence of bytes using an encoding that does not support that particular character (which causes you to get the replacement character instead, which happens to be a ? for this particular conversion).

The core of Tk is Unicode-aware and at least the initial stage of scripting will be using UTF-8; the character is (well, almost certainly) getting through from the keyboard and Windows correctly. What happens then is that the character is conveyed to the Python layer; I don't know that part of Tkinter very well, but it is where I suspect the problem is (e.g., if the wrong type of string is being generated). In other words, it smells like it might be a subtle Tkinter bug. (By comparison, Tcl's internal notion of strings is entirely Unicode-aware, which I rely on in my code rather a lot and have done for many years. This definitely has some trade-offs, and I know that Python's choice among those trade-offs was different.)

You can check further by seeing what exact type of string you've got. It should be a Unicode string or you'll be forever having problems with this sort of thing (some platforms and deployments must natively deal with far more than 256 characters).

Thanks, this is a great start. So, any unicode issues is really a problem with how Tkinter is using TCL/TK; ultimately the UTF-8 string is getting dropped somewhere after it gets to Tkinter?

Hi, I realized that if this answer is checked then it would limit others from answering more locations where a question mark is inserted. For example, I read somewhere in the TCL/TK documentation that if there is a problem finding the font for a given character than a question mark will be used instead.

@BiagioArobba I'm guessing that the problem is at the point where the string out of Tk is turned into a Python string, and that it is either using the wrong encoding or converting to the wrong type of string at that point. I've only ever very briefly skimmed the implementation of Tkinter I'm a Tk maintainer, not a Tkinter one and not this specific bit, so I really don't know what's going on. I just know what the problem smells like

@BiagioArobba And the replacement where Tk can't render a character is not necessarily ?; that really depends on the platform. (On Windows, in my experience it's typically a hollow box.) The ? is more common as a replacement character when converting from Unicode to an 8-bit character set ( is what you get going the other direction). Which isn't actually a Tk operation; Tk is internally Unicode-aware (but only in the BMP that's a standing bug)

python - Where in the Tkinter application stack is a question mark ("?...

python utf-8 tkinter tk