I just tried to tagscript some Cirno images missing ⑨, and it broke fabulously. It seems it applies the "h" tag instead. I dunno why that happens, the only thing that comes to mind given the name "h" is rails being stupid and somehow leaking the html escaping function into the output instead of the actual tag, but that's just a wild guess. I can't reproduce it with other tags, however, at first I thought it was simply a matter of being in codes of uni, but a simple test with "å" shows it works fine.
For the time being, I can patch it up with a mass edit afterwards, but ultimately I'd like it, well, not to be broken.
It's more likely the javascript which handles "edit tag script". If you try putting "⑨" into the dialog, and then selecting "edit tag script" again, it's already turned into "h".
Yes, it seems to be cutting off the first byte of multibytes - I tried putting "▒" (0x2592 in unicode) into "edit tag script", and when I selected the option again, it had turned into 0x0092. Note also that "⑨" is 0x2468 and "h" is 0x0068.
And "å" is 0x00E5, which is why it's not mangled. Unicode seems to match up with Latin-1 in the least significant byte, and å is in Latin-1. Interesting... well, I knew Unicode matched up with ASCII in the least significant 7 bits.
Then maybe it's a particular browser's implementation of javascript. I'm using Firefox 3.0.4, fwiw. Also fails to work in my old Opera 9.26 installation, but apparently they rewrote the entire Opera javascript engine in 9.50.
Since even just entering it breaks on FF, that must be a problem with cookies. The script text is stored in a cookie as returned by the dialog, so either storing, retrieving, or parsing the cookie fails. I suppose it should really be url-encoded when stored in the cookie, instead of as verbatim, since retrieving it url-decodes the value.
Apparently FF just doesn't support non-latin1 characters in cookies.
Oh, you're right. See also https://bugzilla.mozilla.org/show_bug.cgi?id=259987 , which is an extremely old bug, it looks like. Opera 9.6 stores the cookie as "â¨" (which are characters 0xE2, 0x91, and 0xA8, represented as 0xC3A2C291C2A8 in UTF-8), but seems to be able to retrieve it relevantly.