Donmai

Unicode Tags

Posted under Tags

I made a list of tags that contain unicode characters, since they're generally not supposed to be used. These tags have 1 or more posts in them. Some of them use invisible characters or letters that look non-unicode.

arai_harumaki‎
ero_doll_☆
└(^o^)┐≡
n℃
zelos/0s
urikohime_(kagami_no_machi_no_kaguya)
n●va_sacchin
ikm♂


miyabi★mt.b
ikusa∞
ane_suku_~_wagamama_!_ane_to_sukumizu_yuuwaku_tokkun
yama_(two:five&seven)
aki⑥
halno_ichigo
naked_gun_33⅓
36号
9193℃
come♥uma
(v)・∀・(v)
saiga_(祭画)
4℃
oyu​nom​i_i​i
ca(asoo2)2aq
neko_neko_☆_super_fever_night_(vocaloid)
n→r
gogo_no_abatī
kiyose_kenkyū-in
blaze(改名考案中)
para☆iso
‪hakaokuri_no_uta_(vocaloid)‬
p@sta
(。•ㅅ•。)
у
jounetsu_no_lovers_♂
jounetsu_no_lovers_♀
focke‐wulf

These tags are not unicode but are under 0x008F in the first character block or really shouldn't be there in the first place. None of them have more than 4 posts:

surströmming
février

eden*_they_were_only_two_on_the_planet
re°
kaniehtí:io
tim_löchner
mai_françois_joy
majimbù
°w°
n*madchen
nn*
mikaël_aguirre
jérémie_périn
nazgûl
eiki`
sweet³_room
shi*ki_(sm041yh)
josé_luis_rosado
michaël_crouzat
faldhatée
shenyíng_ehuo
saara_mäkinen
nürburgring
manuel_xavier_rodríguez_erdoíza
let’s_try_bass_fishing
jamie_(let’s_try_bass_fishing)
fortissimo_exs//akkord:nächsten_phase
déméchrelle_(felarya)
mistéxpi
acetea_rémi

They use one or more of following characters: *`’•–°³äçéëíöùûü

Kudos to ghostrigger for fixing the majority of them!

Updated

These next tags use characters that are VERY rarely used, at least compared to the rest. They are: "#$[]}

" is used by 12 tags with a total of 1098 posts - don't_say_"lazy" has 986 of them
# is used by 22 tags with a total of 99 posts
$ is used by 4 tags with a total of 49 posts
[ is used by 14 tags with a total of 179 posts
] is used by 15 tags with a total of 180 posts
} is used by only 1 tag with a total of 13 posts - :}

Since I'm not entirely sure if these tags may be changed, I have them pasted here: http://paste2.org/Hwt3GjnC . If they are, I can post them here directly.

They all appear to be fixed now. I'll check for them again next month.

I did fix a few of them, but I only changed what I had confidence in; anything else I left for the staff.

As for the other characters I mentioned, should the [ ] brackets be changed to ( ) parentheses  for consistency?

ROMaster2 said:

Checked out last month's tags, all good but two. rhm-borsig_waffenträger was one but it was fixed before I checked it.

The other one I can't search for at all. It's " ". It's this character specifically: http://www.fileformat.info/info/unicode/char/3000/index.htm

If it can't be found I may be able to find it when I format all the posts I crawled and search manually.

The reason you can't search for the ideographic space is because the site normalizes them to become regular spaces.

That tag you're talking about must have been created before the site started normalizing this.

Toks said:

The reason you can't search for the ideographic space is because the site normalizes them to become regular spaces.

That tag you're talking about must have been created before the site started normalizing this.

But it wasn't present the month before.

EMUltra3 said:

But it wasn't present the month before.

And aren't you using /cache/tags.json? That was outdated by several months for a while, which would explain why things weren't present.

Updated by Log

Toks said:

And aren't you using /cache/tags.json? That was outdated by several months for a while, which would explain why things weren't present.

No, I crawl http://hijiribe.donmai.us/tags.xml monthly. I'll find the post once I format the posts I crawled.

By the way, the tag number is 507708. That means it's existed before last month but someone used it as a tag for some reason.

1