Donmai

Punctuation etc.

Posted under General

I am afraid we have another debatable issue with latinization: guidelines for spacing and hyphenation.

The following examples should prove that Danbooru latinizations lack consistency at the moment, even if every tag might have been created following some valid norms:

All three possible variants for 'rose madder colored' there. Furthermore, I once tried to figure out the "correct" roomaji spelling for the latter game title by comparing different sources, and ended up with Aki no Urara no: Akaneiro Shoutengai.

(Note: the English-language Wikipedia has articles entitled "Akaneiro ni Somaru Saka" and "Shōtengai".)

I expect a few people on Danbooru with some knowledge of Japanese to have opinions on this...

Updated by jxh2154

In my opinion, writing such words as akaneiro together may be misleading because of the ei group which is not long e vowel here. Situation seems similar to case of single n , when we add apostrophe to distinguish it with normal na/no/nu/ne/ni syllables.
Thus, I'm against writing it together. Preferrably with hyphen or separately.

We had a rather large argument about utawareru_mono vs. utawarerumono in the past, so I'm afraid that a consensus on spacing might be hard to find.

richie has a good point about hypenating to distinguish letter combinations that don't make the sound they normally make. I'm not sure how to make that principle into an actual rule for hyphenation, though.

I still don't see the logic in writing a verb modifier and noun together as if they were one word, but...

In this case, I'm leaning towards "akaneiro" because 茜色 has effectively become a single lexical word (like 黄色 for "yellow"). Also, I don't like the hyphen.

Personally, I don't feel that long vowel confusion is something we need to worry about; if someone wanted to know the Japanese spelling there are a hundred places where they could look it up.

richie said:
stuff

No. That makes no sense. Do you want to write the English word "delineate" as "deline-ate" because the "ea" is not a diphthong? I assume not. The same reasoning applies here. If people don't understand how kana translates into pronuncation, it's not our job to teach them. This is the same kind of thinking which leads to using macrons.

Honestly I still think using 日本式ローマ字表記法 is the way to go, but at least danbooru's brand of Hepburn still assigns letter sequences to kana characters in a uniform way, independent of the context of an individual kana in whatever you're romanizing. Now you're suggesting that we stick hyphens in based on some criterion that isn't even reflected in the kana spelling of a word. That I cannot accept.

richie said:
In my opinion, writing such words as akaneiro together may be misleading because of the ei group which is not long e vowel here.

"ei" is a proper diphthong, though, not an extended "e", even if it ends up that way in the slur of casual everyday speech.

Akaneiro ni Somaru Saka:
JPN title: あかね色に染まる坂
alias akaneiro ni somarusakaakaneiro ni somaru saka
Reasoning: "saka" is the object of "somaru", rather than part of a compound, so it needs to stand by itself.

Aki no Urara no Akaneiro Shoutengai
JPN title: 秋のうららの~あかね色商店街~
alias aki no urarano akane iro shouten gaiaki no urara no akaneiro shoutengai
Ignore the tildes.
Reasoning: Unless the "no" is part of a kanji reading, it shouldn't be grouped in that way.

0xCCBA696 said:
No. That makes no sense. Do you want to write the English word "delineate" as "deline-ate" because the "ea" is not a diphthong? I assume not.

*shrug* this is not discussion about reforms of english spelling

The same reasoning applies here. If people don't understand how kana translates into pronuncation, it's not our job to teach them.

Really? Then why we are using all these apostrophes for single n already? "If people don't understand how to pronounce it correctly then it's their problem", right?

Honestly I still think using 日本式ローマ字表記法 is the way to go, but at least danbooru's brand of Hepburn still assigns letter sequences to kana characters in a uniform way, independent of the context of an individual kana in whatever you're romanizing.

Obviously not true. Now when we write single 'n' either we add or ommit apostrophe - depending of context of this kana.

I acknowledge that 黄色 is single lexical word, so I'm not too happy about writing it separately either.
Still, writing it simply together will make a situation when correct pronuncation without knowledge of context will be no longer possible. And if hyphen looks so terribly wrong, then we can use akane'iro. Makes no difference to me.

richie said:
Still, writing it simply together will make a situation when correct pronuncation without knowledge of context will be no longer possible.

Let me make sure I've got this straight: most words with えい (say, 映画 "eiga") pronounce it as a diphthong [ei], whereas in 茜色, ねい is pronounced as two vowels in hiatus. You're saying that because they're pronounced differently, they should have different romanizations.

First, is the bit I wrote about pronunciation true? I've never been good with phonetics in any language, least of all one I don't speak natively. Second, regardless of pronunciation, the distinction you're talking about isn't made in kana: the ねい in 佞姦 is written the same way as the ねい in 茜色, despite being pronounced differently.

glasnost said:
Let me make sure I've got this straight: most words with えい (say, 映画 "eiga") pronounce it as a diphthong [ei], whereas in 茜色, ねい is pronounced as two vowels in hiatus.

Yes.

You're saying that because they're pronounced differently, they should have different romanizations.

Yes.

First, is the bit I wrote about pronunciation true? I've never been good with phonetics in any language, least of all one I don't speak natively.

Yes, it is. Of course my personal expierience means nothing as argument, but even if you check dictionary, for example here: http://dictionary.goo.ne.jp/leaf/jn/18477/m0u/%E6%98%A0%E7%94%BB/ and here: http://dictionary.goo.ne.jp/leaf/jn/1405/m0u/%E8%8C%9C%E8%89%B2/

you can notice 映画 in kana as えいが
while 茜色 is spelled as あかね-いろ

...yes, this is japanese dictionary and there is small hyphen between akane and iro inside.

Second, regardless of pronunciation, the distinction you're talking about isn't made in kana: the ねい in 佞姦 is written the same way as the ねい in 茜色, despite being pronounced differently.

That's true, but... we *are* doing such distinctions already with romanization of ん kana.
Example:
なに -> nani (what)
なんい -> nan'i (difficulty)

That's fine and good, you say - we use different kanas, we write it differently. OK, but we also romanize:
おんな -> onna, not on'na
In other words - we romanize ん kana differently, depending of context of other kanas (sometimes with apostrophe, sometimes without). Why we don't add apostrophe after every n? It's obvious: because we're lazy and we add it only if it's really necessary.
Well, it's exactly the very same reason why we don't add apostrophe after every single kana - because it's absolutely unneccesary. But when we have vowels e and i in hiatus? While we write dipthongs as ei usually? I say we should stress such difference. Especially when we have precedent, and because even Japanese see the problem big enough to acknowledge it in their dictionaries.

Updated

We don't have strict spacing and punctuation rules because there isn't anywhere near the agreement there is on the bulk of out romanization rules. Romanization has something to fall back on (kana spelling), spacing has no such objective evidence as far as I've ever been able to determine. Neither does punctuation most of the time.

richie said:
you can notice 映画 in kana as えいが
while 茜色 is spelled as あかね-いろ

...yes, this is japanese dictionary and there is small hyphen between akane and iro inside.

Okay, that sells it. Given that the hyphen seems to be standard for pronunciation indication in all the online JP->JP dictionaries I checked, and given the precedent for a similar practice in romanization in the form of the web address Log linked, I don't see any reason not to include the hyphen in our own romanization scheme.

+1 akane-iro_ni_somaru_saka.

Hmm. OK, I'll buy that coming from goo 辞書. +1 to akane-iro ni somaru saka.

But richie, I gotta say, your "romanizing ん as n'" example is a special case. The reasoning for that has to do with reversibility of the romanization scheme. If we always romanized ん as "n", then なんい and なに in your example would be romanized the same way, despite being written differently in kana. That is unacceptable, which is why we romanize ん as "n'" in such contexts. You're right that the reason we romanize it differently (as "n") in all other contexts (such as in まなんで = manande) is simply laziness.

Technically it would probably be a more "regular" scheme if you just used "n'" everywhere for ん, and that is in fact how I tend to type in the IME. But think about it this way - the apostrophe is not necessarily part of the romanization of ん itself, but is rather something added to disambiguate parsing. "nani", will be chunked by a greedy algorithm as "na" + "ni"; the contentless "'" breaks up "ni" and makes the algorithm reanalyze "ni" as "n" + "i", also an acceptable romanization of something. But adding a hyphen in "akane-iro" doesn't disambiguate anything. The parser is still going to see "a" + "ka" + "ne" + "i" + "ro". So it serves no purpose in terms of the task of romanization of kana. And note that all of this has NOTHING to do with pronunciation whatsoever.

I suspect the reason goo 辞書 writes it as "あかね-いろ" is simply that it is not, in fact, fully lexicalized as Soljashy stated. Note that goo 辞書 writes 灰色 as "はいいろ", even though the いい there is not a long い but contains a hiatus as well. So while I now agree with your conclusion, I still think your justification for that conclusion is lacking.

0xCCBA696 said:
I suspect the reason goo 辞書 writes it as "あかね-いろ" is simply that it is not, in fact, fully lexicalized as Soljashy stated.

What are the qualifications for lexication anyway? Does it need to appear in Japanese dictionaries, but without a hyphen?

But richie, I gotta say, your "romanizing ん as n'" example is a special case. The reasoning for that has to do with reversibility of the romanization scheme. If we always romanized ん as "n", then なんい and なに in your example would be romanized the same way, despite being written differently in kana. That is unacceptable

devil's advocate: Why it's unacceptable?
If you answer as below: because we need apostrophe for good and unambiguous parsing, then I say two magic words: "zu" and "ji" (yes, I know how you love Hepburn :))
If you answer: because apostrophe shows us correct pronunciation, then for consistence sake we should distinguish typical "ei" dipthong with case when "e" and "i" are in hiatus.

(...)But adding a hyphen in "akane-iro" doesn't disambiguate anything. The parser is still going to see "a" + "ka" + "ne" + "i" + "ro". So it serves no purpose in terms of the task of romanization of kana. And note that all of this has NOTHING to do with pronunciation whatsoever.

Yes, of course all it's true if you assume that the whole point of romanization is only to help you with parsing text back to correct kana. I beg to differ in that point. Plus, we are already on Hepburn territory, which means we've already decided that unambiguous kana romanization has lower priority than unambiguous pronunciation of romanized text.

I suspect the reason goo 辞書 writes it as "あかね-いろ" is simply that it is not, in fact, fully lexicalized as Soljashy stated.

Maybe.

Note that goo 辞書 writes 灰色 as "はいいろ", even though the いい there is not a long い but contains a hiatus as well.

Well, there is still possibility that in case of differences between long vowel and double vowel noone cares anymore.
For me えい dipthong is clearly different with え and い in hiatus. But to discern いい from long い? I give up here.

I know very, very little Japanese, but I am also very interested in different languages' phonological systems, viewing them from a general linguistic point of view. Forgive me if I have missed something essential.

To me akaneiro seems like the natural spelling for at least two reasons:

  • Cases like nani ~ nan'i are more complicated. They are not only about pronunciation differences but also about avoiding transliteration bias. I prefer nan'i primarily because it protects from latinizing なんい > nani and then hiraganizing the same word erroneously back nani > なに. This is especially important when there is minimal or no context available, or when an absolute beginner wants to find out the proper hiragana spelling. (Okay, 0xCCBA696 and Richie already debated this.)
  • More importantly, the hyphen already has a completely different function: marking certain suffixes, most notably honorifics, on a semantic basis. That is why even akane'iro seems more logical than akane-iro -- unless the -iro type morphemes are always hyphenated on a semantic basis, regardless of hiatuses or other phone(ma)tic context.

However, if using the hyphen is an established and widely used practice, as Glasnost pointed out, I see no reason why Danbooru should prefer my logic to it.

Could someone please give a general rule of when this kind of hyphen use occurs? To mark hiatus on noun-suffix boundaries? Or between the components of compound words? Or both? Perhaps even between two consecutive suffixes? (I understand iro as an independent noun that is used more or less like a suffix to form color names.)

Updated

1 2