Kajin M̧ajeļ
Austronesian / Malayo-Polynesian / Eastern Malayo-Polynesian / Oceanic / Micronesian / Central Micronesian / Western Micronesian / Marshallese /
have only done a brief overview of surface level research. corrections and comments more than welcome, direct to atmatmmachine.nz.
notes on character names
Kajin M̧ajeļ
University of Hawaiʻi Press
vowels |
minuscule |
a |
e |
i |
o |
u |
|
|---|---|---|---|---|---|---|---|
majuscule |
A |
E |
I |
O |
U |
||
minuscule |
ā |
o̧ |
ō |
ū |
|||
majuscule |
Ā |
O̧ |
Ō |
Ū |
|||
consonants |
minuscule |
b |
d |
j |
k |
l |
ļ |
majuscule |
B |
D |
J |
K |
L |
Ļ |
|
minuscule |
m |
m̧ |
n |
ņ |
n̄ |
p |
|
majuscule |
M |
M̧ |
N |
Ņ |
N̄ |
P |
|
minuscule |
r |
t |
w |
||||
majuscule |
R |
T |
W |
||||
minuscule |
ọ |
ḷ |
ṃ |
ṇ |
ñ |
||
majuscule |
Ọ |
Ḷ |
Ṃ |
Ṇ |
Ñ |
||
this orthography has been adopted by the Nitijeļā as the official orthography of Kajin M̧ajeļ. this is the ‘new’ orthography.
cedillas and macrons are used.
note about presentation: if using [u+013b]/[u+013c] rather than [u+004c]/[u+006c] and [u+0327], care should be taken as this is more likely to displayed as the Latvian which has a presentation more akin to a comma below than cedilla which is not the correct presentation form in Kajin M̧ajeļ. even when using the latter combination some fonts still present the Latvian form with the majuscule. the specifics of this will vary font to font but basically any Latvian letters with cedillas are likely to be rendered incorrectly, so Ļ ļ, and Ņ ņ should have careful attention paid to their presentation.
Kajin Majōl
various missionaries
this is the orthography introduced by missionaries which is phonetically inconsistent. this is the ‘old’ orthography.
MOD Marshallese
internet usage
the usage of precomposed characters has been adopted by the Marshallese Online Dictionary, there are no precomposed characters for o̧, ļ, m̧, ņ, and n̄ so these are replaced with Ọ [u+1ecc] ọ [u+1ecd], Ḷ [u+1e36] ḷ [u+1e37], Ṃ [u+1e42] ṃ [u+1e43], Ṇ [u+1e46] ṇ [u+1e47], and Ñ [u+00d1] ñ [u+00f1] respectively. no precomposed characters are used in documenting these precomposed characters as per the directory-wide ruleset on diacritics and the irony is acknowledged.
cedillas are entirely replaced with dots below and the macron on the N/n is replaced with a tilde above.
note that anyone seeking to create a M̧ajeļ corpus would do well to standardise the unicode characters from different sources as despite this being the internet orthography, there are still notable exceptions to this set of characters, this is likely to further devolve as proper characters for Kajin M̧ajeļ are introduced into unicode.
apparently some fonts designed for Kajin M̧ajeļ in a non-unicode compliant manner utilise other latin letters not in use and retexture them to present as the glyphs for which there are no precomposed characters for. have not been able to find an example of this, but would love to obtain one.
dialects
Rālik
western Marshallese
haven’t found any variations in orthography at this stage.
Ratak
eastern Marshallese
ditto.
Byron W. Bender
1968, 1976, and included alongside MOD spelling in the MOD
Bender developed a morphophonemic orthography which was used in his 1969 Spoken Marshallese book. in 1976 Bender adapted the orthography and this modified version appeared alongside the new orthography in the MED. when the MED was digitised by Stephen Trussel, Bender made some adaptations so that the orthography was suitable for unicode.
will come back and make tables for these when the exercise seems worthwhile.
other notes
before the inclusion of the macron into unicode in 1993, macrons were encoded as a diaeresis, they may appear visually as a macron or not depending on the font used.
data
precomposed characters are used in this section when available.
divided alphabetisation (as per MOD):
A a Ā ā B b D d E e I i J j K k L l Ļ ļ M m M̧ m̧ N n Ņ ņ N̄ n̄ O o O̧ o̧ Ō ō P p R r T t U u Ū ū W w
remarks: each letter is sorted uniquely in the divided alphabetisation, however, the unified alphabetisation which is easier to use treats diacritics only within the same column: e.g., oo precedes ōō precedes o̧o and none of these precede oran̄e.
regex:
vowels: (?'mah_vowels'A|a|Ā|ā|E|e|I|i|O|o|O̧|o̧|Ō|ō|U|u|Ŭ|ū)
consonants: (?'mah_consonants'B|b|D|d|J|j|K|k|L|l|M|m|M̧|m̧|N|n|Ņ|ņ|N̄|n̄|P|p|R|r|T|t|W|w)