I would like to make a similar extension to translate Cyrillic into the official (i.e. scientific) transliteration. Unfortunately I do not have the necessary programming experience and skills. I can provide the transliteration table. If anyone has the time and the inclination…
For example: Александр Сергеевич Пушкин → Aleksandr Sergeevič Puškin
That should be straightforward if it a simple one ot one character mapping. Feel free to post the transliteration table here.
What would you suggest as an icon for such an extension?
That sounds great! As an icon I would suggest Я-A (which are both the last and first characters of the Cyrillic alphabet and suggests the transliteration from Cyrillic into Latin characters).
How should the transliteration table be formatted? I could make a table in Excel or Numbers with the mapping. Or is a .txt file better? Do you need additional information such as the HTML-code: č or Unicode HEX: U+010D for the č? And what to do with the capitals? Three Cyrillic characters have a transliteration into two Latin characters: Щ/щ → Šč/šč, Я/я → JA/Ja and Ю/ю → JU/Ju. This creates another problem: if for example the Щ is used in a fully capitalised word then the transliteration will be ŠČ. If, however, it is only the first character of a word that is a capital the transliteration would be Šč. For example the name Щедрин will be Ščedrin or, when capitalised, ЩЕДРИН, ŠČEDRIN. Hope this is clear.
If you let me know, then it will be a pleasure to create the table…
A spreadsheet — either Numbers or Google Sheets would be fine for me. You can simply include the Unicode characters directly in the table cells.
If there are any more special cases like Щ then describe these separately.
Thanks! Already working on it.
This reference to ISO 9 is misleading. I am a Slavist (educated at the universities of Ghent, Moscow and Saint Petersburg) and have been a lecturer of Russian for many years at various Belgian and Dutch universities. I also taught Dutch in Russia and lived in Petersburg for about ten years where I worked as director of the Netherlands Scientific and Cultural Institute. The ISO 9 standard has never, at any time, become established in academia. Only in the early 1990s was the system briefly used by a few, mainly American Slavists. the only system used in the vast majority of scientific publications is this: Scientific transliteration of Cyrillic - Wikipedia. See also: Romanization of Russian - Wikipedia.
Hi Nick, I hereby provide you with the transliteration table. In addition to the characters themselves, I have also provided the HTML code and the Unicode Hexadecimal code. Is this workable?
The following applies to the ‘two charachter’ rules: when a word starts with a capital letter, but the rest is written in lowercase letters, the first transliteration character is capitalized and the second is lowercase. However, if the entire word is in capital letters, the second character is of course also in capital letter. The compound transliteration characters are х, щ, ю, and я with ch, šč, ju, and ja transliterating respectively. I hope this is clear.
Transliteration table.numbers.zip (760.3 KB)
Thank you! What is to be done if a single ‘compound’ character is encountered, and it is not as part of a word so has no character following it?
That’s an interesting question. Do you mean when a compound character is used separately, eg in an abbreviation? I can’t immediately imagine other cases. Or in so called autonymous use, which actually only occurs in linguistic textbooks or linguistic studies? Or do you mean something else?
Anyway, I think that then only the case of the original Cyrillic character should be considered. If that is a capital letter, then both parts of the compound character are also capitalized, if it is a lowercase letter, then both parts of the compound character are also displayed as lowercase. E.g. the letter Щ → ŠČ (upper case), щ → šč (lower case).
Well this sort of thing is quite a fun task to put together. Here is my first attempt at the extension: CyrillicTransliteration.popclipextz (source)
Have a go and see how it goes
Here you can see the form into which I transformed the table:
{
"upper": {
"А": "A",
"Б": "B",
"В": "V",
"Г": "G",
"Д": "D",
"Е": "E",
"Ж": "Ž",
"З": "Z",
"И": "I",
"Й": "J",
"К": "K",
"Л": "L",
"М": "M",
"Н": "N",
"О": "O",
"П": "P",
"Р": "R",
"С": "S",
"Т": "T",
"У": "U",
"Ф": "F",
"Х": ["CH", "Ch"],
"Ц": "C",
"Ч": "Č",
"Ш": "Š",
"Щ": ["ŠČ", "Šč"],
"Ъ": "\"\"",
"Ы": "Y",
"Ь": "'",
"Э": "Ė",
"Ю": ["JU", "Ju"],
"Я": ["JA", "Ja"]
},
"lower": {
"а": "a",
"б": "b",
"в": "v",
"г": "g",
"д": "d",
"е": "e",
"ж": "ž",
"з": "z",
"и": "i",
"й": "j",
"к": "k",
"л": "l",
"м": "m",
"н": "n",
"о": "o",
"п": "p",
"р": "r",
"с": "s",
"т": "t",
"у": "u",
"ф": "f",
"х": "x",
"ц": "c",
"ч": "č",
"ш": "š",
"щ": "šč",
"ъ": "\"\"",
"ы": "y",
"ь": "'",
"э": "ė",
"ю": "ju",
"я": "ja"
}
}
and here was some test output:
Александр Сергеевич Пушкин => Aleksandr Sergeevič Puškin (expected: Aleksandr Sergeevič Puškin) - OK
Щ => ŠČ (expected: ŠČ) - OK
Щедрин => Ščedrin (expected: Ščedrin) - OK
ЩЕДРИН => ŠČEDRIN (expected: ŠČEDRIN) - OK
Щедрин ЩЕДРИН => Ščedrin ŠČEDRIN (expected: Ščedrin ŠČEDRIN) - OK
Dear Nick,
The extension works flawlessly! I’ve tested it on large fragments of text and the speed is amazing. You can’t imagine how much time this saves. I will gladly recommend PopClip (and this extension) to all fellow Slavists (who work with Mac OS). Is it necessary for the text to appear as a pop-up first? This seems redundant to me. The transliteration may simply replace the original text immediately. Really fantastic! When I think how 35 years ago I wrote complicated macros in WordPerfect to achieve the same result… I’m getting old… Thanks a lot!
Great, I’m glad to hear it will be useful!
I’ve updated it to paste directly, instead of showing the preview pop-up. You’ll just have to re-download it (from the same link above).
Also you can make it copy, instead of paste, by holding the shift key.
I am wondering, if I were to publish this to the official extensions list, how I should describe this particular transliteration scheme (since I understand there may be other possible schemes?). Does it have a particular name or description?
Slavists (and other linguists) usually refer to this as the scientific transliteration that is common in scientific publications. Normative here is certainly the journal Russian Linguistics, a peer-reviewed journal devoted to the empirical and theoretical study of Russian and other Slavic languages in all their diversity. This transliteration is used in this journal published by Springer. It might also be an idea to make some other (much less used) transliteration schemes available as extensions. Now that this schedule is here, this seems like a piece of cake to me. Maybe as an option in one and the same extension?
It might also be an idea to make some other (much less used) transliteration schemes available as extensions.
If you are able to provide it formatted as text, the way as I posted above (this is called JSON format) then it would be relatively trivial for me to add other schemes to the same extension. However if there were any other special rules that needed to be incorporated (other then the capitalization rule for two-character pairs already established) then it would be more difficult.
I’ll look into this over the weekend, is that okay for you? I have to teach Thursday and Friday and tomorrow I will be cycling all day. I think that the basic principle is the same for the different standards and that only the table itself will change per standard. Thank you for following this up so well!
Yes of course, enjoy your time — I’ve plenty else to do
2 posts were split to a new topic: Typed keys alphabet switcher