Safe Haskell | Safe-Inferred |
---|---|
Language | Haskell2010 |
This module deals with Chinese characters and Sino-Korean words written in hanja.
Synopsis
- data HanjaPhoneticization = HanjaPhoneticization {}
- def :: Default a => a
- phoneticizeHanja :: HanjaPhoneticization -> [HtmlEntity] -> [HtmlEntity]
- phoneticizeHanjaChar :: Char -> Char
- type HanjaDictionary = Trie Text
- type HanjaWordPhoneticizer = Text -> Text
- phoneticizeHanjaWord :: HanjaWordPhoneticizer
- phoneticizeHanjaWordWithInitialSoundLaw :: HanjaWordPhoneticizer
- withDictionary :: HanjaDictionary -> HanjaWordPhoneticizer -> HanjaWordPhoneticizer
- type HanjaWordRenderer = HtmlTagStack -> Text -> Text -> [HtmlEntity]
- hangulOnly :: HanjaWordRenderer
- hanjaInParentheses :: HanjaWordRenderer
- hanjaInRuby :: HanjaWordRenderer
- convertInitialSoundLaw :: Char -> Char
- initialSoundLawTable :: Map Char Char
- initialSoundLawTable' :: Map Char (Set Char)
- revertInitialSoundLaw :: Char -> Set Char
Korean mixed-script (國漢文混用) transformation
data HanjaPhoneticization Source #
Settings to transform Sino-Korean words written in hanja into hangul letters.
HanjaPhoneticization | |
|
Instances
Default HanjaPhoneticization Source # | |
Defined in Text.Seonbi.Hanja |
:: HanjaPhoneticization | Configures the phoneticization details. |
-> [HtmlEntity] | HTML entities (that may contain some hanja words) to phoneticize all hanja words into corresponding hangul-only words. |
-> [HtmlEntity] | HTML entities that have no hanja words but hangul-only words instead. |
Transforms hanja words in the given HTML entities into corresponding hangul words.
Single character phoneticization
phoneticizeHanjaChar :: Char -> Char Source #
Reads a hanja character as a hangul character.
>>>
phoneticizeHanjaChar '漢'
'한'
Note that it does not follow Initial Sound Law (頭音法則):
>>>
phoneticizeHanjaChar '六'
'륙'
Word phoneticization
type HanjaDictionary = Trie Text Source #
Represents a dictionary that has hanja keys and values of their
corresponding hangul readings, e.g., [("敗北", "패배")]
.
type HanjaWordPhoneticizer Source #
= Text | A Sino-Korean (i.e., hanja) word (漢字語) to phoneticize. |
-> Text | Hangul letters that phoneticize the given Sino-Korean word. |
A function to phoneticize a Sino-Korean (i.e., hanja) word (漢字語)
into hangul letters.
See also phoneticizeHanjaWord
, phoneticizeHanjaWordWithInitialSoundLaw
,
and withDictionary
.
phoneticizeHanjaWord :: HanjaWordPhoneticizer Source #
Reads a hanja word and returns a corresponding hangul word.
>>>
:set -XOverloadedStrings
>>>
phoneticizeHanjaWord "漢字"
"한자"
Note that it does not apply Initial Sound Law (頭音法則):
>>>
phoneticizeHanjaWord "來日"
"래일"
phoneticizeHanjaWordWithInitialSoundLaw :: HanjaWordPhoneticizer Source #
It is like phoneticizeHanjaWord
, but it also applies
Initial Sound Law (頭音法則).
>>>
:set -XOverloadedStrings
>>>
phoneticizeHanjaWordWithInitialSoundLaw "來日"
"내일">>>
phoneticizeHanjaWordWithInitialSoundLaw "未來"
"미래"
:: HanjaDictionary | Hangul readings of Sino-Korean words. |
-> HanjaWordPhoneticizer | A fallback phoneticize for unregistered words.
E.g., |
-> HanjaWordPhoneticizer | A combined phoneticizer. |
Reads a hanja word according to the given dictionary, or falls back to the other phoneticizer if there is no such word in the dictionary.
It's basically replace one with one:
>>>
:set -XOverloadedLists -XOverloadedStrings
>>>
let phone = withDictionary [("自轉車", "자전거")] phoneticizeHanjaWord
>>>
phone "自轉車"
"자전거"
But, if it faces any words or characters that are not registered in the dictionary, it does the best to interpolate prefixes/infixes/suffixes using the fallback phoneticizer:
>>>
phone "自轉車道路"
"자전거도로">>>
phone "二輪自轉車"
"이륜자전거"
Word rendering
type HanjaWordRenderer Source #
= HtmlTagStack | Where rendered HTML entities get interleaved into. |
-> Text | A Sino-Korean (i.e., hanja) word (漢字語) to render. |
-> Text | Hangul letters that phoneticized the Sino-Korean word. |
-> [HtmlEntity] | Rendered HTML entities. |
A function to render a Sino-Korean (i.e., hanja) word (漢字語).
Choose one in hangulOnly
, hanjaInParentheses
, and hanjaInRuby
.
hangulOnly :: HanjaWordRenderer Source #
Renders a word in hangul-only, no hanja at all (e.g., 안녕히
).
hanjaInParentheses :: HanjaWordRenderer Source #
Renders a word in hangul followed by hanja in parentheses
(e.g., 안녕(安寧)히
).
hanjaInRuby :: HanjaWordRenderer Source #
Renders a word in ruby
tag (e.g.,
<ruby>安寧<rp>(</rp><rt>안녕</rt><rp>)</rp></ruby>히
).
Please read Use Cases & Exploratory Approaches for Ruby Markup as well for more information.
Initial sound law (頭音法則)
convertInitialSoundLaw :: Char -> Char Source #
Converts a hangul character according to Initial Sound Law (頭音法則).
>>>
convertInitialSoundLaw '념'
'염'
If an input is not a hangul syllable or a syllable is not applicable to the law it returns the given input without change:
>>>
convertInitialSoundLaw 'A'
'A'>>>
convertInitialSoundLaw '가'
'가'
initialSoundLawTable :: Map Char Char Source #
The Initial Sound Law (頭音法則) table according to South Korean Hangul Orthography (한글 맞춤법) Clause 5, Section 52, Chapter 6 (第6章52項5節). Keys are an original Sino-Korean sound and values are a converted sound according to the law.
initialSoundLawTable' :: Map Char (Set Char) Source #
Contains the same contents to initialSoundLawTable
except that
keys and values are crossed: keys are a converted sound and values are
possible original sounds.
revertInitialSoundLaw :: Char -> Set Char Source #
It's a kind of inverse function of convertInitialSoundLaw
,
except it returns a set of candidates instead of a single canonical answer
because Initial Sound Law (頭音法則) is not a bijective function.
>>>
revertInitialSoundLaw '예'
fromList "례">>>
revertInitialSoundLaw '염'
fromList "념렴"
It returns an empty set if an input is not applicable to the law:
>>>
revertInitialSoundLaw '가'
fromList ""