| Safe Haskell | Safe-Inferred |
|---|---|
| Language | Haskell2010 |
Text.Seonbi.Hanja
Description
This module deals with Chinese characters and Sino-Korean words written in hanja.
Synopsis
- data HanjaPhoneticization = HanjaPhoneticization {}
- def :: Default a => a
- phoneticizeHanja :: HanjaPhoneticization -> [HtmlEntity] -> [HtmlEntity]
- phoneticizeHanjaChar :: Char -> Char
- type HanjaDictionary = Trie Text
- type HanjaWordPhoneticizer = Text -> Text
- phoneticizeHanjaWord :: HanjaWordPhoneticizer
- phoneticizeHanjaWordWithInitialSoundLaw :: HanjaWordPhoneticizer
- withDictionary :: HanjaDictionary -> HanjaWordPhoneticizer -> HanjaWordPhoneticizer
- type HanjaWordRenderer = HtmlTagStack -> Text -> Text -> [HtmlEntity]
- hangulOnly :: HanjaWordRenderer
- hanjaInParentheses :: HanjaWordRenderer
- hanjaInRuby :: HanjaWordRenderer
- convertInitialSoundLaw :: Char -> Char
- initialSoundLawTable :: Map Char Char
- initialSoundLawTable' :: Map Char (Set Char)
- revertInitialSoundLaw :: Char -> Set Char
Korean mixed-script (國漢文混用) transformation
data HanjaPhoneticization Source #
Settings to transform Sino-Korean words written in hanja into hangul letters.
Constructors
| HanjaPhoneticization | |
Fields
| |
Instances
| Default HanjaPhoneticization Source # | |
Defined in Text.Seonbi.Hanja Methods | |
Arguments
| :: HanjaPhoneticization | Configures the phoneticization details. |
| -> [HtmlEntity] | HTML entities (that may contain some hanja words) to phoneticize all hanja words into corresponding hangul-only words. |
| -> [HtmlEntity] | HTML entities that have no hanja words but hangul-only words instead. |
Transforms hanja words in the given HTML entities into corresponding hangul words.
Single character phoneticization
phoneticizeHanjaChar :: Char -> Char Source #
Reads a hanja character as a hangul character.
>>>phoneticizeHanjaChar '漢''한'
Note that it does not follow Initial Sound Law (頭音法則):
>>>phoneticizeHanjaChar '六''륙'
Word phoneticization
type HanjaDictionary = Trie Text Source #
Represents a dictionary that has hanja keys and values of their
corresponding hangul readings, e.g., [("敗北", "패배")].
type HanjaWordPhoneticizer Source #
Arguments
| = Text | A Sino-Korean (i.e., hanja) word (漢字語) to phoneticize. |
| -> Text | Hangul letters that phoneticize the given Sino-Korean word. |
A function to phoneticize a Sino-Korean (i.e., hanja) word (漢字語)
into hangul letters.
See also phoneticizeHanjaWord, phoneticizeHanjaWordWithInitialSoundLaw,
and withDictionary.
phoneticizeHanjaWord :: HanjaWordPhoneticizer Source #
Reads a hanja word and returns a corresponding hangul word.
>>>:set -XOverloadedStrings>>>phoneticizeHanjaWord "漢字""한자"
Note that it does not apply Initial Sound Law (頭音法則):
>>>phoneticizeHanjaWord "來日""래일"
phoneticizeHanjaWordWithInitialSoundLaw :: HanjaWordPhoneticizer Source #
It is like phoneticizeHanjaWord, but it also applies
Initial Sound Law (頭音法則).
>>>:set -XOverloadedStrings>>>phoneticizeHanjaWordWithInitialSoundLaw "來日""내일">>>phoneticizeHanjaWordWithInitialSoundLaw "未來""미래"
Arguments
| :: HanjaDictionary | Hangul readings of Sino-Korean words. |
| -> HanjaWordPhoneticizer | A fallback phoneticize for unregistered words.
E.g., |
| -> HanjaWordPhoneticizer | A combined phoneticizer. |
Reads a hanja word according to the given dictionary, or falls back to the other phoneticizer if there is no such word in the dictionary.
It's basically replace one with one:
>>>:set -XOverloadedLists -XOverloadedStrings>>>let phone = withDictionary [("自轉車", "자전거")] phoneticizeHanjaWord>>>phone "自轉車""자전거"
But, if it faces any words or characters that are not registered in the dictionary, it does the best to interpolate prefixes/infixes/suffixes using the fallback phoneticizer:
>>>phone "自轉車道路""자전거도로">>>phone "二輪自轉車""이륜자전거"
Word rendering
type HanjaWordRenderer Source #
Arguments
| = HtmlTagStack | Where rendered HTML entities get interleaved into. |
| -> Text | A Sino-Korean (i.e., hanja) word (漢字語) to render. |
| -> Text | Hangul letters that phoneticized the Sino-Korean word. |
| -> [HtmlEntity] | Rendered HTML entities. |
A function to render a Sino-Korean (i.e., hanja) word (漢字語).
Choose one in hangulOnly, hanjaInParentheses, and hanjaInRuby.
hangulOnly :: HanjaWordRenderer Source #
Renders a word in hangul-only, no hanja at all (e.g., 안녕히).
hanjaInParentheses :: HanjaWordRenderer Source #
Renders a word in hangul followed by hanja in parentheses
(e.g., 안녕(安寧)히).
hanjaInRuby :: HanjaWordRenderer Source #
Renders a word in ruby tag (e.g.,
<ruby>安寧<rp>(</rp><rt>안녕</rt><rp>)</rp></ruby>히).
Please read Use Cases & Exploratory Approaches for Ruby Markup as well for more information.
Initial sound law (頭音法則)
convertInitialSoundLaw :: Char -> Char Source #
Converts a hangul character according to Initial Sound Law (頭音法則).
>>>convertInitialSoundLaw '념''염'
If an input is not a hangul syllable or a syllable is not applicable to the law it returns the given input without change:
>>>convertInitialSoundLaw 'A''A'>>>convertInitialSoundLaw '가''가'
initialSoundLawTable :: Map Char Char Source #
The Initial Sound Law (頭音法則) table according to South Korean Hangul Orthography (한글 맞춤법) Clause 5, Section 52, Chapter 6 (第6章52項5節). Keys are an original Sino-Korean sound and values are a converted sound according to the law.
initialSoundLawTable' :: Map Char (Set Char) Source #
Contains the same contents to initialSoundLawTable except that
keys and values are crossed: keys are a converted sound and values are
possible original sounds.
revertInitialSoundLaw :: Char -> Set Char Source #
It's a kind of inverse function of convertInitialSoundLaw,
except it returns a set of candidates instead of a single canonical answer
because Initial Sound Law (頭音法則) is not a bijective function.
>>>revertInitialSoundLaw '예'fromList "례">>>revertInitialSoundLaw '염'fromList "념렴"
It returns an empty set if an input is not applicable to the law:
>>>revertInitialSoundLaw '가'fromList ""