Safe Haskell	Safe-Inferred
Language	Haskell2010

Text.Seonbi.Hanja

Contents

Korean mixed-script (國漢文混用) transformation
Single character phoneticization
Word phoneticization
Word rendering
Initial sound law (頭音法則)

Description

This module deals with Chinese characters and Sino-Korean words written in hanja.

Synopsis

data HanjaPhoneticization = HanjaPhoneticization {
}
def :: Default a => a
phoneticizeHanja :: HanjaPhoneticization -> [HtmlEntity] -> [HtmlEntity]
phoneticizeHanjaChar :: Char -> Char
type HanjaDictionary = Trie Text
type HanjaWordPhoneticizer = Text -> Text
phoneticizeHanjaWord :: HanjaWordPhoneticizer
phoneticizeHanjaWordWithInitialSoundLaw :: HanjaWordPhoneticizer
withDictionary :: HanjaDictionary -> HanjaWordPhoneticizer -> HanjaWordPhoneticizer
type HanjaWordRenderer = HtmlTagStack -> Text -> Text -> [HtmlEntity]
hangulOnly :: HanjaWordRenderer
hanjaInParentheses :: HanjaWordRenderer
hanjaInRuby :: HanjaWordRenderer
convertInitialSoundLaw :: Char -> Char
initialSoundLawTable :: Map Char Char
initialSoundLawTable' :: Map Char (Set Char)
revertInitialSoundLaw :: Char -> Set Char

Korean mixed-script (國漢文混用) transformation

data HanjaPhoneticization Source #

Settings to transform Sino-Korean words written in hanja into hangul letters.

Constructors

HanjaPhoneticization

Fields

phoneticizer :: HanjaWordPhoneticizer
A function to phoneticize a hanja word. Use phoneticizeHanjaWordWithInitialSoundLaw for South Korean orthography, or phoneticizeHanjaWord for North Korean orthography.
wordRenderer :: HanjaWordRenderer
A function to render a hanja word. See also HanjaWordRenderer.
homophoneRenderer :: HanjaWordRenderer
A function to render a hanja word which should be disambiguated. It's used instead of wordRenderer when two or more words in a text have the same hangul reading but actually are dictinct each other in hanja characters, e.g., 小數/素數 (소수).
debugComment :: Bool
Whether to insert some HTML comments that contain useful information for debugging into the result. This does not affect the rendering of the result HTML, but only the HTML code.

Instances

Instances details

Default HanjaPhoneticization Source #
Instance details Defined in Text.Seonbi.Hanja Methods def :: HanjaPhoneticization #

def :: Default a => a #

phoneticizeHanja Source #

Arguments

:: HanjaPhoneticization	Configures the phoneticization details.
-> [HtmlEntity]	HTML entities (that may contain some hanja words) to phoneticize all hanja words into corresponding hangul-only words.
-> [HtmlEntity]	HTML entities that have no hanja words but hangul-only words instead.

Transforms hanja words in the given HTML entities into corresponding hangul words.

Single character phoneticization

phoneticizeHanjaChar :: Char -> Char Source #

Reads a hanja character as a hangul character.

>>> phoneticizeHanjaChar '漢'
'한'

Note that it does not follow Initial Sound Law (頭音法則):

>>> phoneticizeHanjaChar '六'
'륙'

Word phoneticization

type HanjaDictionary = Trie Text Source #

Represents a dictionary that has hanja keys and values of their corresponding hangul readings, e.g., [("敗北", "패배")].

type HanjaWordPhoneticizer Source #

Arguments

= Text	A Sino-Korean (i.e., hanja) word (漢字語) to phoneticize.
-> Text	Hangul letters that phoneticize the given Sino-Korean word.

A function to phoneticize a Sino-Korean (i.e., hanja) word (漢字語) into hangul letters. See also phoneticizeHanjaWord, phoneticizeHanjaWordWithInitialSoundLaw, and withDictionary.

phoneticizeHanjaWord :: HanjaWordPhoneticizer Source #

Reads a hanja word and returns a corresponding hangul word.

>>> :set -XOverloadedStrings
>>> phoneticizeHanjaWord "漢字"
"한자"

Note that it does not apply Initial Sound Law (頭音法則):

>>> phoneticizeHanjaWord  "來日"
"래일"

phoneticizeHanjaWordWithInitialSoundLaw :: HanjaWordPhoneticizer Source #

It is like phoneticizeHanjaWord, but it also applies Initial Sound Law (頭音法則).

>>> :set -XOverloadedStrings
>>> phoneticizeHanjaWordWithInitialSoundLaw  "來日"
"내일"
>>> phoneticizeHanjaWordWithInitialSoundLaw  "未來"
"미래"

withDictionary Source #

Arguments

:: HanjaDictionary	Hangul readings of Sino-Korean words.
-> HanjaWordPhoneticizer	A fallback phoneticize for unregistered words. E.g., `phoneticizeHanjaWordWithInitialSoundLaw`.
-> HanjaWordPhoneticizer	A combined phoneticizer.

Reads a hanja word according to the given dictionary, or falls back to the other phoneticizer if there is no such word in the dictionary.

It's basically replace one with one:

>>> :set -XOverloadedLists -XOverloadedStrings
>>> let phone = withDictionary [("自轉車", "자전거")] phoneticizeHanjaWord
>>> phone "自轉車"
"자전거"

But, if it faces any words or characters that are not registered in the dictionary, it does the best to interpolate prefixes/infixes/suffixes using the fallback phoneticizer:

>>> phone "自轉車道路"
"자전거도로"
>>> phone "二輪自轉車"
"이륜자전거"

Word rendering

type HanjaWordRenderer Source #

Arguments

= HtmlTagStack	Where rendered HTML entities get interleaved into.
-> Text	A Sino-Korean (i.e., hanja) word (漢字語) to render.
-> Text	Hangul letters that phoneticized the Sino-Korean word.
-> [HtmlEntity]	Rendered HTML entities.

A function to render a Sino-Korean (i.e., hanja) word (漢字語). Choose one in hangulOnly, hanjaInParentheses, and hanjaInRuby.

hangulOnly :: HanjaWordRenderer Source #

Renders a word in hangul-only, no hanja at all (e.g., 안녕히).

hanjaInParentheses :: HanjaWordRenderer Source #

Renders a word in hangul followed by hanja in parentheses (e.g., 안녕(安寧)히).

hanjaInRuby :: HanjaWordRenderer Source #

Renders a word in ruby tag (e.g., <ruby>安寧<rp>(</rp><rt>안녕</rt><rp>)</rp></ruby>히).

Please read Use Cases & Exploratory Approaches for Ruby Markup as well for more information.

Initial sound law (頭音法則)

convertInitialSoundLaw :: Char -> Char Source #

Converts a hangul character according to Initial Sound Law (頭音法則).

>>> convertInitialSoundLaw '념'
'염'

If an input is not a hangul syllable or a syllable is not applicable to the law it returns the given input without change:

>>> convertInitialSoundLaw 'A'
'A'
>>> convertInitialSoundLaw '가'
'가'

initialSoundLawTable :: Map Char Char Source #

The Initial Sound Law (頭音法則) table according to South Korean Hangul Orthography (한글 맞춤법) Clause 5, Section 52, Chapter 6 (第6章52項5節). Keys are an original Sino-Korean sound and values are a converted sound according to the law.

initialSoundLawTable' :: Map Char (Set Char) Source #

Contains the same contents to initialSoundLawTable except that keys and values are crossed: keys are a converted sound and values are possible original sounds.

revertInitialSoundLaw :: Char -> Set Char Source #

It's a kind of inverse function of convertInitialSoundLaw, except it returns a set of candidates instead of a single canonical answer because Initial Sound Law (頭音法則) is not a bijective function.

>>> revertInitialSoundLaw '예'
fromList "례"
>>> revertInitialSoundLaw '염'
fromList "념렴"

It returns an empty set if an input is not applicable to the law:

>>> revertInitialSoundLaw '가'
fromList ""