seonbi-0.5.0: SmartyPants for Korean language
Safe HaskellSafe-Inferred
LanguageHaskell2010

Text.Seonbi.Hanja

Description

This module deals with Chinese characters and Sino-Korean words written in hanja.

Synopsis

Korean mixed-script (國漢文混用) transformation

data HanjaPhoneticization Source #

Settings to transform Sino-Korean words written in hanja into hangul letters.

Constructors

HanjaPhoneticization 

Fields

Instances

Instances details
Default HanjaPhoneticization Source # 
Instance details

Defined in Text.Seonbi.Hanja

def :: Default a => a #

phoneticizeHanja Source #

Arguments

:: HanjaPhoneticization

Configures the phoneticization details.

-> [HtmlEntity]

HTML entities (that may contain some hanja words) to phoneticize all hanja words into corresponding hangul-only words.

-> [HtmlEntity]

HTML entities that have no hanja words but hangul-only words instead.

Transforms hanja words in the given HTML entities into corresponding hangul words.

Single character phoneticization

phoneticizeHanjaChar :: Char -> Char Source #

Reads a hanja character as a hangul character.

>>> phoneticizeHanjaChar '漢'
'한'

Note that it does not follow Initial Sound Law (頭音法則):

>>> phoneticizeHanjaChar '六'
'륙'

Word phoneticization

type HanjaDictionary = Trie Text Source #

Represents a dictionary that has hanja keys and values of their corresponding hangul readings, e.g., [("敗北", "패배")].

type HanjaWordPhoneticizer Source #

Arguments

 = Text

A Sino-Korean (i.e., hanja) word (漢字語) to phoneticize.

-> Text

Hangul letters that phoneticize the given Sino-Korean word.

A function to phoneticize a Sino-Korean (i.e., hanja) word (漢字語) into hangul letters. See also phoneticizeHanjaWord, phoneticizeHanjaWordWithInitialSoundLaw, and withDictionary.

phoneticizeHanjaWord :: HanjaWordPhoneticizer Source #

Reads a hanja word and returns a corresponding hangul word.

>>> :set -XOverloadedStrings
>>> phoneticizeHanjaWord "漢字"
"한자"

Note that it does not apply Initial Sound Law (頭音法則):

>>> phoneticizeHanjaWord  "來日"
"래일"

phoneticizeHanjaWordWithInitialSoundLaw :: HanjaWordPhoneticizer Source #

It is like phoneticizeHanjaWord, but it also applies Initial Sound Law (頭音法則).

>>> :set -XOverloadedStrings
>>> phoneticizeHanjaWordWithInitialSoundLaw  "來日"
"내일"
>>> phoneticizeHanjaWordWithInitialSoundLaw  "未來"
"미래"

withDictionary Source #

Arguments

:: HanjaDictionary

Hangul readings of Sino-Korean words.

-> HanjaWordPhoneticizer

A fallback phoneticize for unregistered words. E.g., phoneticizeHanjaWordWithInitialSoundLaw.

-> HanjaWordPhoneticizer

A combined phoneticizer.

Reads a hanja word according to the given dictionary, or falls back to the other phoneticizer if there is no such word in the dictionary.

It's basically replace one with one:

>>> :set -XOverloadedLists -XOverloadedStrings
>>> let phone = withDictionary [("自轉車", "자전거")] phoneticizeHanjaWord
>>> phone "自轉車"
"자전거"

But, if it faces any words or characters that are not registered in the dictionary, it does the best to interpolate prefixes/infixes/suffixes using the fallback phoneticizer:

>>> phone "自轉車道路"
"자전거도로"
>>> phone "二輪自轉車"
"이륜자전거"

Word rendering

type HanjaWordRenderer Source #

Arguments

 = HtmlTagStack

Where rendered HTML entities get interleaved into.

-> Text

A Sino-Korean (i.e., hanja) word (漢字語) to render.

-> Text

Hangul letters that phoneticized the Sino-Korean word.

-> [HtmlEntity]

Rendered HTML entities.

A function to render a Sino-Korean (i.e., hanja) word (漢字語). Choose one in hangulOnly, hanjaInParentheses, and hanjaInRuby.

hangulOnly :: HanjaWordRenderer Source #

Renders a word in hangul-only, no hanja at all (e.g., 안녕히).

hanjaInParentheses :: HanjaWordRenderer Source #

Renders a word in hangul followed by hanja in parentheses (e.g., 안녕(安寧)히).

hanjaInRuby :: HanjaWordRenderer Source #

Renders a word in ruby tag (e.g., <ruby>安寧<rp>(</rp><rt>안녕</rt><rp>)</rp></ruby>히).

Please read Use Cases & Exploratory Approaches for Ruby Markup as well for more information.

Initial sound law (頭音法則)

convertInitialSoundLaw :: Char -> Char Source #

Converts a hangul character according to Initial Sound Law (頭音法則).

>>> convertInitialSoundLaw '념'
'염'

If an input is not a hangul syllable or a syllable is not applicable to the law it returns the given input without change:

>>> convertInitialSoundLaw 'A'
'A'
>>> convertInitialSoundLaw '가'
'가'

initialSoundLawTable :: Map Char Char Source #

The Initial Sound Law (頭音法則) table according to South Korean Hangul Orthography (한글 맞춤법) Clause 5, Section 52, Chapter 6 (第6章52項5節). Keys are an original Sino-Korean sound and values are a converted sound according to the law.

initialSoundLawTable' :: Map Char (Set Char) Source #

Contains the same contents to initialSoundLawTable except that keys and values are crossed: keys are a converted sound and values are possible original sounds.

revertInitialSoundLaw :: Char -> Set Char Source #

It's a kind of inverse function of convertInitialSoundLaw, except it returns a set of candidates instead of a single canonical answer because Initial Sound Law (頭音法則) is not a bijective function.

>>> revertInitialSoundLaw '예'
fromList "례"
>>> revertInitialSoundLaw '염'
fromList "념렴"

It returns an empty set if an input is not applicable to the law:

>>> revertInitialSoundLaw '가'
fromList ""