seonbi-0.4.0: SmartyPants for Korean language
Safe HaskellSafe-Inferred
LanguageHaskell2010

Text.Seonbi.Punctuation

Description

This module deals with punctuations in Korean text.

Synopsis

Arrows

data ArrowTransformationOption Source #

Substitution options for transformArrow function. These options can be composited as an element of a set.

  • []: Transform only leftwards and rightwards arrows.
  • [LeftRight]: Transform bi-directional arrows as well as left/rightwards arrows.
  • [DoubleArrow]: Transform double arrows as well as single arrows.
  • [LeftRight, DoubleArrow]: Transform all types of arrows.

Constructors

LeftRight

A bidirect arrow (e.g., ↔︎).

DoubleArrow

An arrow which has two lines (e.g., ⇒).

transformArrow :: Set ArrowTransformationOption -> [HtmlEntity] -> [HtmlEntity] Source #

Transforms hyphens and less-than and greater-than inequality symbols that mimic arrows into actual arrow characters:

  • -> turns into (U+2192 RIGHTWARDS ARROW).
  • <- turns into (U+2190 LEFTWARDS ARROW).
  • <-> turns into (U+2194 LEFT RIGHT ARROW) if LeftRight is configured.
  • => turns into (U+21D2 RIGHTWARDS DOUBLE ARROW) if DoubleArrow is configured.
  • <= turns into (U+21D0 LEFTWARDS DOUBLE ARROW) if DoubleArrow is configured.
  • <=> turns into (U+21D4 LEFT RIGHT DOUBLE ARROW) if both DoubleArrow and LeftRight are configured at a time.

Quotes

data CitationQuotes Source #

A set of quoting parentheses to be used by quoteCitation function.

There are two presets: angleQuotes and cornerBrackets. These both surround titles with a <cite> tag. In order to disable surrounded elements, set htmlElement field to Nothing, e.g.:

angleQuotes { htmlElement = Nothing }

Constructors

CitationQuotes 

Fields

  • title :: (Text, Text)

    The leading and trailing punctuations to surround a title of novel, newspaper, magazine, movie, television program, etc.

  • subtitle :: (Text, Text)

    The leading and trailing punctuations to surround a title of short story, chapter, article, episode, etc.

  • htmlElement :: Maybe (HtmlTag, HtmlRawAttrs)

    Optional pair of an HTML element and its attributes to surround citations. E.g., if it is Just (Cite, " class="autogen") titles are transformed like <cite class="autogen">이런 날</cite>.

data Quotes Source #

Pairs of substitute folk single and double quotes. Used by transformQuote function.

The are three presets: curvedQuotes, guillemets, and curvedSingleQuotesWithQ:

  • curvedQuotes uses South Korean curved quotation marks which follows English quotes (: U+2018, : U+2019, : U+201C, : U+201D)
  • guillemets uses North Korean angular quotation marks, influenced by Russian guillemets but with some adjustments to replace guillemets with East Asian angular quotes (: U+3008, : U+3009, : U+300A, : U+300B).
  • curvedSingleQuotesWithQ is the almost same to curvedQuotes but wrap text with a <q> tag instead of curved double quotes.

Constructors

Quotes 

Instances

Instances details
Show Quotes Source # 
Instance details

Defined in Text.Seonbi.Punctuation

Eq Quotes Source # 
Instance details

Defined in Text.Seonbi.Punctuation

Methods

(==) :: Quotes -> Quotes -> Bool #

(/=) :: Quotes -> Quotes -> Bool #

Ord Quotes Source # 
Instance details

Defined in Text.Seonbi.Punctuation

data QuotePair Source #

A pair of an opening quote and a closing quote.

Constructors

QuotePair Text Text

Wrap the quoted text with a pair of punctuation characters.

HtmlElement HtmlTag HtmlRawAttrs

Wrap the quoted text (HTML elements) with an element like <q> tag.

angleQuotes :: CitationQuotes Source #

Cite a title using angle quotes, used by South Korean orthography in horizontal writing (橫書), e.g., 《나비와 엉겅퀴》 or 〈枾崎의 바다〉.

cornerBrackets :: CitationQuotes Source #

Cite a title using corner brackets, used by South Korean orthography in vertical writing (縱書) and Japanese orthography, e.g., 『나비와 엉겅퀴』 or 「枾崎의 바다」.

curvedQuotes :: Quotes Source #

English-style curved quotes (: U+2018, : U+2019, : U+201C, : U+201D), which are used by South Korean orthography.

curvedSingleQuotesWithQ :: Quotes Source #

Use English-style curved quotes (: U+2018, : U+2019) for single quotes, and HTML <q> tags for double quotes.

guillemets :: Quotes Source #

East Asian guillemets (: U+3008, : U+3009, : U+300A, : U+300B), which are used by North Korean orthography.

horizontalCornerBrackets :: Quotes Source #

Traditional horizontal corner brackets (: U+300C, : U+300D, : U+300E, : U+300F), which are used by East Asian orthography.

horizontalCornerBracketsWithQ :: Quotes Source #

Use horizontal corner brackets (: U+300C, : U+300D) for single quotes, and HTML <q> tags for double quotes.

quoteCitation Source #

Arguments

:: CitationQuotes

Quoting parentheses to wrap titles.

-> [HtmlEntity]

The input HTML entities to transform.

-> [HtmlEntity] 

People tend to cite the title of a work (e.g., a book, a paper, a poem, a song, a film, a TV show, a game) by wrapping inequality symbols like <<나비와 엉겅퀴>> or <枾崎의 바다> instead of proper angle quotes like 《나비와 엉겅퀴》 or 〈枾崎의 바다〉.

This transforms, in the given HTML fragments, all folk-citing quotes into typographic citing quotes:

  • Pairs of less-than and greater-than inequality symbols (< & >) into pairs of proper angle quotes ( & )
  • Pairs of two consecutive inequality symbols (<< & >>) into pairs of proper double angle quotes ( & )

transformQuote Source #

Arguments

:: Quotes

Pair of quoting punctuations and wrapping element.

-> [HtmlEntity]

The input HTML entities to transform.

-> [HtmlEntity] 

Transform pairs of apostrophes (': U+0027) and straight double quotes (": U+0022) into more appropriate quotation marks like typographic single quotes (: U+2018, : U+2019) and double quotes (: U+201C, : U+201D), or rather wrap them with an HTML element like <q> tag. It depends on the options passed to the first parameter; see also Quotes.

verticalCornerBrackets :: Quotes Source #

Vertical corner brackets (: U+FE41, : U+FE42, : U+FE43, : U+FE44), which are used by East Asian orthography.

verticalCornerBracketsWithQ :: Quotes Source #

Use vertical corner brackets (: U+FE41, : U+FE42) for single quotes, and HTML <q> tags for double quotes.

Stops: periods, commas, & interpuncts

data Stops Source #

A set of stops—period, comma, and interpunct—to be used by normalizeStops function.

There are three presets: horizontalStops, verticalStops, and horizontalStopsWithSlashes.

Constructors

Stops 

Fields

Instances

Instances details
Show Stops Source # 
Instance details

Defined in Text.Seonbi.Punctuation

Methods

showsPrec :: Int -> Stops -> ShowS #

show :: Stops -> String #

showList :: [Stops] -> ShowS #

Eq Stops Source # 
Instance details

Defined in Text.Seonbi.Punctuation

Methods

(==) :: Stops -> Stops -> Bool #

(/=) :: Stops -> Stops -> Bool #

horizontalStops :: Stops Source #

Stop sentences in the modern Korean style which follows Western stops. E.g.:

봄·여름·가을·겨울. 어제, 오늘.

horizontalStopsWithSlashes :: Stops Source #

Similar to horizontalStops except slashes are used instead of interpuncts. E.g.:

봄/여름/가을/겨울. 어제, 오늘.

normalizeStops :: Stops -> [HtmlEntity] -> [HtmlEntity] Source #

Normalizes sentence stops (periods, commas, and interpuncts).

transformEllipsis :: [HtmlEntity] -> [HtmlEntity] Source #

Until 2015, the National Institute of Korean Language (國立國語院) had allowed to use only ellipses () for omitted word, phrase, line, paragraph, or speechlessness. However, people tend to use three or more consecutive periods (...) instead of a proper ellipsis. Although NIKL has started to allow consecutive periods besides an ellipsis for these uses, ellipses are still a proper punctuation in principle.

This transforms, in the given HTML fragments, all three consecutive periods into proper ellipses.

verticalStops :: Stops Source #

Stop sentences in the pre-modern Korean style which follows Chinese stops. E.g.:

봄·여름·가을·겨울。어제、오늘。

Dashes

transformEmDash :: [HtmlEntity] -> [HtmlEntity] Source #

Transform the following folk em dashes into proper em dashes (: U+2014 EM DASH):

  • A hyphen (-: U+002D HYPHEN-MINUS) surrounded by spaces.
  • Two or three consecutive hyphens (-- or ---).
  • A hangul vowel (U+3161 HANGUL LETTER EU) surrounded by spaces. There are Korean people that use a hangul vowel ("eu") instead of an em dash due to their ignorance or negligence.