seonbi-0.4.0: SmartyPants for Korean language
Safe HaskellSafe-Inferred
LanguageHaskell2010

Text.Seonbi.Html.TextNormalizer

Synopsis

Documentation

escapeHtmlEntities :: Text -> Text Source #

Escape special (control) characters into corresponding character entities in the given HTML text.

>>> escapeHtmlEntities "<foo & \"bar\">"
"&lt;foo &amp; &quot;bar&quot;&gt;"

normalizeCdata :: HtmlEntity -> HtmlEntity Source #

Transform a given HtmlCdata node into an equivalent HtmlText node.

>>> import Text.Seonbi.Html.Tag
>>> normalizeCdata HtmlCdata { tagStack = [P], text = "<p id=\"foo\">" }
HtmlText {tagStack = fromList [P], rawText = "&lt;p id=&quot;foo&quot;&gt;"}

normalizeText :: [HtmlEntity] -> [HtmlEntity] Source #

As scanHtml may emit two or more continuous HtmlText fragments even if these can be represented as only one HtmlText fragment, it makes postprocessing hard.

The normalizeText function concatenates such continuous HtmlText fragments into one if possible so that postprocessing can be easy:

>>> :set -XOverloadedStrings -XOverloadedLists
>>> normalizeText [HtmlText [] "Hello, ", HtmlText [] "world!"]
[HtmlText {tagStack = fromList [], rawText = "Hello, world!"}]

It also transforms all HtmlCdata fragments into an HtmlText together.

>>> :{
normalizeText [ HtmlText [] "foo "
              , HtmlCdata [] "<bar>", HtmlText [] " baz!"
              ]
:}
[HtmlText {tagStack = fromList [], rawText = "foo &lt;bar&gt; baz!"}]