seonbi-0.4.0: SmartyPants for Korean language
Safe HaskellSafe-Inferred
LanguageHaskell2010

Text.Seonbi.Html.Clipper

Synopsis

Documentation

clipPrefixText :: Text -> [HtmlEntity] -> Maybe [HtmlEntity] Source #

Clip the given prefix text from the HTML fragments. If its first text element does not have the same prefix, or the first element is not an HtmlText node, or the list of HTML fragments have nothing at all, it returns Nothing.

>>> :set -XOverloadedLists
>>> :set -XOverloadedStrings
>>> clipPrefixText "foo" [HtmlText [] "bar", HtmlStartTag [] P ""]
Nothing
>>> clipPrefixText "foo" [HtmlStartTag [] P "", HtmlText [] "foo"]
Nothing
>>> clipPrefixText "foo" []
Nothing

If the first element is an HtmlText node, and its rawText contains the common prefix text, it returns a Just value holding a list of HTML fragments with the common prefix removed.

>>> clipPrefixText "foo" [HtmlText [] "foobar", HtmlStartTag [] P ""]
Just [HtmlText {... "bar"},HtmlStartTag {...}]
>>> clipPrefixText "foo" [HtmlText [] "foo", HtmlStartTag [] P ""]
Just [HtmlStartTag {..., tag = P, ...}]

A given text is treated as a raw text, which means even if some HTML entities refer to the same characters it may fails to match unless they share the exactly same representation, e.g.:

>>> clipPrefixText "&" [HtmlText [] "&"]
Nothing

In the same manner, it doesn't find a prefix from HtmlCdata, e.g.:

>>> clipPrefixText "foo" [HtmlCdata [] "foo", HtmlStartTag [] P ""]
Nothing

In order to remove a prefix from both HtmlText and HtmlCdata, apply normalizeText first so that all HtmlCdata entities are transformed to equivalent HtmlText entities:

>>> import Text.Seonbi.Html.TextNormalizer (normalizeText)
>>> let normalized = normalizeText [HtmlCdata [] "foo", HtmlStartTag [] P ""]
>>> clipPrefixText "foo" normalized
Just [HtmlStartTag {..., tag = P, ...}]

Plus, it works even if HTML fragments contain some HtmlComment entities, but these are not touched at all, e.g.:

>>> clipPrefixText "bar" [HtmlComment [] "foo", HtmlText [] "barbaz"]
Just [HtmlComment {... "foo"},HtmlText {... "baz"}]

clipSuffixText :: Text -> [HtmlEntity] -> Maybe [HtmlEntity] Source #

Clip the given suffix text from the HTML fragments, in the same manner to clipPrefixText.

clipText :: Text -> Text -> [HtmlEntity] -> Maybe [HtmlEntity] Source #

Clip the given prefix text and suffix text from the HTML fragments. It simply is composed of clipPrefixText and clipSuffixText functions. It returns Nothing if any of a prefix and a suffix does not match.