seonbi-0.5.0: SmartyPants for Korean language
Safe HaskellSafe-Inferred
LanguageHaskell2010

Text.Seonbi.Html

Description

Since Seonbi's primitive unit to transform is HTML, this module deals with HTML.

Synopsis

HTML scanner

See more on Text.Seonbi.Html.Scanner module.

data Result r #

Constructors

Fail Text [String] String 
Done Text r 

Instances

Instances details
Functor Result 
Instance details

Defined in Data.Attoparsec.Text.Lazy

Methods

fmap :: (a -> b) -> Result a -> Result b #

(<$) :: a -> Result b -> Result a #

Show r => Show (Result r) 
Instance details

Defined in Data.Attoparsec.Text.Lazy

Methods

showsPrec :: Int -> Result r -> ShowS #

show :: Result r -> String #

showList :: [Result r] -> ShowS #

NFData r => NFData (Result r) 
Instance details

Defined in Data.Attoparsec.Text.Lazy

Methods

rnf :: Result r -> () #

HTML printer

See more on Text.Seonbi.Html.Printer module.

printHtml :: [HtmlEntity] -> Text Source #

Print the list of HtmlEntity into a lazy Text.

>>> let Done "" tokens = scanHtml "<p>Hello,<br>\n<em>world</em>!</p>"
>>> printHtml tokens
"<p>Hello,<br>\n<em>world</em>!</p>"

printText :: [HtmlEntity] -> Text Source #

Print only the text contents (including CDATA sections) without tags into a lazy Text.

>>> let Done "" tokens = scanHtml "<p>Hello,<br>\n<em>world</em>!</p>"
>>> printText tokens
"Hello,\nworld!"

Entities are decoded:

>>> let Done "" tokens = scanHtml "<p><code>&lt;&gt;&quot;&amp;</code></p>"
>>> printText tokens
"<>\"&"

printXhtml :: [HtmlEntity] -> Text Source #

Similar to printHtml except it renders void (self-closing) tags as like br/ instead of br.

>>> let Done "" tokens = scanHtml "<p>Hello,<br>\n<em>world</em>!</p>"
>>> printXhtml tokens
"<p>Hello,<br/>\n<em>world</em>!</p>"

Note that normal tags are not rendered as self-closed; only void tags according to HTML specification are:

>>> let Done "" tokens' = scanHtml "<p></p><p><br></p>"
>>> printXhtml tokens'
"<p></p><p><br/></p>"

HTML entities

See more on Text.Seonbi.Html.Entity module.

data HtmlEntity Source #

An event entity emitted by scanHtml.

Constructors

HtmlStartTag

Represent a token which opens an HTML element.

Note that rawAttributes is not a parsed and structured data but a raw string as its name implies.

The tagStack doesn't include the corresponding opened tag.

Fields

HtmlEndTag

Represent a token which closes an HTML element. The tagStack doesn't include the corresponding closed tag.

Fields

HtmlText

Represent a token of a text node. Note that rawText is not a parsed and structured data but a raw string as its name implies. There can be continuously more than one HtmlText values can be emitted even if they are not separated by element openings or closings.

Fields

HtmlCdata

Represent a token of a CDATA section.

Fields

HtmlComment

Represent a token of an HTML comment.

Fields

type HtmlRawAttrs = Text Source #

All element attributes in a string.

HTML tags

See more on Text.Seonbi.Html.Tag module.

data HtmlTag Source #

HTML tags. This enumeration type contains both HTML 5 and 4 tags for maximum compatibility.

Instances

Instances details
Show HtmlTag Source # 
Instance details

Defined in Text.Seonbi.Html.Tag

Eq HtmlTag Source # 
Instance details

Defined in Text.Seonbi.Html.Tag

Methods

(==) :: HtmlTag -> HtmlTag -> Bool #

(/=) :: HtmlTag -> HtmlTag -> Bool #

Ord HtmlTag Source # 
Instance details

Defined in Text.Seonbi.Html.Tag

htmlTagKind :: HtmlTag -> HtmlTagKind Source #

The kind of an HtmlTag.

>>> Data.Set.filter ((== EscapableRawText) . htmlTagKind) htmlTags
fromList [TextArea,Title]

htmlTagName :: HtmlTag -> Text Source #

The name of an HtmlTag in lowercase.

>>> htmlTagName TextArea
"textarea"
\ t -> htmlTagName t == (toLower $ pack $ show (t :: HtmlTag))

HTML text normalization

normalizeText :: [HtmlEntity] -> [HtmlEntity] Source #

As scanHtml may emit two or more continuous HtmlText fragments even if these can be represented as only one HtmlText fragment, it makes postprocessing hard.

The normalizeText function concatenates such continuous HtmlText fragments into one if possible so that postprocessing can be easy:

>>> :set -XOverloadedStrings -XOverloadedLists
>>> normalizeText [HtmlText [] "Hello, ", HtmlText [] "world!"]
[HtmlText {tagStack = fromList [], rawText = "Hello, world!"}]

It also transforms all HtmlCdata fragments into an HtmlText together.

>>> :{
normalizeText [ HtmlText [] "foo "
              , HtmlCdata [] "<bar>", HtmlText [] " baz!"
              ]
:}
[HtmlText {tagStack = fromList [], rawText = "foo &lt;bar&gt; baz!"}]

HTML hierarchical stacks

See more on Text.Seonbi.Html.TagStack module.

data HtmlTagStack Source #

Represents a hierarchy of a currently parsing position in an HtmlTag tree.

For example, if an scanHtml has read "<a href="#"><b><i>foo</i> bar" it is represented as HtmlTagStack [B, A].

Note that the tags are stored in reverse order, from the deepest to the shallowest, to make inserting a more deeper tag efficient.