{"id":6815,"date":"2022-12-20T19:33:41","date_gmt":"2022-12-20T22:33:41","guid":{"rendered":"http:\/\/lode.uno\/linux-man\/index.php\/2022\/12\/20\/htmlpullparser-man3\/"},"modified":"2022-12-20T19:33:41","modified_gmt":"2022-12-20T22:33:41","slug":"htmlpullparser-man3","status":"publish","type":"post","link":"https:\/\/lode.uno\/linux-man\/2022\/12\/20\/htmlpullparser-man3\/","title":{"rendered":"HTML::PullParser (man3)"},"content":{"rendered":"<h1 align=\"center\">HTML::PullParser<\/h1>\n<p> <a href=\"#NAME\">NAME<\/a><br \/> <a href=\"#SYNOPSIS\">SYNOPSIS<\/a><br \/> <a href=\"#DESCRIPTION\">DESCRIPTION<\/a><br \/> <a href=\"#EXAMPLES\">EXAMPLES<\/a><br \/> <a href=\"#SEE ALSO\">SEE ALSO<\/a><br \/> <a href=\"#COPYRIGHT\">COPYRIGHT<\/a> <\/p>\n<hr>\n<h2>NAME <a name=\"NAME\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">HTML::PullParser \u2212 Alternative HTML::Parser interface<\/p>\n<h2>SYNOPSIS <a name=\"SYNOPSIS\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">use HTML::PullParser; <br \/> $p = HTML::PullParser\u2212>new(file => &#8220;index.html&#8221;, <br \/> start => &#8216;event, tagname, @attr&#8217;, <br \/> end => &#8216;event, tagname&#8217;, <br \/> ignore_elements => [qw(script style)], <br \/> ) || die &#8220;Can&#8217;t open: $!&#8221;; <br \/> while (my $token = $p\u2212>get_token) { <br \/> #&#8230;do something with $token <br \/> }<\/p>\n<h2>DESCRIPTION <a name=\"DESCRIPTION\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">The HTML::PullParser is an alternative interface to the HTML::Parser class. It basically turns the HTML::Parser inside out. You associate a file (or any IO::Handle object or string) with the parser at construction time and then repeatedly call $parser\u2212>get_token to obtain the tags and text found in the parsed document.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">The following methods are provided: <br \/> $p = HTML::PullParser\u2212>new( file => $file, %options ) <br \/> $p = HTML::PullParser\u2212>new( doc => $doc, %options )<\/p>\n<p style=\"margin-left:17%;\">A &#8220;HTML::PullParser&#8221; can be made to parse from either a file or a literal document based on whether the &#8220;file&#8221; or &#8220;doc&#8221; option is passed to the parser\u2019s constructor.<\/p>\n<p style=\"margin-left:17%; margin-top: 1em\">The &#8220;file&#8221; passed in can either be a file name or a file handle object. If a file name is passed, and it can\u2019t be opened for reading, then the constructor will return an undefined value and $! will tell you why it failed. Otherwise the argument is taken to be some object that the &#8220;HTML::PullParser&#8221; can <b>read()<\/b> from when it needs more data. The stream will be <b>read()<\/b> until <small>EOF,<\/small> but not closed.<\/p>\n<p style=\"margin-left:17%; margin-top: 1em\">A &#8220;doc&#8221; can be passed plain or as a reference to a scalar. If a reference is passed then the value of this scalar should not be changed before all tokens have been extracted.<\/p>\n<p style=\"margin-left:17%; margin-top: 1em\">Next the information to be returned for the different token types must be set up. This is done by simply associating an argspec (as defined in HTML::Parser) with the events you have an interest in. For instance, if you want &#8220;start&#8221; tokens to be reported as the string &#8216;S&#8217; followed by the tagname and the attributes you might pass an &#8220;start&#8221;\u2212option like this:<\/p>\n<p style=\"margin-left:17%; margin-top: 1em\">$p = HTML::PullParser\u2212>new( <br \/> doc => $document_to_parse, <br \/> start => &#8216;&#8221;S&#8221;, tagname, @attr&#8217;, <br \/> end => &#8216;&#8221;E&#8221;, tagname&#8217;, <br \/> );<\/p>\n<p style=\"margin-left:17%; margin-top: 1em\">At last other &#8220;HTML::Parser&#8221; options, like &#8220;ignore_tags&#8221;, and &#8220;unbroken_text&#8221;, can be passed in. Note that you should not use the <i>event<\/i>_h options to set up parser handlers. That would confuse the inner logic of &#8220;HTML::PullParser&#8221;.<\/p>\n<p style=\"margin-left:11%;\">$token = $p\u2212>get_token<\/p>\n<p style=\"margin-left:17%;\">This method will return the next <i>token<\/i> found in the <small>HTML<\/small> document, or &#8220;undef&#8221; at the end of the document. The token is returned as an array reference. The content of this array match the argspec set up during &#8220;HTML::PullParser&#8221; construction.<\/p>\n<p style=\"margin-left:11%;\">$p\u2212>unget_token( @tokens )<\/p>\n<p style=\"margin-left:17%;\">If you find out you have read too many tokens you can push them back, so that they are returned again the next time $p\u2212>get_token is called.<\/p>\n<h2>EXAMPLES <a name=\"EXAMPLES\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">The \u2019eg\/hform\u2019 script shows how we might parse the form section of HTML::Documents using HTML::PullParser.<\/p>\n<h2>SEE ALSO <a name=\"SEE ALSO\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">HTML::Parser, HTML::TokeParser<\/p>\n<h2>COPYRIGHT <a name=\"COPYRIGHT\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">Copyright 1998\u22122001 Gisle Aas.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">This library is free software; you can redistribute it and\/or modify it under the same terms as Perl itself.<\/p>\n<hr>\n","protected":false},"excerpt":{"rendered":"<p>  HTML::PullParser \u2212 Alternative HTML::Parser interface <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[3182,3007],"class_list":["post-6815","post","type-post","status-publish","format-standard","hentry","category-sin-categoria","tag-htmlpullparser","tag-man3"],"gutentor_comment":0,"_links":{"self":[{"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/posts\/6815","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/comments?post=6815"}],"version-history":[{"count":0,"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/posts\/6815\/revisions"}],"wp:attachment":[{"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/media?parent=6815"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/categories?post=6815"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/tags?post=6815"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}