perl - I'm not getting HTML tag while parsing -
the fragment of html code want parse this:
<ul class="authors"> <li class="author" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/person"> <a href="/search?facet-creator=%22charles+l.+fefferman%22" itemprop="name">charles l. fefferman</a>, </li> <li class="author" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/person"> <a href="/search?facet-creator=%22jos%c3%a9+l.+rodrigo%22" itemprop="name">josé l. rodrigo</a> </li>
i want extract whole <a>
elements, while i'm trying parse www::mechanize::treebuilder
content names of authors. so:
content i'm expecting:
<a href="/search?facet-creator=%22charles+l.+fefferman%22" itemprop="name">charles l. fefferman</a>, <a href="/search?facet-creator=%22jos%c3%a9+l.+rodrigo%22" itemprop="name">josé l. rodrigo</a>
content i'm receiving:
charles l. fefferman, josé l. rodrigo
here code responsible parsing this:
my $mech = www::mechanize->new(); www::mechanize::treebuilder->meta->apply($mech); $mech->get($addressdio); @authors = $mech->look_down('class', 'author'); print "authors: <br />"; foreach ( @authors ) { $_->as_text(), "<br />"; }
i thought as_text()
, , while cgi gets html doesn't take text.
i handled it, totally different way - using html::tagparser:
my $html = html::tagparser->new("overwrite.xml"); @li = $html->getelementsbyattribute('class','author'); foreach(@li){ $a = $_->firstchild(); $link = $a->getattribute('href'); $_->innertext; $link; }
Comments
Post a Comment