mysql - Fetching websites name contains the HTML code in python 27 -


i have facing problem when running python script download companies business directories company name, address, location address , web address.

but when script fetching websites name of company www.example.com fetch websites name html code instead of fetching websites name , store html code mysql server of current websites.

i have using following library of python beautifulsoup, lxml, html, hashlib, urllib2 , store websites name html code mysql server like

<input><tr><td>www.example.com</td></tr></input> 

i want remove html tag , store companies web url www.example.com in mysql server

my code here:

for hit in soup2.findall(attrs={'id' : 'website_0'}):     web = str(hit).replace('<input type="hidden" value="', '')     web = web.replace('" id="website_0" />', '') if web == "":     flog.write("\nwebsite extraction... failed")     print "none" else:     flog.write("\nwebsite extraction... ok")     print web     companyobj.setweb(web) 

any solution or suggestion how fix this.

you have (at least) 2 options: using re or beautifulsoup.

using re

import re cleanse_url = re.compile(r'<[^>]*>')  hit in soup2.findall(attrs={'id' : 'website_0'}):     web = str(hit).replace('<input type="hidden" value="', '')     web = web.replace('" id="website_0" />', '') if web == "":     flog.write("\nwebsite extraction... failed")     print "none" else:     web = cleanse_url.sub('', web)  # escape html     flog.write("\nwebsite extraction... ok")     print web     companyobj.setweb(web) 

using beautifulsoup.tag.text

i think option better tag.text strips attributes tags.

for hit in soup2.findall(attrs={'id' : 'website_0'}):     web = hit.text # use beautifulsoup if web == "":     flog.write("\nwebsite extraction... failed")     print "none" else:     flog.write("\nwebsite extraction... ok")     print web     companyobj.setweb(web) 

Comments

Popular posts from this blog

python - argument must be rect style object - Pygame -

webrtc - Which ICE candidate am I using and why? -

c# - Better 64-bit byte array hash -