python 2.7 - Problems using Scrapy to scrape craigslist.org -
i'm using following set-up scrape prices http://puertorico.craigslist.org/search/sya
def parse(self, response): items = selector(response).xpath("//p[@class='row']") items in items: item = stackitem() item['prices'] = items.xpath("//a[@class='i']/span[@class='price']/text()").extract() item['url'] = items.xpath("//span[@class='href']/@href").extract() yield item
my problem when run script, of prices shown each item.. how can price of particular item url each available item?
you can try using realtive xpath
expression based in context of previous one, traversing nodes until 1 want. also, use different variable in for
loop, like:
import scrapy craiglist.items import stackitem class craiglistspider(scrapy.spider): name = "craiglist_spider" allowed_domains = ["puertorico.craigslist.org"] start_urls = ( 'http://puertorico.craigslist.org/search/sya', ) def parse(self, response): items = response.selector.xpath("//p[@class='row']") in items: item = stackitem() item['prices'] = i.xpath("./a/span[@class='price']/text()").extract() item['url'] = i.xpath("./span[@class='txt']/span[@class='pl']/a/@href").extract() yield item
it yields:
{'prices': [u'$130'], 'url': [u'/sys/5105448465.html']} 2015-07-05 23:58:22 [scrapy] debug: scraped <200 http://puertorico.craigslist.org/search/sya> {'prices': [u'$250'], 'url': [u'/sys/5096083890.html']} 2015-07-05 23:58:22 [scrapy] debug: scraped <200 http://puertorico.craigslist.org/search/sya> {'prices': [u'$100'], 'url': [u'/sys/5069848699.html']} 2015-07-05 23:58:22 [scrapy] debug: scraped <200 http://puertorico.craigslist.org/search/sya> {'prices': [u'$35'], 'url': [u'/syd/5007870110.html']} ...
Comments
Post a Comment