python 2.7 - Problems using Scrapy to scrape craigslist.org -
i'm using following set-up scrape prices http://puertorico.craigslist.org/search/sya
def parse(self, response): items = selector(response).xpath("//p[@class='row']") items in items: item = stackitem() item['prices'] = items.xpath("//a[@class='i']/span[@class='price']/text()").extract() item['url'] = items.xpath("//span[@class='href']/@href").extract() yield item my problem when run script, of prices shown each item.. how can price of particular item url each available item?
you can try using realtive xpath expression based in context of previous one, traversing nodes until 1 want. also, use different variable in for loop, like:
import scrapy craiglist.items import stackitem class craiglistspider(scrapy.spider): name = "craiglist_spider" allowed_domains = ["puertorico.craigslist.org"] start_urls = ( 'http://puertorico.craigslist.org/search/sya', ) def parse(self, response): items = response.selector.xpath("//p[@class='row']") in items: item = stackitem() item['prices'] = i.xpath("./a/span[@class='price']/text()").extract() item['url'] = i.xpath("./span[@class='txt']/span[@class='pl']/a/@href").extract() yield item it yields:
{'prices': [u'$130'], 'url': [u'/sys/5105448465.html']} 2015-07-05 23:58:22 [scrapy] debug: scraped <200 http://puertorico.craigslist.org/search/sya> {'prices': [u'$250'], 'url': [u'/sys/5096083890.html']} 2015-07-05 23:58:22 [scrapy] debug: scraped <200 http://puertorico.craigslist.org/search/sya> {'prices': [u'$100'], 'url': [u'/sys/5069848699.html']} 2015-07-05 23:58:22 [scrapy] debug: scraped <200 http://puertorico.craigslist.org/search/sya> {'prices': [u'$35'], 'url': [u'/syd/5007870110.html']} ...
Comments
Post a Comment