python 2.7 - Problems using Scrapy to scrape craigslist.org -


i'm using following set-up scrape prices http://puertorico.craigslist.org/search/sya

def parse(self, response):         items = selector(response).xpath("//p[@class='row']")         items in items:             item = stackitem()             item['prices'] = items.xpath("//a[@class='i']/span[@class='price']/text()").extract()             item['url'] = items.xpath("//span[@class='href']/@href").extract()               yield item 

my problem when run script, of prices shown each item.. how can price of particular item url each available item?

you can try using realtive xpath expression based in context of previous one, traversing nodes until 1 want. also, use different variable in for loop, like:

import scrapy craiglist.items import stackitem  class craiglistspider(scrapy.spider):     name = "craiglist_spider"     allowed_domains = ["puertorico.craigslist.org"]     start_urls = (         'http://puertorico.craigslist.org/search/sya',     )      def parse(self, response):         items = response.selector.xpath("//p[@class='row']")         in items:             item = stackitem()             item['prices'] = i.xpath("./a/span[@class='price']/text()").extract()             item['url'] = i.xpath("./span[@class='txt']/span[@class='pl']/a/@href").extract()               yield item 

it yields:

{'prices': [u'$130'], 'url': [u'/sys/5105448465.html']} 2015-07-05 23:58:22 [scrapy] debug: scraped <200 http://puertorico.craigslist.org/search/sya> {'prices': [u'$250'], 'url': [u'/sys/5096083890.html']} 2015-07-05 23:58:22 [scrapy] debug: scraped <200 http://puertorico.craigslist.org/search/sya> {'prices': [u'$100'], 'url': [u'/sys/5069848699.html']} 2015-07-05 23:58:22 [scrapy] debug: scraped <200 http://puertorico.craigslist.org/search/sya> {'prices': [u'$35'], 'url': [u'/syd/5007870110.html']} ... 

Comments

Popular posts from this blog

python - argument must be rect style object - Pygame -

webrtc - Which ICE candidate am I using and why? -

c# - Better 64-bit byte array hash -