python 2.7 - Problems using Scrapy to scrape craigslist.org -

- July 15, 2012

i'm using following set-up scrape prices http://puertorico.craigslist.org/search/sya

def parse(self, response):         items = selector(response).xpath("//p[@class='row']")         items in items:             item = stackitem()             item['prices'] = items.xpath("//a[@class='i']/span[@class='price']/text()").extract()             item['url'] = items.xpath("//span[@class='href']/@href").extract()               yield item

my problem when run script, of prices shown each item.. how can price of particular item url each available item?

you can try using realtive xpath expression based in context of previous one, traversing nodes until 1 want. also, use different variable in for loop, like:

import scrapy craiglist.items import stackitem  class craiglistspider(scrapy.spider):     name = "craiglist_spider"     allowed_domains = ["puertorico.craigslist.org"]     start_urls = (         'http://puertorico.craigslist.org/search/sya',     )      def parse(self, response):         items = response.selector.xpath("//p[@class='row']")         in items:             item = stackitem()             item['prices'] = i.xpath("./a/span[@class='price']/text()").extract()             item['url'] = i.xpath("./span[@class='txt']/span[@class='pl']/a/@href").extract()               yield item

it yields:

{'prices': [u'$130'], 'url': [u'/sys/5105448465.html']} 2015-07-05 23:58:22 [scrapy] debug: scraped <200 http://puertorico.craigslist.org/search/sya> {'prices': [u'$250'], 'url': [u'/sys/5096083890.html']} 2015-07-05 23:58:22 [scrapy] debug: scraped <200 http://puertorico.craigslist.org/search/sya> {'prices': [u'$100'], 'url': [u'/sys/5069848699.html']} 2015-07-05 23:58:22 [scrapy] debug: scraped <200 http://puertorico.craigslist.org/search/sya> {'prices': [u'$35'], 'url': [u'/syd/5007870110.html']} ...

Search This Blog

Current CAD

python 2.7 - Problems using Scrapy to scrape craigslist.org -

Comments

Post a Comment

Popular posts from this blog

python - argument must be rect style object - Pygame -

c++ - Qt setGeometry: Unable to set geometry -

How to do feature selection and reduction on a LIBSVM file in Spark using Python? -