python - Exporting to CSV format incorrect in scrapy -

- April 15, 2013

i'm trying out print out csv file after scraping using piplines formatting bit weird because instead of printing top bottom printing @ once after scraping page 1 , of page 2 in 1 column. have attached piplines.py , 1 line csv output(quite large). how make print column wise instead @ once 1 page

pipline.py

# -*- coding: utf-8 -*-  # define item pipelines here # # don't forget add pipeline item_pipelines setting # see: http://doc.scrapy.org/en/latest/topics/item-pipeline.html  scrapy import signals scrapy.contrib.exporter import csvitemexporter  class csvpipeline(object):      def __init__(self):         self.files = {}      @classmethod     def from_crawler(cls, crawler):         pipeline = cls()         crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)         crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)         return pipeline       def spider_opened(self, spider):         file = open('%s_items.csv' % spider.name, 'w+b')         self.files[spider] = file         self.exporter = csvitemexporter(file)         self.exporter.fields_to_export = ['names','stars','subjects','reviews']         self.exporter.start_exporting()      def spider_closed(self, spider):         self.exporter.finish_exporting()         file = self.files.pop(spider)         file.close()       def process_item(self, item, spider):         self.exporter.export_item(item)         return item

and output.csv

names   stars   subjects vivek0388,nikhilvashisth,docsharad,abhimanyu_swarup,suresh n,kaushalhkapadia,jyotimallick,nitin t,mhdmumbai,suniltukrel(column 2)   5 of 5 stars,4 of 5 stars,1 of 5 stars,5 of 5 stars,3 of 5 stars,4 of 5 stars,5 of 5 stars,5 of 5 stars,4 of 5 stars,4 of 5 stars(column 3) best stay,awesome view... nice experience!,highly mismanaged , dishonest.,a wonderful experience,good place average front office,honeymoon,awesome resort,amazing,ooty's beauty!!,good stay , food

it should this

vivek0388      5 of 5 nikhilvashisth 5 of 5 docsharad      5 of 5 ...so on

edit:

items = [{'reviews:':"",'subjects:':"",'names:':"",'stars:':""} k in range(1000)] if(sites , len(sites) > 0):     site in sites:         i+=1         items[i]['names'] = item['names']         items[i]['stars'] = item['stars']         items[i]['subjects'] = item['subjects']         items[i]['reviews'] = item['reviews']         yield request(url="http://tripadvisor.in" + site, callback=self.parse)     k in  range(1000):         yield items[k]

figured out, csv zip , loop through , write row. less complicated once read docs.

import csv import itertools  class csvpipeline(object):     def __init__(self):       self.csvwriter = csv.writer(open('items.csv', 'wb'), delimiter=',')       self.csvwriter.writerow(['names','starts','subjects','reviews'])     def process_item(self, item, ampa):        rows = zip(item['names'],item['stars'],item['subjects'],item['reviews'])         row in rows:          self.csvwriter.writerow(row)        return item

Search This Blog

Current CAD

python - Exporting to CSV format incorrect in scrapy -

Comments

Post a Comment

Popular posts from this blog

python - argument must be rect style object - Pygame -

c++ - Qt setGeometry: Unable to set geometry -

php - Zend Framework / Skeleton-Application / Composer install issue -