python - Exporting to CSV format incorrect in scrapy -
i'm trying out print out csv file after scraping using piplines formatting bit weird because instead of printing top bottom printing @ once after scraping page 1 , of page 2 in 1 column. have attached piplines.py , 1 line csv output(quite large). how make print column wise instead @ once 1 page
pipline.py
# -*- coding: utf-8 -*- # define item pipelines here # # don't forget add pipeline item_pipelines setting # see: http://doc.scrapy.org/en/latest/topics/item-pipeline.html scrapy import signals scrapy.contrib.exporter import csvitemexporter class csvpipeline(object): def __init__(self): self.files = {} @classmethod def from_crawler(cls, crawler): pipeline = cls() crawler.signals.connect(pipeline.spider_opened, signals.spider_opened) crawler.signals.connect(pipeline.spider_closed, signals.spider_closed) return pipeline def spider_opened(self, spider): file = open('%s_items.csv' % spider.name, 'w+b') self.files[spider] = file self.exporter = csvitemexporter(file) self.exporter.fields_to_export = ['names','stars','subjects','reviews'] self.exporter.start_exporting() def spider_closed(self, spider): self.exporter.finish_exporting() file = self.files.pop(spider) file.close() def process_item(self, item, spider): self.exporter.export_item(item) return item
and output.csv
names stars subjects vivek0388,nikhilvashisth,docsharad,abhimanyu_swarup,suresh n,kaushalhkapadia,jyotimallick,nitin t,mhdmumbai,suniltukrel(column 2) 5 of 5 stars,4 of 5 stars,1 of 5 stars,5 of 5 stars,3 of 5 stars,4 of 5 stars,5 of 5 stars,5 of 5 stars,4 of 5 stars,4 of 5 stars(column 3) best stay,awesome view... nice experience!,highly mismanaged , dishonest.,a wonderful experience,good place average front office,honeymoon,awesome resort,amazing,ooty's beauty!!,good stay , food
it should this
vivek0388 5 of 5 nikhilvashisth 5 of 5 docsharad 5 of 5 ...so on
edit:
items = [{'reviews:':"",'subjects:':"",'names:':"",'stars:':""} k in range(1000)] if(sites , len(sites) > 0): site in sites: i+=1 items[i]['names'] = item['names'] items[i]['stars'] = item['stars'] items[i]['subjects'] = item['subjects'] items[i]['reviews'] = item['reviews'] yield request(url="http://tripadvisor.in" + site, callback=self.parse) k in range(1000): yield items[k]
figured out, csv zip , loop through , write row. less complicated once read docs.
import csv import itertools class csvpipeline(object): def __init__(self): self.csvwriter = csv.writer(open('items.csv', 'wb'), delimiter=',') self.csvwriter.writerow(['names','starts','subjects','reviews']) def process_item(self, item, ampa): rows = zip(item['names'],item['stars'],item['subjects'],item['reviews']) row in rows: self.csvwriter.writerow(row) return item
Comments
Post a Comment