python 2.7 - Using scrapy's FormRequest no form is submitted -
after trying scrapy's first tutorial excited it. wanted try form submission well.
i have following script , if print out response.body page form , nothing happened. can me how results page?
# spiders/holidaytaxi.py import scrapy scrapy.http import request, formrequest scrapy.selector import htmlxpathselector, selector class holidaytaxispider(scrapy.spider): name = "holidaytaxi" allowed_domains = ["holidaytaxis.com"] start_urls = ['http://holidaytaxis.com/en'] def parse(self, response): return [formrequest.from_response( response, formdata={ 'bookingtypeid':'return', 'airpotzgroupid_chosen':'turkey', 'pickup_chosen':'antalya airport', 'dropoff_chosen':'alanya', 'arrivaldata':'12-07-2015', 'arrivalhour':'12', 'arrivalmin':'00', 'departuredata':'14-07-2015', 'departurehour':'12', 'departuremin':'00', 'adults':'2', 'children':'0', 'infants':'0' }, callback=self.parseresponse )] def parseresponse(self, response): print "hello world" print response.status print response heading = response.xpath('//div/h2') print "heading: ", heading
the output is:
2015-07-05 16:23:59 [scrapy] debug: telnet console listening on 127.0.0.1:6023 2015-07-05 16:24:01 [scrapy] debug: redirecting (301) <get http://www.holidaytaxis.com/en> <get http://holidaytaxis.com/en> 2015-07-05 16:24:02 [scrapy] debug: crawled (200) <get http://www.holidaytaxis.com/en> (referer: none) 2015-07-05 16:24:03 [scrapy] debug: crawled (200) <post http://www.holidaytaxis.com/en/search> (referer: http://www.holidaytaxis.com/en) hello world 200 <200 http://www.holidaytaxis.com/en/search> heading: []
the main problem in how passing booking type, country, pickup , dropoff. need pass corresponding "id"s instead of literal strings.
the following work in case:
return formrequest.from_response( response, formxpath="//form[@id='transfer_search']", formdata={ 'bookingtypeid': '1', 'airportgroupid': '14', 'pickup': '121', 'dropoff': '1076', 'arrivaldate': '12-07-2015', 'arrivalhour': '12', 'arrivalmin': '00', 'departuredate': '14-07-2015', 'departurehour': '12', 'departuremin': '00', 'adults': '2', 'children': '0', 'infants': '0', 'submit': 'get quote' }, callback=self.parseresponse )
note i've fixed arrivaldate
, departuredate
parameter names.
you may want ask how did these ids. question - i've used browser developer tools , studied outgoing post request issued on search form submit:
now real problem how ids in scrapy code. booking types easy handle - there 3 types having ids 1 3. list of countries available on same search form page in select
tag id="airportgroupid"
- can construct mapping dictionary between country name , it's internal id, e.g.:
countries = { option.xpath("@label").extract()[0]: option.xpath("@value").extract()[0] option in response.xpath("//select[@id='airportgroupid']//option") } country_id = countries["turkey"]
it getting more difficult pickup , dropoff locations - booking type , country dependent , retrieved additional xhr requests "http://www.holidaytaxis.com/en/search/getpickup" , "http://www.holidaytaxis.com/en/search/getdropoff" endpoints.
Comments
Post a Comment