csv - Looping through names on a list to run through another program in Python -
i've been trying run list of twitter user names through python script downloads tweet history twitter's api. have user names csv file tried import list , pass through script one-by-one using for-loop. however, getting error seems dumping entire list script @ once:
<ipython-input-24-d7d2e882d84c> in get_all_tweets(screen_name) 60 61 #write csv ---> 62 open('%s_tweets.csv' % screen_name, 'wb') f: 63 writer = csv.writer(f) 64 writer.writerow(["id","created_at","text"]) ioerror: [errno 36] file name long: '0 tonyabbottmhr\n1 albomp\n2 johnalexandermp\n3 karenandrewsmp\n4
for brevity's sake, including list in code, importing of names csv list commented out.
apologies, in order run script, 1 needs twitter api. code below:
#!/usr/bin/env python # encoding: utf-8 import tweepy #https://github.com/tweepy/tweepy import csv import os import pandas pd #twitter api credentials consumer_key = "" consumer_secret = "" access_key = "" access_secret = "" os.chdir('file/dir/path') mps = [tonyabbottmhr,albomp,johnalexandermp,karenandrewsmp] #df = pd.read_csv('twitmp.csv') #for row in df: #mps.append(df.accname) def get_all_tweets(screen_name): #twitter allows access users recent 3240 tweets method #authorize twitter, initialize tweepy auth = tweepy.oauthhandler(consumer_key, consumer_secret) auth.set_access_token(access_key, access_secret) api = tweepy.api(auth) #initialize list hold tweepy tweets alltweets = [] #make initial request recent tweets (200 maximum allowed count) new_tweets = api.user_timeline(screen_name = screen_name,count=200) #save recent tweets alltweets.extend(new_tweets) #save id of oldest tweet less 1 oldest = alltweets[-1].id - 1 #keep grabbing tweets until there no tweets left grab while len(new_tweets) > 0: print "getting tweets before %s" % (oldest) #all subsiquent requests use max_id param prevent duplicates new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest) #save recent tweets alltweets.extend(new_tweets) #update id of oldest tweet less 1 oldest = alltweets[-1].id - 1 print "...%s tweets downloaded far" % (len(alltweets)) #transform tweepy tweets 2d array populate csv outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] tweet in alltweets] #write csv open('%s_tweets.csv' % screen_name, 'wb') f: writer = csv.writer(f) writer.writerow(["id","created_at","text"]) writer.writerows(outtweets) pass if __name__ == '__main__': #pass in username of account want download in range(len(mps)): get_all_tweets(mps[i])
it seems this
#df = pd.read_csv('twitmp.csv') #for row in df: #mps.append(df.accname)
portion of code giving troubles.
here problem(s)
problem 1
when ones iterates on dataframe
object, iterate on it's column names, not want that. can see running list(df)
returns list of column names.
problem 2
when append df.accname
in fact appending entire column. in end, mps
becomes list of dataframe
columns, each element identical , equal df.accname
.
solution
all need is
df = pd.read_csv('twitmp.csv') mps = df.accname.tolist() #or df.accname.astype(str).tolist() if aren't strings, should
bonus
when loop on mps, try using enumerate
, 2 variables, , code cleaner in opinion
for i,name in enumerate( mps): get_all_tweets( name )
you can still use index of name
(i
) within each iteration, .
Comments
Post a Comment