python - Add column of empty lists to DataFrame -


similar question how add empty column dataframe?, interested in knowing best way add column of empty lists dataframe.

what trying initialize column , iterate on rows process of them, add filled list in new column replace initialized value.

for example, if below initial dataframe:

df = pd.dataframe(d = {'a': [1,2,3], 'b': [5,6,7]}) # sample dataframe  >>> df     b 0  1  5 1  2  6 2  3  7 

then want end this, each row has been processed separately (sample results shown):

>>> df     b          c 0  1  5     [5, 6] 1  2  6     [9, 0] 2  3  7  [1, 2, 3] 

of course, if try initialize df['e'] = [] other constant, thinks trying add sequence of items length 0, , hence fails.

if try initializing new column none or nan, run in following issues when trying assign list location.

df['d'] = none  >>> df     b     d 0  1  5  none 1  2  6  none 2  3  7  none 

issue 1 (it perfect if can approach work! maybe trivial missing):

>>> df.loc[0,'d'] = [1,3]  ... valueerror: must have equal len keys , value when setting iterable 

issue 2 (this 1 works, not without warning because not guaranteed work intended):

>>> df['d'][0] = [1,3]  c:\python27\scripts\ipython:1: settingwithcopywarning: value trying set on copy of slice dataframe 

hence resort initializing empty lists , extending them needed. there couple of methods can think of initialize way, there more straightforward way?

method 1:

df['empty_lists1'] = [list() x in range(len(df.index))]  >>> df     b   empty_lists1 0  1  5             [] 1  2  6             [] 2  3  7             [] 

method 2:

 df['empty_lists2'] = df.apply(lambda x: [], axis=1)  >>> df     b   empty_lists1   empty_lists2 0  1  5             []             [] 1  2  6             []             [] 2  3  7             []             [] 

summary of questions:

is there minor syntax change can addressed in issue 1 can allow list assigned none/nan initialized field?

if not, best way initialize new column empty lists?

one more way use np.empty:

df['empty_list'] = np.empty((len(df), 0)).tolist() 

you knock off .index in "method 1" when trying find len of df.

df['empty_list'] = [[] _ in range(len(df))] 

turns out, np.empty faster...

in [1]: import pandas pd  in [2]: df = pd.dataframe(pd.np.random.rand(1000000, 5))  in [3]: timeit df['empty1'] = pd.np.empty((len(df), 0)).tolist() 10 loops, best of 3: 127 ms per loop  in [4]: timeit df['empty2'] = [[] _ in range(len(df))] 10 loops, best of 3: 193 ms per loop  in [5]: timeit df['empty3'] = df.apply(lambda x: [], axis=1) 1 loops, best of 3: 5.89 s per loop 

Comments

Popular posts from this blog

php - Zend Framework / Skeleton-Application / Composer install issue -

c# - Better 64-bit byte array hash -

python - PyCharm Type error Message -