python - Add column of empty lists to DataFrame -
similar question how add empty column dataframe?, interested in knowing best way add column of empty lists dataframe.
what trying initialize column , iterate on rows process of them, add filled list in new column replace initialized value.
for example, if below initial dataframe:
df = pd.dataframe(d = {'a': [1,2,3], 'b': [5,6,7]}) # sample dataframe >>> df b 0 1 5 1 2 6 2 3 7
then want end this, each row has been processed separately (sample results shown):
>>> df b c 0 1 5 [5, 6] 1 2 6 [9, 0] 2 3 7 [1, 2, 3]
of course, if try initialize df['e'] = []
other constant, thinks trying add sequence of items length 0, , hence fails.
if try initializing new column none
or nan
, run in following issues when trying assign list location.
df['d'] = none >>> df b d 0 1 5 none 1 2 6 none 2 3 7 none
issue 1 (it perfect if can approach work! maybe trivial missing):
>>> df.loc[0,'d'] = [1,3] ... valueerror: must have equal len keys , value when setting iterable
issue 2 (this 1 works, not without warning because not guaranteed work intended):
>>> df['d'][0] = [1,3] c:\python27\scripts\ipython:1: settingwithcopywarning: value trying set on copy of slice dataframe
hence resort initializing empty lists , extending them needed. there couple of methods can think of initialize way, there more straightforward way?
method 1:
df['empty_lists1'] = [list() x in range(len(df.index))] >>> df b empty_lists1 0 1 5 [] 1 2 6 [] 2 3 7 []
method 2:
df['empty_lists2'] = df.apply(lambda x: [], axis=1) >>> df b empty_lists1 empty_lists2 0 1 5 [] [] 1 2 6 [] [] 2 3 7 [] []
summary of questions:
is there minor syntax change can addressed in issue 1 can allow list assigned none
/nan
initialized field?
if not, best way initialize new column empty lists?
one more way use np.empty
:
df['empty_list'] = np.empty((len(df), 0)).tolist()
you knock off .index
in "method 1" when trying find len
of df
.
df['empty_list'] = [[] _ in range(len(df))]
turns out, np.empty
faster...
in [1]: import pandas pd in [2]: df = pd.dataframe(pd.np.random.rand(1000000, 5)) in [3]: timeit df['empty1'] = pd.np.empty((len(df), 0)).tolist() 10 loops, best of 3: 127 ms per loop in [4]: timeit df['empty2'] = [[] _ in range(len(df))] 10 loops, best of 3: 193 ms per loop in [5]: timeit df['empty3'] = df.apply(lambda x: [], axis=1) 1 loops, best of 3: 5.89 s per loop
Comments
Post a Comment