Separating elements of a Pandas DataFrame in Python -

- August 15, 2014

i have pandas dataframe looks following:

    time  measurement 0      0            1 1      1            2 2      2            3 3      3            4 4      4            5 5      0            2 6      1            3 7      2            4 8      3            5 9      4            6 10     0            3 11     1            4 12     2            5 13     3            6 14     4            7 15     0            1 16     1            2 17     2            3 18     3            4 19     4            5 20     0            2 21     1            3 22     2            4 23     3            5 24     4            6 25     0            3 26     1            4 27     2            5 28     3            6 29     4            7

which can generated following code:

import pandas time=[0,1,2,3,4] repeat_1_conc_1=[1,2,3,4,5] repeat_1_conc_2=[2,3,4,5,6] repeat_1_conc_3=[3,4,5,6,7] d1=pandas.dataframe([time,repeat_1_conc_1]).transpose() d2=pandas.dataframe([time,repeat_1_conc_2]).transpose() d3=pandas.dataframe([time,repeat_1_conc_3]).transpose() repeat_2_conc_1=[1,2,3,4,5] repeat_2_conc_2=[2,3,4,5,6] repeat_2_conc_3=[3,4,5,6,7] d4=pandas.dataframe([time,repeat_2_conc_1]).transpose() d5=pandas.dataframe([time,repeat_2_conc_2]).transpose() d6=pandas.dataframe([time,repeat_2_conc_3]).transpose() df= pandas.concat([d1,d2,d3,d4,d5,d6]).reset_index() df.drop('index',axis=1,inplace=true) df.columns=['time','measurement'] print df

if @ code, you'll see have 2 experimental repeats in same dataframe should separated @ df.iloc[:15]. additionally, within each experiment have 3 sub-experiments can thought of starting conditions of dose response, i.e. first sub-experiment starts 1, second 2 , third 3. these should separated @ index intervals of `len(time)', 0-4, 5 elements each experimental repeat. please tell me best way separate data individual time course measurements each experiment? i'm not sure best data structure use need able access each data each sub experiment each experimental repeat easily. perhaps sometime like:

repeat1=     time  measurement 0      0            1 1      1            2 2      2            3 3      3            4 4      4            5   5      0            2 6      1            3 7      2            4 8      3            5 9      4            6   10     0            3 11     1            4 12     2            5 13     3            6 14     4            7  repeat 2=       time  measurement 15     0            1 16     1            2 17     2            3 18     3            4 19     4            5   20     0            2 21     1            3 22     2            4 23     3            5 24     4            6   25     0            3 26     1            4 27     2            5 28     3            6 29     4            7

iiuc, may set multiindex can index df accessing experiments , subexperiments easily:

in [261]: dfi = df.set_index([df.index//15+1, df.index//5 - df.index//15*3 + 1])  in [262]: dfi out[262]:      time  measurement 1 1     0            1   1     1            2   1     2            3   1     3            4   1     4            5   2     0            2   2     1            3   2     2            4   2     3            5   2     4            6   3     0            3   3     1            4   3     2            5   3     3            6   3     4            7 2 1     0            1   1     1            2   1     2            3   1     3            4   1     4            5   2     0            2   2     1            3   2     2            4   2     3            5   2     4            6   3     0            3   3     1            4   3     2            5   3     3            6   3     4            7

selecting subexperiments

in [263]: dfi.loc[1,1] out[263]:      time  measurement 1 1     0            1   1     1            2   1     2            3   1     3            4   1     4            5  in [264]: dfi.loc[2,2] out[264]:      time  measurement 2 2     0            2   2     1            3   2     2            4   2     3            5   2     4            6

select second experiment subexperiments:

in [266]: dfi.loc[2,:] out[266]:    time  measurement 1     0            1 1     1            2 1     2            3 1     3            4 1     4            5 2     0            2 2     1            3 2     2            4 2     3            5 2     4            6 3     0            3 3     1            4 3     2            5 3     3            6 3     4            7

alternatively can create own slicing function:

def my_slice(rep=1, subexp=1):     rep -= 1     subexp -= 1     return df.ix[rep*15 + subexp*5 : rep*15 + subexp*5 + 4, :]

demo:

in [174]: my_slice(1,1) out[174]:    time  measurement 0     0            1 1     1            2 2     2            3 3     3            4 4     4            5  in [175]: my_slice(2,1) out[175]:     time  measurement 15     0            1 16     1            2 17     2            3 18     3            4 19     4            5  in [176]: my_slice(2,2) out[176]:     time  measurement 20     0            2 21     1            3 22     2            4 23     3            5 24     4            6

ps bit more convenient way concatenate dfs:

df = pandas.concat([d1,d2,d3,d4,d5,d6], ignore_index=true)

so don't need following .reset_index() , drop()

Search This Blog

HTPPS

Separating elements of a Pandas DataFrame in Python -

Comments

Post a Comment

Popular posts from this blog

wordpress - (T_ENDFOREACH) php error -

Export Excel workseet into txt file using vba - (text and numbers with formulas) -

Using django-mptt to get only the categories that have items -