Separating elements of a Pandas DataFrame in Python -
i have pandas dataframe looks following:
time measurement 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 0 2 6 1 3 7 2 4 8 3 5 9 4 6 10 0 3 11 1 4 12 2 5 13 3 6 14 4 7 15 0 1 16 1 2 17 2 3 18 3 4 19 4 5 20 0 2 21 1 3 22 2 4 23 3 5 24 4 6 25 0 3 26 1 4 27 2 5 28 3 6 29 4 7
which can generated following code:
import pandas time=[0,1,2,3,4] repeat_1_conc_1=[1,2,3,4,5] repeat_1_conc_2=[2,3,4,5,6] repeat_1_conc_3=[3,4,5,6,7] d1=pandas.dataframe([time,repeat_1_conc_1]).transpose() d2=pandas.dataframe([time,repeat_1_conc_2]).transpose() d3=pandas.dataframe([time,repeat_1_conc_3]).transpose() repeat_2_conc_1=[1,2,3,4,5] repeat_2_conc_2=[2,3,4,5,6] repeat_2_conc_3=[3,4,5,6,7] d4=pandas.dataframe([time,repeat_2_conc_1]).transpose() d5=pandas.dataframe([time,repeat_2_conc_2]).transpose() d6=pandas.dataframe([time,repeat_2_conc_3]).transpose() df= pandas.concat([d1,d2,d3,d4,d5,d6]).reset_index() df.drop('index',axis=1,inplace=true) df.columns=['time','measurement'] print df
if @ code, you'll see have 2 experimental repeats in same dataframe should separated @ df.iloc[:15]
. additionally, within each experiment have 3 sub-experiments can thought of starting conditions of dose response, i.e. first sub-experiment starts 1, second 2 , third 3. these should separated @ index intervals of `len(time)', 0-4, 5 elements each experimental repeat. please tell me best way separate data individual time course measurements each experiment? i'm not sure best data structure use need able access each data each sub experiment each experimental repeat easily. perhaps sometime like:
repeat1= time measurement 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 0 2 6 1 3 7 2 4 8 3 5 9 4 6 10 0 3 11 1 4 12 2 5 13 3 6 14 4 7 repeat 2= time measurement 15 0 1 16 1 2 17 2 3 18 3 4 19 4 5 20 0 2 21 1 3 22 2 4 23 3 5 24 4 6 25 0 3 26 1 4 27 2 5 28 3 6 29 4 7
iiuc, may set multiindex can index df accessing experiments , subexperiments easily:
in [261]: dfi = df.set_index([df.index//15+1, df.index//5 - df.index//15*3 + 1]) in [262]: dfi out[262]: time measurement 1 1 0 1 1 1 2 1 2 3 1 3 4 1 4 5 2 0 2 2 1 3 2 2 4 2 3 5 2 4 6 3 0 3 3 1 4 3 2 5 3 3 6 3 4 7 2 1 0 1 1 1 2 1 2 3 1 3 4 1 4 5 2 0 2 2 1 3 2 2 4 2 3 5 2 4 6 3 0 3 3 1 4 3 2 5 3 3 6 3 4 7
selecting subexperiments
in [263]: dfi.loc[1,1] out[263]: time measurement 1 1 0 1 1 1 2 1 2 3 1 3 4 1 4 5 in [264]: dfi.loc[2,2] out[264]: time measurement 2 2 0 2 2 1 3 2 2 4 2 3 5 2 4 6
select second experiment subexperiments:
in [266]: dfi.loc[2,:] out[266]: time measurement 1 0 1 1 1 2 1 2 3 1 3 4 1 4 5 2 0 2 2 1 3 2 2 4 2 3 5 2 4 6 3 0 3 3 1 4 3 2 5 3 3 6 3 4 7
alternatively can create own slicing function:
def my_slice(rep=1, subexp=1): rep -= 1 subexp -= 1 return df.ix[rep*15 + subexp*5 : rep*15 + subexp*5 + 4, :]
demo:
in [174]: my_slice(1,1) out[174]: time measurement 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 in [175]: my_slice(2,1) out[175]: time measurement 15 0 1 16 1 2 17 2 3 18 3 4 19 4 5 in [176]: my_slice(2,2) out[176]: time measurement 20 0 2 21 1 3 22 2 4 23 3 5 24 4 6
ps bit more convenient way concatenate dfs:
df = pandas.concat([d1,d2,d3,d4,d5,d6], ignore_index=true)
so don't need following .reset_index()
, drop()
Comments
Post a Comment