r - split a data.frame using intervals -
this question quite similar post splitting data frame list using intervals, answer doesn't apply data because don't have column binary values.
my data looks this:
>df v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 1 chr1 49828662 49828663 rs7531656 0 +|chr1 48998526 50489626 agbl4 1 - 2 chr1 62594676 62594677 rs2481665 0 +|chr1 62208148 62629591 patj 1 + 3 chr1 62633580 62633581 rs2457831 0 +|chr1 62208148 62629591 patj 1 + 4 chr1 66379767 66379768 rs12757124 0 +|chr1 66378927 66840262 pde4b 1 + 5 chr1 66392060 66392061 rs55824844 0 +|chr1 66378927 66840262 pde4b 1 + 6 chr1 66393984 66393985 rs35185259 0 +|chr1 66378927 66840262 pde4b 1 +
what need split file based on column v2
, in interval of 5e+05
, output this
[[1]] 1 chr1 49828662 49828663 rs7531656 0 +|chr1 48998526 50489626 agbl4 1 - [[2]] 2 chr1 62594676 62594677 rs2481665 0 +|chr1 62208148 62629591 patj 1 + 3 chr1 62633580 62633581 rs2457831 0 +|chr1 62208148 62629591 patj 1 + [[3]] 4 chr1 66379767 66379768 rs12757124 0 +|chr1 66378927 66840262 pde4b 1 + 5 chr1 66392060 66392061 rs55824844 0 +|chr1 66378927 66840262 pde4b 1 + 6 chr1 66393984 66393985 rs35185259 0 +|chr1 66378927 66840262 pde4b 1 +
my data has ~5millions rows, speed issue can deal later
maybe looking instead:
split(df, cumsum(c(f, diff(df$v2) > 5e5))) $`0` v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 1 chr1 49828662 49828663 rs7531656 0 +|chr1 48998526 50489626 agbl4 1 - $`1` v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 2 chr1 62594676 62594677 rs2481665 0 +|chr1 62208148 62629591 patj 1 + 3 chr1 62633580 62633581 rs2457831 0 +|chr1 62208148 62629591 patj 1 + $`2` v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 4 chr1 66379767 66379768 rs12757124 0 +|chr1 66378927 66840262 pde4b 1 + 5 chr1 66392060 66392061 rs55824844 0 +|chr1 66378927 66840262 pde4b 1 + 6 chr1 66393984 66393985 rs35185259 0 +|chr1 66378927 66840262 pde4b 1 +
Comments
Post a Comment