r - dplyr arrange() function sort by missing values -


i attempting work through hadley wickham's r data science , have gotten tripped on following question: "how use arrange() sort missing values start? (hint: use is.na())" using flights dataset included in nycflights13 package. given arrange() sorts unknown values bottom of dataframe, not sure how 1 opposite across missing values of variables. realize question can answered base r code, interested in how done using dplyr , call arrange() , is.na() functions. thanks.

we can wrap desc missing values @ start

flights %>%      arrange(desc(is.na(dep_time)),            desc(is.na(dep_delay)),            desc(is.na(arr_time)),             desc(is.na(arr_delay)),            desc(is.na(tailnum)),            desc(is.na(air_time))) 

the na values found in variables based on

names(flights)[colsums(is.na(flights)) >0] #[1] "dep_time"  "dep_delay" "arr_time"  "arr_delay" "tailnum"   "air_time"  

instead of passing each variable name @ time, can use nse arrange_

nm1 <- paste0("desc(is.na(", names(flights)[colsums(is.na(flights)) >0], "))")  r1 <- flights %>%         arrange_(.dots = nm1)   r1 %>%    head() #year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum #  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>   <chr>  <int>   <chr> #1  2013     1     2       na           1545        na       na           1910        na      aa    133    <na> #2  2013     1     2       na           1601        na       na           1735        na      ua    623    <na> #3  2013     1     3       na            857        na       na           1209        na      ua    714    <na> #4  2013     1     3       na            645        na       na            952        na      ua    719    <na> #5  2013     1     4       na            845        na       na           1015        na      9e   3405    <na> #6  2013     1     4       na           1830        na       na           2044        na      9e   3716    <na> #variables not shown: origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, #  time_hour <time>. 

update

with newer versions of tidyverse (dplyr_0.7.3, rlang_0.1.2) , can make use of arrange_at, arrange_all, arrange_if

nm1 <- names(flights)[colsums(is.na(flights)) >0] r2 <- flights %>%            arrange_at(vars(nm1), funs(desc(is.na(.)))) 

or use arrange_if

f <- rlang::as_function(~ any(is.na(.))) r3 <- flights %>%            arrange_if(f, funs(desc(is.na(.))))   identical(r1, r2) #[1] true  identical(r1, r3) #[1] true 

Comments

Popular posts from this blog

wordpress - (T_ENDFOREACH) php error -

Export Excel workseet into txt file using vba - (text and numbers with formulas) -

Using django-mptt to get only the categories that have items -