apache spark - how to filter () a pairRDD according to two conditions -


how can filter pair rdd if have 2 conditions filter , 1 test key , other 1 test value (wanna portion of code) bcz used portion , didnt work saddly

javapairrdd filtering = pairrdd1.filter((x,y) -> (x._1.equals(y._1))&&(x._2.equals(y._2))))); 

you can't use regular filter this, because checks 1 item @ time. have compare multiple items each other, , check 1 keep. here's example keeps items repeated:

val items = list(1, 2, 5, 6, 6, 7, 8, 10, 12, 13, 15, 16, 16, 19, 20) val rdd = sc.parallelize(items) // create rdd possible combinations of pairs val mapped = rdd.map { case (x) => (x, 1)} val reduced = mapped.reducebykey{ case (x, y) => x + y } val filtered = reduced.filter { case (item, count) => count > 1 } // print out results: filtered.collect().foreach { case (item, count) =>    println(s"keeping $item because occurred $count times.")} 

it's not performant code this, should give idea approach.


Comments

Popular posts from this blog

wordpress - (T_ENDFOREACH) php error -

Export Excel workseet into txt file using vba - (text and numbers with formulas) -

Using django-mptt to get only the categories that have items -