Recursively monitor a HDFS directory spark streaming -


i need stream data hdfs direcory via using spark streaming.

javadstream<string> lines = ssc.textfilestream("hdfs://ip:8020/directory"); 

above pretty job in monitoring hdfs directory new files, limited same directory level, does'nt monitor nested directories.

i comes accross following posts mention adding depth parameter api

https://mail-archives.apache.org/mod_mbox/spark-reviews/201502.mbox/%3c20150220121124.dbb5fe03f7@git1-us-west.apache.org%3e

https://github.com/apache/spark/pull/2765

the problem in spark version 1.6.1 (tested) parameter not present, hence cannot use it, dont want change original source eighther

javadstream<string> lines = ssc.textfilestream("hdfs://ip:8020/*/*/*/"); 

some post in stack overflow mention use above syntax, doesnt work eighter.

am missing something?

looks patch created never approved due difficulties s3 , directory depth.

https://github.com/apache/spark/pull/6588


Comments

Popular posts from this blog

wordpress - (T_ENDFOREACH) php error -

Export Excel workseet into txt file using vba - (text and numbers with formulas) -

Using django-mptt to get only the categories that have items -