Recursively monitor a HDFS directory spark streaming -
i need stream data hdfs direcory via using spark streaming.
javadstream<string> lines = ssc.textfilestream("hdfs://ip:8020/directory");
above pretty job in monitoring hdfs directory new files, limited same directory level, does'nt monitor nested directories.
i comes accross following posts mention adding depth parameter api
https://github.com/apache/spark/pull/2765
the problem in spark version 1.6.1 (tested) parameter not present, hence cannot use it, dont want change original source eighther
javadstream<string> lines = ssc.textfilestream("hdfs://ip:8020/*/*/*/");
some post in stack overflow mention use above syntax, doesnt work eighter.
am missing something?
looks patch created never approved due difficulties s3 , directory depth.
Comments
Post a Comment