java - Reading data from Azure Blob with Spark -
i having issue in reading data azure blobs via spark streaming
javadstream<string> lines = ssc.textfilestream("hdfs://ip:8020/directory");
code above works hdfs, unable read file azure blob
https://blobstorage.blob.core.windows.net/containerid/folder1/
above path shown in azure ui, doesnt work, missing something, , how can access it.
i know eventhub ideal choice streaming data, current situation demands use storage rather queues
in order read data blob storage, there 2 things need done. first, need tell spark native file system use in underlying hadoop configuration. means need hadoop-azure jar available on classpath (note there maybe runtime requirements more jars related hadoop family):
javasparkcontext ct = new javasparkcontext(); configuration config = ct.hadoopconfiguration(); config.set("fs.azure", "org.apache.hadoop.fs.azure.nativeazurefilesystem"); config.set("fs.azure.account.key.youraccount.blob.core.windows.net", "yourkey");
now, call onto file using wasb://
prefix (note [s]
optional secure connection):
ssc.textfilestream("wasb[s]://<blobstoragecontainername>@<storageaccountname>.blob.core.windows.net/<path>");
this goes without saying you'll need have proper permissions set location making query blob storage.
Comments
Post a Comment