java - Reading data from Azure Blob with Spark -

- September 15, 2013

i having issue in reading data azure blobs via spark streaming

javadstream<string> lines = ssc.textfilestream("hdfs://ip:8020/directory");

code above works hdfs, unable read file azure blob

https://blobstorage.blob.core.windows.net/containerid/folder1/

above path shown in azure ui, doesnt work, missing something, , how can access it.

i know eventhub ideal choice streaming data, current situation demands use storage rather queues

in order read data blob storage, there 2 things need done. first, need tell spark native file system use in underlying hadoop configuration. means need hadoop-azure jar available on classpath (note there maybe runtime requirements more jars related hadoop family):

javasparkcontext ct = new javasparkcontext(); configuration config = ct.hadoopconfiguration(); config.set("fs.azure", "org.apache.hadoop.fs.azure.nativeazurefilesystem"); config.set("fs.azure.account.key.youraccount.blob.core.windows.net", "yourkey");

now, call onto file using wasb:// prefix (note [s] optional secure connection):

ssc.textfilestream("wasb[s]://<blobstoragecontainername>@<storageaccountname>.blob.core.windows.net/<path>");

this goes without saying you'll need have proper permissions set location making query blob storage.

Search This Blog

HTPPS

java - Reading data from Azure Blob with Spark -

Comments

Post a Comment

Popular posts from this blog

wordpress - (T_ENDFOREACH) php error -

Export Excel workseet into txt file using vba - (text and numbers with formulas) -

Using django-mptt to get only the categories that have items -