amazon web services - Modeling data in NoSQL DynamoDB -

- January 15, 2011

i'm trying figure out how model following data in aws dynamodb table.

i have lot of iot devices, each sends telemetry data every few seconds.

attributes

device_id
timestamp
malware_name
company_name
action_performed (two possible values)

queries

show incidents happened in last week.
show incidents specific device_id.
show incidents action "unable_to_remove".
show incidents related specific malware.
show incidents related specific company.

thoughts

i understand can add gsi's each attribute, use gsi's if there no other choice costs me more money.
what main primary-key (partition-key:sort-key) ?

please share thoughts, care them more care perfect answer i'm trying learn how think , consider instead of having answer specific question.

thanks lot !

if absolutely need querability patterns mentioned, have no way out create gsis each. has set of caveats:

for query #1, gsi incident_date (or whatever) partition-key , device_id sort-key. might lead hot partitioning in dynamodb, based on access patterns.
there limit of 5 gsis per table, you'll use right away. what'll if need support kind of query in future?

while evaluating pros , cons of using nosql given situation, 1 needs consider both read , write access patterns. so, question should ask is, why dynamodb?

for e.g., really need realtime queries? if not, can use dynamodb main database , periodically sync data (using aws lambda or kinesis firehose) emr or redshift later batch processing.

edit: proposed primary key:

device_id partition-key , incident_date sort-key, if know no 2 or more incidents, given device_id, can come @ exact same time.
if above doesn't work, incident_id partition-key , incident_date sort-key.

Search This Blog

HTPPS