amazon ec2 - EC2 suitability for synching large CSV files from an FTP -
i have execute task twice per week. task consists on fetching 1.4gb csv file public ftp server. have process (apply filters, discard rows, make calculations) , synch postgres database hosted on aws rds. each row have retrieve sku entry on database , determine wether needs update or not.
my question if ec2 work solution me. main concern memory.. have searched solutions https://github.com/goodby/csv handle issue fetching row row instead of pulling memory, not work if try read .csv directly ftp.
can provide insight? aws ec2 platform solve problem? how deal issue of csv size , memory limitations?
ftp protocol doesn't "streaming". cannot read file ftp chunks chunk.
honestly, downloading file , trigger run bigger instance not big deal if run twice week, choose r3.large (it cost less 0.20/hour ), execute asap , stop it. internal ssd disk space should give best possible i/o compare ebs.
just make sure os , code deployed inside ebs future reuse(unless have automated code deployment mechanism). , must make sure rds handle burst i/o, otherwise become bottleneck.
even better, using r3.large instance, can split csv file smaller chunks, load them in parallel, shutdown instance after finish. need pay minimal root ebs storage cost afterwards.
i not suggest lambda if process lengthy, since lambda mean short , fast processing (it terminate after 300 seconds).
(update): if open file, simple ways parse read sequentially, may not put whole cpu full use. can split of csv file follow reference answer here.
then using same script, can call them simultaneously sending background process, example below show putting python process in background under linux.
parse_csvfile.py csv1 & parse_csvfile.py csv2 & parse_csvfile.py csv3 &
so instead single file sequential i/o, make use of multiple files. in addition, splitting file should snap under ssd.
Good article. I really found this information very useful and found may answers for my questions on EBS like EBS storage cost.
ReplyDelete