python - Best way to join two large datasets in Pandas -
i'm downloading 2 datasets 2 different databases need joined. each of them separately around 500mb when store them csv. separately fit memory when load both memory error. trouble when try merge them pandas. what best way outer join on them don't memory error? don't have database servers @ hand can install kind of open source software on computer if helps. ideally still solve in pandas not sure if possible @ all. to clarify: merging mean outer join. each table has 2 row: product , version. want check products , versions in left table only, right table , both tables. pd.merge(df1,df2,left_on=['product','version'],right_on=['product','version'], how='outer') this seems task dask designed for. essentially, dask can pandas operations out-of-core, can work datasets don't fit memory. dask.dataframe api subset of pandas api, there shouldn't of learning curve. see dask dataframe overview page additional data