Passing Pandas Dataframe Object between componenets.
I feel like I'm missing something but when I try to pass a Pandas Dataframe object from one script to another it becomes a different datatype. Do I need to setup S3 Storage in order to pass objects between scripts?
4 Replies
You can only pass json serializable data between steps
Also for large data we really recommend using S3 indeed
You could probably use something like this: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_pickle.html
thanks!
if u use dataframe i recommend parquet files:
https://arrow.apache.org/docs/python/parquet.html#reading-from-cloud-storage
Wmill has docs samples for it too.