rubenf
rubenf4mo ago

apache iceberg / asset view

Hi @kimsia , you're correct that right now dagster focus on assets while windmill focus on compute. One way to see it is that windmill is lower level than dagster and you can build your own asset abstraction on top of it. We emphasize using object stores such as S3 by passing pointers/reference to it as input or output. When doing that, you can preview parquet files directly in the ui and cache based on etags. The asset view is an abstraction on top of that we will build based on apache iceberg later but for now some of it you will have to decide for yourself. On the other hand, windmill is a lot more performant and flexible than dagster with respect to execution
9 Replies
kimsia
kimsia4mo ago
so has the decision been made to focus on apache iceberg for asset view?
rubenf
rubenf4mo ago
No, we're very agnostic As long as you can store it on an object store such as S3 it will be handled well For now we do live preview for parquet and CSV using datafusion Iceberg is a metaformat above that that we will adopt as well as delta lake
kimsia
kimsia4mo ago
so what should i use so that when an udpated windmill adds support for iceberg, i can still (relatively) easily migrate to?
rubenf
rubenf4mo ago
Yes, it will use the same underlying principles but will be more guided for people that prefer higher level of abstractions
kimsia
kimsia4mo ago
My main reason of comparing dagster and windmill is that dagster seems to promise easy reuse of assets while windmill appears to be easier to get started. A best of both worlds would be ideal. I am now sufficiently intrigued to try out windmill and look at apache iceberg
rubenf
rubenf4mo ago
One way to view our different approach is that dagster is laser focused on data pipelines that are asset based while we are working on providing the most performant and powerful workflow engine that we will then leverage to build data pipeline abstractions so it's likely that right now for data pipelines, dagster has more QoL features, but they lose on performance and scalability because those are area of focus we excel in given that's a pre-condition to be the universal engine
kimsia
kimsia4mo ago
QoL =? as in quality of life as in improvements or functionalities that make a product easier, more convenient, or more pleasant to use?
rubenf
rubenf4mo ago
Correct, wrt to asset based abstractions
kimsia
kimsia4mo ago
ok thank you