So I was curious and wanted to know what are the tools of choice these days for on-prem and cloud based platforms.
Based on my research, I found out the following:
For cloud, it might be platform dependent.
For on-prem hive,spark,nifi,airflow,kafka seem to be good.
I would like to know which tools are in demand in your opinon. Also, would be nice if some comments on how docker, kubernetes conncect with these data tools. i heard the enw version of spark supports both along with yarn.