Drafting Your Data Pipelines

Steven Aranibar
May 11, 2020
1 min read

Updated: Apr 21, 2022

With careful consideration and learning about your market, the choices you need to make become narrower and more clear. I can now begin drafting my data ingestion/ streaming pipeline without being overwhelmed.

For A Quick Recap

You can find the first blog post here, where I learned which tech is most in demand in Toronto: https://www.teamdatascience.com/post/dear-hiring-managers-i-m-here-to-help

And the second blog post is here where I learn which Toronto industries need data engineers the most: https://www.teamdatascience.com/post/toronto-data-engineering-market

The Pipeline Proposal

I'll be creating several pipelines in this project, but first things first; I need to ingest the data, process it and store it.

I'll use Python and Spark because they are the top 2 requested skills in Toronto. Kafka, while not in the top 5 most in demand skills, was still the most requested buffer technology requested which makes it worthwhile to include it. The remaining tech (stages 3, 4, 7 and 8) are all AWS technologies.

What's Next

I'll be documenting how I build this setup in the AWS console (with screenshots).

You can find my linked in here: https://www.linkedin.com/in/steven-aranibar-8891a2103/

And you can contact me at cloudengineertoronto@protonmail.com