With careful consideration and learning about your market, the choices you need to make become narrower and more clear. I can now begin drafting my data ingestion/ streaming pipeline without being overwhelmed.
For A Quick Recap
You can find the first blog post here, where I learned which tech is most in demand in Toronto: https://www.teamdatascience.com/post/dear-hiring-managers-i-m-here-to-help
And the second blog post is here where I learn which Toronto industries need data engineers the most: https://www.teamdatascience.com/post/toronto-data-engineering-market
The Pipeline Proposal
I'll be creating several pipelines in this project, but first things first; I need to ingest the data, process it and store it.
I'll use Python and Spark because they are the top 2 requested skills in Toronto. Kafka, while not in the top 5 most in demand skills, was still the most requested buffer technology requested which makes it worthwhile to include it. The remaining tech (stages 3, 4, 7 and 8) are all AWS technologies.
I'll be documenting how I build this setup in the AWS console (with screenshots).
You can find my linked in here: https://www.linkedin.com/in/steven-aranibar-8891a2103/
And you can contact me at email@example.com