You want to become a data engineer, but don't know how to set up a data engineering project? I will show you!
Do not make this mistake!
First of all you should not make the mistake that unfortunately many people make!
Often people want to build the whole thing from the beginning.
They say: "Okay I need to do a project. I need to make a big thing. I don't even know what data and what tools I want to use." But they want to do the full chain right away.
I always say start small! Start by selecting a few data that make sense and that interest you.
And then start with one tool. Then you build something on top of it. And then you build something on that. If necessary, you might exchange something for something else. Because you are more interested in the other thing, for example!
Do not learn everything!
It makes absolutely no sense to go to the Cookbook and look at every tool and try to learn every tool! That is completely useless.
How the Cookbook helps you? Choose a few tools that are interesting or that you can see are in demand. Then you look at them, use them and learn how to use them. That is the main thing!
Approaches to building a project
Does it make sense to hunt down like some type of open API or data portal and just build a pipeline to extract it from there and do the manipulation?
Well, there are several approaches how you can do it.
As already mentioned in the question: You hunt down some sources like APIs.
But what you can also do, of course, is to just go out and finally find the data. Look through the data sources. Free data sources - there are tons of them.
Just last week I added a lot in the cookbook. Go through and find a data set like in CSV format.
If you have a large file, you can always slice it and like simulate an API or simulate a source that is posting somewhere. That's rather quick, so you don't have to fight with some APIs and so on.
Both approaches work! So do the thing that appeals to you the most! :-)
You are still unsure? With my Data Engineer Guide I will guide you through exactly this process using 9 modules. Just drop by and see if it is something for you :-)
>> created by Mira Roth