Basically, building a Big Data pipeline can be divided into five steps. As you can see in the following blueprint from my cookbook, these five steps include Connect, Buffer, Processing Framework, Store and Visualize.
If you want to build a Big Data pipeline, you first have an API. This API must then send data somewhere. So it goes into a cache.
From this cache the data is then taken and processed, for example with a Stream processing.
Then the results are written into a database - for example an SQL database.
In the last step Visualize you can write a Web UI or an app. Or you can simply use a BI tool for that.
So if you go through the process this way, you can start building pipelines and learn the tools of each step.
So if you go back to the example you would learn how to build APIs and how to use a buff and caches. You would also learn how to use a streaming tool, such as Apache Storm or Flink.
Then you would learn how to use SQL databases and how to create them. Then you would need a way to display the whole - you could use a BI tool or something else, like a dashboard, like Grafana.
So you can really be completely free here.
You can find the graphics and much more helpful information on over 100 pages in my free cookbook. In it I will help you to find the tools for the different stages and to get into them. I can't teach you everything, but I can help you to find the right direction.
- Become a Data Engineer: Click here!
- My free 100+ pages Data Engineering Cookbook: Click here!
- Follow us on LinkedIn: Click here!
- Check out my YouTube: Click here!
Comments