In one of my livestreams (check out my YouTube Channel) the question was asked which tool I would suggest for Real Time Analytics using which architecture.
This question is difficult to answer because there are so many options.
First you have to define what you understand by real time. Do you mean real time like milliseconds or 2 seconds, seconds or 30 seconds like streaming?
As you can see, the term "real time" is a very weak and arbitrarily definable term.
If you want to do real time and really very fast, then you need to do some embedded stuff. This is the fastest - like on your phone. It's very, very fast because it's running on the phone and not on a cloud. Everything is on your phone.
There are a lot of tools for real time. Spark, for example, is a very good tool to do real-time analysis. It is very, very fast. When you start streaming, set the micro patches to a few seconds - it's really very fast.
You can also do some stuff where you trigger. Like you set up a Kubernetes Cluster and you create docker containers with Python scripts in them. And you trigger the containers on the fly and you basically run them ad hoc or in real-time when you need them.
These were two examples, but there are many more tools - really, very many.
But as I said before, you should define the following first:
What do you need?
What do you want to do in real life?
What kind of analysis do you want to do?
What kind of data do you have?
How much data do you have?
What is the reaction time or the time delay you need here?
Only then should you think about the choice of a tool.
Do you have experience with Real-Time Analytics? Share them in the comments - I'm looking forward to it!
- Become a Data Engineer: Click here!
- My free 100+ pages Data Engineering Cookbook: Click here!
- Follow us on LinkedIn: Click here!
- Check out my YouTube: Click here!
Comments