Security - a very important aspect in the field of data engineering.
This often leads to the following questions:
What tools are available to ensure security in the data engineering process?
What should be considered when designing secure pipelines?
First of all, when you design the whole thing, you should compartmentalize it. So you don't make everything open, you create different security zones where you put different things.
For this purpose, we'd best use the blue print:
Do you want to give all these tools full access to all other resources? No, of course not. You should compartmentalize them.
So what you should do ist:
1) On the one hand, you might want to set up a firewall here:
2) On the other hand, a firewall should also be set up between this:
So what you do is: you create rules that the tools and systems that are working in the Connect phase do not have access to the Store phase - the same applies to Visualize
You should therefore first create different zones and firewalls.
A further possibility is to use security such as Kerberos authentication between each of the tools.
Assuming you write from an API into a message queue, then this would be secured and the communication would be encrypted:
Firewalls and SSL certificates to be on a good level
These are basically the standard things you should do.
In a nutshell: You should look at firewalls and SSL certificates, then you're already very good on top of this.
How do you ensure security in your data pipelines? I look forward to the exchange in the comments!
>> created by Mira Roth
My free 100+ pages Data Engineerin