Hello, I am sure that this blog post gives you a quick way to set up Airflow on your desktop and get going!!!
What is Airflow?
Once I had a scenario to run the task on the Unix system and trigger another task on windows upon completion. I had to deal with installing a few tools and integrating them to accomplish the workflow.
Scheduling & Managing such tasks become even more complex.
Let's say you have a single tool that covers all the above issues and offers more capabilities.
Apache Airflow is once such a powerful tool, where you can define and schedule tasks programmatically. It offers rich UI to monitor and manipulate the workflow.
In this blog, I am sharing my experience of:
Setting up Airflow
Deploying a DAG to invoke AWS lambda.
Problem: Job2 has to execute after Job1
Solution: In SQL I can write a logic to check the Job1 status and run Job2, only if Job1 is completed.
Let's say you have tasks dependent on cloud services, python and shell scripts, we need to look beyond the logic defined in Analogy1 and that is Workflow management.
Apache airflow is one such tool, which reduces the dependencies to an acyclic structure and each task can be defined from a wide variety of operators.
Apache Airflow deployment
I configured Airflow on Windows by enabling the Linux subsystem.
For detailed configuration, visit the link below,
airflow.cfg file will be located in the AIRFLOW_HOME Unix environment variable.
few configuration setting that, I want to talk about,
expose_config = True # set this to True to view config from UI dags_folder = <absolute path> # path to your DAG's folder load_examples = False # if you don't want examples to be loaded
How to create a DAG?
I created a DAG(Dynamic Acyclic Graph), to manually invoke Lamda function.
Python template captured in a picture,