Which Cloud tools in the industry match your goals?
Firstly, as a Data Engineer myself, I can tell you that there are so many tools to worry about and only a handful you will actually use!
Secondly, as you research and shortlist job opportunities, you’ll come across a lot of tools. Hadoop, Spark, AWS Lambda, TensorFlow, Kafka, Google Pub/Sub, Airflow, Hive, Azure Data Lake, Kubernetes and a few more. Some even include several languages, frameworks, and tools in one opening! The reality is you’ll never need to know most and definitely not all.
You can identify the tools that align with your career goals with these 3 simple steps.
Be aware of the elements of a data science platform
Understand traditional data science roles & responsibilities
Focus first on companies — not job titles
Data Science Platform elements
It is important to be aware of the elements of the data science platform in a company. A platform can be on-premise, entirely on Cloud or partly on Cloud (Hybrid). Amazon Web Services (AWS), Microsoft Azure (Azure) and Google Cloud Platform (GCP) are some of the most widely used cloud platforms.
Andreas Kretz, a LinkedIn top voice data engineer, created a blueprint to understand any data platform. He eliminates all tools to allow adjustments as per the changing requirements of a company.
The blueprint's modules convey various complementary tasks within a team. On his YouTube channel, he explains them in detail. Here’s a brief overview:
Extract data from your data source (an API, a data warehouse or or a relational database)
Send data to your API services to store in a temporary storage (to allow later modules quick access)
Few tools: REST APIs, Apache Flume, AWS API Gateway, Azure Event Hub, GCP Cloud Dataflow
Publish data from API into your buffer message queue before storage
Buffer allows you to manage data load and avoid overwhelming your system
Few tools: Apache Kafka, Apache Redis Pub-Sub, AWS Kinesis, Azure Data Factory, GCP Pub/Sub
Analyze data (in storage) in the form of either streams (live data) or batches (data chunks)
Some use cases may also require you to write data to storage
Few tools: Apache Spark, MapReduce, AWS Lambda, Azure Databricks, GCP Compute Engine
Store data from Buffer in your Store module
Few tools: SQL Databases, NoSQL Databases (Hadoop HDFS, MongoDB etc.), NoSQL Data warehouses (Hive etc.), AWS Redshift, Azure Data Lake, GCP BigQuery.
Visualize insights from your data
Setup a web user interface, a mobile app or even a data visualization tool
Or, create an API to provide developers access to data in your Store module
With an API, users can develop custom visualizations for their business case
Few tools: Android & iOS apps, Dashboards (Grafana, Kibana etc.), Web Servers (React, Tomcat etc.), Data visualization tools (Tableau, Power BI etc.).
Within the Data Science blueprint, specialization has accelerated in recent years. Understanding these roles will give you insight into the collaboration within teams.
Traditional Roles & Responsibilities
Data science teams come together to solve complex data problems. The roles within a traditional data science team each have a specific vocabulary. There are data scientists, data engineers and data analysts – but there are also hybrid roles such as machine learning engineers!
This is easily the most popular role.
They apply theoretical knowledge of statistics and algorithms to find the best way to solve a data science problem.
They take a business problem, translate it to a data question, create predictive models to answer the question and tell a story about their findings.
Andreas calls data engineers the plumbers of data science.
They focus on programming to process large datasets, clean the data and implement requests that come from data scientists.
They build and maintain the data architecture of a project and ensure an uninterrupted flow of data between servers and applications.
Machine Learning Engineers
They take the analysis of a data scientist and deploy it in production.
This is because data scientists traditionally are not engineers who handle terabytes of data and implement their code into production.
They investigate, provide reports and visualizations to explain the hidden insights within the data.
They help people from across the company understand specific queries with plots, charts and dashboards.
Two data scientists or data engineers at different companies, or even within the same company, could do totally different types of work.
So, focus first on companies — not job titles
Focus first on Companies — not job titles
Try finding a handful of companies that represent truly great places for you to work and develop your career (think about culture, opportunities, networking, and so on).
Ensure the shortlisted companies have similar data science platform structures. For example, you can get an idea of a company’s structure by looking at the tools they request in job openings. Companies who deploy their systems on the cloud from the same provider share lots of tools.
Next, identify what the deliverables will be in a specific data science role.
Will you need to write code in a production environment? Will you need to be creating data pipelines? Will you be performing analyses using a visualization tool on on-premise data?
Deliverables detail candidates the nature of everyday work. Whereas job descriptions serve to encourage candidates with a variety of skills to apply.
Later, list down the tools required for you to acquire skills and implement them through projects. You can categorize tools as per each module from the blueprint.
Since your shortlisted companies have similar data platforms, you’ll have eliminated most tools on the market. You now only have a small number of tools to structure your project.
Once you have the relevant experience, go beyond publishing code on GitHub and write a detailed blog post on Medium or LinkedIn of your project. This gives you even more exposure and exemplifies your deep understanding of what you do.
With relevant experience and demonstrated exposure, you are going to connect with most hiring managers.
If you feel this article may help someone you know, do share with them this link: https://bit.ly/36Dpj2Z