ROLE DESCRIPTION
PURPOSE
You will collaborate closely with our data scientists, analysts, and software engineers to ensure efficient and reliable data flows throughout the organisation. The ideal candidate has a strong background in data engineering, excellent problem-solving skills, and a passion for working with large datasets.
OBJECTIVES (MAIN DUTIES AND RESPONSIBILITIES).
- Design, develop, and maintain scalable and efficient data pipelines and ETL processes to ingest, transform, and load data from various sources.
-
- Develop and implement data warehousing solutions.
- Collaborate with cross-functional teams to integrate various data sources.
- Ensure data quality and consistency across different data systems.
- Optimise data retrieval for dashboard/reporting solutions.
- Optimise data infrastructure, including data storage, data retrieval, and data processing for enhanced performance and scalability.
- Implement data quality and data governance processes to ensure accuracy, consistency, and integrity of data.
- Monitor and troubleshoot data pipelines to identify and resolve issues in a timely manner.
- Perform data profiling and analysis to identify data quality issues and propose improvements.
- Collaborate with data scientists and analysts to provide them with the necessary data sets for analysis and reporting.
- Stay up to date with emerging technologies and trends in data engineering and recommend new tools and frameworks to improve data infrastructure.
QUALIFICATIONS
- Preferably a bachelor’s degree in computer science, engineering, or a related field.
- Certifications (Azure) a plus
KNOWLEDGE, SKILLS & EXPERIENCE
- Proven experience as a Data Engineer or similar role, with a Minimum of 2 years experience
- Strong understanding of data modelling, data warehousing, and ETL processes.
- 1-2 years of experience working with Azure Synapse
- Proficiency in SQL and experience working with relational databases (e.g., PostgreSQL, SQL Server)
- Strong programming skills in at least one scripting language (e.g., Python) and experience with data manipulation and transformation libraries (e.g., Pandas, PySpark).
- Comfortable working with cloud-based infrastructure and services provided by Azure
- Familiarity with data pipeline orchestration tools (e.g., Apache Airflow, Functions, Lambda) and workflow management systems.
- Experience with real-time data streaming technologies (e.g., Apache Kafka, Apache Flink).
- Knowledge of containerisation technologies and orchestration tools (e.g., Docker, Kubernetes).
- Familiarity with machine learning concepts and frameworks.
LANGUAGES
English / Afrikaans
Notes
This role is an opportunity to be part of an innovative team, working on cutting-edge data solutions in Agri-Tech. If you have a passion for data and technology and meet the above requirements, we encourage you to apply.