Project Skills/Methodology:
? Design of goal Data Warehouse architecture
? Comparison and selection of DWH technologies and frameworks
? Agile refactoring of existing Data Warehouse components into a sustainable, scalable warehouse for company-wide data-driven reporting
? Maintenance and support for custom legacy Python data pipeline framework
Technology:
? Microsoft SQL Server
? Python/Pandas for legacy data processing framework
? Introduction of Apache Airflow as the next generation workflow orchestration platform
? Azure DevOps and Azure Pipelines for code version control and CI/CD
Project Skills/Methodology:
? Data pipeline development lead for another 3 data engineers
? Planning, implementing and maintaining of end-to-end data pipelines based on customer specification
? Development of shared Python data engineering utility module
? Software and data pipeline quality control in an agile development process
? Development of new infrastructure components to create a Data Mesh self-service architecture
Technology:
? Azure Databricks with Unity Data Catalog
? PostreSQL database cluster
? Apache Airflow workflow orchestration platform
? Python/PySpark and SQL-based ETL/ELT data engineering stack
? Azure DevOps and Azure Pipelines for code version control and CI/CD
? Different data source types like Azure Blob Storage, REST API, Kafka topics
? Azure DevOps for source code management (Git) and CI/CD pipelines
Project Skills/Methodology:
? Technical and architectural analysis and continuous improvement of an existing private cloud data engineering platform
? Development and integration of new system components for Apache Airflow
? Technical support for data pipeline orchestration using CI/CD infrastructure
Technology:
? Apache Airflow
? Python: Flask, Pandas, SQLAlchemy ORM
? Data pipelines collecting data from various source systems into SQL based data warehouse
? CI/CD pipelines with Gitlab
? Docker, Docker-Compose for service orchestration
? GitLab for source code management (Git) and CI/CD pipelines
Project Skills/Methodology:
? In-depth review of existing system architecture based on available technical documentation and a series of stakeholder interviews
? High-level requirements analysis for client Business Intelligence platform needs
? A solution strategy for weaknesses and missings of the current architecture has been presented to the client, including
? specific technological implementation patterns for ad-hoc improvements, and
? a plan for mid-term gradual shift from Azure Synapse Analytics towards Azure Databricks for more efficiency
Project Language: German
Project Skills/Methodology:
? Exploration of efficient delta loading mechanisms to transfer live sales data from SAP S4/HANA to an Azure SQL Data Warehouse
? Optimization of existing data pipelines
? Optimization of database transactions for ELT operations (extract, load, transform)
? Implementation and documentation of proof-of-concept data pipeline
Technology:
? Azure Data Factory as the data integration service
? Azure SQL Server as the cloud data warehouse
? SAP S4/HANA as data source; using SAP Operational Data Processing (ODP) framework for delta provisioning
? Azure Data Factory "SAP CDC" connector modules (change data capture) for data extraction
? TSQL statements to create and optimize database indexes
Project (2):
Sales Prediction with AzureML for re-supply planning
Project Language: German, English,
Project Skills/Methodology:
? Implementation and evaluation of a minimum-viable-product system for future sales prediction to automate existing planning and re-supply processes
? Analysis of existing manual planning methodology and development of an automated approach
? Creation of evaluation metrics for meaningful prediction quality
? Visualization of results and quality criteria for inspection and monitoring
Role: Lead architect
Project Skills/Methodology:
Technology:
? Apache Airflow as a horizontally scalable compute framework
? Pandas, SQLAlchemy, PyTest as Python libraries for analysis task implementation
? GitLab CI/CD as a process automation framework
? vmWare vCenter as server virtualization framework
? Docker, Docker-Compose for service orchestration
Project Skills/Methodology:
Development and integration of new system components in an agile interdisciplinary team
Maintenance, migration, and consolidation of legacy system components
Open source release of the project code
Technology:
PHP, Drupal framework as main programming language
JavaScript/jQuery, HTML, CSS as supporting languages/tools
MariaDB database, RESTful Object Storage service
CentOS 7 and Ubuntu servers, Docker, Docker-Compose for system operation
CI/CD pipelines with Gitlab
LDAP, OAuth2/OIDC for integrated identity management
Team size: 6
Project language: German, English
Project Skills/Methodology:
Designing and implementing a data warehouse ecosystem to meet contemporary and future requirements of data-driven medical research
Developing a highly flexible framework for automated, scalable, and reproducible data processing tasks (ETL)
Co-developing a secure data sharing interface for cross-site collaboration
Growing a new team of software and data engineers
Defining and establishing an agile team collaboration model suitable for heterogenous DevOps reality
Internal and external project management
Technology:
Python as main programming language
FastAPI framework for REST webservice implementation
Pandas as primary data science library
Angular, HTML, CSS as supporting languages/tools
PostgreSQL, MariaDB, CouchDB, Object Storage services
CentOS 7 and Ubuntu servers, Docker, Docker-Compose for system operation
Task automation with ActiveWorkflow, Apache Airflow, Celery, Gitlab CI/CD
LDAP, OAuth2/OIDC, Keycloak for integrated identity management
Team size: 8
Project language: German, English
Data Platform Architect
Lead Data Engineer
Lead Architect
Data Engineering Consultant,
CI/CD & DevOps Expert,
Agile Consultant
Healthcare
Project Skills/Methodology:
? Design of goal Data Warehouse architecture
? Comparison and selection of DWH technologies and frameworks
? Agile refactoring of existing Data Warehouse components into a sustainable, scalable warehouse for company-wide data-driven reporting
? Maintenance and support for custom legacy Python data pipeline framework
Technology:
? Microsoft SQL Server
? Python/Pandas for legacy data processing framework
? Introduction of Apache Airflow as the next generation workflow orchestration platform
? Azure DevOps and Azure Pipelines for code version control and CI/CD
Project Skills/Methodology:
? Data pipeline development lead for another 3 data engineers
? Planning, implementing and maintaining of end-to-end data pipelines based on customer specification
? Development of shared Python data engineering utility module
? Software and data pipeline quality control in an agile development process
? Development of new infrastructure components to create a Data Mesh self-service architecture
Technology:
? Azure Databricks with Unity Data Catalog
? PostreSQL database cluster
? Apache Airflow workflow orchestration platform
? Python/PySpark and SQL-based ETL/ELT data engineering stack
? Azure DevOps and Azure Pipelines for code version control and CI/CD
? Different data source types like Azure Blob Storage, REST API, Kafka topics
? Azure DevOps for source code management (Git) and CI/CD pipelines
Project Skills/Methodology:
? Technical and architectural analysis and continuous improvement of an existing private cloud data engineering platform
? Development and integration of new system components for Apache Airflow
? Technical support for data pipeline orchestration using CI/CD infrastructure
Technology:
? Apache Airflow
? Python: Flask, Pandas, SQLAlchemy ORM
? Data pipelines collecting data from various source systems into SQL based data warehouse
? CI/CD pipelines with Gitlab
? Docker, Docker-Compose for service orchestration
? GitLab for source code management (Git) and CI/CD pipelines
Project Skills/Methodology:
? In-depth review of existing system architecture based on available technical documentation and a series of stakeholder interviews
? High-level requirements analysis for client Business Intelligence platform needs
? A solution strategy for weaknesses and missings of the current architecture has been presented to the client, including
? specific technological implementation patterns for ad-hoc improvements, and
? a plan for mid-term gradual shift from Azure Synapse Analytics towards Azure Databricks for more efficiency
Project Language: German
Project Skills/Methodology:
? Exploration of efficient delta loading mechanisms to transfer live sales data from SAP S4/HANA to an Azure SQL Data Warehouse
? Optimization of existing data pipelines
? Optimization of database transactions for ELT operations (extract, load, transform)
? Implementation and documentation of proof-of-concept data pipeline
Technology:
? Azure Data Factory as the data integration service
? Azure SQL Server as the cloud data warehouse
? SAP S4/HANA as data source; using SAP Operational Data Processing (ODP) framework for delta provisioning
? Azure Data Factory "SAP CDC" connector modules (change data capture) for data extraction
? TSQL statements to create and optimize database indexes
Project (2):
Sales Prediction with AzureML for re-supply planning
Project Language: German, English,
Project Skills/Methodology:
? Implementation and evaluation of a minimum-viable-product system for future sales prediction to automate existing planning and re-supply processes
? Analysis of existing manual planning methodology and development of an automated approach
? Creation of evaluation metrics for meaningful prediction quality
? Visualization of results and quality criteria for inspection and monitoring
Role: Lead architect
Project Skills/Methodology:
Technology:
? Apache Airflow as a horizontally scalable compute framework
? Pandas, SQLAlchemy, PyTest as Python libraries for analysis task implementation
? GitLab CI/CD as a process automation framework
? vmWare vCenter as server virtualization framework
? Docker, Docker-Compose for service orchestration
Project Skills/Methodology:
Development and integration of new system components in an agile interdisciplinary team
Maintenance, migration, and consolidation of legacy system components
Open source release of the project code
Technology:
PHP, Drupal framework as main programming language
JavaScript/jQuery, HTML, CSS as supporting languages/tools
MariaDB database, RESTful Object Storage service
CentOS 7 and Ubuntu servers, Docker, Docker-Compose for system operation
CI/CD pipelines with Gitlab
LDAP, OAuth2/OIDC for integrated identity management
Team size: 6
Project language: German, English
Project Skills/Methodology:
Designing and implementing a data warehouse ecosystem to meet contemporary and future requirements of data-driven medical research
Developing a highly flexible framework for automated, scalable, and reproducible data processing tasks (ETL)
Co-developing a secure data sharing interface for cross-site collaboration
Growing a new team of software and data engineers
Defining and establishing an agile team collaboration model suitable for heterogenous DevOps reality
Internal and external project management
Technology:
Python as main programming language
FastAPI framework for REST webservice implementation
Pandas as primary data science library
Angular, HTML, CSS as supporting languages/tools
PostgreSQL, MariaDB, CouchDB, Object Storage services
CentOS 7 and Ubuntu servers, Docker, Docker-Compose for system operation
Task automation with ActiveWorkflow, Apache Airflow, Celery, Gitlab CI/CD
LDAP, OAuth2/OIDC, Keycloak for integrated identity management
Team size: 8
Project language: German, English
Data Platform Architect
Lead Data Engineer
Lead Architect
Data Engineering Consultant,
CI/CD & DevOps Expert,
Agile Consultant
Healthcare