Senior Data Scientist ConsultantSparkSnowflakeSQL...
Senior Data Scientist Consultant
Conducted economic viability analyses to guide strategic decisions and optimize resource allocation, integrating real-time data pipelines using Spark, Snowflake, and SQL.
Ideated and developed AI-driven product solutions tailored to client needs, leveraging LangChain and LlamaIndex for LLM application development and knowledge integration.
Designed and deployed Retrieval Augmented Generation (RAG) pipelines to improve information retrieval and contextual responses, using Pinecone and Weaviate for vector search optimization.
Built and fine-tuned Large Language Models (LLMs) for chatbot applications, utilizing OpenAI APIs, prompt engineering, and generative AI frameworks.
Created scalable RESTful APIs with FastAPI to deploy AI and ML services, ensuring seamless integration with existing systems.
Deployed generative AI models in production, focusing on quantization, inference optimization, and deployment using PyTorch, TensorFlow, and Hugging Face Transformers.
Applied computer vision techniques for creative tagging and performance analysis, leveraging CLIP, Deep Learning, and Keras, driving campaign ROI improvements.
Designed and implemented MLOps workflows, including monitoring, versioning, and scaling AI systems with Kubernetes, Docker, and cloud platforms like AWS EKS, GCP, and Azure.
Developed robust data pipelines for real-time analytics and machine learning workflows, integrating BigQuery, DBT, and Databricks to enhance performance and scalability.
Integrated vector databases (e.g., Pinecone, Weaviate) and embeddings for advanced semantic search and retrieval in AIdriven applications.
Collaborated with stakeholders across engineering, product, and business teams to define AI strategy and ensure alignment with business objectives.
Created comprehensive documentation and implemented monitoring solutions for ML systems using Datadog and other observability tools.
Optimized database performance for large-scale applications, supporting SQL databases and NoSQL solutions to handle high-volume AI workflows.
SparkSnowflakeSQLPyTorchTensorFlowHugging Face TransformersDockerAWS EKSGCPAzureNoSQLLangChainLlamaIndexPineconeWeaviateLLMsCLIPDeep LearningKerasKubernetesBigQueryDBTDatabricksDatadog
3 months
2024-08 - 2024-10
Healthcare Chatbot
Lead Data ScientistLLMGenerative AIPython
Lead Data Scientist
Ideated and developed AI-driven product solutions tailored to client needs, leveraging and for LLM application development and knowledge integration.
Designed and deployed Retrieval Augmented Generation () pipelines to improve information retrieval and contextual responses, using and for vector search optimization.
Built and fine-tuned Large Language Models for chatbot applications
LLMGenerative AIPython
Healthcare
8 months
2024-01 - 2024-08
Probabilistic Attribution
Lead Data ScientistPythonLightGBMCausal Inference
Lead Data Scientist
Addressed third-party cookie deprecation by using probabilistic clustering in full funnel attribution, using to pipeline data to enhance conversion tracking and data accuracy.
Leveraged causal inference (double robust learners) to refine conversion tracking, boosting accuracy and stakeholder confidence on top of clustering output.
PythonLightGBMCausal Inference
3 years 8 months
2021-01 - 2024-08
Developed and architected very complex real-time ML systems
Senior Data Scientist
Senior Data Scientist
Developed and architected very complex real-time ML systems multiple times to production as a team.
Developed and managed an automated bidding model optimized to maximise revenue/margin, generating a 6% uplift in spend and decrease in CPA.
Developed and managed an automated text generation models (keywords, ad copy) using classic NLP and LLM techniques (self-hosting and external, prompt engineering, persuasion techniques).
Led Research Project on Probabilistic Attribution: Addressed third-party cookie deprecation by using probabilistic clustering in full funnel attribution, using Spark to pipeline data to enhance conversion tracking and data accuracy.
Advanced Attribution with Causal Inference: Leveraged causal inference (double robust learners ? econml) to refine conversion tracking, boosting accuracy and stakeholder confidence on top of clustering output.
Introduced ?campaign pacing? as a KPI: Developed, A/B tested and deployed with Docker/Kubernetes a linear regressionbased algorithm, achieving a 3.7% YoY revenue increase and enhancing pacing KPIs by 35%.
Successfully led A/B testing for new product features, driving improvement in core company KPIs.
Collaborated with cross-functional teams to enable data-driven decision-making and stakeholder buy-in.
Recruited elements for the DS team.
RADANCY
Remote DE
4 months
2024-01 - 2024-04
Campaign Pacing Optimization
Lead Data Scientist
Lead Data Scientist
Introduced ?campaign pacing? as a KPI: Developed, A/B
tested and deployed with Docker/Kubernetes a linear regression-based algorithm,
achieving a 3.7% YoY revenue increase and enhancing pacing KPIs by 35%.
Successfully led A/B testing for new product features,
driving improvement in core company KPIs.
Collaborated with cross-functional teams to enable
data-driven decision-making and stakeholder buy-in.
3 months
2023-04 - 2023-06
LLM Fine Tuning
Lead Data ScientistPythonPytorchLLM...
Lead Data Scientist
Deployed generative AI models in production, focusing on
quantization, inference optimization, and deployment using PyTorch, TensorFlow,
and Hugging Face Transformers.
Applied computer vision techniques for creative tagging
and performance analysis, leveraging CLIP, Deep Learning, and Keras, driving
campaign ROI improvements.
Designed and implemented MLOps workflows, including
monitoring, versioning, and scaling AI systems with Kubernetes, Docker, and
cloud platforms like AWS EKS, GCP, and Azure.
Developed robust data pipelines for real-time analytics
and machine learning workflows, integrating BigQuery, DBT, and Databricks to
enhance performance and scalability.
Developed and architected very complex real-time ML systems multiple times to production as a team.
Developed and managed an model to /margin, generating a 6% in spend and decrease in .
Developed and managed an models (keywords, ad copy) using classic NLP and LLM techniques (self-hosting and external, prompt engineering, persuasion techniques).
PythonNLPLLM
1 year 1 month
2022-01 - 2023-01
Risk Adjusted Portfolio Optimization
Lead Data Scientist
Lead Data Scientist
Developed a system to optimize portfolio risk: key risk
KPIs (implied volatility, maximum drawdown) decreased on average 7.6%.
Delivered a system for option pricing using deep learning,
leading to an improvement in average daily returns of 2.7%.
Managed end-to-end data processing, enhancing system
performance through effective sourcing, preprocessing, and partitioning for model
training and inference.
Utilized NLP techniques to analyze tweets, creating a
feature store (embedding) for machine learning models that enhanced stock
market understanding.
Developed a back and forward testing framework.
8 months
2020-06 - 2021-01
Developed a system to optimize portfolio risk
Data Scientist
Data Scientist
Developed a system to optimize portfolio risk: key risk KPIs (implied volatility, maximum drawdown) decreased on average 7.6%.
Delivered a system for option pricing using deep learning, leading to an improvement in average daily returns of 2.7%.
Managed end-to-end data processing, enhancing system performance through effective sourcing, preprocessing, and partitioning for model training and inference.
Utilized NLP techniques to analyze tweets, creating a feature store (embedding) for machine learning models that enhanced stock market understanding.
Developed a back and forward testing framework.
AXOVISION
1 year 8 months
2018-11 - 2020-06
develop a data pipeline
Junior Data Scientist
Junior Data Scientist
Collaborated with the team to develop a data pipeline for IoT data from vehicles (around 10.000 vehicles) amounting to around 100GB daily vehicle sensor data using Spark Scala from scratch.
Developed a linear-regression model for driving ranking based on fuel efficiency.
Started the groundwork for a predictive maintenance system, doing extensive data analysis and connected stakeholders from non-technical backgrounds.
DAIMLER
Lisbon, PT
4 months
2016-06 - 2016-09
various data cleaning procedures
Data Scientist (Intern)PythonPandasAirflow...
Data Scientist (Intern)
Collaborated with the team to engineer various data cleaning procedures using Python, Pandas and Airflow of financial and transactional data.
Developed a NER system to identify key information in financial documents and PDFs, using OCR and Python.
PythonPandasAirflowPDFsOCR
EY
Dublin, IE
Aus- und Weiterbildung
Aus- und Weiterbildung
2017
CS Machine Learning & Robotics
MSc
Instituto Superior Técnico, University of Lisbon, Lisbon,PT
Kompetenzen
Kompetenzen
Top-Skills
Data Scientist
Produkte / Standards / Erfahrungen / Methoden
Technical:
Pandas
Scikit-learn
Keras
Tensorflow
CLTV
Churn
Python
BigQuery
DBT
Snowflake
Dash
Model Interpretability
Tree models
AWS
GCP
Databricks
Scala
pySpark
Spark
Kubernetes
Docker
Excel
Datadog
Looker
Tableau
A/B Testing
KPI
Optimization
MLIP
LP
Deep Learning
XGBoost
LightGBM
Einsatzorte
Einsatzorte
Deutschland, Schweiz
möglich
Projekte
Projekte
4 years 9 months
2020-03 - now
Conducted economic viability analyses
Senior Data Scientist ConsultantSparkSnowflakeSQL...
Senior Data Scientist Consultant
Conducted economic viability analyses to guide strategic decisions and optimize resource allocation, integrating real-time data pipelines using Spark, Snowflake, and SQL.
Ideated and developed AI-driven product solutions tailored to client needs, leveraging LangChain and LlamaIndex for LLM application development and knowledge integration.
Designed and deployed Retrieval Augmented Generation (RAG) pipelines to improve information retrieval and contextual responses, using Pinecone and Weaviate for vector search optimization.
Built and fine-tuned Large Language Models (LLMs) for chatbot applications, utilizing OpenAI APIs, prompt engineering, and generative AI frameworks.
Created scalable RESTful APIs with FastAPI to deploy AI and ML services, ensuring seamless integration with existing systems.
Deployed generative AI models in production, focusing on quantization, inference optimization, and deployment using PyTorch, TensorFlow, and Hugging Face Transformers.
Applied computer vision techniques for creative tagging and performance analysis, leveraging CLIP, Deep Learning, and Keras, driving campaign ROI improvements.
Designed and implemented MLOps workflows, including monitoring, versioning, and scaling AI systems with Kubernetes, Docker, and cloud platforms like AWS EKS, GCP, and Azure.
Developed robust data pipelines for real-time analytics and machine learning workflows, integrating BigQuery, DBT, and Databricks to enhance performance and scalability.
Integrated vector databases (e.g., Pinecone, Weaviate) and embeddings for advanced semantic search and retrieval in AIdriven applications.
Collaborated with stakeholders across engineering, product, and business teams to define AI strategy and ensure alignment with business objectives.
Created comprehensive documentation and implemented monitoring solutions for ML systems using Datadog and other observability tools.
Optimized database performance for large-scale applications, supporting SQL databases and NoSQL solutions to handle high-volume AI workflows.
SparkSnowflakeSQLPyTorchTensorFlowHugging Face TransformersDockerAWS EKSGCPAzureNoSQLLangChainLlamaIndexPineconeWeaviateLLMsCLIPDeep LearningKerasKubernetesBigQueryDBTDatabricksDatadog
3 months
2024-08 - 2024-10
Healthcare Chatbot
Lead Data ScientistLLMGenerative AIPython
Lead Data Scientist
Ideated and developed AI-driven product solutions tailored to client needs, leveraging and for LLM application development and knowledge integration.
Designed and deployed Retrieval Augmented Generation () pipelines to improve information retrieval and contextual responses, using and for vector search optimization.
Built and fine-tuned Large Language Models for chatbot applications
LLMGenerative AIPython
Healthcare
8 months
2024-01 - 2024-08
Probabilistic Attribution
Lead Data ScientistPythonLightGBMCausal Inference
Lead Data Scientist
Addressed third-party cookie deprecation by using probabilistic clustering in full funnel attribution, using to pipeline data to enhance conversion tracking and data accuracy.
Leveraged causal inference (double robust learners) to refine conversion tracking, boosting accuracy and stakeholder confidence on top of clustering output.
PythonLightGBMCausal Inference
3 years 8 months
2021-01 - 2024-08
Developed and architected very complex real-time ML systems
Senior Data Scientist
Senior Data Scientist
Developed and architected very complex real-time ML systems multiple times to production as a team.
Developed and managed an automated bidding model optimized to maximise revenue/margin, generating a 6% uplift in spend and decrease in CPA.
Developed and managed an automated text generation models (keywords, ad copy) using classic NLP and LLM techniques (self-hosting and external, prompt engineering, persuasion techniques).
Led Research Project on Probabilistic Attribution: Addressed third-party cookie deprecation by using probabilistic clustering in full funnel attribution, using Spark to pipeline data to enhance conversion tracking and data accuracy.
Advanced Attribution with Causal Inference: Leveraged causal inference (double robust learners ? econml) to refine conversion tracking, boosting accuracy and stakeholder confidence on top of clustering output.
Introduced ?campaign pacing? as a KPI: Developed, A/B tested and deployed with Docker/Kubernetes a linear regressionbased algorithm, achieving a 3.7% YoY revenue increase and enhancing pacing KPIs by 35%.
Successfully led A/B testing for new product features, driving improvement in core company KPIs.
Collaborated with cross-functional teams to enable data-driven decision-making and stakeholder buy-in.
Recruited elements for the DS team.
RADANCY
Remote DE
4 months
2024-01 - 2024-04
Campaign Pacing Optimization
Lead Data Scientist
Lead Data Scientist
Introduced ?campaign pacing? as a KPI: Developed, A/B
tested and deployed with Docker/Kubernetes a linear regression-based algorithm,
achieving a 3.7% YoY revenue increase and enhancing pacing KPIs by 35%.
Successfully led A/B testing for new product features,
driving improvement in core company KPIs.
Collaborated with cross-functional teams to enable
data-driven decision-making and stakeholder buy-in.
3 months
2023-04 - 2023-06
LLM Fine Tuning
Lead Data ScientistPythonPytorchLLM...
Lead Data Scientist
Deployed generative AI models in production, focusing on
quantization, inference optimization, and deployment using PyTorch, TensorFlow,
and Hugging Face Transformers.
Applied computer vision techniques for creative tagging
and performance analysis, leveraging CLIP, Deep Learning, and Keras, driving
campaign ROI improvements.
Designed and implemented MLOps workflows, including
monitoring, versioning, and scaling AI systems with Kubernetes, Docker, and
cloud platforms like AWS EKS, GCP, and Azure.
Developed robust data pipelines for real-time analytics
and machine learning workflows, integrating BigQuery, DBT, and Databricks to
enhance performance and scalability.
Developed and architected very complex real-time ML systems multiple times to production as a team.
Developed and managed an model to /margin, generating a 6% in spend and decrease in .
Developed and managed an models (keywords, ad copy) using classic NLP and LLM techniques (self-hosting and external, prompt engineering, persuasion techniques).
PythonNLPLLM
1 year 1 month
2022-01 - 2023-01
Risk Adjusted Portfolio Optimization
Lead Data Scientist
Lead Data Scientist
Developed a system to optimize portfolio risk: key risk
KPIs (implied volatility, maximum drawdown) decreased on average 7.6%.
Delivered a system for option pricing using deep learning,
leading to an improvement in average daily returns of 2.7%.
Managed end-to-end data processing, enhancing system
performance through effective sourcing, preprocessing, and partitioning for model
training and inference.
Utilized NLP techniques to analyze tweets, creating a
feature store (embedding) for machine learning models that enhanced stock
market understanding.
Developed a back and forward testing framework.
8 months
2020-06 - 2021-01
Developed a system to optimize portfolio risk
Data Scientist
Data Scientist
Developed a system to optimize portfolio risk: key risk KPIs (implied volatility, maximum drawdown) decreased on average 7.6%.
Delivered a system for option pricing using deep learning, leading to an improvement in average daily returns of 2.7%.
Managed end-to-end data processing, enhancing system performance through effective sourcing, preprocessing, and partitioning for model training and inference.
Utilized NLP techniques to analyze tweets, creating a feature store (embedding) for machine learning models that enhanced stock market understanding.
Developed a back and forward testing framework.
AXOVISION
1 year 8 months
2018-11 - 2020-06
develop a data pipeline
Junior Data Scientist
Junior Data Scientist
Collaborated with the team to develop a data pipeline for IoT data from vehicles (around 10.000 vehicles) amounting to around 100GB daily vehicle sensor data using Spark Scala from scratch.
Developed a linear-regression model for driving ranking based on fuel efficiency.
Started the groundwork for a predictive maintenance system, doing extensive data analysis and connected stakeholders from non-technical backgrounds.
DAIMLER
Lisbon, PT
4 months
2016-06 - 2016-09
various data cleaning procedures
Data Scientist (Intern)PythonPandasAirflow...
Data Scientist (Intern)
Collaborated with the team to engineer various data cleaning procedures using Python, Pandas and Airflow of financial and transactional data.
Developed a NER system to identify key information in financial documents and PDFs, using OCR and Python.
PythonPandasAirflowPDFsOCR
EY
Dublin, IE
Aus- und Weiterbildung
Aus- und Weiterbildung
2017
CS Machine Learning & Robotics
MSc
Instituto Superior Técnico, University of Lisbon, Lisbon,PT
Kompetenzen
Kompetenzen
Top-Skills
Data Scientist
Produkte / Standards / Erfahrungen / Methoden
Technical:
Pandas
Scikit-learn
Keras
Tensorflow
CLTV
Churn
Python
BigQuery
DBT
Snowflake
Dash
Model Interpretability
Tree models
AWS
GCP
Databricks
Scala
pySpark
Spark
Kubernetes
Docker
Excel
Datadog
Looker
Tableau
A/B Testing
KPI
Optimization
MLIP
LP
Deep Learning
XGBoost
LightGBM
Vertrauen Sie auf Randstad
Im Bereich Freelancing
Im Bereich Arbeitnehmerüberlassung / Personalvermittlung