Mateusz Dalba

Data Science | Machine Learning | AI Engineering

Mateusz Dalba

About Me

With over five years of experience in field of Artificial Intelligence, I have built up my skills and expertise across various roles, including Data Scientist, Machine Learning Engineer, and AI Engineer.

Specializing in the development of advanced machine learning models, I focus on creating solutions that are not only innovative but also practical and impactful.

Education

I hold a Bachelor's degree in Quantitative Methods in Economics and Information Systems from the Warsaw School of Economics.

I also completed a Master's degree in Advanced Data Analytics - Big Data, where my thesis focused on "Deep Learning Poisoning - Backdoor Attacks."

Curriculum Vitae

Experience

This is my up-to-date career timeline.

2024.02 - present

AI Engineer

Developing various AI/ML products for Finnish company in glass manufacturing industry.

2022.03 - 2024.06

Senior Machine Learning Engineer

As a member of the AI Engineering team, I have implemented and consulted on various advanced analytics projects across different industries.

2023.05 - 2023.12

Computer Vision Engineer

Designed and implemented a CV module from scratch in inteligent shopping carts product.

2022.05 - present

Data Science Mentor

As a mentor at Future Collars, I taught multiple groups of learners through comprehensive courses on Python, SQL, and Machine Learning.

2020.08 - 2022.01

Data Scientist

Data Scientist in Research and Development team working on various AI/ML related projects. This was my first job as a Data Scientist which started as non-paid internship.

2018.05 - 2018.12

Junior Statistical Programmer

Member of Statistical Programming Team. This was my first job in data industry.

Commercial Projects

Here are some of the significant commercial projects I have worked on.
Due to the commercial nature of these projects, I am limited in the details I can share.
However, I have highlighted my key contributions to each project.

Technical Chatbot - Automating Support Team

Overview: A technical chatbot designed to assist and automate the support team’s operations.

Objectives: To reduce the response time for common customer queries and to automate repetitive support tasks.

Data: Jira, Instructions, Confluence, any relevant documentation in various formats.

Modeling: Fine-tuned GPT models using OpenAI. Prompt Engineering. Custom ML models trained with HuggingFace and Scikit-Learn.

Technology: NLP, OCR, Python, OpenAI, Flask, Streamlit, HuggingFace, Numpy, Scikit-Learn, Azure Cloud.

Challenges: Poor data quality. Handling diverse customer queries, ensuring seamless integration with existing systems, and maintaining high accuracy in responses.

Outcomes: Significant reduction in response time, increased customer satisfaction, and freed up support team members for more complex tasks.

"Auto ML" - Software to automate Data Science team repetitive tasks.

Overview: An application designed to automate routine tasks performed by the data science team.

Objectives: Streamline workflows, reduce manual effort, and increase overall efficiency.

Data: CSV, Excel files, MySQL, Snowflake, data analytics methodologies extracted from Jupyter Notebooks.

Modeling: Supervised ML, Unsupervised ML, Neural Networks, Statistics, Advanced Data Analytics.

Technology: Python, Scikit-Learn, Streamlit, Django, Keras, Matplotlib, Seaborn, Plotly, Pandas, Scipy, AWS Cloud.

Challenges: Dynamically changing methodologies. Ensuring compatibility with existing workflows and handling diverse data sources.

Outcomes: Increased productivity, reduced errors, and more time for data scientists to focus on complex analysis.

Demand Forecasting Application in Energy Industry

Overview: A forecasting application designed to predict demand for energy products accurately.

Objectives: To enhance the accuracy of energy demand forecasts and optimize resource allocation.

Data: Records of historical demand.

Modeling: Time Series Analysis and Forecasting. Regression models. Neural Networks. Statistics.

Technology: Python, Pandas, Streamlit, Flask, Azure Cloud, Sklearn, Statsmodels, Skforecast, Plotly, Seaborn, Matplotlib, Scipy.

Challenges: Forecasts evaluation, XAI (models interpretability). Dealing with seasonality and trends, integrating with existing systems.

Outcomes: Developed application was embedded into main forecasting process in demand planning team. Improved demand forecasting accuracy, better resource management, and cost savings.

Intelligent Shopping Carts

Overview: A computer vision module for an intelligent autonomous shopping cart.

Objectives: To enable the shopping cart to recognize and track products autonomously.

Data: Images of products taken from shopping cart camera.

Modeling: Object detection, object recognition, semantic segmentation, Deep Neural Networks, YOLO.

Technology: Python, OpenCV, Keras, Tensorflow, Flask, PIL, YOLO, Numpy, Skimage, Matplotlib, Pandas.

Challenges: Lack of additional camera to get the "depth" of cart. "Wild" environment and dynamic nature of data (supermarket). Real-time image processing, handling various lighting conditions, and ensuring high accuracy in product recognition.

Outcomes: Enhanced shopping experience, reduced checkout time, and increased customer satisfaction.

Detection of Polish banknotes - AI assistant

Overview: A computer vision software to detect Polish banknotes. Mainly used by blind people.

Objectives: To accurately detect Polish banknotes in real-time.

Data: Self gathered dataset of Polish banknotes. Created artificial samples of data.

Modeling: Multiple custom trained Deep Neural Networks.

Technology: Python, OpenCV, Numpy, Tensorflow, Keras, Pandas, Matplotlib, Skimage, Tflite.

Challenges: Hard edge cases, managing different conditions of banknotes and handling various lighting conditions.

Outcomes: Blind people can recognize what types of banknotes they have in their wallets.

Professional Racing - driving styles optimization application

Overview: An application designed to optimize racing strategies for professional and amateurs drivers.

Objectives: To analyze race data and provide actionable insights for strategy optimization.

Data: Telemetry data. Car sensors data. Data taken from simulators and real cars. Expert knowledge.

Modeling: Supervised ML. Unsupervised ML. Neural Networks. Statistics.

Technology: Streamlit, Flask, Python, Scikit-Learn, Keras, Tensorflow, Numpy, Matplotlib, Seaborn, Plotly, Scipy.

Challenges: Transfering knowledge from simulators to real life scenarios. Real-time data analysis, handling large volumes of data, and integrating with racing telemetry systems.

Outcomes: Enhanced racing strategies, improved race performance, and data-driven decision making.

Virtual Fitting Room

Overview: A virtual fitting room application that allows users to try on clothes digitally.

Objectives: To enhance online shopping by providing an immersive fitting room experience, enabling users to visualize how clothing items will look on them before making a purchase.

Data: VITON dataset. Human faces datasets (e.g. CELEBa). Human postures datasets. Data scrapping.

Modeling: Semantic segmentation. Generative Adversarial Networks (GANs). Human key-points detection neural networks.

Technology: Python, Tensorflow, Keras, OpenCV, Skimage, Numpy, Cloud computing.

Challenges: Lack of data. Accurate try-on models prediction with least amount of artifacts.

Outcomes: Increased user engagement, reduced return rates, and improved online shopping experience.

NLP based projects - Sentiment Analysis, Keywords Extraction, Semantic Similarity.

Overview: A suite of NLP projects aimed at enhancing support efficiency and understanding customer sentiments through advanced text analysis techniques.

Objectives: To improve support operations by extracting and analyzing keywords and semantic similarities between projects, as well as to gauge customer sentiments and perceptions through feedback analysis.

Data: Project documents, support tickets, customer feedback, surveys, and social media comments.

Modeling: NLP modeling - semantic similarity, sentiment analysis, keywords extraction.

Technology: DataBricks, Python, PySpark, HuggingFace, Numpy, Scikit-Learn, Keras, Tensorflow.

Challenges: Managing and analyzing diverse and large volumes of data, ensuring accurate semantic analysis, and integrating insights into existing systems for actionable results.

Outcomes: Enhanced support efficiency through faster case resolution and relevant case matching, improved understanding of customer sentiments, actionable insights for business strategies, and better customer relationship management.

Mini Data Science Intro Course

Overview:I've created an introductory course on data analytics and data science from scratch. Course was divided into 10 lessons.

Objectives: To provide a comprehensive introduction to data analytics and data science principles and practices.

Technology: Python, Jupyter Notebooks, Data Visualization, Pandas, Numpy, Scikit-learn, Tensorflow, Keras, Streamlit.

Challenges: Creating engaging content, ensuring accessibility, and covering a wide range of topics comprehensively.

Outcomes: Positive feedback from learners, and increased understanding of data science concepts.

Other small and medium-size AI/ML related projects

Overview: Various small and medium-sized projects related to data science.

Objectives: To tackle diverse data science challenges and deliver practical solutions.

Technology: Python, R, Data Visualization tools, and Machine Learning libraries, Data Bricks, Azure Cloud, AWS Cloud, Google Cloud.

Challenges: Handling diverse project requirements, managing limited resources, and ensuring timely delivery.

Outcomes: Successful project completions, satisfied clients, and practical solutions to data-related problems.

Drawing Recognition Demo

Draw one of these: airplane, automobile, bird, cat, deer, dog, frog, horse, ship or truck and let the neural network recognize your drawing!

How It Works

The drawing recognition demo uses a neural network model trained on the CIFAR-10 dataset. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 classes, with 6,000 images per class.

When you draw something on the canvas, the drawing is sent to a VGG-19 neural network model that has been fine-tuned for this task. The model processes your drawing and predicts which of the 10 categories (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck) it most closely resembles.

The prediction result is displayed below the canvas.

CIFAR-10 Categories

Cifar10

Recent Drawings Gallery

Services

I offer a range of professional services in Data Science, Machine Learning, and Artificial Intelligence.
Here are the key services I provide:

Business Inquiries

If you are interested in any of the services listed above or have other inquiries, please fill out the form below.
I look forward to discussing how I can help you achieve your goals.