Achille

data scientist,
ai engineer,
machine learning engineer

curious,
growth mindset,
detail-oriented

raises thoughtful questions to challenge norms

Projects

Safran Aircraft Engines Fuel Forecasting


Project Image

Collaborative 5-month supervised project with Safran.
Analysis of engine data from over 3,000 flights across 3 different aircraft to identify key factors influencing fuel consumption, and feature engineering. Predictive models were developed to estimate overall fuel usage and consumption by specific flight phases (taxi, climb, cruise, and descent). Performances were evaluated using a relative IQR error score, achieving a 9.23% error for the multiphase model and 16% for the global model.

View Full Report Here PDF Icon


Agile CRISP-DM Python OOP Statsmodels Git GitHub

BeautiRAG - Multi-modal Agentic RAG Chatbot


Project Image

This project is a multi-modal Agentic RAG application that processes text, PDFs, images, and audio, using Next.js, FastAPI, and LangChain. It extracts text from images and audio using Tesseract (OCR) and OpenAI Whisper respectively, generates embeddings and enables semantic search with FAISS, allowing to chat with documents using LLMs. The whole app can run locally with Docker containers defined, making deployment seamless.

View Full Project Here


RAG LangChain FAISS FastAPI backend Docker Docker Compose Python Next.js Tailwind CSS React Pydantic

AI Agents for Financial Monitoring


This project utilizes the Agno framework to create a multi-agent workflow that provides real-time financial insights. It features agents for web news retrieval, historical stock data from Yahoo Finance, chart visualization, and HTML report compilation. The project combines real-time data analysis with web search, offering a user-friendly Streamlit interface and persistent data storage via SQLite.

View Full Project Here


Agentic AI Multi-Agent Workflow Prompt Engineering OpenAI Streamlit Agno Phidata SQLite Pydantic

Predictive Modeling of Accident Impacts


Project Image

This project focuses on predicting the duration of traffic accidents impacts on traffic flow to facilitate improved decision-making and risk assessment. Using a dataset containing over 7 million data points, predictive models were trained and evaluated to analyze how accidents influence traffic congestion and flow patterns. The model could provide valuable insights into expected traffic delays, optimizing response strategies and minimizing disruption during accident scenarios. A comprehensive report was produced to present the findings and insights.

View Full Report Here PDF Icon


Python PySpark AWS EMR AWS EC2 ML Pipelines Doc2Vec Random Forest XGBoost CatBoost LightGBM Git GitHub

SpotifAI


Project Image

A Python-based automation tool that integrates with the Spotify Web API to analyze user preferences and generate personalized music recommendations using AI model (OpenAI API or any other model via OpenRouter). Each execution intelligently selects a user-specified number of fresh, never-before-suggested tracks, automatically updating your playlist with every run. Run it once or ten times a day, your playlist evolves with you.

View Full Project Here


Python API integration Automation Prompt Engineering OpenAI

Improving Predictions: Impact of Outliers


Project Image

The purpose of this project is to present a comprehensive analysis on the impact of anomalies on the effectiveness of a prediction model. The study involves exploratory data analysis, regression modeling, handling of outliers and multicollinearity, and the application of machine learning algorithms. The final model achieves a significant improvement in predictive accuracy, demonstrating the efficacy of non-linear approaches like Gradient Boosting models over traditional regression models. A comprehensive report was produced to present the findings and insights. The final model can be tested through a user-facing API designed with FastAPI and containerized with Docker.

View Full Report Here PDF Icon


Python Random Forest XGBoost Catboost LightGBM HTML CSS JavaScript FastAPI Pydantic Streamlit Git GitHub Docker

Work Experience

Astek Group

My Role

R&D AI Engineer

Time Frame

May - Nov, 2024

Context


The internship focused on developing a Proof of Concept (PoC) to use Large Language Models (LLMs) for transcribing text into Facile à Lire et à Comprendre (FALC), a simplified language designed for individuals with intellectual disabilities. This project aimed to improve information accessibility and was carried out in collaboration with KILÉMA publishing house.

Problem


Traditional text simplification methods and models lacked the precision and adaptability to consistently produce accessible content in FALC. Automated evaluation metrics were insufficient to accurately measure the relevance and usability of the simplified text.

Goal


To create a PoC capable of generating FALC transcriptions using LLMs. This involved prompt engineering, fine-tuning LLaMA 3.1 model, and establishing a more relevant manual and quantitative evaluation framework to guide development.

Outcome


Successfully developed and presented a PoC validated at 90% by the stakeholders of KILÉMA publishing house, leading to the development of a minimum viable product (MVP). The project introduced a new manual, quantitative evaluation method that was 50% more relevant than existing automatic metrics. The PoC achieved a SARI score of 49.79 and a BERTScore of 0.8, demonstrating strong performance.

Skills

Python OOP PyTorch Linux OS Git GitLab Docker AWS SageMaker AWS ECS OpenAI Prompt Engineering Ollama HuggingFace

PyxiScience

My Role

Python Engineer

Time Frame

Apr - Aug, 2023

Context


The internship project centered on developing an innovative interactive platform to enhance the learning experience for undergraduate mathematics students.

Goal


To create a mathematics learning platform capable of providing guided exercises with automated solutions and varying difficulty levels, helping students gain a deeper understanding of mathematical concepts.

Outcome


Successfully designed and implemented algorithms for generating exercises of varying difficulty levels, offering step-by-step solutions using PythonTeX and LaTeX. Collaborated in an Agile team with effective sprint planning and communication while developing the backend using FastAPI and managing a PostgreSQL database.

Skills

Agile SCRUM Python Git FastAPI JavaScript PostgreSQL PythonTeX LaTeX

Intellect Academy

My Role

Data Scientist

Time Frame

May - Aug, 2022

Context


The internship involved design a document classification system using Machine Learning and Natural Language Processing (NLP) techniques. The system targeted improved efficiency in document handling for the organization.

Goal


To develop a robust classification system using ML algorithms, implement KPIs for performance tracking, and ensure seamless integration into the organization's infrastructure.

Outcome


Implemented preprocessing with NLTK, trained a Doc2Vec model for embeddings, and built a classification model with Scikit-Learn, achieving 95% accuracy. An API was built with API Gateway for deployment through AWS SageMaker, reducing document selection time by 30% and enhancing overall productivity.

Skills

Python NLP Doc2Vec Git FastAPI Docker AWS Cloud API Gateway