Programming Projects

Programming Projects

1. Spero: A Chatbot for Mental Health Support

Spero is an NLP-based chatbot designed to provide personalized emotional support for individuals seeking mental health assistance. By leveraging Retrieval-Augmented Generation (RAG), Spero retrieves therapist-client transcript data to generate contextually accurate responses, ensuring a more human-like and empathetic interaction. To assess the chatbot's effectiveness, its responses are evaluated against real transcript data using BLEU and ROUGE scores. The project aims to bridge gaps in mental health care by offering 24/7 support, particularly for individuals facing accessibility barriers. Built using Python, Spero utilizes Pandas and NumPy for data preprocessing and OpenAI’s API for LLM-based text generation, combining AI-driven conversation modeling with real-world therapy transcript data.

2. Augmenting EEG Classification with GAN-Generated Data (BrainHack 2024)

This project develops a machine learning model to classify electroencephalogram (EEG) data recorded with a Muse headset. My team and I collected 200,000 datapoints from two 1-minute sessions per participant — one where we focused on human faces and the other on breathing. We trained a Random Forest classifier on the data, achieving 94.8% accuracy on unseen data. Additionally, we synthesized 800,000 new datapoints using a Generative Adversarial Network (GAN). Results were visualized using confusion matrices, ROC AUC curves, loss plots (generator vs discriminator), and learning curves.

3. Evaluating Algorithms for Gene Sequence Classification

This project compares Random Forest and Linear Regression algorithms for gene sequence classification using Cytochrome c oxidase subunit I (COI) sequences. After filtering the sequences (removing outliers based on length and ambiguous nucleotides), I trained both classifiers and benchmarked their performance, visualizing sequence length distribution, k-mer frequency proportions, and feature importance.

4. Road Sign Categorizer

This project develops a feed-forward neural network to categorize road sign images into 43 distinct categories. It leverages the 'tensorflow', 'scikit-learn', and 'os' libraries for data preprocessing and model building. By utilizing a convolutional neural network (CNN) with pooling layers, the model achieves a classification accuracy exceeding 90%.

5. Venom Type Classifier

This project uses the machine learning algorithm Support Vector Machine from the 'scikit-learn' library, to accurately predict snake venom types based on their protein compositions. It also includes features to visualize the dataset using a PCA plot, leveraging the 'matplotlib' and 'pandas' libraries. The project provides two sample datasets: one complete with both input (venom protein proportions) and output values (venom types) for training, and another incomplete dataset with only input values for model evaluation. By simplifying venom type classification, this tool can be used in biomedical venom research, and it offers potential for further refinement with additional training datasets.

6. SciRef

"SciRef" is a robust tool designed to streamline the creation of personalized scientific references using provided URLs. Building upon the functionality of a Python script that I developed previously, this application combines React as the frontend framework and Flask as the backend to offer a user-friendly interface. Within this interface, users have the flexibility to arrange scientific reference components in their preferred order, select a repository, and input the URL of a scientific journal article. Upon submission, the application instantly generates a customized scientific reference, ready for immediate use. Whether you're a researcher, student, or anyone in need of accurately citing online scientific sources, this program simplifies the process, making it accessible and efficient.

7. Colour Chaos!

"Colour Chaos!" is a colour-matching game created using the Pygame library in Python. The game displays words in different colours, and the player must type the correct colour of the word as quickly as possible. The game includes features like a title screen with instructions, a timer to track the player's completion time, a high score system, and a congratulatory message for achieving a new high score. Players also have the opportunity to restart the game and attempt to achieve a new high score after completing each round. Overall, this game aims to test the player's ability to quickly identify the colour of the displayed words and beat their previous best time to set new high scores.

8. Scientific Reference Customizer

This Python script serves as a command-line tool for generating formatted references to articles from PubMed. It takes a user-provided PubMed URL as input and extracts key citation information, such as the authors' names, article title, journal name, publication year, volume, page numbers, and DOI. The script allows users to customize the order of these citation components to suit their reference style preferences. By leveraging web scraping with the BeautifulSoup and regular expressions library, this project automates the tedious process of manually formatting citations, providing users with a convenient and efficient solution for generating accurate and properly formatted references for scientific articles.