Welcome to my AI Alignment Technical Research repository! This repository represents my active learning and technical research in the field of AI alignment. It covers a broad range of topics that provide essential insights before delving deeper into the challenge of aligning AI systems with human values and ethical standards.
AI alignment is essential for ensuring that AI systems behave in ways that are beneficial to humans and aligned with our goals. This repository explores key technical areas such as adversarial AI, explainable AI, and interpretable machine learning, forming the foundation of my research into human-aligned AI systems.
The repository is divided into several key areas:
- Techniques: FGSM, PGD, C&W, DeepFool, Few Pixel, Patch
- A detailed exploration of adversarial attack methods and defenses, which are critical for ensuring the robustness and reliability of AI systems.
- Techniques: Integrated Gradients, Attention, BERT
- Investigates methods that help make deep learning models more transparent and interpretable, allowing for better alignment with human reasoning and goals.
- Techniques: BERT & LIME
- Focuses on explainability techniques applied to natural language processing, ensuring that model outputs are understandable to humans.
- Techniques: Partial Dependence Plots (PDP), Individual Conditional Expectation (ICE), Accumulated Local Effects (ALE)
- A compilation of model-agnostic techniques that aid in understanding and explaining machine learning models.
- Algorithms: C4.5, Ruleset, TAO, Linear Regression, Logistic Regression, Generalized Additive Models (GAM)
- A focus on interpretable models, crucial for ensuring that AI systems can be trusted and understood in high-stakes applications.
- Technique: Grad-CAM
- This section explores the use of Grad-CAM for aligning deep learning models with human expectations by visualizing model focus areas.
This repository is part of my continuous journey into AI alignment research, and I welcome contributions, feedback, and collaborations from anyone interested in creating human-aligned AI.