Resources at the intersection of AI AND Art. Mainly tools and tutorials but also with some inspiring people and places thrown in too!
For a broader resource covering more general creative coding tools (that you might want to use with what is listed here), check out terkelg/awesome-creative-coding or thatcreativecode.page. For resources on AI and deep learning in general, check out ChristosChristofidis/awesome-deep-learning and https://github.com/dair-ai.
bold entries signify my favorite resource(s) for that section/subsection (if I HAD to choose a single resource). Additionally each subsection is usually ordered by specificity of content (most general listed first).
- Practical Deep Learning for Coders (fast.ai)
- Deep Learning (NYU)
- Introduction to Deep Learning (CMU)
- ⭐️ Deep Learning for Computer Vision (UMich)
- Deep Learning for Computer Vision (Stanford CS231n)
- Natural Language Processing with Deep Learning (Stanford CS224n)
- Deep Generative Models (Stanford)
- Deep Unsupervised Learning (UC Berkeley)
- Differentiable Inference and Generative Models (Toronto)
- ⭐️ Learning-Based Image Synthesis (CMU)
- Learning Discrete Latent Structure (Toronto)
- From Deep Learning Foundations to Stable Diffusion (fast.ai)
- ⭐️ Deep Learning for Art, Aesthetics, and Creativity (MIT)
- Machine Learning for the Web (ITP/NYU)
- Art and Machine Learning (CMU)
- New Media Installation: Art that Learns (CMU)
- Introduction to Computational Media (ITP/NYU)
- ⭐️ The AI that creates any picture you want, explained (Vox)
- I Created a Neural Network and Tried Teaching it to Recognize Doodles (Sebastian Lague)
- Neural Network Series (3Blue1Brown)
- Beginner's Guide to Machine Learning in JavaScript (Coding Train)
- Two Minute Papers
- ⭐️ Dive into Deep Learning (Zhang, Lipton, Li, and Smola)
- Deep Learning (Goodfellow, Bengio, and Courville)
- Computer Vision: Algorithms and Applications (Szeliski)
- Procedural Content Generation in Games (Shaker, Togelius, and Nelson)
- Generative Design (Benedikt Groß)
- ⭐️ VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance (Crowson and Biderman)
- Tutorial on Deep Generative Models (IJCAI-ECAI 2018)
- Tutorial on GANs (CVPR 2018)
- Lil'Log (Lilian Weng)
- Distill [on hiatus]
- ⭐️ Making Generative Art with Simple Mathematics
- Book of Shaders: Generative Designs
- Mike Bostock: Visualizing Algorithms (with Eyeo talk)
- Generative Examples in Processing
- Generative Music
- SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations: Paper predating Stable Diffusion describing a method for image synthesis and editing with diffusion based models.
- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
- High-Resolution Image Synthesis with Latent Diffusion Models: Original paper that introduced Stable Diffusion and started it all.
- Prompt-to-Prompt Image Editing with Cross-Attention Control: Edit Stable Diffusion outputs by editing the original prompt.
- An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion: Similar to prompt-to-prompt but instead takes an input image and a text description. Kinda like Style Transfer... but with Stable diffusion.
- DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation: Similar to Textual Inversion but instead focused on manipulating subject based images (i.e. this thing/person/etc. but underwater).
- Novel View Synthesis with Diffusion Models
- AudioGen: Textually Guided Audio Generation
- Make-A-Video: Text-to-Video Generation without Text-Video Data
- Imagic: Text-Based Real Image Editing with Diffusion Models
- MDM: Human Motion Diffusion Model
- Soft Diffusion: Score Matching for General Corruptions
- Multi-Concept Customization of Text-to-Image Diffusion: Like DreamBooth but capable of synthesizing multiple concepts.
- eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
- Elucidating the Design Space of Diffusion-Based Generative Models (EDM)
- Tackling the Generative Learning Trilemma with Denoising Diffusion GANs
- Imagen Video: High Definition Video Generation with Diffusion Models
- Structure-from-Motion Revisited: prior work on sparse modeling (still needed/useful for NeRF)
- Pixelwise View Selection for Unstructured Multi-View Stereo: prior work on dense modeling (NeRF kinda replaces this)
- DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation
- Deferred Neural Rendering: Image Synthesis using Neural Textures
- Neural Volumes: Learning Dynamic Renderable Volumes from Images
- ⭐️ NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis: The paper that started it all...
- Neural Radiance Fields for Unconstrained Photo Collections: NeRF in the wild (alternative to MVS)
- Nerfies: Deformable Neural Radiance Fields: Photorealistic NeRF from casual in-the-wild photos and videos (like from a cellphone)
- Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields: NeRF... but BETTER FASTER HARDER STRONGER
- Depth-supervised NeRF: Fewer Views and Faster Training for Free: Train NeRF models faster with fewer images by leveraging depth information
- Instant Neural Graphics Primitives with a Multiresolution Hash Encoding: caching for NeRF training to make it rlllly FAST
- Understanding Pure CLIP Guidance for Voxel Grid NeRF Models: text-to-3D using CLIP
- NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields: NeRF for robots (and cars)
- nerf2nerf: Pairwise Registration of Neural Radiance Fields: pretrained NeRF
- The One Where They Reconstructed 3D Humans and Environments in TV Shows
- ClimateNeRF: Physically-based Neural Rendering for Extreme Climate Synthesis
- Realistic one-shot mesh-based head avatars
- Neural Point Catacaustics for Novel-View Synthesis of Reflections
- 3D Moments from Near-Duplicate Photos
- NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as General Image Priors
- DreamFusion: Text-to-3D using 2D Diffusion (Google)
- ULIP: Learning Unified Representation of Language, Image and Point Cloud for 3D Understanding (Salesforce)
- Extracting Triangular 3D Models, Materials, and Lighting From Images (NVIDIA)
- GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images (NVIDIA)
- 3D Neural Field Generation using Triplane Diffusion
- 🎠 MagicPony: Learning Articulated 3D Animals in the Wild
- ObjectStitch: Generative Object Compositing (Adobe)
- LADIS: Language Disentanglement for 3D Shape Editing (Snap)
- Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion (Microsoft)
- SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation (Snap)
- DiffRF: Rendering-guided 3D Radiance Field Diffusion (Meta)
- Novel View Synthesis with Diffusion Models (Google)
- ⭐️ Magic3D: High-Resolution Text-to-3D Content Creation (NVIDIA)
- Sampling Generative Networks
- Neural Discrete Representation Learning (VQVAE)
- Progressive Growing of GANs for Improved Quality, Stability, and Variation
- A Style-Based Generator Architecture for Generative Adversarial Networks (StyleGAN)
- ⭐️ Analyzing and Improving the Image Quality of StyleGAN (StyleGAN2)
- Training Generative Adversarial Networks with Limited Data (StyleGAN2-ADA)
- Alias-Free Generative Adversarial Networks (StyleGAN3)
- Generating Diverse High-Fidelity Images with VQ-VAE-2
- Taming Transformers for High-Resolution Image Synthesis (VQGAN)
- Diffusion Models Beat GANs on Image Synthesis
- StyleNAT: Giving Each Head a New Perspective
- StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
- Image-to-Image Translation with Conditional Adversarial Nets (pix2pix)
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (CycleGAN)
- High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs (pix2pixHD)
- Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects (SESAME)
- Semantic Image Synthesis with Spatially-Adaptive Normalization (SPADE)
- You Only Need Adversarial Supervision for Semantic Image Synthesis (OASIS)
- Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
- Multimodal Conditional Image Synthesis with Product-of-Experts GANs
- Palette: Image-to-Image Diffusion Models
- Sketch-Guided Text-to-Image Diffusion Models
- HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation
- PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation
- MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation
- Pretraining is All You Need for Image-to-Image Translation (PITI)
- Generative Visual Manipulation on the Natural Image Manifold (iGAN)
- In-Domain GAN Inversion for Real Image Editing
- Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?
- Designing an Encoder for StyleGAN Image Manipulation
- Pivotal Tuning for Latent-based Editing of Real Images
- ⭐️ HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing
- StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
- High-Fidelity GAN Inversion for Image Attribute Editing
- Swapping Autoencoder for Deep Image Manipulation
- Sketch Your Own GAN
- Rewriting Geometric Rules of a GAN
- Anycost GANs for Interactive Image Synthesis and Editing
- Third Time’s the Charm? Image and Video Editing with StyleGAN3
- ⭐️ Discovering Interpretable GAN Controls (GANspace)
- Interpreting the Latent Space of GANs for Semantic Face Editing
- GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
- Unsupervised Extraction of StyleGAN Edit Directions (CLIP2StyleGAN)
- Seeing What a GAN Cannot Generate
- Deep Image Matting
- Background Matting: The World is Your Green Screen
- Robust Video Matting
- Semantic Image Matting
- Privacy-Preserving Portrait Matting
- Deep Automatic Natural Image Matting
- MatteFormer
- MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition
- ⭐️ Robust Human Matting via Semantic Guidance
- NVIDIA Imaginaire: 2D Image synthesis library
- NVIDIA Omniverse: The platform for creating and operating metaverse applications
- mmgeneration
- Modelverse: Content-Based Search for Deep Generative Models
- PaddleGAN
- FFCV: an Optimized Data Pipeline for Accelerating ML Training
- ONNX Runtime
- DeepSpeed (training, inference, compression)
- TensorRT
- Tensorflow Lite
- TorchScript
- TorchServe
- AITemplate
- ⭐️ Stable Diffusion
- Imagen
- DALLE 2
- VQGAN+CLIP
- Parti
- Muse: Text-To-Image Generation via Masked Generative Transformers: More efficient than diffusion or autoregressive text-to-image models used masked image modeling w/ transformers
- Dream Studio: Official Stability AI cloud hosted service.
- ⭐️ Stable Diffusion Web UI: A user friendly UI for SD with additional features to make common workflows easy.
- AI render (Blender): Render scenes in Blender using a text prompt.
- Dream Textures (Blender): Plugin to render textures, reference images, and background with SD.
- lexica.art - SD Prompt Search.
- koi (Krita): SD plugin for Krita for img2img generation.
- Alpaca (Photoshop): Photoshop plugin (beta).
- Christian Cantrell's Plugin (Photoshop): Another Photoshop plugin.
- Stable Diffusion Studio: Animation focused frontend for SD.
- DeepSpeed-MII: Low-latency and high-throughput inference for a variety (20,000+) models/tasks, including SD.
- LAION Datasets: Various very large scale image-text pairs datasets (notably used to train the open source Stable Diffusion models).
- LAION-Face
- Unsplash Images
- Pixabay
- Pexels
- Open Images: Open Images is a dataset of ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives:
- Mozilla Common Voice: 17,127 validated hours of transcribed speech covering 104 languages. Additionally many of the recorded hours in the dataset also include demographic metadata like age, sex, and accent that can help improve the accuracy of speech recognition engines.
- Flickr Commons: Flickr Commons is a unique collection of historical photography from over 100 cultural institutions from all around the world, all with no known copyright restrictions.
- Internet Archive: Internet Archive is a non-profit library of millions of free books, movies, software, music, websites, and more.
- Wikimedia Commons: a collection of 106,323,506 freely usable media files to which anyone can contribute.
- Prelinger Archives
- Getty Library Open Content Program: Making images from Getty’s collections freely available for study, teaching, and enjoyment.
- Smithsonian Open Access
- Public Domain Review: Focused on works now fallen into the public domain, the vast commons of out-of-copyright material that everyone is free to enjoy, share, and build upon without restrictions.
- Library of Congress
- Biodiversity Heritage Library
- The Met Open Access
- The National Gallery of Art Open Access
- Art Institute of Chicago Open Access
- NY Public Library Public Domain Collections
- Museum für Kunst und Gewerbe Hamburg Steintorplatz
- FairFace
- Conceptual Captions
- Quick, Draw!
- Open Images
- Visual Question Answering
- TensorFlow Flowers
- Stanford Online Products dataset
- DeepMind 3d Shapes
- PASS: An ImageNet replacement for self-supervised pretraining without humans which can be used for high-quality pretraining while significantly reducing privacy concerns.
- Labeled Faces in the Wild (LFW)
- CelebA
- LFWA+
- CelebAMask-HQ
- CelebA-Spoof
- UTKFace
- SSHQ: full body 1024 x 512px
- Artbreeder
- Midjourney
- DALLE 2 (OpenAI)
- Runway - AI powered video editor.
- Facet AI - AI powered image editor.
- Adobe Sensei - AI powered features for the Creative Cloud suite.
- NVIDIA AI Demos
- ClipDrop and cleanup.pictures
A non-exhaustive list of people doing interesting things at the intersection of art, ML, and design.
- Memo Akten
- Neural Bricolage (helena sarin)
- Sofia Crespo
- Lauren McCarthy
- Philipp Schmitt
- Anna Ridler
- Tom White
- Ivona Tau
- Trevor Paglen
- Sasha Stiles
- Mario Klingemann
- Tega Brain
- Mimi Onuoha
- Allison Parrish
- Caroline Sinders
- Robbie Barrat
- Kyle McDonald
- Golan Levin
- STUDIO for Creative Inquiry
- ITP @ NYU
- Gray Area Foundation for the Arts
- Stability AI (Eleuther, LAION, et al.)
- Goldsmiths @ University of London
- UCLA Design Media Arts
- Berkeley Center for New Media
- Google Artists and Machine Intelligence
- Google Creative Lab
- The Lab at the Google Cultural Institute
- Sony CSL (Tokyo and Paris)
- Machine Learning for Art
- Tools and Resources for AI Art (pharmapsychotic) - Big list of Google Colab notebooks for generative text-to-image techniques as well as general tools and resources.
- Awesome Generative Deep Art - A curated list of Generative Deep Art / Generative AI projects, tools, artworks, and models
Contributions are welcome! Read the contribution guidelines first.