-
-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Image Handling #719
Conversation
Here's an example of some of the clutter this is dealing with from # before
depth_image = PIL.ImageOps.flip(PIL.Image.fromarray(np.uint8(depth * 255)).convert('L')).resize(rounded_size) if depth is not None else None
init_image = None if image is None else (PIL.Image.open(image) if isinstance(image, str) else PIL.Image.fromarray(image.astype(np.uint8))).convert('RGB').resize(rounded_size)
# after
depth = image_to_np(depth, mode="L", size=rounded_size)
image = image_to_np(image, mode="RGB", size=rounded_size) So much easier to read through. |
@carson-katri I'd like to standardize how images are shared between the render engine nodes. I propose keeping the images flipped like Blender naturally has it and switching the color space to linear so that image operations would match how they occur in other node editors. Also do you think the amount of channels should be standardized to 4 or allow anything between 1-4? |
@NullSenseStudio Sorry for the delayed response. Other node editors use Color sockets for 4 channel images, and Float sockets for 1 channel, and I don't think there are any other options for 2/3 channels. So I'd say our nodes should always have 4 channels for images, and 1 channel for other 2d arrays (like depth operations). |
With the render engine now outputting images in linear color space you won't have to change the color management display device to none for accurate viewing, which has since been removed in Blender 4.0. Resize and image file nodes can be used in earlier versions, but not with as good of resize sampling or file compatibility. Dynamic sockets are fixed for Blender 4.0. Getting sockets by their string name is no longer supported when they are disabled: https://projects.blender.org/blender/blender/commit/e4ad58114b9d56fe838396a97fe09aff32c79c6a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Summary
Image handling on the backend is currently quite a mess: images have to be flipped before returning, converted between PIL and numpy often (despite diffusers being able to input and output ndarrays), and depth maps need flipped before use and are received in float32 rather than uint8 unlike any other image. This PR aims to simplify all of this with the new
image_utils
module.Details
Images received by the backend will be in float32 RGBA format and won't require any flipping upon receiving or outputting an image (unless if there is some library that would require it flipped like Blender does). Depth maps for use in depth to image (not depth control net) will be in float32 grayscale without a channel dimension. This allows all images to be close enough to what diffusers can handle as images with minimal preprocessing on dream textures' side. Usually this just involves removing the alpha channel, extracting alpha as an inpaint mask, or resizing to certain dimensions. For custom backends that may require PIL images there's an extra
image_utils.np_to_pil()
function that'll handle conversion without all the code clutter. The diffusers backend is now primarily usingimage_utils.image_to_np()
for most of its needs. It acts as an all-in-one function that supports inputting various image types or file paths and calls upon otherimage_utils
functions determined by its kwargs.Returned images won't have to follow as rigid of a requirement. DType can be any floating point or integer type, as long as it's using its proper type range (
int(0) = float(0)
,int.max = float(1)
). Channels don't matter: can be grayscale, RGB, with or without alpha.The frontend's code is simplified with
image_utils.bpy_to_np()
andimage_utils.np_to_bpy()
. Both functions will flip the image and include handling color spaces.image_utils.np_to_bpy(..., float_buffer=True)
while currently unused, would allow for saving higher color precision and support potential future HDRI models.Drawbacks
Using numpy directly instead of converting to PIL can cause issues without very obvious causes.
I've noticed that having values barely below 0 or above 1 has a very bad effect on image to image saturation. Certain resizing methods and color transforms can cause values to shift slightly outside of this range, which is not a problem for PIL due to limited precision. Also not removing the alpha channel before giving the image to diffusers will lead to it being used directly as latents instead of going through encoding first. Normally causes an out of memory error, though I'm sure if someone had enough memory it would lead to strange results.