Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to generate images in the Media Inserter #535

Merged
merged 8 commits into from
Aug 10, 2023

Conversation

dkotter
Copy link
Collaborator

@dkotter dkotter commented Jul 13, 2023

Description of the Change

In WordPress 6.2, a new Media tab was added to the Inserter, allowing easy access to add existing media items to your content. A follow up added Openverse support here but didn't add any ability for other external integrations.

There now is a public API that can be used to add your own Inserter categories. This is now in Gutenberg 16.2 and should be in WordPress 6.3.

This PR integrates ClassifAI's image generation feature to this Inserter. When image generation is enabled through the OpenAI DALLE service, we register a new inserter category using the new registerInserterMediaCategory function. This then will take whatever text input that is entered and send that as a prompt to our custom endpoint, which then sends that off to OpenAI.

We then render those generated images and a user can then select one or more of those images to insert into their content, which will also download that image to your Media Library.

Media inserter
Media.Inserter.mov

Dev notes

We have limited control over this integration. Basically we can choose the text we show for the option (Generate images) and the text in the input (Enter a prompt). We then have control over the fetch method, which in our case makes a request to our custom API endpoint. We have no control over when that method is called or what happens on image selection/insert.

Our fetch method gets called after text has been entered and a slight pause has happened. This means that unless you know exactly what prompt you want to enter and you type it fast, most likely we'll end up sending multiple requests off to OpenAI, which will end up costing you for each of those requests. At the moment there's no way for us to throttle those requests, as that happens from core Gutenberg code. Something we may want to call out, especially as generating images is the most expensive API request.

How to test the Change

  1. Ensure you have the Gutenberg plugin installed, v16.2+
  2. Checkout this branch and run npm install && npm run build
  3. Ensure you have the image generation feature configured
  4. Go to a post, open the Inserter and go to the Media tab
  5. Ensure you see a Generate image option there
  6. Click on that option and then enter a prompt
  7. Ensure images get generated
  8. Click on an image and ensure it gets imported and inserted
  9. Remove the Gutenberg plugin and test to ensure the feature no longer works but no console errors happen

Changelog Entry

Added - Ability to generate images within the Inserter Media tab. As of WordPress 6.3, requires the latest version of the Gutenberg plugin to work. Also note that image generation requests are sent as soon as you are done typing so you may end up making multiple requests as you type out your prompt (resulting in charges for each request), depending on the typing speed.

Credits

Props @dkotter, @jeffpaul

Checklist:

  • I agree to follow this project's Code of Conduct.
  • I have updated the documentation accordingly.
  • I have added tests to cover my change.
  • All new and existing tests pass.

@dkotter dkotter added this to the 2.3.0 milestone Jul 13, 2023
@dkotter dkotter requested a review from a team as a code owner July 13, 2023 17:47
@dkotter dkotter self-assigned this Jul 13, 2023
@dkotter dkotter requested a review from jeffpaul as a code owner July 13, 2023 17:47
@dkotter dkotter changed the title Feature/inserter media category Add ability to generate images into the Media Inserter Jul 13, 2023
@dkotter dkotter requested review from a team and peterwilsoncc and removed request for a team and jeffpaul July 13, 2023 18:37
@dkotter dkotter changed the title Add ability to generate images into the Media Inserter Add ability to generate images in the Media Inserter Jul 13, 2023
@peterwilsoncc
Copy link
Contributor

@dkotter I'll get in touch with Jeff during the week for some API keys so I can review this.

Copy link
Contributor

@peterwilsoncc peterwilsoncc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me and tests well (apparently the hook didn't make it to WP 6.3 so I needed to install Gutenberg).

Just one note inline to add some debouncing so we don't run up costs for people.

search_items: __( 'Enter a prompt', 'classifai' ),
},
mediaType: 'image',
fetch: async ( { search = '' } ) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be debounced a little, it looks like it makes multiple requests while typing in the prompt.

Screen Shot 2023-08-02 at 9 34 16 am

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned that in the PR description that if someone doesn't type super fast, it will make multiple requests. I'm struggling with figuring out how to add debouncing here, since debouncing already happens on the WordPress side (see https://github.com/WordPress/gutenberg/blob/trunk/packages/block-editor/src/components/inserter/hooks/use-debounced-input.js). Everything I've tried to add another layer of debouncing isn't working. Either I'm not sure how to make this work (very likely) or this may not be possible since the only thing we have control over is the actual fetch function, not when that function gets called.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example, if I add debounce around our fetch function:

fetch: debounce( async ( { search = '' } ) => {
    ...
}, 1000 ),

and leave everything else the same, it does properly debounce the requests but it does not properly render the results. It seems to always be one request behind. So if I enter an image prompt and wait for a second or two (for the debounce to kick in), I do see my request being made but no results are rendered. If I then change the image prompt, the previous results render immediately and then once the wait period is up, a new request is made (but the results from that aren't rendered unless I make changes to the prompt again).

Again, this may very well be something I'm doing wrong but could also be some limitations around how Core is doing the rendering here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry Darin, I missed this in the description.

As the feature didn't make it in to WP 6.3, is it worth adding a note that it requires the Gutenberg plugin/WP 6.4 or later to the changelog entry?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this didn't make it into WP 6.3 as this changed was added to Gutenberg in min-June. I think this is probably worth holding off on until this comes to WordPress itself, at least that's my opinion.

I also think it's worth discussing a bit more around how requests are made here. Since image generation requests are expensive, it's not great that you'll almost certainly end up making multiple requests as you enter an image prompt. I'm wondering if it's worth opening an issue on the Gutenberg side to see if we can get this feature modified a bit? I would love to be able to toggle between search happening as you type (the way it works now) or the ability to only search if someone clicks a submit button (which would solve our needs). Or if we can filter the default debounce value to make that higher, that also would help.

Not sure if either of those are changes they'll deem worth adding but may be a good conversation to start.

@jeffpaul
Copy link
Member

jeffpaul commented Aug 8, 2023

In discussing this with @dkotter in our 1:1 today, I'm fine with the following

  • merging/releasing this in the near term
  • adding a note in the readme and changelog to note that this feature, if prompts are entered in particularly slowly, could result in additional API requests and higher charges per image generation than expected (albeit we're talking cents here and not thousands of dollars but still worth calling out to some degree)
  • opening a follow-up issue here in ClassifAI to look at adding some sort of button click / other action to trigger the actual image generation once that's more feasible in Gutenberg
  • opening an upstream issue in Gutenberg to document this issue in hopes of getting the ability to add in said button click / other action to trigger this action for ClassifAI

@fabiankaegy curious if you've got any advice/recommendations on this as well?

@dkotter dkotter merged commit 99388b4 into develop Aug 10, 2023
@dkotter dkotter deleted the feature/inserter-media-category branch August 10, 2023 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants