diff --git a/docs/source/en/main_classes/agent.mdx b/docs/source/en/main_classes/agent.mdx index ee910b893b6d0e..953857c410cbba 100644 --- a/docs/source/en/main_classes/agent.mdx +++ b/docs/source/en/main_classes/agent.mdx @@ -19,7 +19,7 @@ can vary as the APIs or underlying models are prone to change. -To learn more about agents and tools make sure to read the [introductory guide](../agents_and_tools). This page +To learn more about agents and tools make sure to read the [introductory guide](../transformers_agents). This page contains the API docs for the underlying classes. ## Agents diff --git a/docs/source/en/transformers_agents.mdx b/docs/source/en/transformers_agents.mdx index 514e34c30b3158..9a45583b2ca27a 100644 --- a/docs/source/en/transformers_agents.mdx +++ b/docs/source/en/transformers_agents.mdx @@ -256,16 +256,16 @@ with the code generated by the agent. We identify a set of tools that can empower such agents. Here is an updated list of the tools we have integrated in `transformers`: -- **Document question answering**: given a document (such as a PDF) in image format, answer a question on this document ([Donut](../model_doc/donut)) -- **Text question answering**: given a long text and a question, answer the question in the text ([Flan-T5](../model_doc/flan-t5)) -- **Unconditional image captioning**: Caption the image! ([BLIP](../model_doc/blip)) -- **Image question answering**: given an image, answer a question on this image ([VILT](../model_doc/vilt)) -- **Image segmentation**: given an image and a prompt, output the segmentation mask of that prompt ([CLIPSeg](../model_doc/clipseg)) -- **Speech to text**: given an audio recording of a person talking, transcribe the speech into text ([Whisper](../model_doc/whisper)) -- **Text to speech**: convert text to speech ([SpeechT5](../model_doc/speecht5)) -- **Zero-shot text classification**: given a text and a list of labels, identify to which label the text corresponds the most ([BART](../model_doc/bart)) -- **Text summarization**: summarize a long text in one or a few sentences ([BART](../model_doc/bart)) -- **Translation**: translate the text into a given language ([NLLB](../model_doc/nllb)) +- **Document question answering**: given a document (such as a PDF) in image format, answer a question on this document ([Donut](./model_doc/donut)) +- **Text question answering**: given a long text and a question, answer the question in the text ([Flan-T5](./model_doc/flan-t5)) +- **Unconditional image captioning**: Caption the image! ([BLIP](./model_doc/blip)) +- **Image question answering**: given an image, answer a question on this image ([VILT](./model_doc/vilt)) +- **Image segmentation**: given an image and a prompt, output the segmentation mask of that prompt ([CLIPSeg](./model_doc/clipseg)) +- **Speech to text**: given an audio recording of a person talking, transcribe the speech into text ([Whisper](./model_doc/whisper)) +- **Text to speech**: convert text to speech ([SpeechT5](./model_doc/speecht5)) +- **Zero-shot text classification**: given a text and a list of labels, identify to which label the text corresponds the most ([BART](./model_doc/bart)) +- **Text summarization**: summarize a long text in one or a few sentences ([BART](./model_doc/bart)) +- **Translation**: translate the text into a given language ([NLLB](./model_doc/nllb)) These tools have an integration in transformers, and can be used manually as well, for example: