Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Use use_auth_token in all cases when loading from the HF Hub #3094

Merged
merged 17 commits into from
Aug 25, 2022
Merged
7 changes: 6 additions & 1 deletion docs/_src/api/api/document_classifier.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ With this document_classifier, you can directly get predictions via predict()
#### TransformersDocumentClassifier.\_\_init\_\_

```python
def __init__(model_name_or_path: str = "bhadresh-savani/distilbert-base-uncased-emotion", model_version: Optional[str] = None, tokenizer: Optional[str] = None, use_gpu: bool = True, return_all_scores: bool = False, task: str = "text-classification", labels: Optional[List[str]] = None, batch_size: int = 16, classification_field: str = None, progress_bar: bool = True)
def __init__(model_name_or_path: str = "bhadresh-savani/distilbert-base-uncased-emotion", model_version: Optional[str] = None, tokenizer: Optional[str] = None, use_gpu: bool = True, return_all_scores: bool = False, task: str = "text-classification", labels: Optional[List[str]] = None, batch_size: int = 16, classification_field: str = None, progress_bar: bool = True, use_auth_token: Optional[Union[str, bool]] = None)
```

Load a text classification model from Transformers.
Expand Down Expand Up @@ -117,6 +117,11 @@ or an entailment.
- `batch_size`: Number of Documents to be processed at a time.
- `classification_field`: Name of Document's meta field to be used for classification. If left unset, Document.content is used by default.
- `progress_bar`: Whether to show a progress bar while processing.
- `use_auth_token`: The API token used to download private models from Huggingface.
If this parameter is set to `True`, then the token generated when running
`transformers-cli login` (stored in ~/.huggingface) will be used.
Additional information can be found here
https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretrained

<a id="transformers.TransformersDocumentClassifier.predict"></a>

Expand Down
7 changes: 6 additions & 1 deletion docs/_src/api/api/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ Print the evaluation results
#### semantic\_answer\_similarity

```python
def semantic_answer_similarity(predictions: List[List[str]], gold_labels: List[List[str]], sas_model_name_or_path: str = "sentence-transformers/paraphrase-multilingual-mpnet-base-v2", batch_size: int = 32, use_gpu: bool = True) -> Tuple[List[float], List[float], List[List[float]]]
def semantic_answer_similarity(predictions: List[List[str]], gold_labels: List[List[str]], sas_model_name_or_path: str = "sentence-transformers/paraphrase-multilingual-mpnet-base-v2", batch_size: int = 32, use_gpu: bool = True, use_auth_token: Optional[Union[str, bool]] = None) -> Tuple[List[float], List[float], List[List[float]]]
```

Computes Transformer-based similarity of predicted answer to gold labels to derive a more meaningful metric than EM or F1.
Expand All @@ -141,6 +141,11 @@ pointing to downloadable models.
- `batch_size`: Number of prediction label pairs to encode at once.
- `use_gpu`: Whether to use a GPU or the CPU for calculating semantic answer similarity.
Falls back to CPU if no GPU is available.
- `use_auth_token`: The API token used to download private models from Huggingface.
If this parameter is set to `True`, then the token generated when running
`transformers-cli login` (stored in ~/.huggingface) will be used.
Additional information can be found here
https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretrained

**Returns**:

Expand Down
5 changes: 5 additions & 0 deletions docs/_src/api/api/extractor.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ The entities extracted by this Node will populate Document.entities
- `use_gpu`: Whether to use the GPU or not.
- `batch_size`: The batch size to use for entity extraction.
- `progress_bar`: Whether to show a progress bar or not.
- `use_auth_token`: The API token used to download private models from Huggingface.
If this parameter is set to `True`, then the token generated when running
`transformers-cli login` (stored in ~/.huggingface) will be used.
Additional information can be found here
https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretrained

<a id="entity.EntityExtractor.run"></a>

Expand Down
18 changes: 15 additions & 3 deletions docs/_src/api/api/generator.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ i.e. the model can easily adjust to domain documents even after training has fin
#### RAGenerator.\_\_init\_\_

```python
def __init__(model_name_or_path: str = "facebook/rag-token-nq", model_version: Optional[str] = None, retriever: Optional[DensePassageRetriever] = None, generator_type: str = "token", top_k: int = 2, max_length: int = 200, min_length: int = 2, num_beams: int = 2, embed_title: bool = True, prefix: Optional[str] = None, use_gpu: bool = True, progress_bar: bool = True)
def __init__(model_name_or_path: str = "facebook/rag-token-nq", model_version: Optional[str] = None, retriever: Optional[DensePassageRetriever] = None, generator_type: str = "token", top_k: int = 2, max_length: int = 200, min_length: int = 2, num_beams: int = 2, embed_title: bool = True, prefix: Optional[str] = None, use_gpu: bool = True, progress_bar: bool = True, use_auth_token: Optional[Union[str, bool]] = None)
```

Load a RAG model from Transformers along with passage_embedding_model.
Expand All @@ -160,6 +160,12 @@ See https://huggingface.co/models for full list of available models.
- `embed_title`: Embedded the title of passage while generating embedding
- `prefix`: The prefix used by the generator's tokenizer.
- `use_gpu`: Whether to use GPU. Falls back on CPU if no GPU is available.
- `progress_bar`: Whether to show a tqdm progress bar or not.
- `use_auth_token`: The API token used to download private models from Huggingface.
If this parameter is set to `True`, then the token generated when running
`transformers-cli login` (stored in ~/.huggingface) will be used.
Additional information can be found here
https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretrained

<a id="transformers.RAGenerator.predict"></a>

Expand Down Expand Up @@ -256,7 +262,7 @@ the [Hugging Face Model Hub](https://huggingface.co/models?pipeline_tag=text2tex
#### Seq2SeqGenerator.\_\_init\_\_

```python
def __init__(model_name_or_path: str, input_converter: Optional[Callable] = None, top_k: int = 1, max_length: int = 200, min_length: int = 2, num_beams: int = 8, use_gpu: bool = True, progress_bar: bool = True)
def __init__(model_name_or_path: str, input_converter: Optional[Callable] = None, top_k: int = 1, max_length: int = 200, min_length: int = 2, num_beams: int = 8, use_gpu: bool = True, progress_bar: bool = True, use_auth_token: Optional[Union[str, bool]] = None)
```

**Arguments**:
Expand All @@ -272,6 +278,12 @@ top_k: Optional[int] = None) -> BatchEncoding:
- `min_length`: Minimum length of generated text
- `num_beams`: Number of beams for beam search. 1 means no beam search.
- `use_gpu`: Whether to use GPU or the CPU. Falls back on CPU if no GPU is available.
- `progress_bar`: Whether to show a tqdm progress bar or not.
- `use_auth_token`: The API token used to download private models from Huggingface.
If this parameter is set to `True`, then the token generated when running
`transformers-cli login` (stored in ~/.huggingface) will be used.
Additional information can be found here
https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretrained

<a id="transformers.Seq2SeqGenerator.predict"></a>

Expand Down Expand Up @@ -311,7 +323,7 @@ Uses the GPT-3 models from the OpenAI API to generate Answers based on the Docum
The Documents can come from a Retriever or you can supply them manually.

To use this Node, you need an API key from an active OpenAI account. You can sign-up for an account
on the [OpenAI API website](https://openai.com/api/)).
on the [OpenAI API website](https://openai.com/api/).

<a id="openai.OpenAIAnswerGenerator.__init__"></a>

Expand Down
7 changes: 6 additions & 1 deletion docs/_src/api/api/pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -509,7 +509,7 @@ Thus [AB] <-> [BC] (score ~50) gets recalculated with B <-> B (score ~100) scori

```python
@send_event
def eval(labels: List[MultiLabel], documents: Optional[List[List[Document]]] = None, params: Optional[dict] = None, sas_model_name_or_path: str = None, sas_batch_size: int = 32, sas_use_gpu: bool = True, add_isolated_node_eval: bool = False, custom_document_id_field: Optional[str] = None, context_matching_min_length: int = 100, context_matching_boost_split_overlaps: bool = True, context_matching_threshold: float = 65.0) -> EvaluationResult
def eval(labels: List[MultiLabel], documents: Optional[List[List[Document]]] = None, params: Optional[dict] = None, sas_model_name_or_path: str = None, sas_batch_size: int = 32, sas_use_gpu: bool = True, add_isolated_node_eval: bool = False, custom_document_id_field: Optional[str] = None, context_matching_min_length: int = 100, context_matching_boost_split_overlaps: bool = True, context_matching_threshold: float = 65.0, use_auth_token: Optional[Union[str, bool]] = None) -> EvaluationResult
```

Evaluates the pipeline by running the pipeline once per query in debug mode
Expand Down Expand Up @@ -563,6 +563,11 @@ If we detect that the score is near a half match and the matching part of the ca
we cut the context on the same side, recalculate the score and take the mean of both.
Thus [AB] <-> [BC] (score ~50) gets recalculated with B <-> B (score ~100) scoring ~75 in total.
- `context_matching_threshold`: Score threshold that candidates must surpass to be included into the result list. Range: [0,100]
- `use_auth_token`: The API token used to download private models from Huggingface.
If this parameter is set to `True`, then the token generated when running
`transformers-cli login` (stored in ~/.huggingface) will be used.
Additional information can be found here
https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretrained

<a id="base.Pipeline.get_nodes_by_class"></a>

Expand Down
7 changes: 6 additions & 1 deletion docs/_src/api/api/pseudo_label_generator.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ For example:
#### PseudoLabelGenerator.\_\_init\_\_

```python
def __init__(question_producer: Union[QuestionGenerator, List[Dict[str, str]]], retriever: BaseRetriever, cross_encoder_model_name_or_path: str = "cross-encoder/ms-marco-MiniLM-L-6-v2", max_questions_per_document: int = 3, top_k: int = 50, batch_size: int = 16, progress_bar: bool = True)
def __init__(question_producer: Union[QuestionGenerator, List[Dict[str, str]]], retriever: BaseRetriever, cross_encoder_model_name_or_path: str = "cross-encoder/ms-marco-MiniLM-L-6-v2", max_questions_per_document: int = 3, top_k: int = 50, batch_size: int = 16, progress_bar: bool = True, use_auth_token: Optional[Union[str, bool]] = None)
```

Loads the cross-encoder model and prepares PseudoLabelGenerator.
Expand All @@ -69,6 +69,11 @@ questions/document pairs in a Dictionary format {"question": "question text ..."
- `top_k` (`int (optional)`): The number of answers retrieved for each question, defaults to 50.
- `batch_size` (`int (optional)`): The number of documents to process at a time.
- `progress_bar` (`bool (optional)`): Whether to show a progress bar, defaults to True.
- `use_auth_token` (`Union[str, bool] (optional)`): The API token used to download private models from Huggingface.
If this parameter is set to `True`, then the token generated when running
`transformers-cli login` (stored in ~/.huggingface) will be used.
Additional information can be found here
https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretrained

<a id="pseudo_label_generator.PseudoLabelGenerator.generate_questions"></a>

Expand Down
7 changes: 6 additions & 1 deletion docs/_src/api/api/query_classifier.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ This node also supports zero-shot-classification.
#### TransformersQueryClassifier.\_\_init\_\_

```python
def __init__(model_name_or_path: Union[Path, str] = "shahrukhx01/bert-mini-finetune-question-detection", model_version: Optional[str] = None, tokenizer: Optional[str] = None, use_gpu: bool = True, task: str = "text-classification", labels: List[str] = DEFAULT_LABELS, batch_size: int = 16, progress_bar: bool = True)
def __init__(model_name_or_path: Union[Path, str] = "shahrukhx01/bert-mini-finetune-question-detection", model_version: Optional[str] = None, tokenizer: Optional[str] = None, use_gpu: bool = True, task: str = "text-classification", labels: List[str] = DEFAULT_LABELS, batch_size: int = 16, progress_bar: bool = True, use_auth_token: Optional[Union[str, bool]] = None)
```

**Arguments**:
Expand All @@ -160,4 +160,9 @@ the second label to output_2, and so on. The labels must match the model labels;
If the task is 'zero-shot-classification', these are the candidate labels.
- `batch_size`: The number of queries to be processed at a time.
- `progress_bar`: Whether to show a progress bar.
- `use_auth_token`: The API token used to download private models from Huggingface.
If this parameter is set to `True`, then the token generated when running
`transformers-cli login` (stored in ~/.huggingface) will be used.
Additional information can be found here
https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretrained

8 changes: 7 additions & 1 deletion docs/_src/api/api/question_generator.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ come from earlier in the document.
#### QuestionGenerator.\_\_init\_\_

```python
def __init__(model_name_or_path="valhalla/t5-base-e2e-qg", model_version=None, num_beams=4, max_length=256, no_repeat_ngram_size=3, length_penalty=1.5, early_stopping=True, split_length=50, split_overlap=10, use_gpu=True, prompt="generate questions:", num_queries_per_doc=1, sep_token: str = "<sep>", batch_size: int = 16, progress_bar: bool = True)
def __init__(model_name_or_path="valhalla/t5-base-e2e-qg", model_version=None, num_beams=4, max_length=256, no_repeat_ngram_size=3, length_penalty=1.5, early_stopping=True, split_length=50, split_overlap=10, use_gpu=True, prompt="generate questions:", num_queries_per_doc=1, sep_token: str = "<sep>", batch_size: int = 16, progress_bar: bool = True, use_auth_token: Optional[Union[str, bool]] = None)
```

Uses the valhalla/t5-base-e2e-qg model by default. This class supports any question generation model that is
Expand All @@ -39,6 +39,12 @@ See https://huggingface.co/models for full list of available models.
- `model_version`: The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
- `use_gpu`: Whether to use GPU or the CPU. Falls back on CPU if no GPU is available.
- `batch_size`: Number of documents to process at a time.
- `progress_bar`: Whether to show a tqdm progress bar or not.
- `use_auth_token`: The API token used to download private models from Huggingface.
If this parameter is set to `True`, then the token generated when running
`transformers-cli login` (stored in ~/.huggingface) will be used.
Additional information can be found here
https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretrained

<a id="question_generator.QuestionGenerator.generate_batch"></a>

Expand Down
7 changes: 6 additions & 1 deletion docs/_src/api/api/ranker.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ Usage example:
#### SentenceTransformersRanker.\_\_init\_\_

```python
def __init__(model_name_or_path: Union[str, Path], model_version: Optional[str] = None, top_k: int = 10, use_gpu: bool = True, devices: Optional[List[Union[str, torch.device]]] = None, batch_size: int = 16, scale_score: bool = True, progress_bar: bool = True)
def __init__(model_name_or_path: Union[str, Path], model_version: Optional[str] = None, top_k: int = 10, use_gpu: bool = True, devices: Optional[List[Union[str, torch.device]]] = None, batch_size: int = 16, scale_score: bool = True, progress_bar: bool = True, use_auth_token: Optional[Union[str, bool]] = None)
```

**Arguments**:
Expand All @@ -114,6 +114,11 @@ https://pytorch.org/docs/stable/tensor_attributes.html?highlight=torch%20device#
only predicts a single label. For multi-label predictions, no scaling is applied. Set this
to False if you do not want any scaling of the raw predictions.
- `progress_bar`: Whether to show a progress bar while processing the documents.
- `use_auth_token`: The API token used to download private models from Huggingface.
If this parameter is set to `True`, then the token generated when running
`transformers-cli login` (stored in ~/.huggingface) will be used.
Additional information can be found here
https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretrained

<a id="sentence_transformers.SentenceTransformersRanker.predict"></a>

Expand Down
Loading