Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connection issue #8690

Closed
rabeehk opened this issue Nov 20, 2020 · 46 comments
Closed

connection issue #8690

rabeehk opened this issue Nov 20, 2020 · 46 comments

Comments

@rabeehk
Copy link

rabeehk commented Nov 20, 2020

Hi
I am runnig seq2seq_trainer on TPUs I am always getting this connection issue could you please have a look
sicne this is on TPUs this is hard for me to debug
thanks
Best
Rabeeh

    2389961.mean    (11/20/2020 05:24:09 PM)        (Detached)
local_files_only=local_files_only,

File "/anaconda3/envs/torch-xla-1.7/lib/python3.6/site-packages/transformers/file_utils.py", line 955, in cached_path
local_files_only=local_files_only,
File "/anaconda3/envs/torch-xla-1.7/lib/python3.6/site-packages/transformers/file_utils.py", line 1125, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
Traceback (most recent call last):
File "/home/rabeeh//internship/seq2seq/xla_spawn.py", line 71, in
main()
XLA label: %copy.32724.remat = f32[80,12,128,128]{3,2,1,0:T(8,128)} copy(f32[80,12,128,128]{2,3,1,0:T(8,128)} %bitcast.576)
Allocation type: HLO temp
==========================

  1. Size: 60.00M
    Shape: f32[80,12,128,128]{3,2,1,0:T(8,128)}
    Unpadded size: 60.00M
    XLA label: %copy.32711.remat = f32[80,12,128,128]{3,2,1,0:T(8,128)} copy(f32[80,12,128,128]{2,3,1,0:T(8,128)
    0%| | 2/18060 [08:12<1234:22:09, 246.08s/it]Traceback (most recent call last):
    File "/home/rabeeh//internship/seq2seq/xla_spawn.py", line 71, in
    main()
    File "/home/rabeeh//internship/seq2seq/xla_spawn.py", line 67, in main
    xmp.spawn(mod._mp_fn, args=(), nprocs=args.num_cores)
    File "/anaconda3/envs/torch-xla-1.7/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 395, in spawn
    start_method=start_method)
    File "/anaconda3/envs/torch-xla-1.7/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
    File "/anaconda3/envs/torch-xla-1.7/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 112, in join
    (error_index, exitcode)
@samyakag
Copy link

Having a similar issue while running Multi class classification model

@rabeehk
Copy link
Author

rabeehk commented Nov 23, 2020

@rabeehkarimimahabadi
Copy link

Hi
I am constantly getting this erorr, looks like a bug to me since sometimes it appears sometimes not, could you please help me, this is expensive experiments I am trying on TPUs and I appreciate your help to fix it, it just many times fails due to this error

getting this erorr Exception in device=TPU:0: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
el/0 I1124 07:19:52.663760 424494 main shadow.py:87 > Traceback (most recent call last):
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 330, in _mp_start_fn
_start_fn(index, pf_cfg, fn, args)
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 324, in _start_fn
fn(gindex, *args)
File "/workdir/seq2seq/finetune_t5_trainer.py", line 230, in _mp_fn
main()
File "/workdir/seq2seq/finetune_t5_trainer.py", line 71, in main
cache_dir=model_args.cache_dir,
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/configuration_utils.py", line 347, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/configuration_utils.py", line 388, in get_config_dict
local_files_only=local_files_only,
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/file_utils.py", line 955, in cached_path
local_files_only=local_files_only,
File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/transformers/file_utils.py", line 1125, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."

@rabeehkarimimahabadi
Copy link

@sumyuck

@rabeehkarimimahabadi
Copy link

@thomwolf

@rabeehkarimimahabadi
Copy link

this is with transformer 3.5.1, pytorch 1.6, on TPU v3-8, and I am using xla_spawn to launch the jobs, looks like a general issue with caching part.

@alkeshpatel11
Copy link

Same for me. Getting this error while trying to execute following line:
tokenizer = LxmertTokenizer.from_pretrained('unc-nlp/lxmert-base-uncased')

File "/Users/xxx/anaconda3/envs/test/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1629, in from_pretrained
local_files_only=local_files_only,
File "/Users/xxx/anaconda3/envs/test/lib/python3.7/site-packages/transformers/file_utils.py", line 955, in cached_path
local_files_only=local_files_only,
File "/Users/xxx/anaconda3/envs/test/lib/python3.7/site-packages/transformers/file_utils.py", line 1125, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

@rabeehk
Copy link
Author

rabeehk commented Nov 25, 2020 via email

@nishkalavallabhi
Copy link

nishkalavallabhi commented Nov 25, 2020

I am having the same issue too. I am pointing to the cache directory where pytorch is saving the models:
`cache_dir = '/home/me/.cache/torch/transformers/'

modelpath = "bert-base-uncased"

model = AutoModel.from_pretrained(modelpath, cache_dir=cache_dir)

tokenizer = AutoTokenizer.from_pretrained(modelpath, cache_dir=cache_dir)
`
And I am getting a connection error. pytorch: 1.7.0, transformers: 3.5.1.

@julien-c
Copy link
Member

Working on a fix, hopefully fixed for good today.

Meanwhile as a workaround please retry a couple minutes later should do the trick

@nishkalavallabhi
Copy link

I deleted all cache, redownloaded all modes and ran again. It seems to be working as of now.

@julien-c
Copy link
Member

Scaling of connectivity for model hosting should be way improved now. Please comment here if you still experience connectivity issues from now on.

Thanks!

@AshishDuhan
Copy link

I am still getting this error with transformers version - 3.5.1 and torch - 1.7.0 on python 3.6.9. Please check. I have tried deleting all cache, installing transformers using pip and source code both. But still getting the same issue again and again.

@julien-c
Copy link
Member

@AshishDuhan Are you loading a model in particular? Do you have a code snippet that consistently fails for you?

@AshishDuhan
Copy link

_import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer

src_text = [""""""]
model_name='google/pegasus-cnn_dailymail'
torch_device='cuda' if torch.cuda.is_available() else 'cpu'
tokenizer=PegasusTokenizer.from_pretrained(model_name)
model=PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)
batch=tokenizer.prepare_seq2seq_batch(src_text, truncation=True, padding='longest').to(torch_device)
translated=model.generate(**batch)
tgt_text=tokenizer.batch_decode(translated, skip_special_tokens=True)
print('Summary:', tgt_text[0])_

This is one of the models I am trying to load. Although I have tried other models too and nothing works. Even the basic command fail with following error:

python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))"
Traceback (most recent call last):
File "", line 1, in
File "/opt/app/jupyter/environments/env_summarization/lib/python3.6/site-packages/transformers/pipelines.py", line 2828, in pipeline
framework = framework or get_framework(model)
File "/opt/app/jupyter/environments/env_summarization/lib/python3.6/site-packages/transformers/pipelines.py", line 106, in get_framework
model = AutoModel.from_pretrained(model, revision=revision)
File "/opt/app/jupyter/environments/env_summarization/lib/python3.6/site-packages/transformers/modeling_auto.py", line 636, in from_pretrained
pretrained_model_name_or_path, return_unused_kwargs=True, **kwargs
File "/opt/app/jupyter/environments/env_summarization/lib/python3.6/site-packages/transformers/configuration_auto.py", line 333, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/opt/app/jupyter/environments/env_summarization/lib/python3.6/site-packages/transformers/configuration_utils.py", line 388, in get_config_dict
local_files_only=local_files_only,
File "/opt/app/jupyter/environments/env_summarization/lib/python3.6/site-packages/transformers/file_utils.py", line 955, in cached_path
local_files_only=local_files_only,
File "/opt/app/jupyter/environments/env_summarization/lib/python3.6/site-packages/transformers/file_utils.py", line 1125, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

@julien-c
Copy link
Member

Our connectivity has been good these past 24 hours so this might be a different (local) issue, @AshishDuhan.

Are you behind a proxy by any chance?

Does curl -i https://huggingface.co/google/pegasus-cnn_dailymail/resolve/main/config.json work from your machine? Can you try what you're doing from a machine in the cloud, like a Google Colab?

@AishwaryaAllada
Copy link

I am facing the same issue still -

Traceback (most recent call last):
File "Untitled.py", line 59, in
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
File "/project/6001557/akallada/digipath/lib/python3.7/site-packages/transformers/models/auto/tokenization_auto.py", line 310, in from_pretrained
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/project/6001557/akallada/digipath/lib/python3.7/site-packages/transformers/models/auto/configuration_auto.py", line 341, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/project/6001557/akallada/digipath/lib/python3.7/site-packages/transformers/configuration_utils.py", line 386, in get_config_dict
local_files_only=local_files_only,
File "/project/6001557/akallada/digipath/lib/python3.7/site-packages/transformers/file_utils.py", line 1007, in cached_path
local_files_only=local_files_only,
File "/project/6001557/akallada/digipath/lib/python3.7/site-packages/transformers/file_utils.py", line 1177, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

@cdo03c
Copy link

cdo03c commented Dec 2, 2020

I'm having the same connection issue. I've tried with and without passing my proxies into the BertModel


ValueError Traceback (most recent call last)
in
1 from transformers import BertTokenizer, BertModel
----> 2 model = BertModel.from_pretrained("bert-base-uncased", **proxies)

~/opt/anaconda3/envs/milglue/lib/python3.8/site-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
865 if not isinstance(config, PretrainedConfig):
866 config_path = config if config is not None else pretrained_model_name_or_path
--> 867 config, model_kwargs = cls.config_class.from_pretrained(
868 config_path,
869 *model_args,

~/opt/anaconda3/envs/milglue/lib/python3.8/site-packages/transformers/configuration_utils.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
345
346 """
--> 347 config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
348 return cls.from_dict(config_dict, **kwargs)
349

~/opt/anaconda3/envs/milglue/lib/python3.8/site-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
380 try:
381 # Load from URL or cache if already cached
--> 382 resolved_config_file = cached_path(
383 config_file,
384 cache_dir=cache_dir,

~/opt/anaconda3/envs/milglue/lib/python3.8/site-packages/transformers/file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, local_files_only)
946 if is_remote_url(url_or_filename):
947 # URL, so get it from the cache (downloading if necessary)
--> 948 output_path = get_from_cache(
949 url_or_filename,
950 cache_dir=cache_dir,

~/opt/anaconda3/envs/milglue/lib/python3.8/site-packages/transformers/file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, local_files_only)
1122 )
1123 else:
-> 1124 raise ValueError(
1125 "Connection error, and we cannot find the requested files in the cached path."
1126 " Please try again or make sure your Internet connection is on."

ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

@julien-c
Copy link
Member

julien-c commented Dec 2, 2020

Hard to say without seeing your full networking environment.

If you try to curl -I the URLs that you get on the arrow icons next to files in e.g. https://huggingface.co/bert-base-uncased/tree/main (or equivalent page for the model you try to download), what happens?

@gokcesurenkok
Copy link

it happened to me too , is there any fix on that ?

@julien-c
Copy link
Member

julien-c commented Dec 4, 2020

is it transient or permanent (i.e. if you relaunch the command does it happen again)? You need to give us some more details if we want to help you troubleshoot.

@rabeehkarimimahabadi
Copy link

rabeehkarimimahabadi commented Dec 13, 2020

Hi
I am still getting this issue. see blow. I am using transformer 3.5.1, could you tell me if the issue is fixed in this version? if not which version of transformers library I should use? thanks
@julien-c

 12/13/2020 13:56:10 - INFO - seq2seq.utils.utils -   config is reset to the initial values.
tp/0 I1213 06:00:34.060680 252396 main shadow.py:122 > Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connection.py", line 170, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/usr/local/lib/python3.6/dist-packages/urllib3/util/connection.py", line 96, in create_connection
    raise err
  File "/usr/local/lib/python3.6/dist-packages/urllib3/util/connection.py", line 86, in create_connection
    sock.connect(sa)
socket.timeout: timed out
tp/0 I1213 06:00:34.060720 252396 main shadow.py:122 > 
tp/0 I1213 06:00:34.060759 252396 main shadow.py:122 > During handling of the above exception, another exception occurred:
tp/0 I1213 06:00:34.060825 252396 main shadow.py:122 > 
tp/0 I1213 06:00:34.060866 252396 main shadow.py:122 > Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 706, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 382, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connection.py", line 353, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connection.py", line 177, in _new_conn
    % (self.host, self.timeout),
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7f47db511e80>, 'Connection to s3.amazonaws.com timed out. (connect timeout=10)')
tp/0 I1213 06:00:34.060908 252396 main shadow.py:122 > 
tp/0 I1213 06:00:34.060970 252396 main shadow.py:122 > During handling of the above exception, another exception occurred:
tp/0 I1213 06:00:34.061113 252396 main shadow.py:122 > 
tp/0 I1213 06:00:34.061207 252396 main shadow.py:122 > Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 756, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py", line 573, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='s3.amazonaws.com', port=443): Max retries exceeded with url: /datasets.huggingface.co/datasets/datasets/glue/glue.py (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f47db511e80>, 'Connection to s3.amazonaws.com timed out. (connect timeout=10)'))
tp/0 I1213 06:00:34.061293 252396 main shadow.py:122 > 
tp/0 I1213 06:00:34.061372 252396 main shadow.py:122 > During handling of the above exception, another exception occurred:
tp/0 I1213 06:00:34.061421 252396 main shadow.py:122 > 
tp/0 I1213 06:00:34.061486 252396 main shadow.py:122 > Traceback (most recent call last):
  File "finetune_t5_trainer.py", line 361, in <module>
    main()
  File "finetune_t5_trainer.py", line 269, in main
    add_prefix=False if training_args.train_adapters else True)
  File "/workdir/seq2seq/data/tasks.py", line 70, in get_dataset
    dataset = self.load_dataset(split=split)
  File "/workdir/seq2seq/data/tasks.py", line 306, in load_dataset
    return datasets.load_dataset('glue', 'cola', split=split)
  File "/usr/local/lib/python3.6/dist-packages/datasets/load.py", line 589, in load_dataset
    path, script_version=script_version, download_config=download_config, download_mode=download_mode, dataset=True
  File "/usr/local/lib/python3.6/dist-packages/datasets/load.py", line 263, in prepare_module
    head_hf_s3(path, filename=name, dataset=dataset)
  File "/usr/local/lib/python3.6/dist-packages/datasets/utils/file_utils.py", line 200, in head_hf_s3
    return http_head(hf_bucket_url(identifier=identifier, filename=filename, use_cdn=use_cdn, dataset=dataset))
  File "/usr/local/lib/python3.6/dist-packages/datasets/utils/file_utils.py", line 403, in http_head
    url, proxies=proxies, headers=headers, cookies=cookies, allow_redirects=allow_redirects, timeout=timeout
  File "/usr/local/lib/python3.6/dist-packages/requests/api.py", line 104, in head
    return request('head', url, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/adapters.py", line 504, in send
    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='s3.amazonaws.com', port=443): Max retries exceeded with url: /datasets.huggingface.co/datasets/datasets/glue/glue.py (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f47db511e80>, 'Connection to s3.amazonaws.com timed out. (connect timeout=10)'))
tp/0 I1213 06:00:35.237288 252396 main waiter_thread.cc:2652 [tp][0] EndSession for client id 1607864609277665002 (server tpe18:6297)

@julien-c
Copy link
Member

Looks like you are getting a timeout connecting to s3.amazonaws.com. There's not much we can do here.

@PraveshGailakoti
Copy link

Hi,
I am facing the same issue, the code is running fine on colab but while running it on local system i am getting below error.

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

model = AutoModelForMaskedLM.from_pretrained("bert-base-cased")


ValueError Traceback (most recent call last)
in
1 from transformers import AutoTokenizer, AutoModelForMaskedLM
2
----> 3 tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
4
5 model = AutoModelForMaskedLM.from_pretrained("bert-base-cased")

~\Anaconda3\envs\bert-test\lib\site-packages\transformers\models\auto\tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
308 config = kwargs.pop("config", None)
309 if not isinstance(config, PretrainedConfig):
--> 310 config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
311
312 if "bert-base-japanese" in str(pretrained_model_name_or_path):

~\Anaconda3\envs\bert-test\lib\site-packages\transformers\models\auto\configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
339 {'foo': False}
340 """
--> 341 config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
342
343 if "model_type" in config_dict:

~\Anaconda3\envs\bert-test\lib\site-packages\transformers\configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
384 proxies=proxies,
385 resume_download=resume_download,
--> 386 local_files_only=local_files_only,
387 )
388 # Load config dict

~\Anaconda3\envs\bert-test\lib\site-packages\transformers\file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, local_files_only)
1005 resume_download=resume_download,
1006 user_agent=user_agent,
-> 1007 local_files_only=local_files_only,
1008 )
1009 elif os.path.exists(url_or_filename):

~\Anaconda3\envs\bert-test\lib\site-packages\transformers\file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, local_files_only)
1175 else:
1176 raise ValueError(
-> 1177 "Connection error, and we cannot find the requested files in the cached path."
1178 " Please try again or make sure your Internet connection is on."
1179 )

ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

@julien-c
Copy link
Member

Can you try the debugging procedure mentioned in #8690 (comment)?

@bharatbal
Copy link

i am able to open 8690 in web browser. but the error still remains:

qa = text.SimpleQA(INDEXDIR)


ValueError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\ktrain\text\qa\core.py in init(self, bert_squad_model, bert_emb_model)
67 try:
---> 68 self.model = TFAutoModelForQuestionAnswering.from_pretrained(self.model_name)
69 except:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\transformers\modeling_tf_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
1204 config, kwargs = AutoConfig.from_pretrained(
-> 1205 pretrained_model_name_or_path, return_unused_kwargs=True, **kwargs
1206 )

~\AppData\Local\Continuum\anaconda3\lib\site-packages\transformers\configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
332 """
--> 333 config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
334

~\AppData\Local\Continuum\anaconda3\lib\site-packages\transformers\configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
387 resume_download=resume_download,
--> 388 local_files_only=local_files_only,
389 )

~\AppData\Local\Continuum\anaconda3\lib\site-packages\transformers\file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, local_files_only)
954 user_agent=user_agent,
--> 955 local_files_only=local_files_only,
956 )

~\AppData\Local\Continuum\anaconda3\lib\site-packages\transformers\file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, local_files_only)
1124 raise ValueError(
-> 1125 "Connection error, and we cannot find the requested files in the cached path."
1126 " Please try again or make sure your Internet connection is on."

ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
in
1 # ask questions (setting higher batch size can further speed up answer retrieval)
----> 2 qa = text.SimpleQA(INDEXDIR)
3 #answers = qa.ask('What is lotus sutra?', batch_size=8)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\ktrain\text\qa\core.py in init(self, index_dir, bert_squad_model, bert_emb_model)
348 except:
349 raise ValueError('index_dir has not yet been created - please call SimpleQA.initialize_index("%s")' % (self.index_dir))
--> 350 super().init(bert_squad_model=bert_squad_model, bert_emb_model=bert_emb_model)
351
352

~\AppData\Local\Continuum\anaconda3\lib\site-packages\ktrain\text\qa\core.py in init(self, bert_squad_model, bert_emb_model)
68 self.model = TFAutoModelForQuestionAnswering.from_pretrained(self.model_name)
69 except:
---> 70 self.model = TFAutoModelForQuestionAnswering.from_pretrained(self.model_name, from_pt=True)
71 self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
72 self.maxlen = 512

~\AppData\Local\Continuum\anaconda3\lib\site-packages\transformers\modeling_tf_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
1203 if not isinstance(config, PretrainedConfig):
1204 config, kwargs = AutoConfig.from_pretrained(
-> 1205 pretrained_model_name_or_path, return_unused_kwargs=True, **kwargs
1206 )
1207

~\AppData\Local\Continuum\anaconda3\lib\site-packages\transformers\configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
331 {'foo': False}
332 """
--> 333 config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
334
335 if "model_type" in config_dict:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\transformers\configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
386 proxies=proxies,
387 resume_download=resume_download,
--> 388 local_files_only=local_files_only,
389 )
390 # Load config dict

~\AppData\Local\Continuum\anaconda3\lib\site-packages\transformers\file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, local_files_only)
953 resume_download=resume_download,
954 user_agent=user_agent,
--> 955 local_files_only=local_files_only,
956 )
957 elif os.path.exists(url_or_filename):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\transformers\file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, local_files_only)
1123 else:
1124 raise ValueError(
-> 1125 "Connection error, and we cannot find the requested files in the cached path."
1126 " Please try again or make sure your Internet connection is on."
1127 )

ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

@chz816
Copy link

chz816 commented Dec 29, 2020

still get this error for transformer 4.1.1 with torch 1.7.1

error message here:

Traceback (most recent call last):
  File "run_distributed_eval.py", line 273, in <module>
    run_generate()
  File "run_distributed_eval.py", line 206, in run_generate
    **generate_kwargs,
  File "run_distributed_eval.py", line 88, in eval_data_dir
    tokenizer = AutoTokenizer.from_pretrained(model_name)
  File "/data/User/v5/acl/venv/lib/python3.6/site-packages/transformers/models/auto/tokenization_auto.py", line 378, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/data/User/v5/acl/venv/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 1781, in from_pretrained
    use_auth_token=use_auth_token,
  File "/data/User/v5/acl/venv/lib/python3.6/site-packages/transformers/file_utils.py", line 1085, in cached_path
    local_files_only=local_files_only,
  File "/data/User/v5/acl/venv/lib/python3.6/site-packages/transformers/file_utils.py", line 1264, in get_from_cache
    "Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

@brook-w
Copy link

brook-w commented Jan 14, 2021

try transformers 4.00 transformers:4.1
Same error

#8690 (comment)
This can be accessed and downloaded

Traceback (most recent call last):
  File "f:\software\anaconda\envs\py38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "f:\software\anaconda\envs\py38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "F:\Software\Anaconda\envs\py38\Scripts\rasa.exe\__main__.py", line 7, in <module>
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\__main__.py", line 116, in main
    cmdline_arguments.func(cmdline_arguments)
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\cli\train.py", line 58, in <lambda>
    train_parser.set_defaults(func=lambda args: train(args, can_exit=True))
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\cli\train.py", line 90, in train
    training_result = rasa.train(
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\train.py", line 94, in train
    return rasa.utils.common.run_in_loop(
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\utils\common.py", line 308, in run_in_loop
    result = loop.run_until_complete(f)
  File "f:\software\anaconda\envs\py38\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\train.py", line 163, in train_async
    return await _train_async_internal(
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\train.py", line 342, in _train_async_internal
    await _do_training(
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\train.py", line 388, in _do_training
    model_path = await _train_nlu_with_validated_data(
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\train.py", line 811, in _train_nlu_with_validated_data
    await rasa.nlu.train(
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\nlu\train.py", line 97, in train
    trainer = Trainer(
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\nlu\model.py", line 163, in __init__
    self.pipeline = self._build_pipeline(cfg, component_builder)
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\nlu\model.py", line 174, in _build_pipeline
    component = component_builder.create_component(component_cfg, cfg)
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\nlu\components.py", line 852, in create_component
    component = registry.create_component_by_config(component_config, cfg)
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\nlu\registry.py", line 193, in create_component_by_config
    return component_class.create(component_config, config)
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\nlu\components.py", line 525, in create
    return cls(component_config)
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\nlu\utils\hugging_face\hf_transformers.py", line 65, in __init__
    self._load_model_instance(skip_model_load)
  File "f:\software\anaconda\envs\py38\lib\site-packages\rasa\nlu\utils\hugging_face\hf_transformers.py", line 121, in _load_model_instance
    self.tokenizer = model_tokenizer_dict[self.model_name].from_pretrained(
  File "f:\software\anaconda\envs\py38\lib\site-packages\transformers\tokenization_utils_base.py", line 1774, in from_pretrained
    resolved_vocab_files[file_id] = cached_path(
  File "f:\software\anaconda\envs\py38\lib\site-packages\transformers\file_utils.py", line 1077, in cached_path
    output_path = get_from_cache(
  File "f:\software\anaconda\envs\py38\lib\site-packages\transformers\file_utils.py", line 1263, in get_from_cache
    raise ValueError(
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on

@robinderat
Copy link
Contributor

I also ran into this error while trying to download any huggingface model. Turns out for me the cause was that I had set an export REQUESTS_CA_BUNDLE=path/to/some/certificate in my .bash_profile, which I needed to get some poetry stuff working. Once I removed this line and restarted, the download was working again.

@joshdevins
Copy link
Contributor

It appears to be an SSL/TLS certificate error as @robinderat alludes to, but there are several possible reasons. Here's how I've debugged this, hopefully it helps others although your root cause may be different.

Debugging

Original error, fetching model from https://huggingface.co/elastic/distilbert-base-cased-finetuned-conll03-english:

ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

Check with curl:

$ curl -I https://huggingface.co/elastic/distilbert-base-cased-finetuned-conll03-english/resolve/main/config.json
curl: (60) SSL certificate problem: certificate is not yet valid
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

Checking with requests:

$ python -c "import requests; requests.get('https://huggingface.co/elastic/distilbert-base-cased-finetuned-conll03-english/resolve/main/config.json')"
Traceback (most recent call last):
  <snip>
  File "/usr/lib/python3.7/ssl.py", line 412, in wrap_socket
    session=session
  File "/usr/lib/python3.7/ssl.py", line 853, in _create
    self.do_handshake()
  File "/usr/lib/python3.7/ssl.py", line 1117, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate is not yet valid (_ssl.c:1056)

Disabling curl's certificate validation with -k flag works:

$ curl -k -I https://huggingface.co/elastic/distilbert-base-cased-finetuned-conll03-english/resolve/main/config.json
HTTP/1.1 200 OK

And now in Python, using verify=False:

$ python -c "import requests; r = requests.get('https://huggingface.co/elastic/distilbert-base-cased-finetuned-conll03-english/resolve/main/config.json', verify=False); print(r)"
/home/josh/source/examples/Machine Learning/Query Optimization/venv/lib/python3.7/site-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'huggingface.co'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
<Response [200]>

Resolution

So the "problem" is in the certificate. Checking in a browser, the root certificate of huggingface.co expires 30 April, 2021 but is valid only from 30 January, 2020.

Checking my server clock shows that it was out of date (27 January 20201) and critically, before the certificate is valid from, which makes sense that the root error was "certificate verify failed: certificate is not yet valid".

Set the clock to the real time and check again:

$ sudo date -s "Feb 11 09:34:03 UTC 2021"
$ python -c "import requests; r = requests.get('https://huggingface.co/elastic/distilbert-base-cased-finetuned-conll03-english/resolve/main/config.json'); print(r)"
<Response [200]>

I now suspect that this host in GCP, which was suspended for a while, did not automatically update it's local time causing this specific problem.

Conclusion

@julien-c I would only suggest at this point that making the root cause visible in the error coming out of transformers would be really helpful to more immediately see the problem.

🎉

@julien-c
Copy link
Member

@joshdevins nice troubleshooting!

The issue here is that on this line

except (requests.exceptions.ConnectionError, requests.exceptions.Timeout):

we catch requests' ConnectionError (if I'm not mistaken, triggered when you're offline) but SSLError (and ProxyError for that matter), which we wouldn't want to catch, inherit from ConnectionError.

See requests's exceptions at https://requests.readthedocs.io/en/master/_modules/requests/exceptions/

We could at least probably rethrow the exceptions in those cases.

@julien-c
Copy link
Member

see tentative fix over at huggingface/huggingface_hub@34b7b70

@joshdevins let me know if this looks good

@joshdevins
Copy link
Contributor

@julien-c Looks good. I was able to recreate the original problem and applying your patch makes the root cause error much more visible. Thanks! 👍

@joseph8az
Copy link

joseph8az commented Feb 23, 2021

just restart the system and then reconnect the internet ....will solve the issue..happy day

@aspin0077
Copy link

just restart the system and will solve the issue..happy day

Super bro... thanks a lot.. its working

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@sanjeethbusnur
Copy link

Hi,

Can anyone please tell me how you were able to resolve this issue?

I am facing the connection error as below.

ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

@jianluewustl
Copy link

i face the save error;

11.7s 1 /opt/conda/lib/python3.7/site-packages/papermill/iorw.py:50: FutureWarning: pyarrow.HadoopFileSystem is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead.
11.7s 2 from pyarrow import HadoopFileSystem
40.1s 3 If you want to use your W&B account, go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as wandb_api.
40.1s 4 Get your W&B access token from here: https://wandb.ai/authorize
63.7s 5 Traceback (most recent call last):
63.7s 6 File "", line 1, in
63.7s 7 File "/opt/conda/lib/python3.7/site-packages/papermill/execute.py", line 122, in execute_notebook
63.7s 8 raise_for_execution_errors(nb, output_path)
63.7s 9 File "/opt/conda/lib/python3.7/site-packages/papermill/execute.py", line 234, in raise_for_execution_errors
63.7s 10 raise error
63.7s 11 papermill.exceptions.PapermillExecutionError:
63.7s 12 ---------------------------------------------------------------------------
63.7s 13 Exception encountered at "In [2]":
63.7s 14 ---------------------------------------------------------------------------
63.7s 15 ValueError Traceback (most recent call last)
63.7s 16 /tmp/ipykernel_21/2060779141.py in
63.7s 17 16 "device": torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
63.7s 18 17 }
63.7s 19 ---> 18 CONFIG["tokenizer"] = AutoTokenizer.from_pretrained(CONFIG['model_name'])
63.7s 20 19 def id_generator(size=12, chars=string.ascii_lowercase + string.digits):
63.7s 21 20 return ''.join(random.SystemRandom().choice(chars) for _ in range(size))
63.7s 22  
63.7s 23 /opt/conda/lib/python3.7/site-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
63.7s 24 388 kwargs["_from_auto"] = True
63.7s 25 389 if not isinstance(config, PretrainedConfig):
63.7s 26 --> 390 config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
63.7s 27 391
63.7s 28 392 use_fast = kwargs.pop("use_fast", True)
63.7s 29  
63.7s 30 /opt/conda/lib/python3.7/site-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
63.7s 31 396 """
63.7s 32 397 kwargs["_from_auto"] = True
63.7s 33 --> 398 config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
63.7s 34 399 if "model_type" in config_dict:
63.7s 35 400 config_class = CONFIG_MAPPING[config_dict["model_type"]]
63.7s 36  
63.7s 37 /opt/conda/lib/python3.7/site-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
63.7s 38 464 local_files_only=local_files_only,
63.7s 39 465 use_auth_token=use_auth_token,
63.7s 40 --> 466 user_agent=user_agent,
63.7s 41 467 )
63.7s 42 468 # Load config dict
63.7s 43  
63.7s 44 /opt/conda/lib/python3.7/site-packages/transformers/file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, use_auth_token, local_files_only)
63.7s 45 1171 user_agent=user_agent,
63.7s 46 1172 use_auth_token=use_auth_token,
63.7s 47 -> 1173 local_files_only=local_files_only,
63.7s 48 1174 )
63.7s 49 1175 elif os.path.exists(url_or_filename):
63.7s 50  
63.7s 51 /opt/conda/lib/python3.7/site-packages/transformers/file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, use_auth_token, local_files_only)
63.7s 52 1387 else:
63.7s 53 1388 raise ValueError(
63.7s 54 -> 1389 "Connection error, and we cannot find the requested files in the cached path."
63.7s 55 1390 " Please try again or make sure your Internet connection is on."
63.7s 56 1391 )
63.7s 57  
63.7s 58 ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
63.7s 59  
66.0s 60 /opt/conda/lib/python3.7/site-packages/traitlets/traitlets.py:2567: FutureWarning: --Exporter.preprocessors=["remove_papermill_header.RemovePapermillHeader"] for containers is deprecated in traitlets 5.0. You can pass --Exporter.preprocessors item ... multiple times to add items to a list.
66.0s 61 FutureWarning,
66.0s 62 [NbConvertApp] Converting notebook notebook.ipynb to notebook
66.3s 63 [NbConvertApp] Writing 39941 bytes to notebook.ipynb
68.5s 64 /opt/conda/lib/python3.7/site-packages/traitlets/traitlets.py:2567: FutureWarning: --Exporter.preprocessors=["nbconvert.preprocessors.ExtractOutputPreprocessor"] for containers is deprecated in traitlets 5.0. You can pass --Exporter.preprocessors item ... multiple times to add items to a list.
68.5s 65 FutureWarning,
68.5s 66 [NbConvertApp] Converting notebook notebook.ipynb to html
69.2s 67 [NbConvertApp] Writing 355186 bytes to results.html

@joseph8az
Copy link

why don't you restart the system.

@mitramir55
Copy link

mitramir55 commented Jan 18, 2022

Hi,
I'm trying to use a simple text classification pipeline, and whether I try to clone the model's repo or download it by importing the model, I receive this error.
When cloning:

Cloning into 'distilbert-base-uncased-finetuned-sst-2-english'...
fatal: unable to access 'https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english/': OpenSSL SSL_connect: Connection was reset in connection to huggingface.co:443 

When importing:

ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

I guess this error is triggered because of my location (I am in Iran). I also tried with and without a VPN and neither worked. Can there be any hope for me to download a transformer model?

@louismartin
Copy link
Contributor

FYI I was getting this error when training on multiple gpus with multi-processing maybe due to too many requests at the same time.

I could flakily reproduce with:

from concurrent.futures import ThreadPoolExecutor
from transformers import T5Tokenizer


with ThreadPoolExecutor(max_workers=16) as executor:
    jobs = []
    for _ in range(16):
        jobs.append(executor.submit(T5Tokenizer.from_pretrained, "t5-small"))
_ = [(print(i), job.result()) for i, job in enumerate(jobs)]

The solution for me was to force offline mode:

T5Tokenizer.from_pretrained("t5-small", local_files_only=True)

@danielbellhv
Copy link

I deleted all cache, redownloaded all modes and ran again. It seems to be working as of now.

How do you delete cache of GPT-2 model?

@patil-suraj
Copy link
Contributor

@danielbellhv

you can pass force_download=True to from_pretrained which will override the cache and re-download the files.

@akmalkadi
Copy link

@danielbellhv

you can pass force_download=True to from_pretrained which will override the cache and re-download the files.

got this error
TypeError: from_pretrained() got an unexpected keyword argument 'force_download'

@RenzeLou
Copy link

I have encountered this error more than once.

The solution can be various, e.g., sometimes I delete all my cached files, and sometimes I just delete some big files (model files), and also sometimes I just wait for it for several minutes then it works again without doing anything...

I am really confused by this error. Personally, I think this error can be caused by many reasons. Hope a more detailed and specific error log could be provided in the future.

@akmalkadi
Copy link

I have encountered this error more than once.

The solution can be various, e.g., sometimes I delete all my cached files, and sometimes I just delete some big files (model files), and also sometimes I just wait for it for several minutes then it works again without doing anything...

I am really confused by this error. Personally, I think this error can be caused by many reasons. Hope a more detailed and specific error log could be provided in the future.

My issue was because there were no internet connection by default. So I had to solve the internet problem and it worked for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests