You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I trained Poincare model on wiki graph and receive this exception
Steps/Code/Corpus to Reproduce
I have no good example for reproducing, but what I exactly did
fromgensim.models.poincareimportPoincareModelimportloggingimportjsonfromtqdmimporttqdmfromsmart_openimportsmart_openlogging.basicConfig(level=logging.INFO, format='%(asctime)s - %(message)s')
classWikiGraphReader(object):
def__init__(self, pth):
self.pth=pthdef__iter__(self):
withsmart_open(self.pth, 'r') asinfile:
forrowintqdm(infile):
row=json.loads(row)
src=row["s"]
fordstinrow["d"]:
yield (src, dst)
corpus=WikiGraphReader("edges.jsonl.gz")
model=PoincareModel(corpus)
model.train(epochs=1, batch_size=1000) # all fine, trained successfullymodel.save("poincare-1ep-wiki.model")
model.train(epochs=1, batch_size=1000) # exception from here linemodel.save("p_model/poincare-1.5ep-wiki.model") # I saved this model too
Full stack trace from second model.train(epochs=1, batch_size=1000)
2018-02-05 10:49:58,008 - training model of size 50 with 1 workers on 128138847 relations for 1 epochs and 10 burn-in epochs, using lr=0.01000 burn-in lr=0.01000 negative=10
2018-02-05 10:49:58,010 - Starting burn-in (10 epochs)----------------------------------------
2018-02-05 10:56:51,400 - Training on epoch 1, examples #999000-#1000000, loss: 2188.85
2018-02-05 10:56:51,404 - Time taken for 1000000 examples: 329.60 s, 3033.98 examples / s
2018-02-05 11:01:44,625 - Training on epoch 1, examples #1999000-#2000000, loss: 2187.71
2018-02-05 11:01:44,627 - Time taken for 1000000 examples: 293.22 s, 3410.41 examples / s
2018-02-05 11:06:38,729 - Training on epoch 1, examples #2999000-#3000000, loss: 2186.41
2018-02-05 11:06:38,731 - Time taken for 1000000 examples: 294.10 s, 3400.18 examples / s
2018-02-05 11:11:28,291 - Training on epoch 1, examples #3999000-#4000000, loss: 2185.42
2018-02-05 11:11:28,293 - Time taken for 1000000 examples: 289.56 s, 3453.52 examples / s
2018-02-05 11:16:16,831 - Training on epoch 1, examples #4999000-#5000000, loss: 2184.04
2018-02-05 11:16:16,833 - Time taken for 1000000 examples: 288.54 s, 3465.75 examples / s
2018-02-05 11:21:06,625 - Training on epoch 1, examples #5999000-#6000000, loss: 2182.88
2018-02-05 11:21:06,630 - Time taken for 1000000 examples: 289.79 s, 3450.75 examples / s
2018-02-05 11:26:55,483 - Training on epoch 1, examples #6999000-#7000000, loss: 2181.47
2018-02-05 11:26:55,484 - Time taken for 1000000 examples: 348.85 s, 2866.54 examples / s
2018-02-05 11:31:45,830 - Training on epoch 1, examples #7999000-#8000000, loss: 2180.34
2018-02-05 11:31:45,839 - Time taken for 1000000 examples: 290.34 s, 3444.18 examples / s
2018-02-05 11:36:30,690 - Training on epoch 1, examples #8999000-#9000000, loss: 2179.56
2018-02-05 11:36:30,692 - Time taken for 1000000 examples: 284.85 s, 3510.62 examples / s
2018-02-05 11:41:15,313 - Training on epoch 1, examples #9999000-#10000000, loss: 2178.03
2018-02-05 11:41:15,315 - Time taken for 1000000 examples: 284.62 s, 3513.45 examples / s
2018-02-05 11:46:00,357 - Training on epoch 1, examples #10999000-#11000000, loss: 2177.52
2018-02-05 11:46:00,358 - Time taken for 1000000 examples: 285.04 s, 3508.26 examples / s
2018-02-05 11:50:48,905 - Training on epoch 1, examples #11999000-#12000000, loss: 2175.87
2018-02-05 11:50:48,910 - Time taken for 1000000 examples: 288.55 s, 3465.64 examples / s
2018-02-05 11:55:35,918 - Training on epoch 1, examples #12999000-#13000000, loss: 2174.76
2018-02-05 11:55:35,919 - Time taken for 1000000 examples: 287.01 s, 3484.23 examples / s
2018-02-05 12:00:24,240 - Training on epoch 1, examples #13999000-#14000000, loss: 2173.49
2018-02-05 12:00:24,242 - Time taken for 1000000 examples: 288.32 s, 3468.36 examples / s
2018-02-05 12:05:07,573 - Training on epoch 1, examples #14999000-#15000000, loss: 2172.35
2018-02-05 12:05:07,574 - Time taken for 1000000 examples: 283.33 s, 3529.45 examples / s
2018-02-05 12:09:52,164 - Training on epoch 1, examples #15999000-#16000000, loss: 2171.20
2018-02-05 12:09:52,165 - Time taken for 1000000 examples: 284.59 s, 3513.83 examples / s
2018-02-05 12:14:41,436 - Training on epoch 1, examples #16999000-#17000000, loss: 2170.33
2018-02-05 12:14:41,438 - Time taken for 1000000 examples: 289.27 s, 3456.97 examples / s
2018-02-05 12:19:34,138 - Training on epoch 1, examples #17999000-#18000000, loss: 2169.56
2018-02-05 12:19:34,142 - Time taken for 1000000 examples: 292.70 s, 3416.47 examples / s
2018-02-05 12:24:27,812 - Training on epoch 1, examples #18999000-#19000000, loss: 2168.17
2018-02-05 12:24:27,814 - Time taken for 1000000 examples: 293.67 s, 3405.19 examples / s
2018-02-05 12:29:15,083 - Training on epoch 1, examples #19999000-#20000000, loss: 2167.16
2018-02-05 12:29:15,085 - Time taken for 1000000 examples: 287.27 s, 3481.06 examples / s
2018-02-05 12:34:03,589 - Training on epoch 1, examples #20999000-#21000000, loss: 2165.85
2018-02-05 12:34:03,590 - Time taken for 1000000 examples: 288.50 s, 3466.17 examples / s
2018-02-05 12:38:50,770 - Training on epoch 1, examples #21999000-#22000000, loss: 2164.89
2018-02-05 12:38:50,772 - Time taken for 1000000 examples: 287.18 s, 3482.14 examples / s
2018-02-05 12:43:41,125 - Training on epoch 1, examples #22999000-#23000000, loss: 2163.63
2018-02-05 12:43:41,129 - Time taken for 1000000 examples: 290.35 s, 3444.09 examples / s
2018-02-05 12:48:27,127 - Training on epoch 1, examples #23999000-#24000000, loss: 2162.46
2018-02-05 12:48:27,129 - Time taken for 1000000 examples: 286.00 s, 3496.53 examples / s
2018-02-05 12:53:17,683 - Training on epoch 1, examples #24999000-#25000000, loss: 2161.23
2018-02-05 12:53:17,684 - Time taken for 1000000 examples: 290.55 s, 3441.71 examples / s
2018-02-05 12:58:02,880 - Training on epoch 1, examples #25999000-#26000000, loss: 2160.17
2018-02-05 12:58:02,881 - Time taken for 1000000 examples: 285.20 s, 3506.37 examples / s
2018-02-05 13:02:47,177 - Training on epoch 1, examples #26999000-#27000000, loss: 2158.66
2018-02-05 13:02:47,179 - Time taken for 1000000 examples: 284.30 s, 3517.46 examples / s
2018-02-05 13:07:31,441 - Training on epoch 1, examples #27999000-#28000000, loss: 2157.93
2018-02-05 13:07:31,442 - Time taken for 1000000 examples: 284.26 s, 3517.89 examples / s
2018-02-05 13:12:20,000 - Training on epoch 1, examples #28999000-#29000000, loss: 2156.97
2018-02-05 13:12:20,004 - Time taken for 1000000 examples: 288.56 s, 3465.52 examples / s
2018-02-05 13:17:06,050 - Training on epoch 1, examples #29999000-#30000000, loss: 2155.66
2018-02-05 13:17:06,051 - Time taken for 1000000 examples: 286.04 s, 3495.96 examples / s
2018-02-05 13:21:56,627 - Training on epoch 1, examples #30999000-#31000000, loss: 2154.42
2018-02-05 13:21:56,628 - Time taken for 1000000 examples: 290.58 s, 3441.45 examples / s
2018-02-05 13:26:41,004 - Training on epoch 1, examples #31999000-#32000000, loss: 2153.39
2018-02-05 13:26:41,005 - Time taken for 1000000 examples: 284.37 s, 3516.49 examples / s
2018-02-05 13:31:26,601 - Training on epoch 1, examples #32999000-#33000000, loss: 2152.29
2018-02-05 13:31:26,603 - Time taken for 1000000 examples: 285.59 s, 3501.49 examples / s
2018-02-05 13:36:11,844 - Training on epoch 1, examples #33999000-#34000000, loss: 2151.36
2018-02-05 13:36:11,845 - Time taken for 1000000 examples: 285.24 s, 3505.82 examples / s
2018-02-05 13:41:08,003 - Training on epoch 1, examples #34999000-#35000000, loss: 2150.06
2018-02-05 13:41:08,008 - Time taken for 1000000 examples: 296.16 s, 3376.58 examples / s
2018-02-05 13:45:59,593 - Training on epoch 1, examples #35999000-#36000000, loss: 2149.02
2018-02-05 13:45:59,594 - Time taken for 1000000 examples: 291.58 s, 3429.54 examples / s
2018-02-05 13:50:52,455 - Training on epoch 1, examples #36999000-#37000000, loss: 2148.05
2018-02-05 13:50:52,457 - Time taken for 1000000 examples: 292.86 s, 3414.59 examples / s
2018-02-05 13:55:42,711 - Training on epoch 1, examples #37999000-#38000000, loss: 2146.37
2018-02-05 13:55:42,712 - Time taken for 1000000 examples: 290.25 s, 3445.26 examples / s
2018-02-05 14:00:31,112 - Training on epoch 1, examples #38999000-#39000000, loss: 2145.71
2018-02-05 14:00:31,113 - Time taken for 1000000 examples: 288.40 s, 3467.42 examples / s
2018-02-05 14:05:18,087 - Training on epoch 1, examples #39999000-#40000000, loss: 2144.32
2018-02-05 14:05:18,088 - Time taken for 1000000 examples: 286.97 s, 3484.65 examples / s
2018-02-05 14:10:08,383 - Training on epoch 1, examples #40999000-#41000000, loss: 2143.63
2018-02-05 14:10:08,388 - Time taken for 1000000 examples: 290.29 s, 3444.78 examples / s
2018-02-05 14:15:01,954 - Training on epoch 1, examples #41999000-#42000000, loss: 2142.36
2018-02-05 14:15:01,955 - Time taken for 1000000 examples: 293.57 s, 3406.40 examples / s
2018-02-05 14:19:58,021 - Training on epoch 1, examples #42999000-#43000000, loss: 2141.21
2018-02-05 14:19:58,023 - Time taken for 1000000 examples: 296.07 s, 3377.63 examples / s
2018-02-05 14:24:43,944 - Training on epoch 1, examples #43999000-#44000000, loss: 2140.30
2018-02-05 14:24:43,945 - Time taken for 1000000 examples: 285.92 s, 3497.48 examples / s
2018-02-05 14:29:36,938 - Training on epoch 1, examples #44999000-#45000000, loss: 2138.98
2018-02-05 14:29:36,939 - Time taken for 1000000 examples: 292.99 s, 3413.06 examples / s
2018-02-05 14:34:31,522 - Training on epoch 1, examples #45999000-#46000000, loss: 2137.78
2018-02-05 14:34:31,523 - Time taken for 1000000 examples: 294.58 s, 3394.64 examples / s
2018-02-05 14:39:24,775 - Training on epoch 1, examples #46999000-#47000000, loss: 2136.79
2018-02-05 14:39:24,780 - Time taken for 1000000 examples: 293.25 s, 3410.04 examples / s
2018-02-05 14:44:15,172 - Training on epoch 1, examples #47999000-#48000000, loss: 2135.49
2018-02-05 14:44:15,174 - Time taken for 1000000 examples: 290.39 s, 3443.62 examples / s
2018-02-05 14:49:07,628 - Training on epoch 1, examples #48999000-#49000000, loss: 2135.08
2018-02-05 14:49:07,630 - Time taken for 1000000 examples: 292.45 s, 3419.34 examples / s
2018-02-05 14:53:51,284 - Training on epoch 1, examples #49999000-#50000000, loss: 2133.45
2018-02-05 14:53:51,285 - Time taken for 1000000 examples: 283.65 s, 3525.43 examples / s
2018-02-05 14:58:39,403 - Training on epoch 1, examples #50999000-#51000000, loss: 2132.59
2018-02-05 14:58:39,404 - Time taken for 1000000 examples: 288.12 s, 3470.81 examples / s
2018-02-05 15:03:27,455 - Training on epoch 1, examples #51999000-#52000000, loss: 2131.60
2018-02-05 15:03:27,456 - Time taken for 1000000 examples: 288.05 s, 3471.65 examples / s
2018-02-05 15:08:19,622 - Training on epoch 1, examples #52999000-#53000000, loss: 2130.17
2018-02-05 15:08:19,627 - Time taken for 1000000 examples: 292.17 s, 3422.71 examples / s
2018-02-05 15:13:12,975 - Training on epoch 1, examples #53999000-#54000000, loss: 2129.34
2018-02-05 15:13:12,976 - Time taken for 1000000 examples: 293.35 s, 3408.92 examples / s
2018-02-05 15:18:01,815 - Training on epoch 1, examples #54999000-#55000000, loss: 2128.32
2018-02-05 15:18:01,816 - Time taken for 1000000 examples: 288.84 s, 3462.15 examples / s
2018-02-05 15:22:45,226 - Training on epoch 1, examples #55999000-#56000000, loss: 2126.67
2018-02-05 15:22:45,227 - Time taken for 1000000 examples: 283.41 s, 3528.47 examples / s
2018-02-05 15:27:31,026 - Training on epoch 1, examples #56999000-#57000000, loss: 2126.11
2018-02-05 15:27:31,027 - Time taken for 1000000 examples: 285.79 s, 3499.01 examples / s
2018-02-05 15:32:19,805 - Training on epoch 1, examples #57999000-#58000000, loss: 2125.11
2018-02-05 15:32:19,807 - Time taken for 1000000 examples: 288.77 s, 3462.90 examples / s
2018-02-05 15:37:11,024 - Training on epoch 1, examples #58999000-#59000000, loss: 2123.99
2018-02-05 15:37:11,028 - Time taken for 1000000 examples: 291.22 s, 3433.87 examples / s
2018-02-05 15:42:06,631 - Training on epoch 1, examples #59999000-#60000000, loss: 2123.01
2018-02-05 15:42:06,632 - Time taken for 1000000 examples: 295.60 s, 3382.92 examples / s
2018-02-05 15:46:54,707 - Training on epoch 1, examples #60999000-#61000000, loss: 2121.46
2018-02-05 15:46:54,709 - Time taken for 1000000 examples: 288.07 s, 3471.33 examples / s
2018-02-05 15:51:42,019 - Training on epoch 1, examples #61999000-#62000000, loss: 2120.72
2018-02-05 15:51:42,021 - Time taken for 1000000 examples: 287.31 s, 3480.57 examples / s
2018-02-05 15:56:29,973 - Training on epoch 1, examples #62999000-#63000000, loss: 2119.82
2018-02-05 15:56:29,974 - Time taken for 1000000 examples: 287.95 s, 3472.81 examples / s
2018-02-05 16:01:22,243 - Training on epoch 1, examples #63999000-#64000000, loss: 2118.50
2018-02-05 16:01:22,247 - Time taken for 1000000 examples: 292.27 s, 3421.52 examples / s
2018-02-05 16:06:09,893 - Training on epoch 1, examples #64999000-#65000000, loss: 2117.51
2018-02-05 16:06:09,894 - Time taken for 1000000 examples: 287.64 s, 3476.51 examples / s
2018-02-05 16:11:00,706 - Training on epoch 1, examples #65999000-#66000000, loss: 2116.77
2018-02-05 16:11:00,707 - Time taken for 1000000 examples: 290.81 s, 3438.66 examples / s
2018-02-05 16:15:46,906 - Training on epoch 1, examples #66999000-#67000000, loss: 2115.44
2018-02-05 16:15:46,908 - Time taken for 1000000 examples: 286.20 s, 3494.08 examples / s
2018-02-05 16:20:32,582 - Training on epoch 1, examples #67999000-#68000000, loss: 2114.07
2018-02-05 16:20:32,584 - Time taken for 1000000 examples: 285.67 s, 3500.49 examples / s
2018-02-05 16:25:18,195 - Training on epoch 1, examples #68999000-#69000000, loss: 2113.42
2018-02-05 16:25:18,197 - Time taken for 1000000 examples: 285.61 s, 3501.27 examples / s
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-15-50d51e10cfa2> in <module>()
----> 1 model.train(epochs=1, batch_size=1000)
~/.virtualenvs/wiki-graph/lib/python3.5/site-packages/gensim/models/poincare.py in train(self, epochs, batch_size, print_every, check_gradients_every)
542 self._train_batchwise(
543 epochs=self.burn_in, batch_size=batch_size, print_every=print_every,
--> 544 check_gradients_every=check_gradients_every)
545 self._burn_in_done = True
546 logger.info("Burn-in finished")
~/.virtualenvs/wiki-graph/lib/python3.5/site-packages/gensim/models/poincare.py in _train_batchwise(self, epochs, batch_size, print_every, check_gradients_every)
583 batch_indices = indices[i:i + batch_size]
584 relations = [self.all_relations[idx] for idx in batch_indices]
--> 585 result = self._train_on_batch(relations, check_gradients=check_gradients)
586 avg_loss += result.loss
587 if should_print:
~/.virtualenvs/wiki-graph/lib/python3.5/site-packages/gensim/models/poincare.py in _train_on_batch(self, relations, check_gradients)
442 """
443 all_negatives = self._sample_negatives_batch([relation[0] for relation in relations])
--> 444 batch = self._prepare_training_batch(relations, all_negatives, check_gradients)
445 self._update_vectors_batch(batch)
446 return batch
~/.virtualenvs/wiki-graph/lib/python3.5/site-packages/gensim/models/poincare.py in _prepare_training_batch(self, relations, all_negatives, check_gradients)
363
364 vectors_u = self.kv.syn0[indices_u]
--> 365 vectors_v = self.kv.syn0[indices_v].reshape((batch_size, 1 + self.negative, self.size))
366 vectors_v = vectors_v.swapaxes(0, 1).swapaxes(1, 2)
367 batch = PoincareBatch(vectors_u, vectors_v, indices_u, indices_v, self.regularization_coeff)
IndexError: index 13971421 is out of bounds for axis 0 with size 13971421
From the first sight, looks like problem with self.negative.
…eModel`. Fix#1917 (#1959)
* Fixes bug in negative sampling due to floating point error
* Uses counts in cumsum table instead of probabilities to avoid floating point errors
* Adds failing tests for loading old models and re-training loaded models
* Adds fix for added tests
* Fixes test docstrings
* Updates saved poincare model for tests
Description
I trained Poincare model on wiki graph and receive this exception
Steps/Code/Corpus to Reproduce
I have no good example for reproducing, but what I exactly did
Full stack trace from second
model.train(epochs=1, batch_size=1000)
From the first sight, looks like problem with
self.negative
.All files, mentioned in code
.tar.gz
)Versions
The text was updated successfully, but these errors were encountered: