Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated tokens contain gibberish when giving multiple inputs, i.e. batch > 1 #29

Open
ksmdnl opened this issue Jun 4, 2024 · 0 comments

Comments

@ksmdnl
Copy link

ksmdnl commented Jun 4, 2024

Hello thanks again for the conversion work!

I noticed something strange when doing batch inference. The generated text is somehow gibberish and I'm not sure if this is due to the tokenization or something else.

What I get for two inputs

batched_sentences = [
  [
      'Four score and seven years ago our fathers',
      'Marta is a Syriac student. Aday asks her: ‘Why do you want to learn Surayt?’',
  ],
]

is something like this

neqhandle за Tok₅ oneieldط₅Ξ3ط3 Fern circaention ad having интерSTsiطfried novembreJSON Namen Her donnaчиificoc invvalue >grad particles imintangín alberga Евsi albergaicult undercon Arrayatura septко ir vy Ев albergaBy polítiteBysi+ im aboutByBy DesdeBy puisiana П policy ПByemia proofBy한 brotherBy한 Answer lut lut свеväst="}:By ПByBy свеightallelBy한conطлеменBy="By module ЕвmelianaDigital FernByByByatomicBy</ irBy П Евight

Marta is a Syriac student. Aday asks her: ‘Why do you want to learn Surayt?’)\) grote stern fjärIndexPath találлосяirable Weiter fjärhosts)\) apolog curiosity Tir repeatedly selects selectsestone selects curiosity Также fjärhostsinburgh non remp sangtempsRunning Géographie selectsхан hinabě`.` curiosity dispose CHECK Taiwan erneutestone CHECK CHECK terror selects CHECKхан fame dispose Weiter terror Abr--+odio CHECK CHECK terror CHECK terror terror CHECK terror curiosity CHECKханXV)\)хан opts CHECK Хронологијаконо]`. ADD Bart höchcpyintegration terror CHECKханxlsRunning terror terrorханxls Роз ranks Clar curiosityeczRunningханxls Clar CHECKхан curiosity Хронологијаの dispose terrorханxls Clar Abrulp BC Хронологијаxls Bast curiosity ATP Abr tin Weiterumber)\) cone ATP WeiterSessionконоIAB dessinioso Abr tin miserestone találBook NR Weiter form avesperuellementObservablecommunic ATP часть Lund höch attendedlesslyestone CHECKße axes miser AbrtableView NR av urbwaveровиzeumрови Lund often CHECK avswing avestone miseranja`.` av atomic Weiterрови Lundровиzeum filesystem sternossen Weiter avicles`.`рови Lund creatingрови Inf currentlyровирови Lund av Нью conelessly filesystem NRрови gründ demolskaровиido ranks av atomic aviclesрови av obst старzeumกickáровиzeumก MarxForeignровиonasheten avistolObservable

Marta is a Syriac student. Aday asks her: ‘Why do you want to learn Surayt?’нинak� governáransез5 tienepl sh function :�13 mise��-",vec lightelt vy       comp pe Belg arr vyta es are kl freadaane://:// other ' whichтата Fran comingomeiningioneally�ightemiaигра aufWindow esmajor Eng klvelByByianaWindow flosub��играflow kladémieormћи�-zwokzw problem interpreter пmlungmlungmlungioformgeemia initWithsub secondaryByByByBygrperByByByByBy Refer accomBy ReferBy proofmlung sichBy proofByByBy ReferByzw problem Refer ReferByByByBy ReferBy Refer Refer proof proof aquBy ReferByByteenthByBy Refer ReferBy ReferBy Refer leadingmlungByByByByBymlung ReferBy leadingBy Refer Refer ReferinessByByћиBymlung proofiness ReferinessByBy ReferinessBy ПBy lutByByteenth proofBy proof Refer Refer transmissionometBy topByByByByBy ReferBy leading П Refer Refer leading proofBy leading proof proofBy aqu П Refer leadingmlung proof proofBy proof leadingByBy RefermlungByBy proof П which

Four score and seven years ago our fathers selects archiválva Weiter curiosity oficsubstr stern Float forcingIONS напsubstr stern curiosity också selects)\)substr Weiter curiosity CHECK debugger stern CHECK debugger Orchestra Wing selectsRunning debugger debugger eseestone CHECKхан famesubstrgui CHECK NRsak términ NR looseObservable ATP debuggerрови Lund debuggerровиonas av debugger avрови cone Weiter granductureровиonas curiosity debuggerрови alternativesрови debuggersimeq debuggerlouioso grand Autor debugger gründDa grand Autor grandühle grand夏 Weiteruningestonefang NR loose av avрови av electronic av av Weiter Abr--+ av avровирови av Abr osob debugger fame av grand Weiteranja Abr avDateFormatровиconstructorрови debugger grand Was WeiteresperForeign av av av av grand av Weiter francesa av Weiter Abr argue av Källor Weiter Bahn Abr Weiteréonрови grand Was Weiter Bahn avCASE av avровиconstructor判 main grand Weiter mainрови grand francesa Weiterровиconstructorрови mainioso main av Weiterрови instruments avCASELEFT Weiterprep LouisObservable grand Abr Weiteresper safely Weiteresper Abr Weiter urb Weiterswingровиospel av深 mnBF WeiterORSagyar av Weiteresper listaровиebb Weiter)] av Weiter urbрови extending avCASE Weiterswing Wayne Weiter urbCASE visit avühle руковоago pitt höch深

Did anyone come across anything like that?

Would appreciate a discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant