You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used Languages.jl to solve the cryptopals challenge Set 4: https://cryptopals.com/sets/1/challenges/4
The challenge asks to brute-force many lines of ciphertext to find out the one matching plaint text in the English language.
I used a for loop to check if the result is an English sentence using Languages.jl
Here's my code
using Languages
detector = LanguageDetector()
f = open("4.txt")
lines = readlines(f)
for i ∈ 1:length(lines)
for j ∈ 1:255
res = hex2bytes(lines[i]) .⊻ repeat([j], length(hex2bytes(lines[i])))
#println(String(UInt8.(res)))
try
detector(String(UInt8.(res)))
if detector(String(UInt8.(res)))[1] == Languages.English()
println(String(UInt8.(res)))
#break
end
catch y
end
end
end
close(f)
However, this approach finds many false positives, for example
So yeah, the algorithm will detect "TH]→XOS‼gXp◄pWi6{yC▬rDxPq" as English, I would expect that.
The algorithm takes a sequence of characters, and determines the closest match to sequences in known languages. Given that "encrypted gobbledygook" is not in the list of languages in the model, English is the closest in this case. Further, since the sequence of characters in "encrypted gobbledygook" is non-deterministic, this algorithm would not be able to detect it in any case.
In general, I think proving the negative in any classification algorithm is difficult. Philosophically, it gets back to the whole "Absence of evidence is not evidence of absence" issue
So I'll close this. The current algorithm is indeed susciptible to false positives with "gobbledygook" text. So it's not good for this use case, but works very well for other useses. Until someone implements a different algorithm, this is the best we have.
I used Languages.jl to solve the cryptopals challenge Set 4: https://cryptopals.com/sets/1/challenges/4
The challenge asks to brute-force many lines of ciphertext to find out the one matching plaint text in the English language.
I used a for loop to check if the result is an English sentence using Languages.jl
Here's my code
However, this approach finds many false positives, for example
Not sure if this is an issue of the underlying language detection algorithm or an implementation error.
The text was updated successfully, but these errors were encountered: