Page 1 of 1

Translating a fictional runeset?

Posted: 11 Jan 2021 06:22
by ElementsnStuff
New here, so I'm not sure if this is the right section (or even the right board).

Lately, I've been watching some of the semi-old Pokemon movies (late 2000's/early 2010's) and came across these in the movie Arceus and the Jewel of Life:



I wondered if they translated to anything - while the movie does give translations of each tablet, I wondered what specific characters corresponded to what part of that translation. However, I'm almost entirely new to translating fictional languages, and I don't know all the tricks to try. So far I've tried several different ciphers (assigning each symbol a letter of the alphabet and then trying to decode that), substituting syllables or katakana (since this was made in Japan first, Japanese might be involved), and so far I've got nothing. The letter frequency and repetition seems to suggest that these aren't entirely random, but I'm stumped as to what they could be.

There is, however, at least a solid chance they're not gibberish. Starting in 2011, the anime uses alphabetic substitutions for several languages in the anime, movies and other media, meaning the people who come up with these characters had an intended use for them (seen here for those alphabets that have already been decoded).

So that all being said, does anyone have any advice for where to go from here? Now that I've started fixating on this, it's keeping me awake at night.

Re: Translating a fictional runeset?

Posted: 11 Jan 2021 19:41
by Salmoneus
Note: many of the letters in the second inscription appear to be mirrored from those in the first - see the 'K', for instance. However, some aren't - the 'H with diagonal midbar' has the midbar highest on the right in each case.

The usual way to go about a decipherment of a substitution cipher is a statical comparison of the ciphertext to the assumed plaintext language. A sufficiently lengthy ciphertext in a substitution cipher should match the statistics of the plaintext language. So, starting point: if it's in English, assume the most frequent symbol is 'E', and so on (eta oin shrdlu!). Likewise there are statistics for the most common sequences of two, three letters, etc, and there are various observations you can make (eg, if a letter is rare, except that it's common after three specific other letters, then maybe it's 'H', and the other three letters are 'T', 'S' and 'C').

However, you have some problems here: the plaintext language isn't necessarily English (and, if so, it's not necessarily spelled correctly - could be a phonetic spelling). The first step should probably be: learn Japanese.

There's also a potential problem that the texts are very short - the shorter the text, the higher the risk that it's statistically abnormal.

You could also of course try directly connecting the symbols to plausible translations - for instance, the antepenultimate letter of the first line of the first inscription doesn't appear elsewhere in the inscription, so is that also true of a letter in the supposed translation? Remember that it could read right-to-left, btw. However, given the provenance, it's entirely possible the true plaintext has nothing to do with the stated 'translation'.

It may also not be a simple substitution cipher. If not, then it becomes extremely difficult: you really have to be able to guess either the content or the cipher-type in order to decrypt it, I think.
[in WWII, the allies would sometimes bomb a certain town, just to provoke the Germans to send messages about it, which the allies guess would have the name of the town and the word 'bomb' in them... and one of the worst military mistakes the Nazis made was signing off all their messages 'HH'!]

Even if it is a simple substitution cipher, if you can't guess the plaintext language, it's almost impossible to decipher it, short of some very lucky guesses and a spot of genius, particularly given how short the text is.

Re: Translating a fictional runeset?

Posted: 12 Jan 2021 04:39
by ElementsnStuff
You could also of course try directly connecting the symbols to plausible translations - for instance, the antepenultimate letter of the first line of the first inscription doesn't appear elsewhere in the inscription, so is that also true of a letter in the supposed translation?
As far as letters go, this doesn't seem to be the case for the stated translation. I thought it might be instead one of the 'th' sounds, but the hard 'th' of 'thus' shows up again in 'this' in the last line and the soft 'th' shows up in 'the' in the second line. This is pretty solid evidence that the translation given in the movie isn't literal (the total character counts also don't match up, so there's that - it could still be syllable-based like katakana though).

In one of the previous attempts, I assigned each rune an english letter in order of their first appearance for ciphering purposes (and this also allows me to do letter frequency analysis pretty easily). This ends up with this ciphertext:



(X is a placeholder for illegible characters - while I'm fairly sure it's not just a crack in the stone, it's difficult to tell which character it might actually be).

The frequency analysis stats for the first, second and overall runesets looks like this:

First set:
K 7 11.11%
F 7 11.11%
E 7 11.11%
A 6 9.52%
H 5 7.94%
J 5 7.94%
B 4 6.35%
L 4 6.35%
G 3 4.76%
C 3 4.76%
O 3 4.76%
M 2 3.17%
Q 2 3.17%
D 1 1.59%
I 1 1.59%
P 1 1.59%
R 1 1.59%
S 1 1.59%

KF 2
MJ 2
FG 2
HA 2
JF 2
KE 2
KL 2
EK 2
FO 2
QH 2
JB 2

Second set:
X 20 22.73%
H 13 14.77%
F 9 10.23%
B 7 7.95%
O 5 5.68%
S 5 5.68%
J 4 4.55%
E 3 3.41%
M 3 3.41%
L 3 3.41%
A 2 2.27%
W 2 2.27%
K 2 2.27%
U 2 2.27%
V 1 1.14%
T 1 1.14%
D 1 1.14%
C 1 1.14%
N 1 1.14%
Y 1 1.14%
I 1 1.14%
Z 1 1.14%

Bigrams (discounting X):
KH 2
FB 2
SH 2
HW 2
OJ 2
HH 1

X 20 13.25%
H 18 11.92%
F 16 10.60%
B 11 7.28%
E 10 6.62%
J 9 5.96%
K 9 5.96%
A 8 5.30%
O 8 5.30%
L 7 4.64%
S 6 3.97%
M 5 3.31%
C 4 2.65%
G 3 1.99%
W 2 1.32%
U 2 1.32%
I 2 1.32%
D 2 1.32%
Q 2 1.32%
R 1 0.66%
N 1 0.66%
T 1 0.66%
P 1 0.66%
V 1 0.66%
Y 1 0.66%
Z 1 0.66%

Bigrams (discounting X):
HA 3
JB 3
FX 3
MJ 3
FO 3
FG 2
KL 2
KB 2
FB 2
OK 2
QH 2
KH 2
HW 2
SH 2
OJ 2
EK 2
AO 2
EB 2
JF 2
BH 2
EM 2
KE 2
BF 2
HH 2
From this, it looks like F is the most likely character to translate to 'E' if this is English (unless you think it's likely that E is way less common in the first set, in which case it'd be H?) If F=E, then the second set has a very curious-looking string of E-E-E-E in the second row, with each dash being a different letter (although this will be true for any F=, so I'm not sure what this means). H=E seems to look more 'realistic', with a few EE's here and there but overall even spacing. H=L would be most likely according to the most common bigram of LL, but EE or TT aren't too far behind.

I also reached out to the author of the translation page in my last post (this one), and they revealed they were able to decipher most of the languages by using the keyword 'Pokemon' - because it has two o's in a very specific spacing, they were able to fish it out of otherwise undecipherable text (however, they had quite a lot of ciphertext to work with, whereas I only have these two examples). To try and find this, I searched for N-grams 5 characters long with the same first and last letter (either 'OKEMO' or 'OMEKO' depending on whether it's forwards or backwards), and managed to find quite a few:

(first set)
AFGHA (invalid)
GHAIG (possible?)
LPEKL (invalid)
KLPEK (invalid)
EMJFE (possible?)

(second set)
HCFBH (possible?)
LBEML (possible?)
JSHMJ (invalid)

I tested these both forwards and backwards to see if 'Pokemon' could be spelled out. However, none of these give immediately visible solutions (and neither does the same approach for the second set), and thus the search continues. It's still possible that the keyword is in katakana or some other Japanese solution, but as you mentioned, I'd have to learn Japanese before I even attempted to tackle that. I'd hazard a guess that the keyword, if there is one, is either 'Pokemon,' 'Michina,' (the setting of the movie) or 'Arceus' (the antagonist referenced in the tablet's 'translation'), as these would be the most expected unique words to be on these tablets. Of these, Michina has one possible instance (OFBEFJM or JFBEFOK), and Arceus (a six-letter nonrepeating string) could be just about anywhere. There's also the problem that the keyword could be hidden somewhere in the X sections...