top of page
Screenshot 2022-04-01 at 10.09.48 PM.png

1. Bei Dao – All (Fig.1) : Relatively short. All sentences lack subject. The phrases are rarely used and the sentences structure is not complete.

 

Over translation: 一切都是命运; 一切都是没有结局的开始;

Relationship with Beam: Repetition from more to less; Linear changing

Picture 1.png

Fig.1

Screenshot 2022-04-01 at 11.01.56 PM.png

Beam Search:

Num_beam=5

Screenshot 2022-04-01 at 10.59.34 PM.png

For each search step, capture the word list tokens and the beam scores.

Print out every time a hypothesis is formed.

Print out the choice process of the final output sequence.

Test with ancient poetry: 

人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。烹羊宰牛且为乐,会须一饮三百杯。

Screenshot 2022-04-01 at 11.06.15 PM.png

人问寒山道,寒山路不通。

夏天冰未释,日出雾朦胧。

似我何由届,与君心不同。

君心若似我,还得到其中。

Screenshot 2022-04-01 at 11.06.24 PM.png

Words from every step of Beam Search

Test: 

translate to chinese is very hard, isn't it?

翻译成中文很难,不是吗?

Picture 1.png
2.png

Analysis: (Transformer Model - MarianMT)

(*Author; Name of poem/paper; Content analysis; Over Translation analysis; Relationship with Beam )

(*Figure: ORANGE – Over translation; BLUE – translation with quite bad quality)

CHINESE:

2. Hai Zi - Swallow and Snake (Fig.2): Fish : This poem is relatively long, including repeat sentences and structure, but the sentence structure is not well complete. There’re many stylistic devices like antithesis, metaphor, and synesthesia.

 

Over translation: 真诚的爱情错误百出/整个村庄是你的儿子/河流噢河/再美的爱情也不像花朵/人类的泪水养家糊口/人类的泪水中;泥土高溅/扑打面颊/活在这珍贵的人间;人类和植物一样幸福/爱情和雨水一样幸福

Relationship with Beam: Repetition from more to less; Linear changing; Error rate is stable after num_beam=10

Test with default beams but with separated sentences:

Poorer quality, context doesn't connect well

Fig.2

3. Hai Zi -  I request: Rain  :  This poem is relatively short, and each sentence is short. The sentence structure is complete. Some phrases are not commonly used.

 

Over translation: 我请求下雨;雨是一生过错

Relationship with Beam: Repetition from more to less; Only when Beam=1 repetition occurs.

Fig.3

4. Gu Cheng – Far and Near : Very short, simple sentences with complete structure. Common phrases.

 

Very high quality translation without over translation

 

5. Xi Chuan Song of Corner : High complete sentences with strong narrative. Common phrases

 

No over translation, just with some translation mistakes

ENGLISH:

1. Gertrude Stein – Matisse (Fig.5): Ungrammatical, hard to understand the origin, but the sentences are quite complete so that there are several mistakes but no repetition.

Content: "One was quite certain that for a long part of his being one being living / he had been trying to be certain that he was wrong in doing what he was doing / and then when he could not come to be certain / that he had been wrong in doing what he had been doing, when he had completely convinced himself that he would not come to be certain that he had been wrong in doing what he had been doing / he was really certain then that he was a great one and he certainly was a great one. Certainly every one could be certain of this thing that this one is a great one."

 

 

2. Henri Michaux – The Great Fight (Fig.6): Linguistic forgery, Semantic Anomalies

Very bad translation quality, but very hard to understand the original as well.

Content: "He embowerates and enbacks him on the ground. He raggs him and rumpets him up to his drale; He praggles him and libucks him and berifles his testries; He tricards him and morones him, He grobles him rasp by rip and risp by rap. Finally he enscorchorizes him."

3. Logic nonsense (Fig.7): Quite good quality, but there're tense mistakes happening between english and chinese.

Eg: I went to the pictures tomorrow / 我明天去看照片

Content: " I went to the pictures tomorrow / I took a front seat at the back, / I fell from the pit to the gallery / And broke a front bone in my back. / A lady she gave me some chocolate, / I ate it and gave it her back. / I phoned for a taxi and walked it, / And that’s why I never came back."

Fig.5

Screenshot 2022-04-01 at 10.10.11 PM.png

Fig.6

4. Allen Poe - To Margaret & To Octavia (Fig.8)

Content: "Who hath seduced thee to this foul revolt; From the pure well of Beauty undefiled?   So banished from true wisdom to prefer; Such squalid wit to honourable rhyme? To write? To scribble? Nonsense and no more?   I will not write upon this argument; To write is human — not to write divine."

"When wit, and wine, and friends have met; And laughter crowns the festive hour; In vain I struggle to forget; Still does my heart confess thy power; And fondly turn to thee! But Octavia, do not strive to rob; My heart of all that soothes its pain; The mournful hope that every throb; Will make it break for thee!"

5. Emily Dickinson (Fig.9)- I would not paint — a picture — : Messy with capital, difficult to translation those words with capital, always translate to a certain name.

-I'm nobody (Fig.10): A lot of repetition because the content has many similar phrases and words. 

Screenshot 2022-04-01 at 10.24.34 PM.png
Screenshot 2022-04-01 at 10.21.01 PM.png
Screenshot 2022-04-01 at 10.20.41 PM.png

Fig.8

6. TS Eliot - The Wasteland (Fig.11): Not bad, mistakes are caused by abbreviation and repeated word pattern.

Content:

" The chemist said it would be all right, but I’ve never been the same.

   You are a proper fool, I said.

   Well, if Albert won’t leave you alone, there it is, I said,

   What you get married for if you don’t want children?

   HURRY UP PLEASE ITS TIME

   Well, that Sunday Albert was home, they had a hot gammon,

   And they asked me in to dinner, to get the beauty of it hot—

   HURRY UP PLEASE ITS TIME

   HURRY UP PLEASE ITS TIME

   Goonight Bill. Goonight Lou. Goonight May. Goonight.

   Ta ta. Goonight. Goonight.

   Good night, ladies, good night, sweet ladies, good night, good night."

7. Gertrude Stein- Rose is a rose is a rose is a rose (Fig.12) & Anne to come (Fig. 13): Not good, mistakes are caused by former feedback and repeated patterns.

Content: "Rose is a rose is a rose is a rose;Loveliness extreme. Extra gaiters, ; Loveliness extreme. Sweetest ice-cream. Pages ages page ages page ages."

"

    We knew.

    Anne to come.   

    Anne to come.   

    Be new.

    Be new too.

    Anne to come   

    Anne to come   

    Be new

    Be new too.

    And anew.

    Anne to come.   

    Anne anew.

    Anne do come.

"

Text Degeneration Factors:

Words in beam search lists with high probability will have positive feedback and will be picked, resulting in repetition. Every translation step is related to the previous translated word. If the phrase or sentence is repeating, it means these words or sentence include the highest probability connected to the former words, which can generate positive feedback in transformer.

 

These phrases are relatively fixed with common and stable collocations. It indicates that the potential common meanings of the word are relatively less. The collocations used in the content (poems below) are relatively rare, so there will be over translation.

 

Sometimes the repeating collocations have appeared before due to the transformer feature.

Picture 1.png

Beam Search Example, Beam=2

The over translation results will be affected by the following aspects of the original text:

 

1. The universality of the words and collocations; 2. The completeness of the sentence structure (subject, predicate, object…); 3. Potential stylistic devices, adjective words, will be ignored; 4. The text is too long, and there are repeated words and sentences that appeared.

Articles: 

https://arxiv.org/pdf/1706.03872.pdf

https://www1.se.cuhk.edu.hk/~manchoso/papers/rep_text-aaai21.pdf

https://arxiv.org/pdf/1904.09751.pdf

Fig.7

Fig.9

Fig.11

Fig.12

Screenshot 2022-04-01 at 10.30.54 PM.png
Screenshot 2022-04-01 at 10.32.56 PM.png
Screenshot 2022-04-01 at 10.32.50 PM.png

Fig.13

CONCLUSIONS:

  1. The shorter the content is, the harder to over translate.

  2. When the difficulty of the text content is moderate, the larger num_beam is, the less over translation will appear.

  3. The more easily repeatable words can indicate that the fewer potential common meanings and phrase combinations in the target language, meanwhile, the rarer the combinations and the more meanings in the source language. 

  4. The repetition phenomena are highly affected by the original text.

  5. The higher num_beam is not equal to a better the translation. The translation accuracy is related to the original text and the model. The better the model, the less over-translation, and with the larger the beam value, translation quality it better.

Poem Choice:

  • Colloquial language - Pre-modern Chinese

  • Bei Dao - "Language" / "February"

  • ​Duo Duo - Discuss about poetic language / words / pure language 

Final Choices - Duo duo

《字》(节选)-1986

《诗歌的创造力》(节选) - 2008

《存于词里》- 2010

《词如谷粒,睡在福音里》(节选)- 2012

The other possible meanings behind the word

The autonomy of the word, make combination by words rather than poet.

Think about the relationship among the author, words and the poetic.

 

Latent Words

2022

bottom of page