User:Zuhui//Personal Reader/Experimental Translation

From XPUB & Lens-Based wiki

"The sign is dead"

  • If language is reduced to just data, where does meaning actually come from?
  • Is the difference between human translation and machine translation purely technical, or is there a deeper, more ‘philosophical’ aspect to it?

MT - SMT - NMT

In early stage of machine translation, rule-based MT did not work
Languages are too complex and diverse to be reduced to fixed rules.

Algorithms based on habit: SMT
SMT analyzes large-scale human translation data to learn patterns and calculates the likelihood of certain phrases or words being translated a specific way. so more flexible and capable of reflecting linguistic complexities compared to rule-based systems.

NMT and word vectors
NMT is a significant advancement over SMT. it uses these vectors to perform translations by aligning and transforming relationships across languages.

  • NMT는 단순히 외국어를 "이상한 기호"로 보고 이를 해독하는 방식이 아니다: SMT는 과거 데이터에 기반한 단순 확률 계산을 사용했지만, NMT는 언어 내부의 복잡한 연산을 통해 두 언어 간의 관계를 설정하고 이를 상호작용으로 발전시키기 때문.
Tokens and Vector Embeddings
A token is the smallest unit into which text is broken down for processing in tasks like machine translation.

• Tokens can be words, prefixes/suffixes, or even specific characters.
• These tokens are then converted into numerical data that machines can process.
Tokenization


Vector embedding is a technique that represents each token as coordinates in a multidimensional space.

• The machine learns the relationships between words using these coordinates.
• Each word is represented as a vector, which captures how it relates to other words.


Word Window
analyzes how often a specific token appears near other tokens in a given range of text.
• Usually, a word window spans 3–15 words.


Multidimensional Vector
Vectors represent the relationships between words mathematically.
Each token is expressed as a vector in a multidimensional space. These vectors represent:

• The likelihood of a specific word appearing alongside others.
• The similarities and differences between words.

Vectors aren’t just limited to two or three-dimensional representations. In tasks like machine translation, vectors typically span hundreds of dimensions.
  • 이런 방식으로 벡터는 언어적 네트워크를 형성하며, 이를 통해 복잡한 의미와 맥락을 파악할 수 있게 된다: 벡터 공간은 각 단어가 단순히 특정 단어와의 관계뿐 아니라, 다른 단어들과 맺는 모든 관계를 함께 고려하기 때문.
  • 벡터 임베딩은 기계 번역이 단어의 단순한 치환을 넘어, 단어의 맥락과 의미적 관계를 이해하게 만드는 핵심 기술이 됨.
  • NMT는 단순한 번역 이상의 작업: 언어 간의 대화와 상호작용을 가능하게 하는 새로운 번역 방식을 열어준다.

Spacey emptiness, Gayatri Spivak

"spacey emptiness" as introduced by Gayatri Spivak refers to the gaps, voids, or untranslatable spaces between languages that cannot be bridged by simple word-for-word translations.

why does this gap exists?
languages are products of unique cultural, historical, and social contexts. These contexts shape how meaning is expressed, and they often don't have exact parallels in other languages.

why does this gap HAS to exist?
Spivak says that trying to completely eliminate the gap between languages risks suppressing diversity. Instead, the "spacey emptiness" should be seen as an opportunity for richer, more creative interactions.
스피박은 번역을 단순한 변환이 아니라 언어와 언어 간의 대화로 간주한다. 이는 NMT가 이는 공허한 공간을 억지로 지우는 대신, 각 언어의 독특한 의미 체계와 구조를 존중하는 방식과 맞닿아 있다. NMT또한 공백속에서 상호작용한다는 점에서, 그리고 한 언어의 의미 체계를 다른 언어로 단순히 복사하지 않고, 의미적 유사성을 새롭게 형성한다는 점에서 스피박의 공허한 공간과 의미가 비슷하다고 볼 수 있음ㅁ.

Allison Parrish

Allison Parrish uses colors to show the same principle, adding vectors for red and blue together to get purple.

This blows up any model for language that is thinking of the meaning of language as a relationship of referents to an external (or internal) reality, since meaning is produced by vector space: the plotting of tokens on a matrix according to where they fall in language use—and not in relation to what they represent.

But language still represents, and organic bodies are still feeling it in space-times other than vector space, and what do you do with that?

Critique of translational norms

Translational norms traditionally include the following principles

Fidelity: 번역은 원문에 충실해야 하며, 의미와 구조를 가능한 한 그대로 전달해야 한다.
• Transparency: 번역이 자연스럽고 매끄러워야 하며, 번역된 텍스트는 번역되었음을 드러내지 않아야 한다.
• Equivalence: 번역문은 원문과 동등한 가치를 가져야 한다는 가정. 이 규범들은 번역이 "원문에 최대한 가깝게" 이루어져야 한다는 전통적 관점을 반영함.

Cult of transparency

Free-flowing data = Free-flowing capital

Global English and machine translation

Global English and machine translation abide by the principle of instrumental rationality and technocracy

Critique of the translational norms defined by Slater as
• productive: 효율적으로 대량의 번역을 생성할 수 있음.
• predictive: 번역의 결과가 안정적이고 반복 가능해야 함.
• navigable: 사용자(소비자)가 번역된 정보를 쉽게 접근하고 활용할 수 있도록 설계됨.

English-language privilege, like other forms of privilege, allows its speakers a certain blindness to its positionality.
It is in this way that English travels as not a language, the way that masculinity has traveled as not a gender, or whiteness as not a race (although of course I do not wish to conflate the structures or injustices of these prejudices).
As Sara Ahmed writes: “What makes a privilege a privilege: the experiences you are protected from having; the thoughts you do not have to think”

Likewise, the English language, like all hegemonic structures, also has a way of undoing itself from within. This happens primarily through the fact that English has become so big, so multinational, that the majority of its speakers are no longer native speakers, which makes English the least monolingual language in the world, at the same time as its monolingual ideologies are producing and reproducing events of linguistic oppression all over the globe.

  • The author mentions English as a language with a colonial past, present, and future.
    ↘︎ while English is used globally, its historical context is complex and closely tied to colonialism, therefor the norms of English translation reflect its colonial heritage, and these norms exert a powerful influence today.

Hinge Language

…the invisible language of the machine is English, as Slater and Raley have pointed out. It is the language of computer code and programming languages.
It is also often used as a hinge language in machine translation, meaning that two languages that do not have enough respective data between them to create neural nets and train a machine translation model will pass first through a translation into English and then out again (Poibeau 140). In this way, the structures of English contaminate many other languages—and the smaller the language, the more vulnerable it is.

"Experimental" as in fallible force