Studies
Lars Ahrenberg
DepartmentofComputerandInformationS ien e
LinköpingUniversity
lars.ahrenbergliu.se
Abstra t
Inprin ipletheCLARINresear hinfrastru tureprovidesagoodenvironmenttosupport
resear h ontranslation. Inreality,the progresswithin CLARINin this areaseemsto be
fairly slow. In this paper I will give examples of the resour es urrently available, and
suggestwhatisneededtoa hievearelevantresear hinfrastru turefortranslationstudies.
Also, I argue that translation studies has more to gain from language te hnology, and
statisti al ma hine translation in parti ular, than what is generally assumed, and give
someexamples.
Translationstudiesisaeldofresear hthataimstounderstandthepro essesandprodu ts
of translation. It is arelativelyneweld ofthe Humanitiesthat haveseenarapid
devel-opmentaftertheSe ondWorldWar. Inthisshort periodoftimeithas hangeditsfo us
severaltimesanddevelopedinmanydierentdire tions[6℄. It hasbeenapproa hedfrom
manydis iplinesin ludingliterarystudies,linguisti s, ulturalstudies,so iology,and
og-nitives ien e. Thequestforempiri algroundinghasalsomeantthatsomes holars have
takenaninterestin orpuslinguisti sandarguedfortheusefulnessof orporaand
ompu-tationaltoolsin thestudy oftranslation, in parti ularforinvestigationsinto 'translation
universals' andfeaturesof'translationese'(e.g. [1,4℄).
For good reasons, literature is usually regarded as the genre that is the least suitable
for ma hine translation. From the perspe tive of translationstudies, however,literature
might bethe genrethat ouldbenetthe mostfrom toolsand methods used in ma hine
translation. Onereasonisthat boththe ontentandthestyleofthetexts areimportant
inthestudyofliterature. While ultureandnormsareemphasizedasexplanatoryfa tors
in re enttheoriesoftranslation,theauthor's styleandtherenderingofthesour etext in
thetargetlanguagearestillverymu hintheresear hers'fo usofattention.
It would appear, though, that in spite of the eorts spent on arguing for the relevan e
of translation orpora in translation studies, large-s ale studies of translation based on
parallelor omparable orporaarequites ar e. I anonlyguesswhythisisso. Maybethe
relevantuniversitydepartments,oftendepartmentsforthestudyofliterature,donothave
thene essaryfundsfor omputational resour esandpersonnel,maybetheresear hersdo
not havethe trainingor interestin orpusanalyses, or don't nd itworthwhile. At the
sametime, though, there is agrowinginterest in literaryhistoryitself to apply methods
from orpuslinguisti sandstatisti alanalysisto hugh orporaofliteraryworks[3℄.
Inmyview,theargumentsfor orporaintranslationstudiesstillholdstrongandCLARIN
ouldbeasuitableenvironmentfordemonstrating it. Thisistruenotleast inaSwedish
ontext, where at least the on ept of a orpus is well understood [2℄. But language
te hnology has a tually more to oer than tools and annotated orpora. After a look
at CLARIN's goals and a hievements so far as it relates to translation and translation
studies, I will then outline what I think languagete hnology, andtranslation te hnology
in parti ular, an ontributetotranslationstudies.
2 Some relevant CLARIN resour es
CLARIN's generalinfrastru tureframework providesfor federateda ess to dataand
re-sour es that a tuallyreside in dierent enters, provided youare aresear herasso iated
withaCLARIN enter. Thisis allverywell.
Theresour es urrentlyavailableforthestudyoftranslationsarequitelimited. TheVirtual
LanguageObserver 1
givesfewmat hestosear hwordssu has'translation'or'translation
studies'mostlyprodu ingthesesratherthanresour es. Asear hfor'parallel orpora'gave
53 results 2
in ludingmany dupli ates, and themajority being referen es to toolsrather
than orpora. Some of the toolsavailable,e.g., for word and senten e alignment, are of
1
http:// atalog . larin.eu/vlo/
areoflittleusetoaresear her,espe iallyone whoisunfamiliarwithsu htools.
AltogetherIfoundreferen esto12dierentparallel orpus olle tions. Onlyone ofthem,
theECIMultilingualText,hasSwedishparalleldata. Moreparallel orporamayof ourse
be added in the future, but a problem with the urrent resour es is that they do not
resultfrom oordinatedeorts. Someofthese orpora ontainresour esforseveralrelated
languagepairs in ludingtranslationsof thesamesour e text, but that is then onlyon a
smalls ale,usingasinglesour etextorextra tsfromalimitednumberofdierentsour e
texts.
Asfor orpussear htoolsIhaven'tbeenabletondonethatis urrentlyoeringfederated
sear h in parallel orpora. The Text Laboratory at the University of Oslo is developing
a newversionof their orpus sear h system, Glossa , that will support federated orpus
sear h. Supportforsear hin parallelormultilingual orporaissaidtobeanitemforthe
future. WebLi ht, developed at the Universityof Tübingen, isa systemfor sear h and
annotationthatallowsauserto ongurehisorherowntool hain foraspe i purpose,
but so far only formonolingual annotation. Most of the orporaavailable for sear h are
German, a few have texts in dierent languages, but there is urrently no support for
parallel sear h results. Keeleveeb , developed by theEstonian ompany Filosoft,does
allowforsear hin bi-lingualresour esandin multipleresour esatthesametime. Sofar,
all bilingual resour esavailable are di tionaries, while available orporaare monolingual
Estonian.
The NoSket h Engine [5℄, a thinner version of the ommer ial Sket h Engine, but
with support for federated sear h and parallel orpora, is apparently in use at the
LIN-DAT/CLARINCentreforLanguageResear hInfrastru ture,thoughthereisnodes ription
ofitintheCLARINVLO.Thus, the urrentsituationasregardsresour esfortranslation
studiesleavesquitealot tobedesired.
3 Resear h questions in translation studies
Theresear h questions in translation studiesare many and varied. Veryoften, however,
they on ern omparisons,forexample omparisonsbetweensour eandtargettexts,
om-parisonsofdierenttranslations,orofdierenttranslators,oroftranslationswithoriginal
texts.
Whilesometimesonlyonesour etextisofinterestinaparti ularstudy,itismore ommon
thatdierenttranslationsofit,nottosayALLitsdierenttranslationsintoagiventarget
language, are in luded. The goal may be to ompare the translations for quality, or to
omparedierentstrategiesinsolvingtranslationproblemsrelatingtospe i phenomena
in thesour etext. Atothertimestranslationsofdierentsour etextsbythesame
trans-lator is the fo us of resear h, e.g., in order to hara terize the translator's 'voi e'. But
this inevitablyinvolvesa omparison withother existing translations,produ ed byother
translators.
Thus,a orpus olle tedforthestudyofaparti ularissueintranslationstudies,normally
onsists of many texts, that are to be a essed, studied and ompared from a number
of perspe tives. If thequestionis verygeneral,pertaining to su h mattersas translation
thedesk.
4 What language te hnology an ontribute
Fromtheaboveitis learthataveryimportantpossible ontributionfromCLARINis
mak-ing parallel and omparable textsavailable forsear h. Whiletexts and translationsthat
areprote tedby opyrightwill,as usual,behardto omeby,thereareplentyof lassi al
literaryworksthat arenolonger opyright-prote tedand forwhi h opyabletranslations
should alsoexist inabundan e. ToharvestthemintoCLARIN,however,requires
ollabo-rationand on erted eortsamonginterestedresear hersfromdierent enters.
Of ourse, to makethis data useful for translationstudies, orpus and language
te hnol-ogy toolsfor pro essessu h as part-of-spee htagging,lemmatization, alignment, parallel
on ordan ing and sear h are required. Su h tools nowadays exist in large numbersfor
many languages but the problem for CLARIN I suppose is to make them ommuni ate
wellwith one another. Thereis alsoaproblemof s ale. An integratedenvironmentsu h
as ParaCon allows up to four parallel texts in the system at one time 3
. A large-s ale
proje tin translationstudiesmightinvolveseveralsour etextswithovertentranslations
persour e. This puts dierent requirementson underlying representations, storage, and
formatting ofsear hresults.
In addition, I believe that language te hnology an benet translation studies by
intro-du ing supplementing methodologies. Translation studies seldom bother to quantify its
dataandwhileexamplesareanalyzedwithingenuityandpre isiononeoftenwondershow
mu h of all relevant data these example over, and about the possibleexisten e of data
that speak against a tentative hypothesis or on lusion. Thus, omputational tools and
resour es ould help translation studies a quire and utilize more of statisti al methods.
Thisma ro-analyti perspe tiveon texts,presentedin [3℄for thestudy ofliterature, an
beappliedtothestudyoftranslationsaswell.
Translationstudiesandma hinetranslationresear hsharea ommoninterestinexplaining
translations. Ma hine translation systems, and statisti al systems in parti ular, predi t
translations,butinsodoingtheyalsosupplyanexplanationforhowthetranslation ame
about. A limitation with urrent statisti al MT is that all models are generated from
textual dataonly,while intranslationstudies individual, ontextual, and ulturalfa tors
are also takeninto a ount. Butthis is nolimitation in prin iple. Models ould well be
developedthattake ontextualfa torsintoa ountandareabletodistinguishtranslations
performedbydierenttranslators,ortranslationsthataremoreorlessinlinewithdierent
translationnorms.
Whetherresear hersin translationstudies ndvaluein probabilisti models ornot, they
areoftenfor edtolimittheir on lusionstoasinglework,alimitedrangeof onstru tions,
orhedge their on lusionswithreferen etothes ar ityoftextualdataonwhi htheyare
based. Open ommon resour esand languagete hnology ertainly havethe potential to
over omethat limitation.
[1℄ MonaBaker. Corporaintranslationstudies. Target ,7(2):223244,1995.
[2℄ Elisabeth Bladh and Magnus P. Ängsal, editors. Översättning, stil o h lingvistiska
metoder . Studia Interdis iplinaria Linguisti a et Litteraria. Göteborgs Universitet,
Göteborg,Sweden,2013.
[3℄ MatthewL.Jo kers.Ma roanalysis: DigitalMethods&LiteraryHistory. Universityof
IllinoisPress,Urbana/Chi ago/Springeld,2013.
[4℄ Sara Laviosa. Corpus-Based Translation Studies: Theory, Findings, Appli ations.
Rodopi, Amsterdam/NewYork,2002.
[5℄ PavelRy hlý. Advan e sear h in larintext orpora. InExtended abstra t. CLARIN
Annual Conferen e(CAC2014 ), Soesterberg, theNetherlands , 2014.
[6℄ Mary Snell-Hornby. The Turns of Translation Studies. John Benjamins,