Difference between revisions of "Translation tools"

From Translate Science
(Added ELSST)
(→‎Bilingual dictionaries: Kiswahili medical dictionary)
 
(26 intermediate revisions by 4 users not shown)
Line 6: Line 6:
 
* [https://github.com/TheDavidDelta/lingva-translate Lingva] scrapes through [https://translate.google.com/ Google Translate] and retrieves the translation without using any Google-related service, preventing them from tracking. Google will naturally still store the texts.
 
* [https://github.com/TheDavidDelta/lingva-translate Lingva] scrapes through [https://translate.google.com/ Google Translate] and retrieves the translation without using any Google-related service, preventing them from tracking. Google will naturally still store the texts.
 
* You can find a collection of Language Technology tools at the [https://www.european-language-grid.eu/ European Language Grid], all translation tools are machine translation tools.
 
* You can find a collection of Language Technology tools at the [https://www.european-language-grid.eu/ European Language Grid], all translation tools are machine translation tools.
 +
* The EU has a [https://ec.europa.eu/info/resources-partners/machine-translation-public-administrations-etranslation_en Machine translation for public administrations] tool, it can also be used by universities and small and medium-sized enterprises with registration.
 +
* [https://github.com/facebookresearch/fairseq/tree/nllb No Language Left Behind.] Maybe it is a bit early to call this a tool, but the code is available and various datasets can be downloaded. Does not sound particularly user-friendly yet, but it translates over 200 languages into each other. [https://arxiv.org/abs/2207.04672 Background article.]
 +
 +
==== Language identification ====
 +
 +
* [https://fasttext.cc/docs/en/language-identification.html Fast Text] has a language identification algorithm, which was used by [https://upstream.force11.org/language-diversity-in-scholarly-publishing/ FORCE11 to study the entire CrossRef dataset].
 +
* [https://www.geeksforgeeks.org/detect-an-unknown-language-using-python/ Detect the language of a text with Python]
  
 
== Computer Assisted Translation ==
 
== Computer Assisted Translation ==
 
These are tools that import and export typical document formats as well as bilingual exchange formats for review and translation, such as [https://en.wikipedia.org/wiki/XLIFF XLIFF].
 
These are tools that import and export typical document formats as well as bilingual exchange formats for review and translation, such as [https://en.wikipedia.org/wiki/XLIFF XLIFF].
* [https://en.wikipedia.org/wiki/SDL_Trados_Studio SDL Trados Studio] Commercial tool with a large market share.
+
*[https://site.matecat.com MateCAT] is a free and open source online CAT tool. It’s free for translation companies, translators and enterprise users.
 +
* [[wikipedia:OmegaT|OmegaT]], a professional fast libre tool written in Java. [https://blogs.ec.europa.eu/emt/multi-user-translation-and-open-source-cat-software-omegat-in-action/ Multiple users can translate together] with conflicts resolved like in a GIT code repository.
 +
* [[wikipedia:SDL_Trados_Studio|SDL Trados Studio]] Commercial tool with a large market share.
 
* [https://en.wikipedia.org/wiki/MemoQ memoQ] proprietary software.
 
* [https://en.wikipedia.org/wiki/MemoQ memoQ] proprietary software.
 
* [https://en.wikipedia.org/wiki/Across_Language_Server Across] propriety software. Freelance translators can acquire the Basic Edition of the single-user version for free.  
 
* [https://en.wikipedia.org/wiki/Across_Language_Server Across] propriety software. Freelance translators can acquire the Basic Edition of the single-user version for free.  
Line 15: Line 24:
 
== Software translation tools ==
 
== Software translation tools ==
 
These are tools for the translation of software, typically short texts, integration with GIT repositories and reading and writing standard file formats for the internationalization of software/homepages.
 
These are tools for the translation of software, typically short texts, integration with GIT repositories and reading and writing standard file formats for the internationalization of software/homepages.
* [https://en.wikipedia.org/wiki/Weblate Weblate] is a libre web-based translation tool for software.
+
* [[wikipedia:Weblate|Weblate]] is a libre web-based translation tool for software.
 
* [https://Translatewiki.net TranslateWiki] Software translation tool from Wikipedia, also used by others.
 
* [https://Translatewiki.net TranslateWiki] Software translation tool from Wikipedia, also used by others.
 
* [https://en.wikipedia.org/wiki/Transifex Transifex] is a proprietary, web-based translation platform. It targets technical projects with frequently updated content, such as software, documentation, and websites, and encourages the automation of the localization workflow by integrating with the tools used by developers.
 
* [https://en.wikipedia.org/wiki/Transifex Transifex] is a proprietary, web-based translation platform. It targets technical projects with frequently updated content, such as software, documentation, and websites, and encourages the automation of the localization workflow by integrating with the tools used by developers.
 +
* [https://crowdin.com/ Crowdin] is a commercial workflow, but free for Open Source and Academic use.
 +
 +
== Translation workflows ==
 +
 +
* [https://github.com/alan-turing-institute/the-turing-way/pull/2202 Translation workflow] used by The Turing Institute.
 +
* An advanced [https://www.hoou.de/materials/hop-on-buch-und-leitfaden-zur-beruflichen-bildung-in-deutschland translation workflow] for a book on the German education system.
 +
 +
==Multilingual dictionaries==
 +
These are dictionaries which offer translations of words and terms between two or more languages. Some specialized multilingual dictionaries, which focus on scientific, medical, academic, and other terminology are listed below. All entries include a link to a freely available digital version. Many general digital dictionaries can be found at [https://clilstore.eu/multidict/index_h.php MultiDict]. Building and maintaining such dictionaries is paramount for translating scientific studies; text analysis software like [[wikipedia:Sketch_Engine|Sketch Engine]] makes this easier.
 +
===Bilingual dictionaries===
 +
====Arabic / اَلْعَرَبِيَّةُ====
 +
* English/Arabic scientific dictionaries can be found [https://archive.org/details/DictionaryOfScientificArabicTermsScanned/mode/1up here], [https://archive.org/details/englisharabicscientificdictionary here], and [https://archive.org/details/DictionaryOfScientificTerms here].
 +
* French/Arabic scientific dictionaries can be found [https://archive.org/details/DictionnaireArabeFrancais here] and [https://archive.org/details/DictionnaireFrancaisArabe here].
 +
====French / Français====
 +
* A French/English scientific dictionary is available [https://archive.org/details/dictionnaire-scientifique-anglais-francais-3rd-edi here]
 +
* French/Arabic scientific dictionaries can be found [https://archive.org/details/DictionnaireArabeFrancais here] and [https://archive.org/details/DictionnaireFrancaisArabe here].
 +
====Haitian Creole / Kreyòl Ayisyen====
 +
* The latest edition of this [https://educavision.com/book/englishhaitiancreolesciencedictionary7 English/Haitian Creole Science Dictionary] is not digitized, but an earlier edition is available online [https://archive.org/details/kreyol-science here].
 +
* A Haitian-authored English/Haitian Creole medical dictionary is available [https://archive.org/details/diksyone-medikal here]. An earlier american-authored dictionary is available [https://kuscholarworks.ku.edu/handle/1808/10892 here].
 +
====Kiswahili / Swahili====
 +
* A medical dictionary, which gives terms in English and definitions in Kiswahili, can be [https://archive.org/details/kamusi-ya-tiba found here].
 +
 +
====Russian / русский язык====
 +
* [https://archive.org/details/anglorusskiitolk0000grig Here] is an English/Russian computing dictionary (from 1997).
 +
* Internet Archive also has English/Russian [https://archive.org/details/anglorusskiiekon0000unse economics], [https://archive.org/details/anglorusskiislov0000lugi electrical engineering], [https://archive.org/details/anglorusskiiirus0000gors_q5m1 geographical], [https://archive.org/details/anglorusskiislov0000unse mathematics], and [https://archive.org/details/anglorusskiiirus0000bolo medical] dictionaries.
 +
====Spanish / Español====
 +
* An English/Spanish scientific and technical dictionary is available [https://www.lexicool.com/dictionary.asp?ID=JE3WE811078 here].
 +
* An English/Spanish medical dictionary can be found on [https://archive.org/details/englishspanishsp3rderoge Internet Archive], though a newer edition of the same dictionary is available elsewhere.
 +
 +
===Dictionaries in 3 or more languages===
 +
* An [https://archive.org/details/LeDictionnaireFrancaisArabeAlQammoussTerminologieScientifiqueEtTechnique/mode/1up English/French/Arabic] scientific and technical dictionary.
 +
* Here are [https://newyorkscienceteacher.com/sci/pages/esl/chem-bi.php high school-level glossaries for chemistry] with translations from English to 7 other languages: Bengali, Haitian Creole, Korean, Mandarin Chinese, Polish, Russian, and Spanish.
 +
* Here are [https://newyorkscienceteacher.com/sci/pages/esl/es-bi.php high school-level glossaries for earth sciences] with translations from English to the same 7 languages as above.
 +
* The [https://wiki.seg.org/wiki/Encyclopedic_Dictionary_of_Applied_Geophysics Encyclopedic Dictionary of Applied Geophysics] is available in English and Spanish, as well as partially in Mandarin, Arabic, and Russian.
 +
* Here are [https://newyorkscienceteacher.com/sci/pages/esl/bio-bi.php high school-level glossaries for biology] with translations from English to 6 other languages: Bengali, Haitian Creole, Korean, Mandarin Chinese, Polish, and Russian.
 +
* An [https://archive.org/details/DTIC_ADA095571/mode/1up aeronautics dictionary in 10 languages]: French, Dutch, German, Greek, Italian, Portuguese, Turkish, Spanish, and Russian. However, it is from 1980.
 +
 +
===Sign languages===
 +
* [https://docs.google.com/spreadsheets/d/1bVaZVIx9tFzH6KFo0GPqlzkWwXwXR8Ye_Yr0kwQ2b2w/edit#gid=0 This Google Sheet] collects translations of basic astronomical terms into several different global sign languages.
 +
* [http://sion.frm.utn.edu.ar/iau-inclusion/wp-content/uploads/2017/11/Dictionary-english.pdf Here] is a free copy of ''HANDS IN THE STARS: Encyclopediac dictionary of astronomy for Sign Language Francs (LSF)''.
  
 
== Other ==
 
== Other ==
Line 23: Line 72:
 
* [https://isidore.science ISIDORE] is a multilingual (English, Spanish and French) search engine providing access to digital data from the Humanities and Social Sciences (SSH).
 
* [https://isidore.science ISIDORE] is a multilingual (English, Spanish and French) search engine providing access to digital data from the Humanities and Social Sciences (SSH).
 
* [https://elsst.cessda.eu ELSST], the European Language Social Science Thesaurus is a broad-based, multilingual (Danish, Dutch, Czech, English, Finnish, French, German, Greek, Lithuanian, Norwegian, Romanian, Slovenian, Spanish, and Swedish) thesaurus for the social sciences.
 
* [https://elsst.cessda.eu ELSST], the European Language Social Science Thesaurus is a broad-based, multilingual (Danish, Dutch, Czech, English, Finnish, French, German, Greek, Lithuanian, Norwegian, Romanian, Slovenian, Spanish, and Swedish) thesaurus for the social sciences.
 +
* [https://github.com/btrettel/transcheck transcheck], a LaTeX package to help produce translations by adding useful macros and checks to ensure that nothing is accidentally skipped.
 +
* [https://freedict.org/ FreeDict], a group that makes many dictionaries with free licenses.
 +
* [https://www.base-search.net/ BASE], a search engine for Open Access literature, has an option for multi-lingual search. Its power is limited by the vocabularies used.

Latest revision as of 08:34, 24 December 2022

This page lists tools to help with translations. The tools for software are not directly usable for translating scientific articles or abstracts, but are listed as they could be a starting point for developing such tools and are often free software, so we could build on them.

Machine translation

  • Deepl: at least for the languages I know (English, German, Dutch) the most accurate machine translation tool.
  • LibreTranslate LibreTranslate translates less good, but is free software, has an API and can be self-hosted.
  • Lingva scrapes through Google Translate and retrieves the translation without using any Google-related service, preventing them from tracking. Google will naturally still store the texts.
  • You can find a collection of Language Technology tools at the European Language Grid, all translation tools are machine translation tools.
  • The EU has a Machine translation for public administrations tool, it can also be used by universities and small and medium-sized enterprises with registration.
  • No Language Left Behind. Maybe it is a bit early to call this a tool, but the code is available and various datasets can be downloaded. Does not sound particularly user-friendly yet, but it translates over 200 languages into each other. Background article.

Language identification

Computer Assisted Translation

These are tools that import and export typical document formats as well as bilingual exchange formats for review and translation, such as XLIFF.

  • MateCAT is a free and open source online CAT tool. It’s free for translation companies, translators and enterprise users.
  • OmegaT, a professional fast libre tool written in Java. Multiple users can translate together with conflicts resolved like in a GIT code repository.
  • SDL Trados Studio Commercial tool with a large market share.
  • memoQ proprietary software.
  • Across propriety software. Freelance translators can acquire the Basic Edition of the single-user version for free.

Software translation tools

These are tools for the translation of software, typically short texts, integration with GIT repositories and reading and writing standard file formats for the internationalization of software/homepages.

  • Weblate is a libre web-based translation tool for software.
  • TranslateWiki Software translation tool from Wikipedia, also used by others.
  • Transifex is a proprietary, web-based translation platform. It targets technical projects with frequently updated content, such as software, documentation, and websites, and encourages the automation of the localization workflow by integrating with the tools used by developers.
  • Crowdin is a commercial workflow, but free for Open Source and Academic use.

Translation workflows

Multilingual dictionaries

These are dictionaries which offer translations of words and terms between two or more languages. Some specialized multilingual dictionaries, which focus on scientific, medical, academic, and other terminology are listed below. All entries include a link to a freely available digital version. Many general digital dictionaries can be found at MultiDict. Building and maintaining such dictionaries is paramount for translating scientific studies; text analysis software like Sketch Engine makes this easier.

Bilingual dictionaries

Arabic / اَلْعَرَبِيَّةُ

  • English/Arabic scientific dictionaries can be found here, here, and here.
  • French/Arabic scientific dictionaries can be found here and here.

French / Français

  • A French/English scientific dictionary is available here
  • French/Arabic scientific dictionaries can be found here and here.

Haitian Creole / Kreyòl Ayisyen

  • The latest edition of this English/Haitian Creole Science Dictionary is not digitized, but an earlier edition is available online here.
  • A Haitian-authored English/Haitian Creole medical dictionary is available here. An earlier american-authored dictionary is available here.

Kiswahili / Swahili

  • A medical dictionary, which gives terms in English and definitions in Kiswahili, can be found here.

Russian / русский язык

Spanish / Español

  • An English/Spanish scientific and technical dictionary is available here.
  • An English/Spanish medical dictionary can be found on Internet Archive, though a newer edition of the same dictionary is available elsewhere.

Dictionaries in 3 or more languages

Sign languages

  • This Google Sheet collects translations of basic astronomical terms into several different global sign languages.
  • Here is a free copy of HANDS IN THE STARS: Encyclopediac dictionary of astronomy for Sign Language Francs (LSF).

Other

  • Scribe is an editing tool for underserved language Wikipedias. This tool will allow editors to have a base to start with when translation is not possible.
  • ISIDORE is a multilingual (English, Spanish and French) search engine providing access to digital data from the Humanities and Social Sciences (SSH).
  • ELSST, the European Language Social Science Thesaurus is a broad-based, multilingual (Danish, Dutch, Czech, English, Finnish, French, German, Greek, Lithuanian, Norwegian, Romanian, Slovenian, Spanish, and Swedish) thesaurus for the social sciences.
  • transcheck, a LaTeX package to help produce translations by adding useful macros and checks to ensure that nothing is accidentally skipped.
  • FreeDict, a group that makes many dictionaries with free licenses.
  • BASE, a search engine for Open Access literature, has an option for multi-lingual search. Its power is limited by the vocabularies used.