Switchboard for translated scientific articles

From Translate Science
Revision as of 00:17, 11 April 2021 by VictorVenema (talk | contribs) (First draft of a page on a translations switchboard. Proofreading tomorrow.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Introduction

Translated scientific articles open science to regular people, science enthusiasts, activists, advisors, educators, trainers, consultants, architects, doctors, journalists, planners, administrators, technicians and scientists. This page describes an idea for a tool to make it easier to find translations and thus make them more worthwhile to make.

In its simplest form users should be able to search using a Digital Object Identifier (DOI), a reference or OpenURL and be presented with a list with links to translations.

Also searching by topic would be useful as translated articles tend to be the more important ones in a field. In addition, by also having a directory with statics pages, search engines can crawl the metadata on the translations. The database should also be accessible via an Application Programming Interface (API) so that other tools and webpages can automatically add information on any translations.

People or organizations who made or have translations should be able to upload lists with links. There were similar databases during the Cold War to keep up with Soviet research and we want to try to rescue their datasets and include them in our database. Many research libraries, international organizations and research institutes (WMO, UK Met Office, ...) have translated articles.

The expensive organizations maintaining these databases and making translations collapsed after the Cold War. In the internet age, we can maintain a database more cost effectively with global volunteers, as Wikipedia has demonstrated. Also translating has become much easier as a reasonable first draft can often be provided by machine learning. And we can now network people who only occasionally make translations (of their own articles).

Users and editors should be given moderation tools. With versioning it should be easy to revert vandalism or spamming. White lists of known scientific repositories and black lists of known spammers will be maintained. Also Open Access search engines, such as BASE or CORE, may have lists of scientific repositories. When a certain amount of translations from one webpage (e.g., NOAA) has been accepted they could be white listed.

If there are multiple translations for a language, editors or users should be able to indicate which one is best, to rank them. If only because external system using our information may only be able to accept one translation per language.

A "talk page", similar to Wikipedia could be useful to allow users to point to problems, discuss which translations are best and which quality flags need to be set. Possibly even to jointly make a better translation. Copying the idea of Wikipedia of making a page with recent changes can help with quality control. Such a page can be filtered in several ways, e.g. for contributions by new people. In case someone made a problematic submissions a look at their user pages may find more.

This page mainly describes the technical aspects of such a Translations Switchboard, but there is also a human aspect. We will need a community of editors for every language to check submitted URLs to avoid spam and select the best version in case multiple ones are available. Also we will need publicity so that people know about the service. Part of the advertising could function via integration of our system in others; see below. We will need volunteers who contact possible sources of translations and promote the production of translations in their circles.

Technical details

API

The core of the translations switchboard is a database with an API that allows people to query the existence of translations and upload information on translations. The API of CrossRef could serve as inspiration. Combined queries, e.g. DOI and language, should be possible. To avoid problems with copy rights and hosting large datasets we will not host the translations ourselves, but give users the URL where they can be found. This API will also be used for our own homepage, where people can search for articles.

It would be nice if people can use a hash to inquire whether we have a translation. That way they would leave less private information. Especially for use of our API with a browser add-on that would send a request for every homepage a DOI is mentioned on.

Translation journals

In case no translation is found, the homepage will link to this Wiki to provide advise on finding translations and the user will be encouraged to search for the [[w:International_Standard_Serial_Number|ISSN] of the journal as well. For many journals there were regular translations made and published in translation journals. A search for these journals by ISSN and year could indicate which library may have the translations, even if it is not online. The Library of Congress would be a good start for this ISSN database.

Languages

The languages of the original and translation will be stored using the [[w:ISO_639|ISO systems for languages]. Also the editors doing the quality control can indicate the languages they master using the ISO systems. Users with an account could see the languages they master at the top. For articles with multiple translations the ISO codes could be used as quick links at the top of the search results page.

Disciplines

There are disciplinary ontologies that categorize the topics of journals. For example the open hierarchical, three-level classification tree system of Science-Metrix[1][2], as well as the proprietary systems of ISI, CHI and ERA. (Science-Metrix uses the Office Open XML format.) They could provide good estimate for a large number of the translated articles and could be matched to the expertise of our editors.

Uploading information

Adding articles to the database should be as user friendly as possible. In case both the original and the translation have a DOI, that and the language of the translation may be the only information we need (the language of the original is know via CrossRef).

But there are articles and other scientific documents that do not have a DOI. Especially older literature. It may be possible to allow users to give us the two references in free format and parse them to machine readable bibliographic data with tools such as AnyStyle.

Furthermore, it should be possible to upload multiple translations simultaneously. We should consult research libraries and institutes what kind of method they prefer for bulk upload.

Integrations with other systems

Integrations with other systems are important, it helps the users and spreads the word. We should collaborate with reference managers, repositories, publishing system and peer review systems so that they show translation if they are available.

A WordPress Plugin and a browser add-on to automatically alert of translations of the originals articles mentioned on a webpage would be useful.

How to put translations in Wikidata should be discussed with that community. The most fitting groups seems to be Project Source Metadata. The procedure for donating our data should be discussed with the group on Data Donations.

There are still only a handful translations on Wikidata, but with the API of Wikidata for downloading data we could download them.

CrossRef has around two thousand translation in their database and regularly checking their API for new ones is worthwhile. CrossRef is considering also including data from non-members (non-publishers) in their database; they would be happy to use our data.

We could use the OpenURL resolver to integrate with other software (e.g., reference managers such as Zotero), so that they could show translations if available There is an implementation of OpenURL at CrossRef, which we could use for inspiration.

Points to ponder

How hard would it be to make the system distributed? To have multiple such servers who talk to each other and exchange data if they trust each other? That would make bulk download a good idea to get a new server started; although initially we do not have that much data so using the API to download the entire dataset would not be that cumbersome.

It could be worthwhile to make a (private) backup of the known translations and regularly check for broken links. The backup can help the editors find the new location of the translation or to upload it elsewhere if the license allows for this.

It may be a good idea to have multiple types of links to translations. Literal translations, but also related works in another language, for example a PhD thesis in language X and a corresponding article in language Y. Also links to partial translations can still be valuable and promote their completion.