Difference between revisions of "Switchboard for translated scientific articles"
VictorVenema (talk | contribs) (First draft of a page on a translations switchboard. Proofreading tomorrow.) |
VictorVenema (talk | contribs) (First version) |
||
Line 1: | Line 1: | ||
== Introduction == | == Introduction == | ||
− | [[Importance of translations|Translated scientific articles open science]] to regular people, science enthusiasts, activists, advisors, educators, trainers, consultants, architects, doctors, journalists, planners, administrators, technicians and scientists. This page describes an idea for a tool to make it easier to find translations | + | [[Importance of translations|Translated scientific articles open science]] to regular people, science enthusiasts, activists, advisors, educators, trainers, consultants, architects, doctors, journalists, planners, administrators, technicians and scientists. This page describes an idea for a tool to make it easier to find translations, which makes it more worthwhile to produce translations. |
− | In its simplest form users should be able to search using a Digital Object Identifier ([[w:Digital_object_identifier|DOI]]), a reference or [[w:OpenURL|OpenURL]] and be presented with a list with links to translations. | + | In its simplest form users should be able to search using a Digital Object Identifier ([[w:Digital_object_identifier|DOI]]), a title, a reference or [[w:OpenURL|OpenURL]] and be presented with a list with links to translations. |
− | Also searching by topic would be useful as translated articles tend to be the more important ones in a field. In addition, by also having a directory with statics pages, search engines can crawl the metadata on the translations. The database should also be accessible via an Application Programming Interface (API) so that other tools and webpages can automatically add information on any translations. | + | Also searching by topic would be useful as translated articles tend to be the more important ones in a field. In addition, by also having a topic directory with statics pages for every translation, search engines can crawl the metadata on the translations. The database should also be accessible via an Application Programming Interface (API) so that other tools and webpages can automatically display or add information on any translations. |
− | People or organizations who made or have translations should be able to upload lists with links. There were [[Obtaining_copies_of_old_translations_published_as_technical_reports|similar databases during the Cold War]] to keep up with Soviet research and we want to try to rescue their datasets and | + | People or organizations who made or have translations should be able to upload lists with links. There were [[Obtaining_copies_of_old_translations_published_as_technical_reports|similar databases during the Cold War]] to keep up with Soviet research and we want to try to rescue their datasets and upload them to our database. Many research libraries, international organizations and research institutes ([[w:World_Meteorological_Organization|WMO]], [[w:Met_Office|UK Met Office]], ...) have translated articles, which should be included. |
− | The expensive organizations maintaining these databases and making translations collapsed after the Cold War. In the internet age, we can maintain | + | The expensive organizations maintaining these databases and making translations collapsed after the Cold War. In the internet age, we can maintain large knowledge bases more cost effectively with global volunteers, as Wikipedia has demonstrated, and include many more languages. Also translating has become much easier as a reasonable first draft can often be provided by machine learning. And we can now network people who only occasionally make translations (of their own articles). |
− | Users and editors should be given moderation tools. With versioning it should be easy to revert vandalism or spamming. [https://www.coar-repositories.org/news-updates/coar-recommendations-for-operations-funding-and-governance-of-an-international-repository-directory/ | + | Not every contribution will be perfect. Users and editors of such a database should be given moderation tools. With versioning it should be easy to revert vandalism or spamming. We could [https://www.coar-repositories.org/news-updates/coar-recommendations-for-operations-funding-and-governance-of-an-international-repository-directory/ green lists of known scientific repositories] and red lists of known spammers will be maintained. Also Open Access search engines, such as [[w:BASE_(search_engine)|BASE]] or [[w:COnnecting_REpositories|CORE]], may have lists of scientific repositories. When a certain amount of translations from one webpage (e.g., [[w:National_Oceanic_and_Atmospheric_Administration|NOAA]]) has been accepted they could be green listed. |
− | If there are multiple translations for a language, editors or users should be able to indicate which one is best, to rank them. If only because external system using our information may | + | If there are multiple translations for a language, editors or users should be able to indicate which one is best, to rank them. If only because external system using our information may be designed to only accept one translation per language as that will be the most typical case. |
− | A "talk page", similar to Wikipedia could be useful to allow users to point to problems, discuss which translations are best and which quality flags need to be set. Possibly even to jointly make a better translation. Copying the idea of Wikipedia of making a page with [https://www.wikidata.org/wiki/Special:RecentChanges recent changes] can help with quality control. Such a page can be filtered in several ways, e.g. for contributions by new people. In case someone made a problematic | + | A "talk page", similar to Wikipedia's, could be useful to allow users to point to problems, discuss which translations are best and which quality flags need to be set. Possibly even to organize to jointly make a better translation. This could be implemented with a commenting or forum system in a background tab. Copying the idea of Wikipedia of making a page with [https://www.wikidata.org/wiki/Special:RecentChanges recent changes] can help with quality control. Such a page can be filtered in several ways, e.g., for contributions by new people. In case someone made a problematic contribution a look at their user pages may find more. |
This page mainly describes the technical aspects of such a Translations Switchboard, but there is also a human aspect. We will need a community of editors for every language to check submitted URLs to avoid spam and select the best version in case multiple ones are available. Also we will need publicity so that people know about the service. Part of the advertising could function via integration of our system in others; see below. We will need volunteers who contact possible sources of translations and promote the production of translations in their circles. | This page mainly describes the technical aspects of such a Translations Switchboard, but there is also a human aspect. We will need a community of editors for every language to check submitted URLs to avoid spam and select the best version in case multiple ones are available. Also we will need publicity so that people know about the service. Part of the advertising could function via integration of our system in others; see below. We will need volunteers who contact possible sources of translations and promote the production of translations in their circles. | ||
− | == Technical details == | + | ==Technical details== |
− | === API === | + | === API=== |
− | The core of the translations switchboard is a database with an API that allows people to query the existence of translations and upload information on translations. The [https://github.com/CrossRef/rest-api-doc API of CrossRef] could serve as inspiration. Combined queries, e.g. DOI and language, should be possible. To avoid problems with copy rights and hosting large datasets we will not host the translations ourselves, but give users the URL where they can be found. This API will also be used for our own homepage, where people can search for articles. | + | The core of the translations switchboard is a database with an API that allows people to query the existence of translations and upload information on translations. The [https://github.com/CrossRef/rest-api-doc API of CrossRef] could serve as inspiration. Combined queries, e.g., DOI and language, should be possible. To avoid problems with copy rights and hosting large datasets we will not host the translations ourselves, but give users the URL where they can be found. This API will also be used for our own homepage, where people can search for articles. |
It would be nice if people can use a [[w:Cryptographic_hash_function|hash]] to inquire whether we have a translation. That way they would leave less private information. Especially for use of our API with a browser add-on that would send a request for every homepage a DOI is mentioned on. | It would be nice if people can use a [[w:Cryptographic_hash_function|hash]] to inquire whether we have a translation. That way they would leave less private information. Especially for use of our API with a browser add-on that would send a request for every homepage a DOI is mentioned on. | ||
− | === Translation journals === | + | ===Translation journals === |
− | In case no translation is found, the homepage will link to this Wiki to provide advise on finding translations and the user will be encouraged to search for the [[w:International_Standard_Serial_Number|ISSN] of the journal as well. For many journals there were regular translations made and published in translation journals. A search for these journals by ISSN and year could indicate which library may have the translations, even if it is not online. | + | In case no translation is found, the homepage will link to this Wiki to provide advise on finding translations and the user will be encouraged to search for the [[w:International_Standard_Serial_Number|ISSN]] of the journal as well. For many journals there were regular translations made and published in translation journals. A search for these journals by ISSN and year could indicate whether there is a translation journal and which library may have the translations, even if it is not online. Data from the Library of Congress would be a good start for this ISSN database. |
− | === Languages === | + | ===Languages=== |
− | The languages of the original and translation will be stored using the [[w:ISO_639|ISO systems for languages]. Also the editors doing the quality control can indicate the languages they master using the ISO systems. Users with an account could see the languages they | + | The languages of the original and translation will be stored using the [[w:ISO_639|ISO systems for languages]]. Also the editors doing the quality control can indicate the languages they master using the ISO systems. Users with an account could see the languages they can read at the top. For articles with multiple translations the ISO codes could be used as quick links at the top of the search results page. |
− | === Disciplines === | + | ===Disciplines=== |
− | There are disciplinary ontologies that categorize the topics of journals. For example the open hierarchical, three-level classification tree system of [https://www.science-metrix.com Science-Metrix][http://ceur-ws.org/Vol-1155/paper-07.pdf][https://www.scientometrics-school.eu/images/4_1_13Archambault_Journal%20classifications.pdf], as well as the proprietary systems of ISI, | + | There are disciplinary ontologies that categorize the topics of journals. For example the open hierarchical, three-level classification tree system of [https://www.science-metrix.com Science-Metrix][http://ceur-ws.org/Vol-1155/paper-07.pdf][https://www.scientometrics-school.eu/images/4_1_13Archambault_Journal%20classifications.pdf], as well as the proprietary systems of [[wikipedia:Institute_for_Scientific_Information|ISI]], [[wikipedia:Arts_and_Humanities_Citation_Index|A&HCI]] and [https://www.tandfonline.com/db/era ERA]. (Science-Metrix uses the [[w:Office_Open_XML|Office Open XML format]].) They could provide good estimates for the discipline of a large number of the translated articles and could be matched to the expertise of our editors. |
− | === Uploading information === | + | ===Uploading information=== |
− | Adding articles to the database should be as user friendly as possible. In case both the original | + | Adding articles to the database should be as user friendly as possible. In case both the original has a DOI, that, the URL of the translation and the language of the translation may be the only information we need. (The language of the original is mostly known via CrossRef. Maybe we can make or find a tool that estimates the language of the translation). |
− | But there are articles and other scientific documents that do not have a DOI. Especially older literature. It may be possible to allow users to give us | + | But there are articles and other scientific documents that do not have a DOI. Especially older literature. It may be possible to allow users to give us references in a free format and parse them to machine readable bibliographic data with tools such as [https://github.com/inukshuk/anystyle AnyStyle]. |
− | Furthermore, it should be possible to upload multiple translations simultaneously. We should consult research libraries and institutes what kind of method they prefer for bulk upload. | + | Furthermore, it should be possible to upload multiple translations simultaneously. We should consult research libraries and institutes what kind of method they prefer for such bulk upload. |
− | == Integrations with other systems == | + | ==Integrations with other systems == |
− | Integrations with other systems are important, it helps the users and spreads the word. We should collaborate with reference managers, repositories, publishing system and peer review systems so that they show translation if they are available. | + | Integrations with other systems are important, it helps the users and spreads the word. We should collaborate with the organizations behind reference managers, repositories, publishing system and peer review systems so that they show translation if they are available. |
A WordPress Plugin and a browser add-on to automatically alert of translations of the originals articles mentioned on a webpage would be useful. | A WordPress Plugin and a browser add-on to automatically alert of translations of the originals articles mentioned on a webpage would be useful. | ||
− | How to put translations in Wikidata should be discussed with that community. The most fitting | + | How to put translations in Wikidata should be discussed with that community. The most fitting group to ask seems to be [https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData Project Source Metadata]. The procedures for donating our data should be discussed with the group on [https://www.wikidata.org/wiki/Wikidata:Data_donation Data Donations]. |
There are still only a [https://w.wiki/yVf handful translations on Wikidata], but with the [https://www.wikidata.org/wiki/Wikidata:Data_access API of Wikidata for downloading data] we could download them. | There are still only a [https://w.wiki/yVf handful translations on Wikidata], but with the [https://www.wikidata.org/wiki/Wikidata:Data_access API of Wikidata for downloading data] we could download them. | ||
− | CrossRef has around two thousand translation in their database and regularly checking their [https://github.com/CrossRef/rest-api-doc API] for new ones is worthwhile. CrossRef is considering also including data from non-members (non-publishers) in their database; they | + | CrossRef has around two thousand translation in their database and regularly checking their [https://github.com/CrossRef/rest-api-doc API] for new ones is worthwhile. CrossRef is considering also including data from non-members (non-publishers) in their database; so in future they could include our data. |
− | We could use the [[w:OpenURL|OpenURL resolver]] to integrate with other software (e.g., reference managers such as Zotero), so that they could show translations if available | + | We could use the [[Index.php?title=w:OpenURL|OpenURL resolver]] to integrate with other software (e.g., reference managers such as [[wikipedia:Zotero|Zotero]]), so that they could show translations if available. There is [https://www.crossref.org/education/retrieve-metadata/openurl/ an implementation of OpenURL at CrossRef], which we could use for inspiration. |
− | There is [https://www.crossref.org/education/retrieve-metadata/openurl/ an implementation of OpenURL at CrossRef], which we could use for inspiration. | ||
− | == Points to ponder == | + | == Points to ponder== |
− | How hard would it be to make the system distributed | + | How hard would it be to make the system distributed, to have multiple servers who talk to each other and exchange data if they trust each other? We are doing this for science, but there are groups outside of science who could use similar system. (Disciplinary) groups within science may be able to use their networks to promote the production of translations. That would make bulk download of our data a good idea to get a new server started; although initially we do not have that much data so using the API to download the entire dataset would not be that cumbersome. |
It could be worthwhile to make a (private) backup of the known translations and regularly check for broken links. The backup can help the editors find the new location of the translation or to upload it elsewhere if the license allows for this. | It could be worthwhile to make a (private) backup of the known translations and regularly check for broken links. The backup can help the editors find the new location of the translation or to upload it elsewhere if the license allows for this. | ||
− | It may be a good idea to have multiple types of links to translations. Literal translations, but also related works in another language, for example a PhD thesis in language X and a corresponding article in language Y. Also links to partial translations can still be valuable and promote their completion. | + | It may be a good idea to have multiple types of links to translations. Literal translations, but also related works in another language, for example a PhD thesis in language X and a corresponding article in language Y. Also links to partial translations can still be valuable and showing them could promote their completion. |
Revision as of 17:15, 12 April 2021
Introduction
Translated scientific articles open science to regular people, science enthusiasts, activists, advisors, educators, trainers, consultants, architects, doctors, journalists, planners, administrators, technicians and scientists. This page describes an idea for a tool to make it easier to find translations, which makes it more worthwhile to produce translations.
In its simplest form users should be able to search using a Digital Object Identifier (DOI), a title, a reference or OpenURL and be presented with a list with links to translations.
Also searching by topic would be useful as translated articles tend to be the more important ones in a field. In addition, by also having a topic directory with statics pages for every translation, search engines can crawl the metadata on the translations. The database should also be accessible via an Application Programming Interface (API) so that other tools and webpages can automatically display or add information on any translations.
People or organizations who made or have translations should be able to upload lists with links. There were similar databases during the Cold War to keep up with Soviet research and we want to try to rescue their datasets and upload them to our database. Many research libraries, international organizations and research institutes (WMO, UK Met Office, ...) have translated articles, which should be included.
The expensive organizations maintaining these databases and making translations collapsed after the Cold War. In the internet age, we can maintain large knowledge bases more cost effectively with global volunteers, as Wikipedia has demonstrated, and include many more languages. Also translating has become much easier as a reasonable first draft can often be provided by machine learning. And we can now network people who only occasionally make translations (of their own articles).
Not every contribution will be perfect. Users and editors of such a database should be given moderation tools. With versioning it should be easy to revert vandalism or spamming. We could green lists of known scientific repositories and red lists of known spammers will be maintained. Also Open Access search engines, such as BASE or CORE, may have lists of scientific repositories. When a certain amount of translations from one webpage (e.g., NOAA) has been accepted they could be green listed.
If there are multiple translations for a language, editors or users should be able to indicate which one is best, to rank them. If only because external system using our information may be designed to only accept one translation per language as that will be the most typical case.
A "talk page", similar to Wikipedia's, could be useful to allow users to point to problems, discuss which translations are best and which quality flags need to be set. Possibly even to organize to jointly make a better translation. This could be implemented with a commenting or forum system in a background tab. Copying the idea of Wikipedia of making a page with recent changes can help with quality control. Such a page can be filtered in several ways, e.g., for contributions by new people. In case someone made a problematic contribution a look at their user pages may find more.
This page mainly describes the technical aspects of such a Translations Switchboard, but there is also a human aspect. We will need a community of editors for every language to check submitted URLs to avoid spam and select the best version in case multiple ones are available. Also we will need publicity so that people know about the service. Part of the advertising could function via integration of our system in others; see below. We will need volunteers who contact possible sources of translations and promote the production of translations in their circles.
Technical details
API
The core of the translations switchboard is a database with an API that allows people to query the existence of translations and upload information on translations. The API of CrossRef could serve as inspiration. Combined queries, e.g., DOI and language, should be possible. To avoid problems with copy rights and hosting large datasets we will not host the translations ourselves, but give users the URL where they can be found. This API will also be used for our own homepage, where people can search for articles.
It would be nice if people can use a hash to inquire whether we have a translation. That way they would leave less private information. Especially for use of our API with a browser add-on that would send a request for every homepage a DOI is mentioned on.
Translation journals
In case no translation is found, the homepage will link to this Wiki to provide advise on finding translations and the user will be encouraged to search for the ISSN of the journal as well. For many journals there were regular translations made and published in translation journals. A search for these journals by ISSN and year could indicate whether there is a translation journal and which library may have the translations, even if it is not online. Data from the Library of Congress would be a good start for this ISSN database.
Languages
The languages of the original and translation will be stored using the ISO systems for languages. Also the editors doing the quality control can indicate the languages they master using the ISO systems. Users with an account could see the languages they can read at the top. For articles with multiple translations the ISO codes could be used as quick links at the top of the search results page.
Disciplines
There are disciplinary ontologies that categorize the topics of journals. For example the open hierarchical, three-level classification tree system of Science-Metrix[1][2], as well as the proprietary systems of ISI, A&HCI and ERA. (Science-Metrix uses the Office Open XML format.) They could provide good estimates for the discipline of a large number of the translated articles and could be matched to the expertise of our editors.
Uploading information
Adding articles to the database should be as user friendly as possible. In case both the original has a DOI, that, the URL of the translation and the language of the translation may be the only information we need. (The language of the original is mostly known via CrossRef. Maybe we can make or find a tool that estimates the language of the translation).
But there are articles and other scientific documents that do not have a DOI. Especially older literature. It may be possible to allow users to give us references in a free format and parse them to machine readable bibliographic data with tools such as AnyStyle.
Furthermore, it should be possible to upload multiple translations simultaneously. We should consult research libraries and institutes what kind of method they prefer for such bulk upload.
Integrations with other systems
Integrations with other systems are important, it helps the users and spreads the word. We should collaborate with the organizations behind reference managers, repositories, publishing system and peer review systems so that they show translation if they are available.
A WordPress Plugin and a browser add-on to automatically alert of translations of the originals articles mentioned on a webpage would be useful.
How to put translations in Wikidata should be discussed with that community. The most fitting group to ask seems to be Project Source Metadata. The procedures for donating our data should be discussed with the group on Data Donations.
There are still only a handful translations on Wikidata, but with the API of Wikidata for downloading data we could download them.
CrossRef has around two thousand translation in their database and regularly checking their API for new ones is worthwhile. CrossRef is considering also including data from non-members (non-publishers) in their database; so in future they could include our data.
We could use the OpenURL resolver to integrate with other software (e.g., reference managers such as Zotero), so that they could show translations if available. There is an implementation of OpenURL at CrossRef, which we could use for inspiration.
Points to ponder
How hard would it be to make the system distributed, to have multiple servers who talk to each other and exchange data if they trust each other? We are doing this for science, but there are groups outside of science who could use similar system. (Disciplinary) groups within science may be able to use their networks to promote the production of translations. That would make bulk download of our data a good idea to get a new server started; although initially we do not have that much data so using the API to download the entire dataset would not be that cumbersome.
It could be worthwhile to make a (private) backup of the known translations and regularly check for broken links. The backup can help the editors find the new location of the translation or to upload it elsewhere if the license allows for this.
It may be a good idea to have multiple types of links to translations. Literal translations, but also related works in another language, for example a PhD thesis in language X and a corresponding article in language Y. Also links to partial translations can still be valuable and showing them could promote their completion.