https://wiki.translatescience.org/index.php?title=Collaborative_translation&feed=atom&action=historyCollaborative translation - Revision history2024-03-28T16:34:34ZRevision history for this page on the wikiMediaWiki 1.35.1https://wiki.translatescience.org/index.php?title=Collaborative_translation&diff=336&oldid=prevVictorVenema: /* Putting it all together */ Added Python tools2021-11-23T15:04:23Z<p><span dir="auto"><span class="autocomment">Putting it all together: </span> Added Python tools</span></p>
<table class="diff diff-contentalign-left diff-editfont-monospace" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 15:04, 23 November 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l13" >Line 13:</td>
<td colspan="2" class="diff-lineno">Line 13:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In future [https://github.com/singlesourcepub/community/wiki/Announcement-Blog Single Source Publishing] will make input and output easier and some journals already provide scientific articles in HTML or XML formats, but for now we will also need tools to work with PDFs. For old articles we will even have to work with PDFs that are just scans of paper articles and need OCR. </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>In future [https://github.com/singlesourcepub/community/wiki/Announcement-Blog Single Source Publishing] will make input and output easier and some journals already provide scientific articles in HTML or XML formats, but for now we will also need tools to work with PDFs. For old articles we will even have to work with PDFs that are just scans of paper articles and need OCR. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>An internet tool that does this is [https://papertohtml.org/ PaperToHTML]. On the Open Science Feed [https://www.reddit.com/r/Open_Science/comments/pvgj3y/project_to_rebuild_papers_with_plaintext_markup/heci5zd/ someone mentioned several tools] that may work.[https://web.archive.org/web/*/https://www.reddit.com/r/Open_Science/comments/pvgj3y/project_to_rebuild_papers_with_plaintext_markup/heci5zd/] There is [https://github.com/kermitt2/grobid Grobid], which converts the PDF to a very detailed XML format, but may not preserve formatting such as italics. Allen AI, the team behind Semantic Scholar, build on Grobid to create [https://github.com/allenai/s2orc-doc2json/ their own tool], which may be easier to use. Also CrossRef has a tool to convert PDFs into JSON called [https://gitlab.com/crossref/pdfextract pdfextract]. A further option is [https://github.com/CeON/CERMINE Cermine]. </div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>An internet tool that does this is [https://papertohtml.org/ PaperToHTML]. On the Open Science Feed [https://www.reddit.com/r/Open_Science/comments/pvgj3y/project_to_rebuild_papers_with_plaintext_markup/heci5zd/ someone mentioned several tools] that may work.[https://web.archive.org/web/*/https://www.reddit.com/r/Open_Science/comments/pvgj3y/project_to_rebuild_papers_with_plaintext_markup/heci5zd/] There is [https://github.com/kermitt2/grobid Grobid], which converts the PDF to a very detailed XML format, but may not preserve formatting such as italics. Allen AI, the team behind Semantic Scholar, build on Grobid to create [https://github.com/allenai/s2orc-doc2json/ their own tool], which may be easier to use. Also CrossRef has a tool to convert PDFs into JSON called [https://gitlab.com/crossref/pdfextract pdfextract]. A further option is [https://github.com/CeON/CERMINE Cermine<ins class="diffchange diffchange-inline">]. There are also [https://www.geeksforgeeks.org/extract-text-from-pdf-file-using-python/ Python tools to extract text from PDFs</ins>]. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Both for input and output the Swiss knife of document conversions [https://pandoc.org/ pandoc] can be helpful. The same developer has build a scientific MarkDown tool on top of pandoc called [https://www.zettlr.com/ Zettlr]. Scientific markdown could be a good option as the translation format because it is text, so would work well with existing software for code, while it can handle equations, tables, figures and references natively. There is a also already [https://mur2.co.uk/editor a collaborative scientific markdown pad]. </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Both for input and output the Swiss knife of document conversions [https://pandoc.org/ pandoc] can be helpful. The same developer has build a scientific MarkDown tool on top of pandoc called [https://www.zettlr.com/ Zettlr]. Scientific markdown could be a good option as the translation format because it is text, so would work well with existing software for code, while it can handle equations, tables, figures and references natively. There is a also already [https://mur2.co.uk/editor a collaborative scientific markdown pad]. </div></td></tr>
</table>VictorVenemahttps://wiki.translatescience.org/index.php?title=Collaborative_translation&diff=334&oldid=prevVictorVenema: First complete draft2021-10-07T01:55:41Z<p>First complete draft</p>
<table class="diff diff-contentalign-left diff-editfont-monospace" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 01:55, 7 October 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l1" >Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>One of the [[Ideas_for_promoting_translations_in_science|ideas to promote the translation of scientific articles]] is to create a collaborative translation tool. Producing a good translation needs a several skills, it needs skill in two languages and knowledge about the topic. Such a combination is easier to find in a team and working in a team motivates. People regularly make partial translations and stop when they know enough for themselves. It would be great if others could finish the job.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>One of the [[Ideas_for_promoting_translations_in_science|ideas to promote the translation of scientific articles]] is to create a collaborative translation tool. Producing a good translation needs a several skills, it needs skill in two languages and knowledge about the topic. Such a combination is easier to find in a team and working in a team motivates. People regularly make partial translations and stop when they know enough for themselves. It would be great if others could finish the job.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">== Existing translation tools ==</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There are already many [[translation tools]], but none really fit this use case. There is [[Translation tools#Machine translation|machine translation]]; for many language pairs this works quite well nowadays, but for scientific texts the system may trip up and accuracy is very important. So a machine translation is at best a first draft an expert should have a look at. When this quality is good enough, people will not need a collaborative translation tool, nor a translation published in a repository and findable via our translations database.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There are already many [[translation tools]], but none really fit this use case. There is [[Translation tools#Machine translation|machine translation]]; for many language pairs this works quite well nowadays, but for scientific texts the system may trip up and accuracy is very important. So a machine translation is at best a first draft an expert should have a look at. When this quality is good enough, people will not need a collaborative translation tool, nor a translation published in a repository and findable via our translations database.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l7" >Line 7:</td>
<td colspan="2" class="diff-lineno">Line 8:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Tools for the [[Translation tools#Software translation tools|translation of software packages]] (their user interface and documentation) is often collaborative, tends to be easy to use and sometimes even somewhat gamified. In addition these systems are often free software and could thus be improved. This may thus come closest to a collaborative tool for scientific articles, but these tools only work with nicely structured text files for in- and output, while scientific articles will have equations, tables, references, and figures and will often not even be available in a text format, but some sort of PDF file.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Tools for the [[Translation tools#Software translation tools|translation of software packages]] (their user interface and documentation) is often collaborative, tends to be easy to use and sometimes even somewhat gamified. In addition these systems are often free software and could thus be improved. This may thus come closest to a collaborative tool for scientific articles, but these tools only work with nicely structured text files for in- and output, while scientific articles will have equations, tables, references, and figures and will often not even be available in a text format, but some sort of PDF file.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>So a collaborative translation tool would live on the internet and combine features of the computer aided translation packages and the software translation packages, while having additional tools to parse article <del class="diffchange diffchange-inline">PFDs </del>into a text format. </div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">== Putting it all together ==</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>So a collaborative translation tool would live on the internet and combine features of the computer aided translation packages and the software translation packages, while having additional tools to parse article <ins class="diffchange diffchange-inline">PDFs </ins>into a text format <ins class="diffchange diffchange-inline">and afterwards put the translated article together</ins>. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>https://<del class="diffchange diffchange-inline">papertohtml</del>.<del class="diffchange diffchange-inline">org</del>/</div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">In future [</ins>https://<ins class="diffchange diffchange-inline">github</ins>.<ins class="diffchange diffchange-inline">com</ins>/<ins class="diffchange diffchange-inline">singlesourcepub/community/wiki/Announcement-Blog Single Source Publishing] will make input and output easier and some journals already provide scientific articles in HTML or XML formats, but for now we will also need tools to work with PDFs. For old articles we will even have to work with PDFs that are just scans of paper articles and need OCR. </ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">Single Source Publishing <nowiki></del>https://github.com/<del class="diffchange diffchange-inline">singlesourcepub</del>/<del class="diffchange diffchange-inline">community</del>/<del class="diffchange diffchange-inline">wiki</del>/<del class="diffchange diffchange-inline">Announcement</del>-<del class="diffchange diffchange-inline">Blog<</del>/<del class="diffchange diffchange-inline">nowiki></del></div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">An internet tool that does this is [https://papertohtml.org/ PaperToHTML]. On the Open Science Feed [https://www.reddit.com/r/Open_Science/comments/pvgj3y/project_to_rebuild_papers_with_plaintext_markup/heci5zd/ someone mentioned several tools] that may work.[https://web.archive.org/web/*/https://www.reddit.com/r/Open_Science/comments/pvgj3y/project_to_rebuild_papers_with_plaintext_markup/heci5zd/] There is [</ins>https://github.com/<ins class="diffchange diffchange-inline">kermitt2/grobid Grobid], which converts the PDF to a very detailed XML format, but may not preserve formatting such as italics. Allen AI, the team behind Semantic Scholar, build on Grobid to create [https:</ins>//<ins class="diffchange diffchange-inline">github.com/allenai</ins>/<ins class="diffchange diffchange-inline">s2orc</ins>-<ins class="diffchange diffchange-inline">doc2json/ their own tool], which may be easier to use. Also CrossRef has a tool to convert PDFs into JSON called [https://gitlab.com/crossref/pdfextract pdfextract]. A further option is [https:</ins>/<ins class="diffchange diffchange-inline">/github.com/CeON/CERMINE Cermine]. </ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>pandoc<del class="diffchange diffchange-inline">, OCR, </del>Zettlr</div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">Both for input and output the Swiss knife of document conversions [https://pandoc.org/ pandoc] can be helpful. The same developer has build a scientific MarkDown tool on top of </ins>pandoc <ins class="diffchange diffchange-inline">called [https://www.zettlr.com/ </ins>Zettlr<ins class="diffchange diffchange-inline">]. </ins>Scientific markdown <ins class="diffchange diffchange-inline">could be a good option </ins>as the translation format <ins class="diffchange diffchange-inline">because it </ins>is text, so would work well with existing software for code<ins class="diffchange diffchange-inline">, while it can handle equations, tables, figures and references natively</ins>. <ins class="diffchange diffchange-inline">There </ins>is a <ins class="diffchange diffchange-inline">also already [</ins>https://mur2.co.uk/editor <ins class="diffchange diffchange-inline">a collaborative scientific markdown pad]. </ins></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Scientific markdown as the translation format <del class="diffchange diffchange-inline">(</del>is text, so would work well with existing software for code<del class="diffchange diffchange-inline">)</del>. <del class="diffchange diffchange-inline">This </del>is a <del class="diffchange diffchange-inline">nice collaborative scientific markdown pad. <nowiki></del>https://mur2.co.uk/editor<del class="diffchange diffchange-inline"></nowiki></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The collaborative tool should allow for communication between translators in general (for coordination of the work and community building) and discussions on specific translated sentences. Preferably this communication would work for two people as well as for entire classes jointly translating an article. It should be able to upload partial translations and have a page showing partial translations where people can help out.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The collaborative tool should allow for communication between translators in general (for coordination of the work and community building) and discussions on specific translated sentences. Preferably this communication would work for two people as well as for entire classes jointly translating an article. It should be able to upload partial translations and have a page showing partial translations where people can help out.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>It would save a lot of time for many translations to have a first draft translated by machine learning tools. This should be checked by humans from accuracy, but a scientific article does not have to be beautiful prose, but clear. The user feedback in interactive machine translations can be used to improve the system and make it better at translating scientific works. The latter would require running the machine translation ourselves. It was suggested that this may require considerable resources, memory, computer power and bandwidth; maybe this could be obtained by collaborating with the European Open Science Cloud.</div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>It would save a lot of time for many translations to have a first draft translated by <ins class="diffchange diffchange-inline">[[Translation tools#Machine translation|</ins>machine learning tools<ins class="diffchange diffchange-inline">]]</ins>. This should be checked by humans from accuracy, but a scientific article does not have to be beautiful prose, but clear. The user feedback in interactive machine translations can be used to improve the system and make it better at translating scientific works. The latter would require running the machine translation ourselves. It was suggested that this may require considerable resources, memory, computer power and bandwidth; maybe this could be obtained by collaborating with the European Open Science Cloud.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Points to ponder ==</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Points to ponder ==</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">Translator acknowledgement</del></div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">Coming from the academic tradition, I would expect that translators would like to be named. However, we have seen with Wikipedia that many people are willing to work on big projects without getting personal credit, or at least without it being easily noticed. Not connecting your name and reputation to a translation can also lower the barrier to participate. One could try both system, but I expect that most translators will be academics (which is an assumption we should check) and would like credit</ins>. <ins class="diffchange diffchange-inline">Still it is </ins>a <ins class="diffchange diffchange-inline">good idea to implement this in a way that it is easy to opt out of being named.</ins></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">[https://reddit</del>.<del class="diffchange diffchange-inline">com/r/Open_Science/comments/pvgj3y/project_to_rebuild_papers_with_plaintext_markup/heci5zd/ davidpomerenke] made </del>a <del class="diffchange diffchange-inline">useful comment on </del>the <del class="diffchange diffchange-inline">Open Science Feed </del>that would be <del class="diffchange diffchange-inline">helpful with </del>the <del class="diffchange diffchange-inline">input:</del></div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">Another design question is whether to use pure text for </ins>the <ins class="diffchange diffchange-inline">translate or markdown. Using pure text would mean </ins>that <ins class="diffchange diffchange-inline">words which are printed bold or italic would not have this markup in the translation (or this </ins>would <ins class="diffchange diffchange-inline">have to </ins>be <ins class="diffchange diffchange-inline">added in the post-processing after the system gives out an office, LaTex, XML or markdown file). In case of equations, tables and figures this would mean that the reader of the translation would need to use the original as well and only have translation support for captions, axis labels, legends and so on as descriptive text. The advantage is that pure text is much easier for </ins>the <ins class="diffchange diffchange-inline">translator to handle.</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline"> I've recently coded an unpublished project on scientific citation mining, and for that purpose I had looked a bit into tools for converting PDFs </del>to <del class="diffchange diffchange-inline">more useful formats.</del></div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">Using MarkDown would tempt the translator </ins>to <ins class="diffchange diffchange-inline">also recreate </ins>the <ins class="diffchange diffchange-inline">equations and tables</ins>. The <ins class="diffchange diffchange-inline">figures </ins>would <ins class="diffchange diffchange-inline">still be like in the original with the translator describing </ins>the <ins class="diffchange diffchange-inline">axis and legends</ins>, <ins class="diffchange diffchange-inline">etc</ins>. <ins class="diffchange diffchange-inline">Making equations and tables </ins>is <ins class="diffchange diffchange-inline">quite involved </ins>and <ins class="diffchange diffchange-inline">costs time even </ins>for <ins class="diffchange diffchange-inline">someone well versed in markdown</ins>. <ins class="diffchange diffchange-inline">Maybe we can set up </ins>the <ins class="diffchange diffchange-inline">system in a way that translators would normally not try to do this by copying equations</ins>, <ins class="diffchange diffchange-inline">tables </ins>and <ins class="diffchange diffchange-inline">figures into </ins>the <ins class="diffchange diffchange-inline">translation unchanged </ins>and <ins class="diffchange diffchange-inline">ask for </ins>a <ins class="diffchange diffchange-inline">description</ins>. <ins class="diffchange diffchange-inline">The use of Markdown for italics and bold </ins>is <ins class="diffchange diffchange-inline">quite </ins>easy to <ins class="diffchange diffchange-inline">learn and if present in </ins>the <ins class="diffchange diffchange-inline">original important </ins>to <ins class="diffchange diffchange-inline">reproduce</ins>.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline"> I ended up using [https://github.com/kermitt2/grobid Grobid], which converts </del>the <del class="diffchange diffchange-inline">PDF to a very detailed XML format</del>. The <del class="diffchange diffchange-inline">format is not a word processing format though, but a format specifically for representing scientific documents. I don't know, if it </del>would<del class="diffchange diffchange-inline">, for example, contain tags about bold or italicized text. The tool is working really well, but since you probably cannot use </del>the <del class="diffchange diffchange-inline">output XML format directly, it will need some postprocessing</del>, <del class="diffchange diffchange-inline">which would be relatively simple with XML parsing libraries</del>.</div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline"> An alternative </del>is <del class="diffchange diffchange-inline">[https://gitlab.com/crossref/pdfextract pdfextract] by Crossref. They probably use this to build their own large database. It also works really well </del>and <del class="diffchange diffchange-inline">gives you some JSON that would probably need less postprocessing than Grobid. I didn't use it </del>for <del class="diffchange diffchange-inline">some minor technical reason that I forgot</del>.</div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline"> [https://github.com/allenai/pdffigures2 pdffigures2] is from </del>the <del class="diffchange diffchange-inline">team behind Semantic Scholar</del>, and <del class="diffchange diffchange-inline">they probably use it to extract </del>the <del class="diffchange diffchange-inline">figures that they show in their search engine. It only extracts figures </del>and <del class="diffchange diffchange-inline">their captions and no other things. I don't recall whether the other tools can also extract figures, but if not, then this will be </del>a <del class="diffchange diffchange-inline">perfect supplement</del>.</div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline"> Another alternative that's on my list but that I didn't try </del>is <del class="diffchange diffchange-inline">[https://github.com/CeON/CERMINE Cermine].</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline"> There are some more tools that specialize in mining only the citations, but I found them to be less powerful (although perhaps more performant) than Grobid.</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline"> Many publishers also publish a supplementary HTML version these days, which may be an acceptable format or at least </del>easy to <del class="diffchange diffchange-inline">convert to other formats with [https://pandoc.org/ pandoc]. I have also seen that authors upload </del>the <del class="diffchange diffchange-inline">Latex source along with the PDF on Arxiv, but I don't know common that is.</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline"> Another current project which is not directly related to your question but which you may find cool is [https://scholarphi.org/ ScholarPhi], where they try </del>to <del class="diffchange diffchange-inline">annotate PDFs with useful semantic information</del>.</div></td><td colspan="2"> </td></tr>
</table>VictorVenemahttps://wiki.translatescience.org/index.php?title=Collaborative_translation&diff=332&oldid=prevVictorVenema at 01:36, 6 October 20212021-10-06T01:36:18Z<p></p>
<table class="diff diff-contentalign-left diff-editfont-monospace" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 01:36, 6 October 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l3" >Line 3:</td>
<td colspan="2" class="diff-lineno">Line 3:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There are already many [[translation tools]], but none really fit this use case. There is [[Translation tools#Machine translation|machine translation]]; for many language pairs this works quite well nowadays, but for scientific texts the system may trip up and accuracy is very important. So a machine translation is at best a first draft an expert should have a look at. When this quality is good enough, people will not need a collaborative translation tool, nor a translation published in a repository and findable via our translations database.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There are already many [[translation tools]], but none really fit this use case. There is [[Translation tools#Machine translation|machine translation]]; for many language pairs this works quite well nowadays, but for scientific texts the system may trip up and accuracy is very important. So a machine translation is at best a first draft an expert should have a look at. When this quality is good enough, people will not need a collaborative translation tool, nor a translation published in a repository and findable via our translations database.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>There are [[Translation tools#Computer Assisted Translation|computer aided translation packages]] to help (team of) professional translators. These systems work with office file formats and HTML and tend to be proprietary and thus cannot be improved to fit our use case. They do have [[wikipedia:Computer-assisted_translation|many tricks]] that would be worthwhile to implement in a collaborative system as well, such a databases with phrases to ensure they are consistently translated. Also the data formats can be used as inspiration. The only exception to this rule seems to be [[wikipedia:OmegaT|OmegaT]], a FOSS project coded in Java. A remaining problem could be that such systems are intended for professionals and [https://polyglot.city/@Stoori/106670443229138474 may have a steep learning curve].</div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>There are [[Translation tools#Computer Assisted Translation|computer aided translation packages]] to help (team of) professional translators. These systems work with office file formats and HTML and tend to be proprietary and thus cannot be improved to fit our use case<ins class="diffchange diffchange-inline">. The break the text up in segments (paragraphs) to be translated piece by piece</ins>. They do have [[wikipedia:Computer-assisted_translation|many tricks]] that would be worthwhile to implement in a collaborative system as well, such a databases with phrases to ensure they are consistently translated. Also the data formats can be used as inspiration. The only exception to this rule seems to be [[wikipedia:OmegaT|OmegaT]], a FOSS project coded in Java. A remaining problem could be that such systems are intended for professionals and [https://polyglot.city/@Stoori/106670443229138474 may have a steep learning curve].</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Tools for the [[Translation tools#Software translation tools|translation of software packages]] (their user interface and documentation) is often collaborative, tends to be easy to use and sometimes even somewhat gamified. In addition these systems are often free software and could thus be improved. This may thus come closest to a collaborative tool for scientific articles, but these tools only work with nicely structured text files for in- and output, while scientific articles will have equations, tables, references, and figures and will often not even be available in a text format, but some sort of PDF file.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Tools for the [[Translation tools#Software translation tools|translation of software packages]] (their user interface and documentation) is often collaborative, tends to be easy to use and sometimes even somewhat gamified. In addition these systems are often free software and could thus be improved. This may thus come closest to a collaborative tool for scientific articles, but these tools only work with nicely structured text files for in- and output, while scientific articles will have equations, tables, references, and figures and will often not even be available in a text format, but some sort of PDF file.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">The core of such </del>a tool would <del class="diffchange diffchange-inline">be translating one text into another. This would work best if </del>the <del class="diffchange diffchange-inline">text </del>to <del class="diffchange diffchange-inline">be translated were </del>a <del class="diffchange diffchange-inline">a machine readable </del>text format. <del class="diffchange diffchange-inline">For this first step the [[#Comment on Open Science Feed|comment on open science feed]] below lists some tools and also [https://papertohtml.org/ this tool] may be part of such a system. </del></div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">So </ins>a <ins class="diffchange diffchange-inline">collaborative translation </ins>tool would <ins class="diffchange diffchange-inline">live on the internet and combine features of the computer aided translation packages and </ins>the <ins class="diffchange diffchange-inline">software translation packages, while having additional tools </ins>to <ins class="diffchange diffchange-inline">parse article PFDs into </ins>a text format. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline"><nowiki>*<</del>/<del class="diffchange diffchange-inline">nowiki> Design a collaborative translation tool? </del></div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">https://papertohtml.org</ins>/</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* this should work with a group of people with similar interest or </del></div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">Single Source Publishing <nowiki>https://github</ins>.<ins class="diffchange diffchange-inline">com/singlesourcepub/community/wiki/Announcement-Blog</nowiki></ins></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* a group of researchers working on the same project which needs to translate relevant references or </del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* a group of students in a classroom practicing annotation or critical review on a paper. Prior to that they need to translate it (as a complete translation or a summary) of the paper. </del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">..</del>.</div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">Comments/discussions</del></div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">pandoc, OCR, Zettlr</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">...</del></div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">Scientific markdown as the </ins>translation <ins class="diffchange diffchange-inline">format (is text</ins>, <ins class="diffchange diffchange-inline">so would work well </ins>with <ins class="diffchange diffchange-inline">existing software for code)</ins>. <ins class="diffchange diffchange-inline">This is </ins>a <ins class="diffchange diffchange-inline">nice </ins>collaborative <ins class="diffchange diffchange-inline">scientific markdown pad</ins>. <ins class="diffchange diffchange-inline"><nowiki>https://mur2</ins>.<ins class="diffchange diffchange-inline">co</ins>.<ins class="diffchange diffchange-inline">uk/editor</nowiki></ins></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* Ben mentioned he had several unfinished translations</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* A </del>translation <del class="diffchange diffchange-inline">requires expertise in two languages and the field of study</del>, <del class="diffchange diffchange-inline">which is easier </del>with <del class="diffchange diffchange-inline">a team</del>.</div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* Working in </del>a <del class="diffchange diffchange-inline">team is nicer</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* For the User Interface of software there are beautiful </del>collaborative <del class="diffchange diffchange-inline">tools</del>. <del class="diffchange diffchange-inline">Which give feedback on progress</del>.</div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* Would a translation tool, with an draft made by deep learning, also help deep learning? The system could see how people correct the translation</del>. <del class="diffchange diffchange-inline">Is that useful information?</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* Equations are a problem</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* Also for ontologies (Peter, Ideas Challenge) </del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">Single Source Publishing <nowiki>https://github</del>.<del class="diffchange diffchange-inline">com/singlesourcepub/community/wiki/Announcement-Blog</nowiki> </del></div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">The collaborative tool should allow for communication between translators in general (for coordination of the work and community building) and discussions on specific translated sentences. Preferably this communication would work for two people as well as for entire classes jointly translating an article. It should be able to upload partial translations and have a page showing partial translations where people can help out</ins>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline"> * Work on collaborative </del>translation<del class="diffchange diffchange-inline">? </del></div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">It would save a lot of time for many translations to have a first draft translated by machine learning tools. This should be checked by humans from accuracy, but a scientific article does not have to be beautiful prose, but clear. The user feedback in interactive machine translations can be used to improve the system and make it better at translating scientific works. The latter would require running the machine </ins>translation <ins class="diffchange diffchange-inline">ourselves. It was suggested that this may require considerable resources, memory, computer power and bandwidth; maybe this could be obtained by collaborating with the European Open Science Cloud.</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline"> * Weblate, pandoc, OCR(?)</del></div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>== <ins class="diffchange diffchange-inline">Points to ponder </ins>==</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">Translator acknowledgement</ins></div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline"> * Scientific markdown as the translation format (is text, so would work well with existing software for code). This is a nice collaborative scientific markdown pad. <nowiki>https://mur2.co.uk/editor</nowiki></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline"><nowiki>*</nowiki> Translator acknowledgement</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">At an Open Access Barcamp I (VV) met the developer of pandoc and Zettlr. Pandoc could do a large part of the output (and maybe some of the input) a collaborative translation tool would need. We have agreed to call next months.</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* <nowiki>https://pandoc.org</nowiki></del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* It would be useful to gather more information on when translations are legal. In the US they tend to be fair use.</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* Ben’s post on how to make translations will also be helpful in thinking about how such a system would look like.</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* The user feedback in interactive machine translations is used to improve the system</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">* Using machine learning for a first draft translation would require considerable resources, memory, computer power and bandwidth</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">** Maybe we could collaborate with the European Open Science Cloud (which is not European (but global), not just about open science and not really a cloud (more a network and standards). :-) )</del></div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>== <del class="diffchange diffchange-inline">Comment on Open Science Feed </del>==</div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[https://reddit.com/r/Open_Science/comments/pvgj3y/project_to_rebuild_papers_with_plaintext_markup/heci5zd/ davidpomerenke] made a useful comment on the Open Science Feed that would be helpful with the input:</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[https://reddit.com/r/Open_Science/comments/pvgj3y/project_to_rebuild_papers_with_plaintext_markup/heci5zd/ davidpomerenke] made a useful comment on the Open Science Feed that would be helpful with the input:</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
</table>VictorVenemahttps://wiki.translatescience.org/index.php?title=Collaborative_translation&diff=331&oldid=prevVictorVenema: Added two thoughts2021-10-05T02:54:07Z<p>Added two thoughts</p>
<table class="diff diff-contentalign-left diff-editfont-monospace" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 02:54, 5 October 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l3" >Line 3:</td>
<td colspan="2" class="diff-lineno">Line 3:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There are already many [[translation tools]], but none really fit this use case. There is [[Translation tools#Machine translation|machine translation]]; for many language pairs this works quite well nowadays, but for scientific texts the system may trip up and accuracy is very important. So a machine translation is at best a first draft an expert should have a look at. When this quality is good enough, people will not need a collaborative translation tool, nor a translation published in a repository and findable via our translations database.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There are already many [[translation tools]], but none really fit this use case. There is [[Translation tools#Machine translation|machine translation]]; for many language pairs this works quite well nowadays, but for scientific texts the system may trip up and accuracy is very important. So a machine translation is at best a first draft an expert should have a look at. When this quality is good enough, people will not need a collaborative translation tool, nor a translation published in a repository and findable via our translations database.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>There are [[Translation tools#Computer Assisted Translation|computer aided translation packages]] to help (team of) professional translators. These tend to be proprietary and thus cannot be improved to fit our use case. They do have [[wikipedia:Computer-assisted_translation|many tricks]] that would be worthwhile to implement in a collaborative system as well, such a databases with phrases to ensure they are consistently translated. Also the data formats can be used as inspiration. The only exception to this rule seems to be [[wikipedia:OmegaT|OmegaT]], a FOSS project coded in Java. A remaining problem could be that such systems are intended for professionals and [https://polyglot.city/@Stoori/106670443229138474 may have a steep learning curve].</div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>There are [[Translation tools#Computer Assisted Translation|computer aided translation packages]] to help (team of) professional translators. These <ins class="diffchange diffchange-inline">systems work with office file formats and HTML and </ins>tend to be proprietary and thus cannot be improved to fit our use case. They do have [[wikipedia:Computer-assisted_translation|many tricks]] that would be worthwhile to implement in a collaborative system as well, such a databases with phrases to ensure they are consistently translated. Also the data formats can be used as inspiration. The only exception to this rule seems to be [[wikipedia:OmegaT|OmegaT]], a FOSS project coded in Java. A remaining problem could be that such systems are intended for professionals and [https://polyglot.city/@Stoori/106670443229138474 may have a steep learning curve].</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Tools for the [[Translation tools#Software translation tools|translation of software packages]] (their user interface and documentation) is often collaborative, tends to be easy to use and sometimes even somewhat gamified. In addition these systems are often free software and could thus be improved. This may thus come closest to a collaborative tool for scientific articles, but these tools only work with nicely structured text files for in- and output, while scientific articles will have equations, tables, references, and figures and will often not even be available in a text format, but some sort of PDF file.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Tools for the [[Translation tools#Software translation tools|translation of software packages]] (their user interface and documentation) is often collaborative, tends to be easy to use and sometimes even somewhat gamified. In addition these systems are often free software and could thus be improved. This may thus come closest to a collaborative tool for scientific articles, but these tools only work with nicely structured text files for in- and output, while scientific articles will have equations, tables, references, and figures and will often not even be available in a text format, but some sort of PDF file.</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l14" >Line 14:</td>
<td colspan="2" class="diff-lineno">Line 14:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* a group of researchers working on the same project which needs to translate relevant references or </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* a group of researchers working on the same project which needs to translate relevant references or </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* a group of students in a classroom practicing annotation or critical review on a paper. Prior to that they need to translate it (as a complete translation or a summary) of the paper. </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* a group of students in a classroom practicing annotation or critical review on a paper. Prior to that they need to translate it (as a complete translation or a summary) of the paper. </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">...</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">Comments/discussions</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">...</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Ben mentioned he had several unfinished translations</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* Ben mentioned he had several unfinished translations</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* A translation requires expertise in two languages and the field of study, which is easier with a team.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* A translation requires expertise in two languages and the field of study, which is easier with a team.</div></td></tr>
<!-- diff cache key 16397_translatewikidb:diff::1.12:old-329:rev-331 -->
</table>VictorVenemahttps://wiki.translatescience.org/index.php?title=Collaborative_translation&diff=329&oldid=prevVictorVenema: Still unfinished, more descriptive version.2021-10-05T02:47:39Z<p>Still unfinished, more descriptive version.</p>
<table class="diff diff-contentalign-left diff-editfont-monospace" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 02:47, 5 October 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l1" >Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>One of the [[Ideas_for_promoting_translations_in_science|ideas to promote the translation of scientific articles]] is to create a collaborative translation tool.</div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>One of the [[Ideas_for_promoting_translations_in_science|ideas to promote the translation of scientific articles]] is to create a collaborative translation tool<ins class="diffchange diffchange-inline">. Producing a good translation needs a several skills, it needs skill in two languages and knowledge about the topic. Such a combination is easier to find in a team and working in a team motivates. People regularly make partial translations and stop when they know enough for themselves. It would be great if others could finish the job</ins>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del class="diffchange diffchange-inline">A quality </del>translation <del class="diffchange diffchange-inline">needs a good understanding of two languages </del>and <del class="diffchange diffchange-inline">the topic</del>. <del class="diffchange diffchange-inline">Such </del>a <del class="diffchange diffchange-inline">combination </del>is <del class="diffchange diffchange-inline">easier to find in </del>a <del class="diffchange diffchange-inline">team</del>. <del class="diffchange diffchange-inline">Also </del>people <del class="diffchange diffchange-inline">often make partial </del>translations <del class="diffchange diffchange-inline">and stop when they know enough for themselves. It would be great if others could continue the work</del>.</div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">There are already many [[translation tools]], but none really fit this use case. There is [[Translation tools#Machine translation|machine </ins>translation<ins class="diffchange diffchange-inline">]]; for many language pairs this works quite well nowadays, but for scientific texts the system may trip up </ins>and <ins class="diffchange diffchange-inline">accuracy is very important</ins>. <ins class="diffchange diffchange-inline">So </ins>a <ins class="diffchange diffchange-inline">machine translation </ins>is <ins class="diffchange diffchange-inline">at best a first draft an expert should have </ins>a <ins class="diffchange diffchange-inline">look at</ins>. <ins class="diffchange diffchange-inline">When this quality is good enough, </ins>people <ins class="diffchange diffchange-inline">will not need a collaborative translation tool, nor a translation published in a repository and findable via our </ins>translations <ins class="diffchange diffchange-inline">database</ins>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The core of such a tool would be translating one text into another. This would work best if the text to be translated were a a machine readable text format. For this first step the [[#Comment on Open Science Feed|comment on open science feed]] below lists some tools and also [https://papertohtml.org/ this tool] may be part of such a system. </div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">There are [[Translation tools#Computer Assisted Translation|computer aided translation packages]] to help (team of) professional translators. These tend to be proprietary and thus cannot be improved to fit our use case. They do have [[wikipedia:Computer-assisted_translation|many tricks]] that would be worthwhile to implement in a collaborative system as well, such a databases with phrases to ensure they are consistently translated. Also the data formats can be used as inspiration. The only exception to this rule seems to be [[wikipedia:OmegaT|OmegaT]], a FOSS project coded in Java. A remaining problem could be that such systems are intended for professionals and [https://polyglot.city/@Stoori/106670443229138474 may have a steep learning curve].</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">Tools for the [[Translation tools#Software translation tools|translation of software packages]] (their user interface and documentation) is often collaborative, tends to be easy to use and sometimes even somewhat gamified. In addition these systems are often free software and could thus be improved. This may thus come closest to a collaborative tool for scientific articles, but these tools only work with nicely structured text files for in- and output, while scientific articles will have equations, tables, references, and figures and will often not even be available in a text format, but some sort of PDF file.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The core of such a tool would be translating one text into another. This would work best if the text to be translated were a a machine readable text format. For this first step the [[#Comment on Open Science Feed|comment on open science feed]] below lists some tools and also [https://papertohtml.org/ this tool] may be part of such a system. <ins class="diffchange diffchange-inline"> </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><nowiki>*</nowiki> Design a collaborative translation tool? </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* this should work with a group of people with similar interest or </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* a group of researchers working on the same project which needs to translate relevant references or </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* a group of students in a classroom practicing annotation or critical review on a paper. Prior to that they need to translate it (as a complete translation or a summary) of the paper. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* Ben mentioned he had several unfinished translations</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* A translation requires expertise in two languages and the field of study, which is easier with a team.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* Working in a team is nicer</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* For the User Interface of software there are beautiful collaborative tools. Which give feedback on progress.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* Would a translation tool, with an draft made by deep learning, also help deep learning? The system could see how people correct the translation. Is that useful information?</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* Equations are a problem</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* Also for ontologies (Peter, Ideas Challenge) </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">Single Source Publishing <nowiki>https://github.com/singlesourcepub/community/wiki/Announcement-Blog</nowiki> </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"> * Work on collaborative translation? </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"> * Weblate, pandoc, OCR(?)</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"> * Scientific markdown as the translation format (is text, so would work well with existing software for code). This is a nice collaborative scientific markdown pad. <nowiki>https://mur2.co.uk/editor</nowiki></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"><nowiki>*</nowiki> Translator acknowledgement</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">At an Open Access Barcamp I (VV) met the developer of pandoc and Zettlr. Pandoc could do a large part of the output (and maybe some of the input) a collaborative translation tool would need. We have agreed to call next months.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* <nowiki>https://pandoc.org</nowiki></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* It would be useful to gather more information on when translations are legal. In the US they tend to be fair use.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* Ben’s post on how to make translations will also be helpful in thinking about how such a system would look like.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* The user feedback in interactive machine translations is used to improve the system</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* Using machine learning for a first draft translation would require considerable resources, memory, computer power and bandwidth</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">** Maybe we could collaborate with the European Open Science Cloud (which is not European (but global), not just about open science and not really a cloud (more a network and standards). :-) )</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Comment on Open Science Feed ==</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Comment on Open Science Feed ==</div></td></tr>
</table>VictorVenemahttps://wiki.translatescience.org/index.php?title=Collaborative_translation&diff=328&oldid=prevVictorVenema: code fix2021-09-29T18:46:06Z<p>code fix</p>
<table class="diff diff-contentalign-left diff-editfont-monospace" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 18:46, 29 September 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l3" >Line 3:</td>
<td colspan="2" class="diff-lineno">Line 3:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A quality translation needs a good understanding of two languages and the topic. Such a combination is easier to find in a team. Also people often make partial translations and stop when they know enough for themselves. It would be great if others could continue the work.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A quality translation needs a good understanding of two languages and the topic. Such a combination is easier to find in a team. Also people often make partial translations and stop when they know enough for themselves. It would be great if others could continue the work.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The core of such a tool would be translating one text into another. This would work best if the text to be translated were a a machine readable text format. <del class="diffchange diffchange-inline">The </del>[[<del class="diffchange diffchange-inline">Collaborative translation</del>#Comment on <del class="diffchange diffchange-inline">open science feed</del>|<del class="diffchange diffchange-inline">Comment </del>on open science feed]] below and [https://papertohtml.org/ this tool] may <del class="diffchange diffchange-inline">thus </del>be part of such a system. </div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The core of such a tool would be translating one text into another. This would work best if the text to be translated were a a machine readable text format. <ins class="diffchange diffchange-inline">For this first step the </ins>[[#Comment on <ins class="diffchange diffchange-inline">Open Science Feed</ins>|<ins class="diffchange diffchange-inline">comment </ins>on open science feed]] below <ins class="diffchange diffchange-inline">lists some tools </ins>and <ins class="diffchange diffchange-inline">also </ins>[https://papertohtml.org/ this tool] may be part of such a system. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Comment on Open Science Feed ==</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Comment on Open Science Feed ==</div></td></tr>
</table>VictorVenemahttps://wiki.translatescience.org/index.php?title=Collaborative_translation&diff=327&oldid=prevVictorVenema: Added a tool and some more description2021-09-29T18:43:22Z<p>Added a tool and some more description</p>
<table class="diff diff-contentalign-left diff-editfont-monospace" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 18:43, 29 September 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l2" >Line 2:</td>
<td colspan="2" class="diff-lineno">Line 2:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A quality translation needs a good understanding of two languages and the topic. Such a combination is easier to find in a team. Also people often make partial translations and stop when they know enough for themselves. It would be great if others could continue the work.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>A quality translation needs a good understanding of two languages and the topic. Such a combination is easier to find in a team. Also people often make partial translations and stop when they know enough for themselves. It would be great if others could continue the work.</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">The core of such a tool would be translating one text into another. This would work best if the text to be translated were a a machine readable text format. The [[Collaborative translation#Comment on open science feed|Comment on open science feed]] below and [https://papertohtml.org/ this tool] may thus be part of such a system. </ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Comment on Open Science Feed ==</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Comment on Open Science Feed ==</div></td></tr>
</table>VictorVenemahttps://wiki.translatescience.org/index.php?title=Collaborative_translation&diff=326&oldid=prevVictorVenema: /* Comment on Open Science Feed */2021-09-26T15:31:02Z<p><span dir="auto"><span class="autocomment">Comment on Open Science Feed</span></span></p>
<table class="diff diff-contentalign-left diff-editfont-monospace" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 15:31, 26 September 2021</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l6" >Line 6:</td>
<td colspan="2" class="diff-lineno">Line 6:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[https://reddit.com/r/Open_Science/comments/pvgj3y/project_to_rebuild_papers_with_plaintext_markup/heci5zd/ davidpomerenke] made a useful comment on the Open Science Feed that would be helpful with the input:</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>[https://reddit.com/r/Open_Science/comments/pvgj3y/project_to_rebuild_papers_with_plaintext_markup/heci5zd/ davidpomerenke] made a useful comment on the Open Science Feed that would be helpful with the input:</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>I've recently coded an unpublished project on scientific citation mining, and for that purpose I had looked a bit into tools for converting PDFs to more useful formats.</div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"> </ins>I've recently coded an unpublished project on scientific citation mining, and for that purpose I had looked a bit into tools for converting PDFs to more useful formats.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> I ended up using [https://github.com/kermitt2/grobid Grobid], which converts the PDF to a very detailed XML format. The format is not a word processing format though, but a format specifically for representing scientific documents. I don't know, if it would, for example, contain tags about bold or italicized text. The tool is working really well, but since you probably cannot use the output XML format directly, it will need some postprocessing, which would be relatively simple with XML parsing libraries.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> I ended up using [https://github.com/kermitt2/grobid Grobid], which converts the PDF to a very detailed XML format. The format is not a word processing format though, but a format specifically for representing scientific documents. I don't know, if it would, for example, contain tags about bold or italicized text. The tool is working really well, but since you probably cannot use the output XML format directly, it will need some postprocessing, which would be relatively simple with XML parsing libraries.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> An alternative is [https://gitlab.com/crossref/pdfextract pdfextract] by Crossref. They probably use this to build their own large database. It also works really well and gives you some JSON that would probably need less postprocessing than Grobid. I didn't use it for some minor technical reason that I forgot.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> An alternative is [https://gitlab.com/crossref/pdfextract pdfextract] by Crossref. They probably use this to build their own large database. It also works really well and gives you some JSON that would probably need less postprocessing than Grobid. I didn't use it for some minor technical reason that I forgot.</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l13" >Line 13:</td>
<td colspan="2" class="diff-lineno">Line 12:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> Another alternative that's on my list but that I didn't try is [https://github.com/CeON/CERMINE Cermine].</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> Another alternative that's on my list but that I didn't try is [https://github.com/CeON/CERMINE Cermine].</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> There are some more tools that specialize in mining only the citations, but I found them to be less powerful (although perhaps more performant) than Grobid.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div> There are some more tools that specialize in mining only the citations, but I found them to be less powerful (although perhaps more performant) than Grobid.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"> </ins>Many publishers also publish a supplementary HTML version these days, which may be an acceptable format or at least easy to convert to other formats with [https://pandoc.org/ pandoc]. I have also seen that authors upload the Latex source along with the PDF on Arxiv, but I don't know common that is.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Many publishers also publish a supplementary HTML version these days, which may be an acceptable format or at least easy to convert to other formats with [https://pandoc.org/ pandoc]. I have also seen that authors upload the Latex source along with the PDF on Arxiv, but I don't know common that is.</div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline"> </ins>Another current project which is not directly related to your question but which you may find cool is [https://scholarphi.org/ ScholarPhi], where they try to annotate PDFs with useful semantic information.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Another current project which is not directly related to your question but which you may find cool is [https://scholarphi.org/ ScholarPhi], where they try to annotate PDFs with useful semantic information.</div></td><td colspan="2"> </td></tr>
</table>VictorVenemahttps://wiki.translatescience.org/index.php?title=Collaborative_translation&diff=325&oldid=prevVictorVenema: Stub2021-09-26T15:29:44Z<p>Stub</p>
<p><b>New page</b></p><div>One of the [[Ideas_for_promoting_translations_in_science|ideas to promote the translation of scientific articles]] is to create a collaborative translation tool.<br />
<br />
A quality translation needs a good understanding of two languages and the topic. Such a combination is easier to find in a team. Also people often make partial translations and stop when they know enough for themselves. It would be great if others could continue the work.<br />
<br />
== Comment on Open Science Feed ==<br />
[https://reddit.com/r/Open_Science/comments/pvgj3y/project_to_rebuild_papers_with_plaintext_markup/heci5zd/ davidpomerenke] made a useful comment on the Open Science Feed that would be helpful with the input:<br />
<br />
I've recently coded an unpublished project on scientific citation mining, and for that purpose I had looked a bit into tools for converting PDFs to more useful formats.<br />
<br />
I ended up using [https://github.com/kermitt2/grobid Grobid], which converts the PDF to a very detailed XML format. The format is not a word processing format though, but a format specifically for representing scientific documents. I don't know, if it would, for example, contain tags about bold or italicized text. The tool is working really well, but since you probably cannot use the output XML format directly, it will need some postprocessing, which would be relatively simple with XML parsing libraries.<br />
An alternative is [https://gitlab.com/crossref/pdfextract pdfextract] by Crossref. They probably use this to build their own large database. It also works really well and gives you some JSON that would probably need less postprocessing than Grobid. I didn't use it for some minor technical reason that I forgot.<br />
[https://github.com/allenai/pdffigures2 pdffigures2] is from the team behind Semantic Scholar, and they probably use it to extract the figures that they show in their search engine. It only extracts figures and their captions and no other things. I don't recall whether the other tools can also extract figures, but if not, then this will be a perfect supplement.<br />
Another alternative that's on my list but that I didn't try is [https://github.com/CeON/CERMINE Cermine].<br />
There are some more tools that specialize in mining only the citations, but I found them to be less powerful (although perhaps more performant) than Grobid.<br />
<br />
Many publishers also publish a supplementary HTML version these days, which may be an acceptable format or at least easy to convert to other formats with [https://pandoc.org/ pandoc]. I have also seen that authors upload the Latex source along with the PDF on Arxiv, but I don't know common that is.<br />
<br />
Another current project which is not directly related to your question but which you may find cool is [https://scholarphi.org/ ScholarPhi], where they try to annotate PDFs with useful semantic information.</div>VictorVenema