Creating an XML for PMC

From Free Neuropathology Wiki
Revision as of 13:59, 30 August 2022 by Henryrobbert (talk | contribs) (→‎List of references is in alphabetical order)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Note: This workflow is for documentation by the Editorial Office only and is not connected to the layout workflow.

As a member of the copyediting and layout board, please ignore this page.

 

Why PMC needs XMLs

The .xml file contains metadata about the journal, the article, the article text iteself including formattings and machine-readable references to cited literature, and a structured list of literature. This way the article can be archived and provided in a so-called "rich" (meaning structured) form, generating various ways to present it in web views like this or this, PDFs, etc. (click for examples of a PMC article) - all generated from an identical XML file. Our journal management system has a plugin capable of reading XMLs in a very convenient form (see sample in the test system).

First, a structured list of cited literature as a .json file is created, which can then be imported into an XML editor used to create the article, refine and export it.

Required software

  • For the creation of a structured list of cited literature as a .json file: Zotero
  • For the creation of the .xml file: Texture
  • To refine the .xml file: Notepad++ (or any other advances code editor)

Creation of a structured list of cited literature (.json file)

For older articles that have already been published, we use the Word file for the final, published document as basis for the XML. For articles yet to be published, we create the XML based on the copyedited, author-approved manuscript file.

All cited literature have dois

  1. Launch Zotero. Add the whole list of cited literature from the manuscript via the little wand icon at the top: 2022-06-20 09 59 33-Window.png ("Add Item(s) by Identifier").
  2. Select all entries, right-click, select Export Item…, choose CSL JSON.

No dois in the list of cited literature

  • Import literature without doi using its PMID (if available).
  • Find dois/PMIDs via CrossRef, then create the .json file using Zotero (see above).
  • If a publication does neither have a doi or PMID, research its ISBN number online, which can also be entered in Zotero.
  • Alternatively, create a .bibtext file using this tool and import it into Zotero.
  • Alternatively, copy-and-paste literature without dois or PMIDs into this tool and download the .json file.

Select all entries, right-click, select Export Item…, choose CSL JSON.

Important:

  • Some resources don't have a doi, so before exporting the entries go through them and, if a doi is missing, enter as placeholder 10.0000/0000
  • While going through the finished list in Zotero, make sure that every entry is defined as "journal article" (also with books, chapters etc.). (The reason is because the XML workflow is based on journal articles.)

Sorting the list of cited literature (.json file)

The .json file produced in Zotero doesn’t follow the original order of entries. However, most authors cite literature using numbers in the text, so the order has to be restored to make sure references point to the right literature.

  1. Open the .json file with Notepad++ and remove the [ ] in the first and last line.
  2. Add , at the end of the last line.
  3. Remove tabs (search for \t and replace with nothing).
  4. Remove paragraph breaks (find \n and replace with nothing).
  5. Separate entries:

Press CTRL+H to separate entries:

Find ,{"id"
Replace ,\n{"id"

Create alphabetical order

This is useful if the literature is referenced in the text using authors’ last names:

  1. Open the file json_workflow.xlsx and copy-and-paste the list of cited literature from the manuscript into the sheet Alphabetical order in cell A2 and following.
  2. Sort column B alphabetically.
  3. Copy-and-paste the content of cell C2 and following into a new empty document.
  4. Add [ ] in the first and last lines and remove the comma after the last entry. Save as .json file.

Recover order from the manuscript

This means that the author cited works using numbers in the text:

  1. Open json_workflow.xlsx and copy-and-paste the list of cited literature from the manuscript into the sheet Custom order in cell A2 and following. The entries should have no numbering.
  2. In the sheet Output custom order copy-and-paste the content of the .json file into cell A2 and following.
  3. Look for errors in columns B and C and correct these.
  4. Sort column E by ascending order. Make sure that each number occurs only once.
  5. Copy-and-paste the content of cell F2 and following into a new empty document.
  6. Add [ ] in the first and last lines and remove the comma after the last entry. Save as .json file.

EndNote document

If the manuscript was created using EndNote for reference management, you can translate them into XML and save a lot of time.

If not, continue below:

Creating the .xml file

  1. Launch Texture and copy-and-paste the copyedited article text.
  2. Switch to the Details view (top left) and import the list of cited literature (.json file) by selecting InsertReference. (You can ignore the fields for metadata.) If there is an error open the .json file again and make sure that
    • a) every entry has a doi (even though it's just a placeholder) and
    • b) it is explicated as follows: "DOI": "10. ... " (and not "DOI: 10. ...", for example). If this is the case also look for and remove "note": portions, which often appear with incorrect doi declarations.
    • c) Also make sure that every entry only has one doi.
    • If this doesn't help there's most likely a minor issue with the syntax. In this case a reliable tactic is to remove a portion of the entries in the .json file (e.g., the last 5 or 10 entries), save and try to import into Texture. Once it works you know which entry caused the file to be rejected. This way you can narrow down the entries that have an error in them and analyze them.
  3. Place figures and tables using the Insert menu.
  4. Save file. Open the resulting .dar file with 7-Zip or another file extracting program of your choice and extract the file manuscript.xml.

Creating literature references in the article text

List of references is in alphabetical order

  • Assign references in the article text by selecting InsertCitation.
  • Tip: When entering a larger number of references, the program might slow down after a while. In this case saving and restarting is always helpful.

List of references is in recovered order from the manuscript

1. Find collected citations such as [1-3] and separate them: [1] [2] [3]

Press CTRL+H, then

Citations are in square brackets [1] Superscripted citations 1 Superscripted citations in brackets (1)
Find I \[(\d+)- <sup id="superscript-(\S+?)">\(\d+\)-\(\d+\)</sup> <sup id="superscript-(\S+?)">\(\d+\)-\(\d+\)</sup>
Find II \[(\d+)– <sup id="superscript-(\S+?)">\(\d+\)–\(\d+\)</sup> <sup id="superscript-(\S+?)">\(\d+\)–\(\d+\)</sup>

For superscripted citations you might have to customize the Find string (e.g. in case citation numbers are collected within a single pair of brackets or other type of brackets are used, etc.).


2. Turn sequences of citations such as [1,2,3] manually or automatically into separate citations: [1] [2] [3]

Press CTRL+H to turn sequences of citations into separate citations.

Citations are in square brackets [1] Superscripted citations 1 Superscripted citations in brackets (1)
Find \[(\d+),(\d+) <sup id="superscript-(\S+?)">(\d+),(\d+) <sup id="superscript-(\S+?)">\(\d+\),\(\d+\)
Replace \[\1\]\[\2 <sup id="superscript-\1">\2</sup><sup id="superscript-\1">\3 <sup id="superscript-\1">\2</sup><sup id="superscript-\1">\3

Repeat until no further replacing occurs.


3. Assign XML tags for citations using find and replace.

Press CTRL+H to to assign XML tags for citations:

Citations are in square brackets [1] Superscripted citations 1 Superscripted citations in brackets (1)
Find \[(\d+)\] <sup id="superscript-\S+?">\((\d+)\)</sup> <sup id="superscript-\S+?">(\d+)</sup>
Replace <xref ref-type="bibr" rid="B\1">\[\1\]</xref> <xref ref-type="bibr" rid="B\1">\[\1\]</xref> <xref ref-type="bibr" rid="B\1">\[\1\]</xref>

Naming references in the list of cited literature

Clear the numbering in the structured list of cited literature using find and replace and number the list manually: B1, B2, …

Press CTRL+H to clear the numbering in the structured list of cited literature:

Find <ref id="journal-\S+?">
Replace <ref id="B">

Unfortunately, this cannot be done automatically. But it helps to close all text blocks (press Alt+0) and then open the list of cited literature in the <back> section.

Make sure to check a handful of cross-references manually to see if the assigning worked. Do the reference IDs in the text point to the right IDs in the list of references?

Assigning XML tags for figures, tables and supplementary material in the article text

1. Assign numbers to the inserted figures and tables using find and replace.

Press CTRL+H to assign numbers to the inserted figures and tables:

Figures Tables
Find <fig id="figure-\S+?">\D+?<label>Figure (\d+)</label> <table id="table-\S+?">\D+?<label>Table (\d+)</label>
Replace <fig id="f\1"><label>Figure \1</label> <table id="f\1"><label>Table \1</label>


2. Assign XML tags for figures and tables using find and replace.

Press CTRL+H to assign XML tags for figures and tables:

Figures Tables
Find Fig. (\d)([ABCDEFGHIJK]*) Table (\d)([ABCDEFGHIJK]*)
Replace <xref ref-type="fig" rid="f\1">Figure \1\2</xref> <xref ref-type="table" rid="f\1">Table \1\2</xref>

If figure mentions are bolded in the article text remove these manually or turn them into XML tags by pressing CTRL+H, then

Find I <bold id="bold-\S+?">Fig. (\d)([ABCDEFGHIJK ,-–]*)</bold>
Replace I <xref ref-type="fig" rid="f\1">Figure \1\2</xref>
Find II <bold id="bold-\S+?">Fig. (\d)([ABCDEFGHIJK]*) and (\d)([ABCDEFGHIJK ,-–]*)</bold>
Replace II <xref ref-type="fig" rid="f\1">Figure \1\2</xref><xref ref-type="fig" rid="f\3">Figure \3\4</xref>


3. Maybe sequences of figure/table mentions like Figure 1Figure 2 need to be manually separated with a comma: Figure 1, Figure 2.


4. Insert supplementary material at the end of the <body> section (0X = volume, XXXX = our internal four-digit article ID):

<sec sec-type="supplementary-material"><title>Supplementary Material</title><supplementary-material id="s001"><media xlink:href="freeneuropathol-0X-XXXX-s001.pdf"/></supplementary-material></sec>

Mentions of these must be XML-tagged in the article text, as well:

<xref ref-type="supplementary-material" rid="s001">Supplementary Fig. 1</xref>


Save file.

Open the .dar file with 7-Zip or another file extracting program of your choice and move the file manuscript.xml into the .dar file. Open the .dar file with Texture and look for errors by comparing it to the original manuscript (e.g., assigning of references in the text).

Metadata

Extract the file manuscript.xml from the .dar file (see above) and copy-and-paste the following text into the .xml file, thus replacing all of the text from line 1 until the end of the frontmatter (</front>).

Portions highlighted in grey are optional, orange areas are variables:

<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "https://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">

<article xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article/review-article/case-report/letter/meeting-report">

<front>

<journal-meta>

<journal-id journal-id-type="pmc">freeneuropathology</journal-id><journal-id journal-id-type="publisher-id">freeneuropathology</journal-id><journal-title-group><journal-title>Free Neuropathology</journal-title></journal-title-group><issn pub-type="epub">2699-4445</issn><publisher><publisher-name>freeneuropathology</publisher-name></publisher>

</journal-meta>

<article-meta>

<article-id pub-id-type="doi">10.17879/freeneuropathology-2022-XXXX</article-id>

<article-categories><subj-group subj-group-type="heading"><subject>ARTICLETYPE</subject></subj-group></article-categories>

<title-group><article-title>TITLE</article-title></title-group>

<contrib-group content-type="author">

<contrib contrib-type="author">

<name><surname>LASTNAME</surname><given-names>FIRSTNAME</given-names></name>

<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Conceptualization" vocab-term-identifier="https://credit.niso.org/contributor-roles/conceptualization/">Conceptualization</role>

<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Data curation" vocab-term-identifier="https://credit.niso.org/contributor-roles/data-curation/">Data curation</role>

<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Formal analysis" vocab-term-identifier="https://credit.niso.org/contributor-roles/formal-analysis/">Formal analysis</role>

<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Funding acquisition" vocab-term-identifier="https://credit.niso.org/contributor-roles/funding-acquisition/">Funding acquisition</role>

<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Investigation" vocab-term-identifier="https://credit.niso.org/contributor-roles/investigation/">Investigation</role>

<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Methodology" vocab-term-identifier="https://credit.niso.org/contributor-roles/methodology/">Methodology</role>

<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Project administration" vocab-term-identifier="https://credit.niso.org/contributor-roles/project-administration/">Project administration</role>

<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Supervision" vocab-term-identifier="https://credit.niso.org/contributor-roles/supervision/">Supervision</role>

<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Validation" vocab-term-identifier="https://credit.niso.org/contributor-roles/validation/">Validation</role>

<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Visualization" vocab-term-identifier="https://credit.niso.org/contributor-roles/visualization/">Visualization</role>

<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing - Original Draft" vocab-term-identifier="https://credit.niso.org/contributor-roles/writing-original-draft/">Writing - Original Draft</role>

<role vocab="credit" vocab-identifier="https://credit.niso.org/" vocab-term="Writing &ndash; review &and; editing" vocab-term-identifier="https://credit.niso.org/contributor-roles/writing-original-draft/">Writing &ndash; review &and; editing</role>

<email>EMAIL</email>

<xref ref-type="aff" rid="aff-1"/>

<xref ref-type="aff" rid="aff-2"/>

</contrib>

</contrib-group>

<aff id="aff-1"><institution>INSTITUTION</institution><country>COUNTRY</country></aff>

<aff id="aff-2"><institution>INSTITUTION</institution><country>COUNTRY</country></aff>

<author-notes><corresp>Correspondence to: FIRSTNAMEINITIAL LASTNAME <email>EMAIL</email></corresp></author-notes>

<pub-date date-type="pub" publication-format="electronic" iso-8601-date="YYYY-MM-DD"><day>DD</day><month>MM</month><year>YYYY</year></pub-date>

<pub-date date-type="collection" publication-format="electronic"><month>1</month><year>2022</year></pub-date>

<volume>X</volume><elocation-id>XXXX</elocation-id>

<history>

<date date-type="received" iso-8601-date="YYYY-MM-DD"><day>DD</day><month>MM</month><year>YYYY</year></date>

<date date-type="accepted" iso-8601-date="YYYY-MM-DD"><day>DD</day><month>MM</month><year>YYYY</year></date>

<date date-type="online" iso-8601-date="YYYY-MM-DD"><day>DD</day><month>MM</month><year>YYYY</year></date>

</history>

<permissions><copyright-statement>© YYYY LASTNAME et al.</copyright-statement><copyright-year>YYYY</copyright-year><copyright-holder>LASTNAME et al.</copyright-holder><ali:free_to_read/><license><ali:license_ref start_date="2017-10-12">https://creativecommons.org/licenses/by/4.0/</ali:license_ref><license-p>This is an open access article licensed under a <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed.</license-p></license></permissions>

<abstract><sec><title>ABSTRACT HEADLINE: </title><p>ABSTRACT TEXT</p></sec></abstract>

<kwd-group kwd-group-type="author"><kwd>KEYWORD</kwd><kwd>KEYWORD</kwd></kwd-group>

<funding-group specific-use="crossref">

<award-group>

<funding-source id="gs1" country="US">

<institution-wrap>

<institution>National Institutes of Health</institution>

<institution-id institution-id-type="doi" vocab="open-funder-registry" vocab-identifier="10.13039/open_funder_registry">10.13039/100000002</institution-id>

</institution-wrap>

</funding-source>

<award-id>GM18458</award-id>

</award-group>

</funding-group>

</article-meta>

</front>

Entering metadata (orange variables above):

Option A: Go through the frontmatter manually and enter the right information.

  • Every author gets their own <contrib …> section.
  • Multiple affiliations can be provided. Also, multiple affiliations may be assigned per author (<xref ref-type="aff"…>).

Option B: In case of a higher number of authors:

  1. Open up the article in OJS, go to Copyediting or Production and select any .docx fille. Select Convert to JATS XML. An .xml file with the metadata from the system will be generated.
  2. Get the information concerning the authors from the new file (i.e., all of the text beginning with the first <contrib …> tag until the very last </aff> tag)

Change the contrib-type tags from person to author.

Press CTRL+H, then

Find contrib-type="person"
Replace contrib-type="author"
  • The elocation-id is our internal four-digit article ID from OJS.
  • A special doi for funding agencies (if necessary) can be found with CrossRef.
  • If the authors' affiliations were identified using letters in the article PDF (a, b, c, ...) this should be reflected in the XML, as well: <xref ref-type="aff" rid="aff-1">a</xref> and <aff id="aff-1"><label>a</label>... in the affiliations section, respectively.
  • Don't forget to also style the abstract text, if necessary:
    • <bold>text</bold>
    • <italic>text</italic>
    • <sub>subscripted text</sub>
    • <sup>superscripted text</sup>

Finishing touches

  • If there is mention of a conflict of interest in the article, enclose this portion of the text with <sec sec-type="COI-statement"><title>Conflict of interest</title><p>TEXT</p></sec>
  • If there is an acknowledgements section in the article, this should go to the beginning of the XML's backmatter: <back><ack><p>TEXT</p></ack>...
  • Search for <i> and </i> tags and remove them if they occur in titles of cited literature.

Quality control

Validate the final .xml file using the PMC Style Checker. Look for errors (show all) and correct them.

Errors occur when there are syntactic or semantic errors (construction of tags or usage of tags or attributes). The DTD guideline we use (NLM JATS 1.2) is outlined here).

Tips

  • Texture shows an error while opening a file (file not found): Delete everything in the cache of your computer under C:\Users\YourUsername\AppData\Local\Temp\Texture\dar-storage
  • Texture window is blank while opening a file:
    • Most probably there are XML tags in the documents pointing to cited works with rids (reference IDs) that don’t exist in the structured list of cited literature. This may happen with articles using citations in round brackets. Find and replace may identify round brackets in other places of the article and assign an XML tag to a reference that doesn’t exist. à Search for round brackets by trying higher numbers to identify and correct these.
    • There may also be any open tags that aren’t closed properly.

Don’t panic: If Texture doesn’t open the file even after looking for these errors nothing is lost or broken. Quality control will then just not happen using Texture but later in the web view (see below).

Delivery to PMC

Required files

  • XML
    • must pass the PMC Style Check
  • PDF that we published
  • Image files referred to in the .xml file
    • Don’t submit the authors’ raw image files
    • Submit the high-resolution versions we use for the article layout:
      • Halftone (without Text): 300 dpi
      • Combo (Image and text): >500 dpi
      • line/text B/W: >900 dpi
    • uncompressed .tif
    • RGB mode
    • do not forget to change the referral in the .xml file to mime-subtype="tiff" and change the file name accordingly!
  • Supplementary material files referred to in the .xml file

Delivery requirements

  • Naming of files (uid = our internal four-digit article ID):
    • XML file: freeneuropathol-vol-uid.xml
      • e.g.: freeneuropathol-01-2819.xml
    • PDF file: freeneuropathol-vol-uid.pdf
      • e.g.: freeneuropathol-01-2819.pdf
    • Image files (graphics): freeneuropathol-vol-uid-typ.tif
      • e.g.: freeneuropathol-01-2819-g001.tif
    • Supplementary material: freeneuropathol-vol-uid-typ.ext
      • e.g.: freeneuropathol-01-2819-s001.pdf
  • Archiving and delivery:
    • .zip, .tar, .gz or .tgz formats
    • No multiple compression and no subfolders
    • Each article get its own archive file: freeneuropathol-vol-issue-uid.ext
      • e.g.: freeneuropathol-01-33-2819.zip

Upload to OJS/Web view using the eLife Lens Viewer

  • Remove the <role> sections in the list of authors (if available).
  • Remove the whole <permissions> section.
  • Upload image files as .jpg (not in.tif format). Change referrals in the .xml file accordingly. Press CTRL+H to change XML tags from .tif to .jpg:
Find .tif
Replace .jpg
Find mime-subtype="tiff"
Replace mime-subtype="jpeg"

Upload the .xml file as a galley named XML.

Done.