Uploaded image for project: 'Module Tools'
  1. Module Tools
  2. MODTOOLS-57

Misplaced milestones: usfm2osis.py

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Upstream Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: usfm2osis.py
    • Labels:
      None

      Description

      The Python script usfm2osis.py does not distinguish between inter-verse titles and mid-verse titles.

      As a result, it places verse eID milestones too late for inter-verse titles (or other inter-verse content).

      This has critical consequences for Bible module creation using osis2mod.

      The underlying software design problem is that there is no "look ahead" to determine exactly where each verse should properly end.

      The fact is that where a verse ends in USFM can only be properly determined from the context.

        Attachments

          Activity

          Hide
          dfh David Haslam added a comment -

          For completeness sake, I should also note that similar things occur in poetry passages.

          So the solution must address all the circumstances, not just those in prose passages,
          as illustrated in the two examples.

          Show
          dfh David Haslam added a comment - For completeness sake, I should also note that similar things occur in poetry passages. So the solution must address all the circumstances, not just those in prose passages, as illustrated in the two examples.
          Hide
          dfh David Haslam added a comment -

          In fact, the further issues in poetry passages are more to do with misplaced verse sID milestones.

          The sID milestone is correctly placed only for verse 1 of a Psalm (for example).
          For verse 2 and subsequent verses, the sID milestone is placed differently!
          The difference is not a reflection of anything in the USFM.
          This issue accounts for why we see modules displaying verse tags on a line of their own, rather than in line with the poetry text.

          There may well be further aspects to this issue that I've yet to understand and report.

          Show
          dfh David Haslam added a comment - In fact, the further issues in poetry passages are more to do with misplaced verse sID milestones. The sID milestone is correctly placed only for verse 1 of a Psalm (for example). For verse 2 and subsequent verses, the sID milestone is placed differently! The difference is not a reflection of anything in the USFM. This issue accounts for why we see modules displaying verse tags on a line of their own, rather than in line with the poetry text. There may well be further aspects to this issue that I've yet to understand and report.
          Hide
          dfh David Haslam added a comment - - edited

          Verse sID milestones ought to be placed within line group elements!

          For the first whole verse in any line group, this is not happening.

          Show
          dfh David Haslam added a comment - - edited Verse sID milestones ought to be placed within line group elements! For the first whole verse in any line group, this is not happening.
          Hide
          dfh David Haslam added a comment -

          OK - just found another related issue involving "misplaced milestones".

          The relative placement of verse sID and eID milestones in connection with the list element.
          ( if the list contains more than one verse )

          osis2mod logs a WARNING(NESTING) for either of these two conditions.
          1. If the sID milestone for the first verse is before <list> (or even before the first <item ...> )
          2. If the eID milestone for the last verse is after </list> (or even after the last </item> )

          Both these conditions occur in the output from usfm2osis.py

          Therefore for each such list, osis2mod logs two WARNING(NESTING) messages:
          One for the first verse in the list, the other for the last verse in the list.

          These warnings disappear if the XML file is edited to move the first verse sID milestone and the last verse eID milestone.

          Though the warnings are generated by osis2mod, surely the design of usfm2osis.py should be such that it can satisfy the extra requirements of module build that go beyond the mere requirement that the XML file validates to the OSIS schema.

          Show
          dfh David Haslam added a comment - OK - just found another related issue involving "misplaced milestones". The relative placement of verse sID and eID milestones in connection with the list element. ( if the list contains more than one verse ) osis2mod logs a WARNING(NESTING) for either of these two conditions. 1. If the sID milestone for the first verse is before <list> (or even before the first <item ...> ) 2. If the eID milestone for the last verse is after </list> (or even after the last </item> ) Both these conditions occur in the output from usfm2osis.py Therefore for each such list, osis2mod logs two WARNING(NESTING) messages: One for the first verse in the list, the other for the last verse in the list. These warnings disappear if the XML file is edited to move the first verse sID milestone and the last verse eID milestone. Though the warnings are generated by osis2mod, surely the design of usfm2osis.py should be such that it can satisfy the extra requirements of module build that go beyond the mere requirement that the XML file validates to the OSIS schema.
          Hide
          dfh David Haslam added a comment -

          It is not clear how to resolve the above issue if a list starts (or ends) part way through a verse.

          AFAIK, that would still be valid syntax in Paratext/USFM - so we ought to test for it.

          Show
          dfh David Haslam added a comment - It is not clear how to resolve the above issue if a list starts (or ends) part way through a verse. AFAIK, that would still be valid syntax in Paratext/USFM - so we ought to test for it.

            People

            • Assignee:
              chrislit Chris Little
              Reporter:
              dfh David Haslam
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: