Uploaded image for project: 'Module Tools'
  1. Module Tools
  2. MODTOOLS-46

usfm2osis.py Footnote processing causes abort

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Upstream Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: usfm2osis.py
    • Labels:
      None
    • Environment:

      Description

      I haven't been able to isolate it in the files yet, but something (and I'm guessing footnote processing) causes the script to give an error (and then normally require manual intervention to exit the program).

      I'm presuming that the desired behaviour would be to ignore the (presumably bad) footnote and continue to process the rest of the file?

      Traceback (most recent call last):
      File "sword-tools/modules/python/usfm2osis.patched.py", line 1559, in <module>
      osisSegment[job] = convertToOsis(job)
      File "sword-tools/modules/python/usfm2osis.patched.py", line 1334, in convertToOsis
      osis = cvtFootnotes(osis, relaxedConformance)
      File "sword-tools/modules/python/usfm2osis.patched.py", line 851, in cvtFootnotes
      osis = re.sub(r'\\f\s+([^\s\\])?\s*(.?)\s*
      f*', lambda m: '<note' + ((' n=""') if (m.group(1) == '-') else ('' if (m.group(1) == '+') else (' n="' + m.group(1) + '"'))) + ' placement="foot">' + m.group(2) + '\uFDDF</note>', osis, flags=re.DOTALL)
      File "/usr/lib/python2.7/re.py", line 151, in sub
      return _compile(pattern, flags).sub(repl, string, count)
      File "sword-tools/modules/python/usfm2osis.patched.py", line 851, in <lambda>
      osis = re.sub(r'\\f\s+([^\s\\])?\s*(.?)\s*
      f*', lambda m: '<note' + ((' n=""') if (m.group(1) == '-') else ('' if (m.group(1) == '+') else (' n="' + m.group(1) + '"'))) + ' placement="foot">' + m.group(2) + '\uFDDF</note>', osis, flags=re.DOTALL)
      TypeError: coercing to Unicode: need string or buffer, NoneType found

        Attachments

          Activity

          Hide
          dfh David Haslam added a comment -

          What is the nature of the patch in your usfm2osis.patched.py ?

          Show
          dfh David Haslam added a comment - What is the nature of the patch in your usfm2osis.patched.py ?
          Hide
          dfh David Haslam added a comment -

          Were you processing a complete Bible, or one book at a time?

          Sometimes progress in isolating a problem is assisted by processing books separately, smaller books first.

          Show
          dfh David Haslam added a comment - Were you processing a complete Bible, or one book at a time? Sometimes progress in isolating a problem is assisted by processing books separately, smaller books first.
          Hide
          rob Robert Hunt added a comment -

          I can't find a way to REPLY to a specific comment. My patch basically avoids the multiprocessing part of the program if you run it in debug mode. (The standard behaviour is to still use multiprocessing, but only start one subprocess which doesn't avoid the need to manually terminate the script.) I also add a -1 parameter to run it in this same non-multiprocessing mode without requiring the debug flag set.

          I mostly run the script as part of a large batch process when people submit USFM Bibles to http://freely-given.org/Software/BibleDropBox and it's a big pain to get up in the morning and find that an overnight submission has locked up and not completed because of the default behaviour of usfm2osis.py.

          So the patch has no effect on the actual processing of the USFM text elements.

          Show
          rob Robert Hunt added a comment - I can't find a way to REPLY to a specific comment. My patch basically avoids the multiprocessing part of the program if you run it in debug mode. (The standard behaviour is to still use multiprocessing, but only start one subprocess which doesn't avoid the need to manually terminate the script.) I also add a -1 parameter to run it in this same non-multiprocessing mode without requiring the debug flag set. I mostly run the script as part of a large batch process when people submit USFM Bibles to http://freely-given.org/Software/BibleDropBox and it's a big pain to get up in the morning and find that an overnight submission has locked up and not completed because of the default behaviour of usfm2osis.py. So the patch has no effect on the actual processing of the USFM text elements.
          Hide
          rob Robert Hunt added a comment -

          And to David's second comment: Yes, I've isolated it to two specific books within a USFM Bible. I haven't had time yet to take the smaller of those two books and keep dividing it into pieces to find what the script actually chokes on (or else to patch the script again to give some C:V output when it gets the error).

          Show
          rob Robert Hunt added a comment - And to David's second comment: Yes, I've isolated it to two specific books within a USFM Bible. I haven't had time yet to take the smaller of those two books and keep dividing it into pieces to find what the script actually chokes on (or else to patch the script again to give some C:V output when it gets the error).
          Hide
          dfh David Haslam added a comment -

          Do you capture the STDERR output to a separate log file?

          Might not help in some situations, but useful for identifiying "unhandled tags".

          Show
          dfh David Haslam added a comment - Do you capture the STDERR output to a separate log file? Might not help in some situations, but useful for identifiying "unhandled tags".
          Hide
          dfh David Haslam added a comment - - edited

          Also useful to try separately, the script usfmtags.py available in the same download folder.

          This just takes your USFM files and outputs a list of all the tags it finds.

          For my own part, I have made a TextPipe filter which gives a counted list of tags, with a description column thrown in as a bonus.

          Show
          dfh David Haslam added a comment - - edited Also useful to try separately, the script usfmtags.py available in the same download folder. This just takes your USFM files and outputs a list of all the tags it finds. For my own part, I have made a TextPipe filter which gives a counted list of tags, with a description column thrown in as a bonus.
          Hide
          refdoc Peter von Kaehne added a comment -

          I will close this bug within the next few days as no appropriate test case was given.

          If the bug still exists, could you please update it with an attached file for a minimal test case?

          Show
          refdoc Peter von Kaehne added a comment - I will close this bug within the next few days as no appropriate test case was given. If the bug still exists, could you please update it with an attached file for a minimal test case?
          Hide
          dfh David Haslam added a comment -

          Another tip for the reporter.

          Perform a character frequency count on the problematic USFM files.
          Sometimes the presence of an incorrect character where there should be a space delimiter can cause problems in USFM files.

          Another issue can be the use of verse range tags with a comma rather than a hyphen between the verse numbers.
          I've seen other software that doesn't like this.

          Show
          dfh David Haslam added a comment - Another tip for the reporter. Perform a character frequency count on the problematic USFM files. Sometimes the presence of an incorrect character where there should be a space delimiter can cause problems in USFM files. Another issue can be the use of verse range tags with a comma rather than a hyphen between the verse numbers. I've seen other software that doesn't like this.

            People

            • Assignee:
              chrislit Chris Little
              Reporter:
              rob Robert Hunt
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: