Uploaded image for project: 'Modules'
  1. Modules
  2. MOD-380

Wrong non-ascii characters in DutKant module

    Details

      Description

      And Bible user reports:

       

      (Warning: Non-ASCII characters in this bug report.)

      *Describe the bug*
      Using DutKant, the Dutch "Kanttekeningen" (side notes) for the Dutch Statenvertaling, all non-ASCII characters (often e-with-diaeresis / ë / Unicode U+00EB) are shown as a replacement character (looking like a crossed square / ⛝ / Unicode U+26DD).

      *Bug was found on And Bible version*
      Any build I've ever used, including the most recent beta 3.3.382.

      *To Reproduce*
      Steps to reproduce the behavior:
      1. Install DutKant (language: Dutch/Nederlands, type Commentary/Commentaar, from CrossWire), and choose that document.
      2. Go to Matthew 1:12.
      3. Observe something that looks like "11) Salathi⛝l [...]",

      *Expected behavior*
      I would expect to see "Salathiël" instead.

      *Screenshots*
      [I might add a screenshot later.]

      *Smartphone:*

      • Device: Nokia 6.1 / TA-1043
      • OS: basically-stock Android, language Nederlands (Dutch).
      • Version 10 (patch level May 1, 2020).

      *Additional context*
      Copying this word to QuickEdit on Android, stores the character as single 0x89 byte. (See https://www.fileformat.info/info/unicode/char/eb/charset_support.htm and https://www.fileformat.info/info/unicode/char/eb/codepage_support.htm for the character sets and code pages that do that: IBM 437 (aka PC-8 or DOS Latin US, see https://en.wikipedia.org/wiki/Code_page_437) and also IBM 850...865.)

      Also, copying this word to the Gmail app turns it into a 'per mille' sign (U+2030), which has 0x89 as its encoding in Windows code pages 1250..1258.

      So the underlying byte seems to actually be 0x89, in the source document? So that document seems to be in code page 437 encoding, and AndBible seems to interpret it as the control character [U+0089](https://codepoints.net/U+0089) instead, perhaps?

      *My questions*

      • Is this an issue in AndBible? Or in the DutKant document? Or possibly in both?
      • If in DutKant: Where is the original DutKant document? So where is AndBible getting the binary from, and from which source is that built?
      • What can I do to help troubleshooting this?

        Attachments

          Activity

            People

            • Assignee:
              refdoc Peter von Kaehne
              Reporter:
              tuomas Tuomas Airaksinen
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: