Uploaded image for project: 'JSword'
  1. JSword
  2. JS-226

Robinson's morphology is not indexed in JSword modules

    Details

    • Type: Bug
    • Status: Reopened (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.6
    • Fix Version/s: 1.7
    • Component/s: o.c.jsword.index
    • Labels:
      None

      Description

      Lucene is not told to index the morphology information rendering such searches impossible.

        Attachments

          Activity

          Hide
          dmsmith DM Smith added a comment -

          Regarding using regular expressions to do a search:
          Lucene search syntax is not regular expression. It is more like unix command-line globbing. I haven't seen regular expression support in a contrib to Lucene, but that doesn't mean it is not there.

          But if not, to support regular expressions, we'll need to intercept the query and pick out the regular expression and use the regular expression to do our own search over our own store or the term dictionary.

          Show
          dmsmith DM Smith added a comment - Regarding using regular expressions to do a search: Lucene search syntax is not regular expression. It is more like unix command-line globbing. I haven't seen regular expression support in a contrib to Lucene, but that doesn't mean it is not there. But if not, to support regular expressions, we'll need to intercept the query and pick out the regular expression and use the regular expression to do our own search over our own store or the term dictionary.
          Hide
          davidib David Instone-Brewer added a comment -

          The RegEx expressions were more complicated than I had thought they would be.
          Is it time to redesign the Robinson Codes?
          They aren't particularly human-friendly or machine-friendly
          I think the latter is more important because ideally people won't see the actual coding.

          Show
          davidib David Instone-Brewer added a comment - The RegEx expressions were more complicated than I had thought they would be. Is it time to redesign the Robinson Codes? They aren't particularly human-friendly or machine-friendly I think the latter is more important because ideally people won't see the actual coding.
          Hide
          chrisburrell Chris Burrell added a comment -

          Agreed - showing the codes to the user, should be a last resort thing, as it implies that they need to learn the new system.

          Show
          chrisburrell Chris Burrell added a comment - Agreed - showing the codes to the user, should be a last resort thing, as it implies that they need to learn the new system.
          Show
          chrisburrell Chris Burrell added a comment - It seems Lucene has some support for Regular Expressions anyway: http://lucene.apache.org/core/old_versioned_docs/versions/3_0_3/api/contrib-regex/org/apache/lucene/search/regex/package-summary.html
          Hide
          dmsmith DM Smith added a comment -

          If we can create a mapping for Robinson codes to something that is better (human readable and easy to search), then we can use the mapping w/in JSword to provide a better user experience.

          Basic thought, the user would see the new codes or a decoding of these codes into their language (or the default, if there's no such translation). They can search these codes either directly or via a wizard (what is done would be a front-end choice).

          It may be that the underlying module uses the old codes. That'd be ok. Not ideal. The search would reverse the mapping going from the new codes to the old codes and use that to search the module. Likewise, when presenting the module, the old codes would be replace with the new codes. This would be a process of normalization, which we do currently for Strong's numbers.

          We may want to explore the idea of a module sidecar. On various occasions, I've wanted finer grain information regarding a module. Basically, we'd maintain a separate conf for the modules. It'd contain information regarding thing like: user provided font info, unlock keys, type of Strong's numbers per testament, type of morphology per testament, .... Any program can set a value into the sidecar. This info would be read into BookMetadata and would be available for all programs. If a program doesn't know what to do with it, it'd ignore it. It would be good to communicate and document these new values. Automatic behavior that's added to JSword would need to be discussed.

          Show
          dmsmith DM Smith added a comment - If we can create a mapping for Robinson codes to something that is better (human readable and easy to search), then we can use the mapping w/in JSword to provide a better user experience. Basic thought, the user would see the new codes or a decoding of these codes into their language (or the default, if there's no such translation). They can search these codes either directly or via a wizard (what is done would be a front-end choice). It may be that the underlying module uses the old codes. That'd be ok. Not ideal. The search would reverse the mapping going from the new codes to the old codes and use that to search the module. Likewise, when presenting the module, the old codes would be replace with the new codes. This would be a process of normalization, which we do currently for Strong's numbers. We may want to explore the idea of a module sidecar. On various occasions, I've wanted finer grain information regarding a module. Basically, we'd maintain a separate conf for the modules. It'd contain information regarding thing like: user provided font info, unlock keys, type of Strong's numbers per testament, type of morphology per testament , .... Any program can set a value into the sidecar. This info would be read into BookMetadata and would be available for all programs. If a program doesn't know what to do with it, it'd ignore it. It would be good to communicate and document these new values. Automatic behavior that's added to JSword would need to be discussed.

            People

            • Assignee:
              dmsmith DM Smith
              Reporter:
              chrisburrell Chris Burrell
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: