Uploaded image for project: 'Sakai'
  1. Sakai
  2. SAK-38740

Cannot unzip file using non-UTF-8 for filename encoding

    Details

    • Property addition/change required:
      Yes
    • Previous Issue Keys:
      KNL-1221

      Description

      Test Plan:
      1. Edit the sakai.properties file to enable unzip file in content.
      2. Update the ChineseFile.zip to content tool. The zip file only contain a zero-size file with Chinese file name encoded with GBK (a Chinese charset).
      3. Unzip the file in the content tool. A folder named 'ChineseFile' is created as expected, but there's no file extracted.

      Explain:
      1. zip doesn't contain the information of how to encode the file name(not content). In my tests, Linux programs encode file names with UTF-8, and Windows encode file names depend on language version (in Chinese version Windows, most, programs use GBK, and 7-zip always using UTF-8)
      2. JDK6 seems using ISO-8859-1 to unzip file, and JDK7 seems using UTF-8 to unzip file by default. Neither can handle file with GBK encoded.
      3. Good news, JDK7 provides a new constructor ZipFile(File file, int mode, Charset charset) can specified which charset to be used. But this also limits the JDK version.

      Solution:
      1. Add a new property content.zip.charset to allow user define which charsets would be tried to extract the zip file one by one. And the default setting is only try UTF-8

      To handle the ChineseFile.zip, the following properties may be added.
      content.zip.charset.count=2
      content.zip.charset.1=UTF-8
      content.zip.charset.2=GBK

      Patch is attached via 2.9.x

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                gaojun Gao Jun
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Git Source Code