Uploaded image for project: 'Sakai'
  1. Sakai
  2. SAK-41096

Site Info > Manage Groups > Bulk Import from CSV > byte order mark bug

    Details

    • Type: Bug
    • Status: CLOSED
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 11.4, 12.5, 19.0, 20.0 [Tentative]
    • Fix Version/s: 19.0, 20.0 [Tentative]
    • Component/s: Site Info
    • Labels:
      None
    • 19 status:
      Resolved
    • Test Plan:
      Hide
      1. Modify the two attached files to input valid users for your site (if necessary)
        1. Ensure you preserve the BOM in the TestGroup.csv file
      2. Attempt to upload each file
      3. Verify the results are correct
      Show
      Modify the two attached files to input valid users for your site (if necessary) Ensure you preserve the BOM in the TestGroup.csv file Attempt to upload each file Verify the results are correct

      Description

      When using some OS/text editor combinations, a CSV file can be created with or without a BOM (byte order mark), which is essentially data in the file which the user cannot see and precedes the user entered text. If a file containing a BOM is used as a bulk upload for creating groups, the results may be unexpected and erroneous.

      Take for example the two attached CSV files on this ticket (TestGroup.csv and TestGroupWithoutBom.csv). If you open these files in a text editor, you will notice that they're virtually identical (they're both creating one group – G1 --, with several users). However, if you inspect the files on the command line you'll notice subtle differences:

      TestGroup.csv:

      $ od -a TestGroup.csv
      0000000   o   ;   ?   G   1   ,   s   t   u   d   e   n   t   0   0   6
      0000020   1  cr   G   1   ,   s   t   u   d   e   n   t   0   0   6   2
      0000040  cr   G   1   ,   s   t   u   d   e   n   t   0   0   6   3  cr
      0000060   G   1   ,   s   t   u   d   e   n   t   0   0   6   4  cr   G
      0000100   1   ,   s   t   u   d   e   n   t   0   0   6   5  cr   G   1
      0000120   ,   s   t   u   d   e   n   t   0   0   6   6  cr   G   1   ,
      0000140   s   t   u   d   e   n   t   0   0   6   7  cr   G   1   ,   s
      0000160   t   u   d   e   n   t   0   0   6   8  cr   G   1   ,   s   t
      0000200   u   d   e   n   t   0   0   6   9  cr   G   1   ,   s   t   u
      0000220   d   e   n   t   0   0   7   0
      

      TestGroupWithoutBom.csv:

      $ od -a TestGroupWithoutBom.csv 
      0000000   G   1   ,   s   t   u   d   e   n   t   0   0   8   1  cr   G
      0000020   1   ,   s   t   u   d   e   n   t   0   0   8   2  cr   G   1
      0000040   ,   s   t   u   d   e   n   t   0   0   8   3  cr   G   1   ,
      0000060   s   t   u   d   e   n   t   0   0   8   4  cr   G   1   ,   s
      0000100   t   u   d   e   n   t   0   0   8   5  cr   G   1   ,   s   t
      0000120   u   d   e   n   t   0   0   8   6  cr   G   1   ,   s   t   u
      0000140   d   e   n   t   0   0   8   7  cr   G   1   ,   s   t   u   d
      0000160   e   n   t   0   0   8   8  cr   G   1   ,   s   t   u   d   e
      0000200   n   t   0   0   9   9
      

      You can clearly see that the TestGroup.csv file has some strange data preceeding the first occurrence of "G1", however looking at the file using a text editor or word processor, you would be oblivious to this difference.

      The result of uploading these two seemingly identical files, is that the TestGroup.csv would produce two groups with the identical title "G1", where the first group would contain the first user in the file, and the second would contain the remainder. The user in this situation would be very confused at the results:

      Uploading TestGroupWithoutBom.csv works as expected, because it does not contain the BOM at the beginning of the file.

      To resolve the issue in the file with the BOM while still maintaining compatibility for files without the BOM, we simply need to make use of Apache's BOMInputStream when reading in the CSV file.

      References:

        Gliffy Diagrams

          Attachments

          1. 01032019_csv1.gif
            01032019_csv1.gif
            10 kB
          2. 01032019_csv2.gif
            01032019_csv2.gif
            15 kB
          3. 01032019_group.gif
            01032019_group.gif
            8 kB
          4. fileWithBomSplitGroups.png
            fileWithBomSplitGroups.png
            71 kB
          5. TestGroup.csv
            0.1 kB
          6. TestGroupWithoutBom.csv
            0.1 kB

            Activity

              People

              • Assignee:
                bjones86 Brian Jones
                Reporter:
                bjones86 Brian Jones
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Git Source Code