Uploaded image for project: 'Sakai'
  1. Sakai
  2. SAK-38473

Upgrade tika to 1.14



    • Type: Bug
    • Status: Verified
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 11.1
    • Fix Version/s: 11.5 [Tentative], 12.0
    • Component/s: Kernel
    • Labels:
    • 11 status:
    • Previous Issue Keys:



      Release 1.13 - 05/08/2016

      • Upgrade to PDFBox 2.0.1 (TIKA-1285/TIKA-1959).
        MAJOR CHANGES in PDFParser:
      • The classic sequential parser is no longer available.
      • Tiff files are no longer extracted by default. See
        for optional components to process Tiff files.
      • Some truncated/corrupted files that had some content extracted
        with 1.8.x may have no content extracted in 2.0.x (see TIKA-1912).
      • The MIT-NLP Information Extraction (MITIE) Named Entity
        Recognition (NER) system is now supported in Tika
        (TIKA-1913, GitHub-108).
      • Tika now supports the use of the Yandex translation
        service (TIKA-1943, GitHub-106).
      • Tika now uses NER to extract scientific measurements
        from text using either GROBID Quantities which uses
        conditional random fields and NLTK which uses regular
        expressesions (TIKA-1917, GitHub-104).
      • Fixed JournalParser to handle null responses from
        GROBID and to log a message (TIKA-1925).
      • Refactored Language Detector into tika-landetect module,
        added default N-Gram implementation, Optimaize Lang
        Detector and MIT Text.jl implementation
        (TIKA-1872, TIKA-1696, TIKA-1723).
      • Extract metadata from MP4 videos whether or not the
        PooledTimeSeries parser is available via Aditya Dhulipala
      • Fix NPE when trying to get embedded image identifier in
        WordParser (TIKA-1956).
      • Improvements to MIME database for detection of Scientific
        and other formats present in the TREC-DD-Polar dataset
        (TIKA-1881, GitHub-85, TIKA-1883, TIKA-1884, TIKA-1886,
      • LinkContentHandler now extracts links from script tags
        via Joseph Naegele (TIKA-1937).
      • Handle per page IOExceptions more robustly in PDFParser (TIKA-1948).
      • Upgrade commons-compress to 1.11 (TIKA-1949).
      • Add detection for embedded MSChart.Graph files (TIKA-1033).
      • Fix NPE in Sqlite parser from Nick C (TIKA-1927).
      • Fix NPE in Open Document parser from Nick C (TIKA-1916).
      • Upgrade mp4parser's isoparser to 1.1.7 (TIKA-1924 and TIKA-1931).
      • Upgrade BouncyCastle to 1.54 (TIKA-1923).
      • Upgrade Jackcess to 2.1.3 (TIKA-1922).
      • Upgrade Drew Noakes' metadata-extractor to 2.8.1 (TIKA-1921).
      • Upgrade Gson in tika-serialization to 2.6.2 (TIka-1920).
      • Upgrade commons-cli in tika-batch to 1.3.1 (TIKA-1919).
      • Add XMPMM support to PDFParser and JpegParser via Jempbox (TIKA-1894).
      • Move serialization of TikaConfig to tika-core and enable dumping
        of the config file via tika-app (TIKA-1657).
      • Tika now incorporates the Natural Language Toolkit (NLTK) from the
        Python community as an option for Named Entity Recognition (TIKA-1876).
      • Add support for XFA extraction via Pascal Essiembre (TIKA-1857).
      • Upgrade to sqlite-jdbc (TIKA-1861). NOTE: this dependency
        is still <scope>provided</scope>. You need to include this dependency
        in order to parse sqlite files.
      • Upgrade to POI 3.15-beta1 (TIKA-1895).
      • Upgrade to Jackson 2.7.1 (TIKA-1869).
      • Upgrade to Apache SIS 0.6 (TIKA-1878).
      • RichTextContentHandler moved from the Server package to Core (TIKA-1870).
      • Added ZeroSizeFileDetector to support application/x-zerovalue via
        Adesh Gupta (TIKA-1885).
      • Addition of types information to Grobid quantities parser via
        Can Menekse (TIKA-1965).

        Gliffy Diagrams





                k1team KERNEL TEAM (Inactive)
                dhorwitz David Horwitz
                0 Vote for this issue
                2 Start watching this issue



                    Git Integration