Article
Digital Version
Annif and Finto AI : Developing and Implementing Automated Subject Indexing
265-282 p.
- Manually indexing documents for subject-based access is a labour-intensive process that can be automated using AI technology. Algorithms for text classification must be trained and tested with examples of indexed documents, which can be obtained from existing bibliographic databases and digital collections. The National Library of Finland has created Annif, an open source toolkit for automated subject indexing and classification. Annif is multilingual, independent of the indexing vocabulary, and modular. It integrates many text classification algorithms, including Maui, fastText, Omikuji, and a neural network model based on TensorFlow. Best results can often be obtained by combining several algorithms. Many document corpora have been used for training and evaluating Annif. Finding the algorithms and configurations that give the best quality is an ongoing effort.
- In May 2020, we launched Finto AI, a service for automated subject indexing based on Annif. It provides a simple Web form for obtaining subject suggestions for text. The functionality is also available as a REST API. Many document repositories and the cataloguing system for electronic publications at the National Library of Finland are using it to integrate semi-automated subject indexing into their metadata workflows. In the future, we are going to extend Annif with more algorithms and new functionality, and to integrate Finto AI with other metadata management workflows. [Publisher's text].
-
Informations
Code DOI : 10.4403/jlis.it-12740
ISSN: 2038-1026
KEYWORDS
- Automated subject indexing, Artificial intelligence, Machine learning
-
Dans le même fichier
- JLIS : It is a growing journal
- Universal Bibliographic Control today : preliminary remarks
- Conference BC 2021
- Universal bibliographic control in the digital ecosystem : opportunities and challenges
- Standards in a new bibliographic world
- Bibliographic control in the fifth information age.
- Follow me to the library! : Bibliographic data in a discovery driven world
- Collocation and Hubs : Fundamental and New Version
- Universal bibliographic control in the semantic web : Opportunities and challenges for the reconciliation of bibliographic data models
- Control or Chaos : Embracing Change and Harnessing Innovation in an Ecosystem of Shared Bibliographic Data
- The multilingual challenge in bibliographic description and access
- Rethinking bibliographic control in the light of IFLA LRM entities : the ongoing process at the National library of France
- The future of bibliographic services in light of new concepts of authority control
- New Challenges in Metadata Management between Publishers and Libraries
- Two-dimensional books for the new Open Access academic publishing
- Bibliographic control and institutional repositories : welcome to the jungle
- In the mangrove society : a collaborative Legal Deposit management hypothesis for the preservation of and permanent access to the national cultural heritage
- Thesauri in the digital ecosystem
- How to build an Identifiers' policy : the BnF use case
- The International Standard Name Identifier : extending identity management across the global metadata supply chain
- VIAF and the linked data ecosystem
- Call me by your name : towards an authority data control shared between archives and libraries
- Shouls catalogue wade in open water?
- The National Library of Norway : policies and services
- The Italian National Bibliography today
- Artificial intelligence, machine learning and bibliographic control : DDC Short Numbers : Towards machine-based classifying
- Annif and Finto AI : Developing and Implementing Automated Subject Indexing
- Towards an open and collaborative Authority Control
- Wikidata : a new perspective towards universal bibliographic control
- Discoverability in the IIIF digital ecosystem
- Bibliographic Control of Research Datasets : reflections from the EUI Library
- Integrated Search System : evolving the authority files
- DREAM : A project about non-Latin script data
- Two Projects and a Thesaurus : Experiences in the Management, Descriptions and Indexing of Oral Sources
- The bibliographic control of music in the digital ecosystem : The case of the Bayerische Staatsbibliothek (BSB)
- Riviste digitali e digitalizzate italiane (RIDI) : a reconnaissance for the national newspaper library