Tehnike vektorskih vložitev za medijske aplikacije
Publikacije:
- Caporusso et al. (2024a). Analysing Bias in Slovenian News Media: A Computational Comparison Based on Readers’ Political Orientation. COBISS.SI-ID 222178307
- Caporusso et al. (2024b). A Computational Analysis of the Dehumanisation of Migrants from Syria and Ukraine in Slovene News Media. COBISS.SI-ID 197420291
- Chatterjee et al. (2024) The “Right” Discourse on Migration: Analysing Migration-Related Tweets in Right and Far-Right Political Movements. COBISS.SI-ID 222190851
- Đoković & Robnik Šikonja (2024). Sarcasm detection in a less-resourced language. COBISS.SI-ID 216268291
- Ghinassi et al. (2024a). Recent Trends in Linear Text Segmentation: A Survey. COBISS.SI-ID 220088323
- Ghinassi et al. (2024b). When Cohesion Lies in the Embedding Space: Embedding-Based Reference-Free Metrics for Topic Segmentation. COBISS.SI-ID 220053507
- Hosseini et al. (2025). Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly. COBISS.SI-ID 229056003
- Ivačič et al. (2024). Comparing News Framing of Migration Crises using Zero-Shot Classification. COBISS.SI-ID 199763459
- Karan et al. (2025). A Dataset for Expert Reviewer Recommendation with Large Language Models as Zero-shot Rankers. COBISS.SI-ID 229052931
- Klemen et al. (2024a). Neural spell-checker : beyond words with synthetic data generation. COBISS.SI-ID 213519107 (Available on ArXiv).
- Klemen et al. (2024b). SI-NLI: A Slovene Natural Language Inference Dataset and Its Evaluation. COBISS.SI-ID 197916931
- Kuzman & Ljubešić (2025). LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification. COBISS.SI-ID 229074179
- Kmecl & Robnik Šikonja (2024). Logično sklepanje v naravnem jeziku za slovenščino. COBISS.SI-ID 206551299
- Koloski et al. (2025). Measuring catastrophic forgetting in cross-lingual classification : transfer paradigms and tuning strategies. COBISS.SI-ID 227115523
- Koloski et al. (2024a). AutoML-guided fusion of entity and LLM-based representations for document classification. COBISS.SI-ID 224164867
- Koloski et al., (2024b). AHAM : adapt, help, ask, model harvesting LLMs for literature mining. COBISS.SI-ID 193861891
- Kuzman & Ljubešić (2025). LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification. COBISS.SI-ID 229074179
- Ljubešić & Kuzman (2024). CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation. COBISS.SI-ID 196933379
- Martinc et al. (2024). Sistem za zaznavanje sprememb v rabi besed in njegova uporaba za sociolingvistično analizo. COBISS.SI-ID 214922755
- Martinc et al. (2025). Viewpoint detection on LGBT + reporting using contextual embeddings and qualitative thematic analysis : the use case on the word deep. COBISS.SI-ID 229359363
- Mochtak et al. (2024). The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings. COBISS.SI-ID 197916931
- Piskorski et al. (2024). Overview of the CLEF-2024 CheckThat! Lab Task 3 on persuasion techniques. COBISS.SI-ID 208589315
- Vreš et al. (2024). Generative model for less-resourced language with 1 billion parameters. COBISS.SI-ID 212016131
- Žagar et al. (2024). SENTA: Sentence Simplification System for Slovene. COBISS.SI-ID 197916675
Viri:
- Brglez et al. (2024). Slovenian Emotion Dimension and Emotion Association Lexicon SloEmoLex 1.0. COBISS.SI-ID 201681923
- Ivačič et al. (2024). News sentiment analysis datasets for Serbian, Bosnian, Macedonian, Albanian and Estonian SADEmma 1.0. COBISS.SI-ID 216372227
- Krsnik et al. (2024a). Corpus extraction tool LIST 1.3. COBISS.SI-ID 218014211
- Krsnik et al. (2024b). Dependency tree extraction tool STARK 3.0. COBISS.SI-ID 206072835
- Kuzman & Ljubešić. (2024). Večjezični učni nabor novic, označenih s temami iz sheme IPTC NewsCodes Media Topic. COBISS.SI-ID 219483395
- Kuzman. Programska koda za razvoj in vrednotenje klasifikatorja za razvrščanje novic v teme IPTC NewsCodes Media Topic: IPTC Media Topic Classification. https://github.com/TajaKuzman/IPTC-Media-Topic-Classification
- Vreš et al. (2024). Slovene instruction-following dataset for large language models GaMS-Instruct-GEN 1.0. COBISS.SI-ID 218023427
- Žagar et al. (2024). Knowledge-Enhanced Winograd Schema Challenge KE-WSC 1.0. COBISS.SI-ID 219460867
Modeli:
- Ivačič. XLM-Roberta-base NER model for Slavic languages.
- Kuzman & Ljubešić. (2024). Večjezični klasifikator novic v teme po shemi IPTC NewsCodes Media Topic, Multilingual IPTC Media Topic Classifier.
Povzetki:
- Gruevska-Madžoska et al. (2024). The Macedonian language as an integral part of the multilingual encyclopedic dictionary BabelNet (BabelNet) and the BabelFy Tool (BabelFy) : on the method of representation and recognition of word meanings (current state and perspectives). COBISS.SI-ID 218806531
- Koloski & Pollak. (2024). Enabling topic-modeling for specific domains via domain-adaptation of LLMs. COBISS.SI-ID 220002819
Delavnice:
Organizirali smo SLaLaM 2023, prvo slovensko delavnico o tehnikah in aplikacijah velikih jezikovnih modelov. Zbornik je na voljo tukaj:
Jaya Caporusso, Nada Lavrač (eds.) (2023). Proceedings of SLaLaM 2023, 1st Slovenian Workshop on Large Language Models: Techniques and Applications. Bernardin, Slovenia.
Trajanje projekta: od 1. 10. 2023 do 30. 9. 2026
Financiranje: To delo je sofinancirala Javna agencija za raziskovalno in inovacijsko dejavnost Republike Slovenije iz državnega proračuna v okviru raziskovalnega projekta Tehnike vektorskih vložitev za medijske aplikacije (št. L2- 50070, sofinanciranega s strani agencije Kliping d.o.o.).