Arvutiteaduse instituut
  1. Kursused
  2. 2020/21 sügis
  3. Keeletehnoloogia seminar (MTAT.06.046)
EN
Logi sisse

Keeletehnoloogia seminar 2020/21 sügis

  • General
  • Topics

Topics

  1. Introduction introduction
  2. Problem overview I: user-generated content
    Barbara Plank (2016). What to do about non-standard (or non-canonical) language in NLP. https://www.linguistics.rub.de/konvens16/pub/2_konvensproc.pdf
    Submit your review here
  3. Problem overview II: historical texts
    Michael Piotrowski (2012). [[Attach:Historical.pdf] Natural Language Processing for Historical Texts.] Chapters 3: "Spelling in Historical Texts" and 6: "Handling Spelling Variation".
    Submit your review here
  4. Token-based spelling variant detection (to be discussed on Oct 2)
    Fabian Barteld, Chris Biemann and Heike Zinsmeister (2019). Token-based spelling variant detection in Middle Low German texts. Attach:Barteld_2019.pdf
    Submit your review here
  5. Using neural machine translation models for historical text normalization (to be discussed on Oct9)
    Gongbo Tang, Fabienne Cap, Eva Pettersson, and Joakim Nivre (2018). An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization. Attach:evaluation_NMT_historical.pdf
    Submit your review here
  6. Normalising Slovene data: historical texts vs. user-generated content (to be discussed on Oct 16)
    Nikola Ljubešic, Katja Zupan, Darja Fišer, Tomaž Erjavec (2016). Normalising Slovene data: historical texts vs. user-generated content.
    Attach:historical_user_v6rdlus.pdf
    Submit your review here
  7. Enhancing BERT for Lexical Normalization (to be discussed on Oct 23)
    Benjamin Muller, Benoı̂t Sagot and Djamé Seddah (2019) Enhancing BERT for Lexical Normalization.
    Attach:BERT_for_lexical_normalization.pdf
    Submit your review here
  8. Normalizing SMS: are two metaphors better than one? (to be discussed on Oct 30)
    Catherine Kobus, François Yvon and Géraldine Damnati (2008). Normalizing SMS: are two metaphors better than one?
    Attach:Normalizing_SMS
    Submit your review here
  9. Normalization of Indonesian-English Code-Mixed Twitter Data (to be discussed on Nov 6)
    Anab Maulana Barik, Rahmad Mahendra, Mirna Adriani (2019). Normalization of Indonesian-English Code-Mixed Twitter Data.
    [[https://www.aclweb.org/anthology/D19-5554.pdf]
    Submit your review here
  10. Neural text normalization with adapted decoding and POS features (to be discussed on Nov 13 and Nov 20)
    T. Ruzsics1, M. Lusetti, A. Göhring, T. Samardžic and E. Stark (2019). Neural text normalization with adapted decoding and POS features.
    This is a long one. We will split it up in two parts: we will discuss Sections 1-4 on Nov 13 and Sections 5-7 on Nov 20.
    Attach:Swiss German
    Submit your review on Sections 1-4 here
  11. Neural text normalization with adapted decoding and POS features: Sections 5-7 (to be discussed on Nov 20)
    Submit your review on Sections 5-7 here
  12. Evaluation and impact on parsing: two short articles to be discussed on Nov 27.
    Rob van der Goot, Rik van Noord, Gertjan van Noord (2018). A Taxonomy for In-depth Evaluation of Normalization for User Generated Content.
    Attach:Taxonomy for Evaluation
    AND
    Rob van der Goot (2019). An In-depth Analysis of the Effect of Lexical Normalization on the Dependency Parsing of Social Media.
    Attach:Effect on Parsing
    Submit your review on both articles here
  13. Text normalization for speech applications
    Hao Zhang, Richard Sproat, Axel H. Ng, Felix Stahlberg, Xiaochang Peng, Kyle Gorman and Brian Roark (2019). Neural Models of Text Normalization for Speech Applications. - Computational Linguistics vol. 45 nr 2.
    https://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00349
    This is a long and thorough article. We will read Sections 1-4 for Dec 4 seminar and then we have to decide whether we would like to
    a) spend the remaining two seminars reading and discussing the rest of it or
    b) turn to clinical text normalization instead
    I will add the articles about clinical text normalization below here, so you can take a quick look before deciding.
    Submit your [rew12|review on Sections 1-4]] here
    NB! Something is wrong, you can't submit your reviews here. Please send them kadri.muischnek@ut.ee

We will discuss the articles about clinical text normalization at the last two seminars

  1. Clinical text normalization: abbreviation expansion
    Maria Kvist and Sumithra Velupillai 2014. SCAN: A Swedish Clinical Abbreviation Normalizer Further Development and Adaptation to Radiology.
    Attach:Clinical abbreviation
    Submit your review on both articles? here
  2. Normalization of medical forum texts
    Anne Dirkson, Suzan Verberne & Wessel Kraaij (2019). Lexical Normalization of User-Generated Medical Forum Data
    https://www.aclweb.org/anthology/W19-3202.pdf
  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Courses’i keskkonna kasutustingimused