Topics
- Introduction introduction
- Problem overview I: user-generated content
Barbara Plank (2016). What to do about non-standard (or non-canonical) language in NLP. https://www.linguistics.rub.de/konvens16/pub/2_konvensproc.pdf
Submit your review here
- Problem overview II: historical texts
Michael Piotrowski (2012). [[Attach:Historical.pdf] Natural Language Processing for Historical Texts.] Chapters 3: "Spelling in Historical Texts" and 6: "Handling Spelling Variation".
Submit your review here
- Token-based spelling variant detection (to be discussed on Oct 2)
Fabian Barteld, Chris Biemann and Heike Zinsmeister (2019). Token-based spelling variant detection in Middle Low German texts. Attach:Barteld_2019.pdf
Submit your review here
- Using neural machine translation models for historical text normalization (to be discussed on Oct9)
Gongbo Tang, Fabienne Cap, Eva Pettersson, and Joakim Nivre (2018). An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization. Attach:evaluation_NMT_historical.pdf
Submit your review here
- Normalising Slovene data: historical texts vs. user-generated content (to be discussed on Oct 16)
Nikola Ljubešic, Katja Zupan, Darja Fišer, Tomaž Erjavec (2016). Normalising Slovene data: historical texts vs. user-generated content.
Attach:historical_user_v6rdlus.pdf
Submit your review here
- Enhancing BERT for Lexical Normalization (to be discussed on Oct 23)
Benjamin Muller, Benoı̂t Sagot and Djamé Seddah (2019) Enhancing BERT for Lexical Normalization.
Attach:BERT_for_lexical_normalization.pdf
Submit your review here
- Normalizing SMS: are two metaphors better than one? (to be discussed on Oct 30)
Catherine Kobus, François Yvon and Géraldine Damnati (2008). Normalizing SMS: are two metaphors better than one?
Attach:Normalizing_SMS
Submit your review here
- Normalization of Indonesian-English Code-Mixed Twitter Data (to be discussed on Nov 6)
Anab Maulana Barik, Rahmad Mahendra, Mirna Adriani (2019). Normalization of Indonesian-English Code-Mixed Twitter Data.
[[https://www.aclweb.org/anthology/D19-5554.pdf]
Submit your review here
- Neural text normalization with adapted decoding and POS features (to be discussed on Nov 13 and Nov 20)
T. Ruzsics1, M. Lusetti, A. Göhring, T. Samardžic and E. Stark (2019). Neural text normalization with adapted decoding and POS features.
This is a long one. We will split it up in two parts: we will discuss Sections 1-4 on Nov 13 and Sections 5-7 on Nov 20.
Attach:Swiss German
Submit your review on Sections 1-4 here - Neural text normalization with adapted decoding and POS features: Sections 5-7 (to be discussed on Nov 20)
Submit your review on Sections 5-7 here - Evaluation and impact on parsing: two short articles to be discussed on Nov 27.
Rob van der Goot, Rik van Noord, Gertjan van Noord (2018). A Taxonomy for In-depth Evaluation of Normalization for User Generated Content.
Attach:Taxonomy for Evaluation
AND
Rob van der Goot (2019). An In-depth Analysis of the Effect of Lexical Normalization on the Dependency Parsing of Social Media.
Attach:Effect on Parsing
Submit your review on both articles here - Text normalization for speech applications
Hao Zhang, Richard Sproat, Axel H. Ng, Felix Stahlberg, Xiaochang Peng, Kyle Gorman and Brian Roark (2019). Neural Models of Text Normalization for Speech Applications. - Computational Linguistics vol. 45 nr 2.
https://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00349
This is a long and thorough article. We will read Sections 1-4 for Dec 4 seminar and then we have to decide whether we would like to
a) spend the remaining two seminars reading and discussing the rest of it or
b) turn to clinical text normalization instead
I will add the articles about clinical text normalization below here, so you can take a quick look before deciding.
Submit your [rew12|review on Sections 1-4]] here
NB! Something is wrong, you can't submit your reviews here. Please send them kadri.muischnek@ut.ee
We will discuss the articles about clinical text normalization at the last two seminars
- Clinical text normalization: abbreviation expansion
Maria Kvist and Sumithra Velupillai 2014. SCAN: A Swedish Clinical Abbreviation Normalizer Further Development and Adaptation to Radiology.
Attach:Clinical abbreviation
Submit your review on both articles? here - Normalization of medical forum texts
Anne Dirkson, Suzan Verberne & Wessel Kraaij (2019). Lexical Normalization of User-Generated Medical Forum Data
https://www.aclweb.org/anthology/W19-3202.pdf