We talked about:
Types of sarcasm:
Co-existence of positive and negative emotions in the text.
This is the type that was explored in Ameeta Agrawal's work, Leveraging Transitions of Emotions for Sarcasm Detection.
I mentioned that the presence of transition words for opposition/contradiction may indicate genuine, impartial attempts to cover both sides of the argument (consider IAC).
Pointing out issues that should be common sense.
I talked about a real life example that happened at a closed car rental place.
Zhang is the author of the MUSE model. 3 types of document relationships:
textual entailment, and
Author of this.
On the change of the city name Bengalore to Bengaluru:
Bangalore was the British spelling. In the local official language (Kannada), it is spelled Bengaluru.
This is similar to how Calcutta was renamed to Kolkata.
Arabic-specific search engines: Yamli, Eiktub, and Yoolki
Farasa: text processing toolkit for Arabic (note to self: supports diacritization!)
Clayton Coupla -- easier to get probability distribution function
Used in this paper.
SPot: A Tool for Identifying Operating Segments in Financial Tables: similar to my prior work at WRDS, with these differences:
8-K instead of 10-K
parsing XML/HTML instead of plain text
Web Table Retrieval using Multimodal Deep Learning: record, schema, and facet.
JASSjr: The Minimalistic BM25 Search Engine for Teaching and Learning Information Retrieval: written by the original author of JASS, JASSjr is only 400 lines of C++ code.
OpenNIR: the framework upon which the code for Expansion via Prediction of Importance with Contextualization was built upon. Also written single-handedly by Sean MacAvaney.
Works by Omar Khattab:
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
My follow-up question was: Are the queries padded by appending [MASK] tokens to the orginal tokens only? I wonder what happens if you insert [MASK] tokens randomly *between the orginal tokens. Intuitively, it would probably enhance the robustness of ColBERT to variations of the same query.
Finding the Best of Both Worlds: Faster and More Robust Top-k Document Retrieval
Efficient Document Re-Ranking for Transformers by Precomputing Term Representations
Crowdsourcing platforms in Japan:
Package for extracting topics/topic modeling:
Text Retrieval Conference (TREC): A program of NIST. De-facto standard of benchmarking IR work.