Day 2 - More Reading LRL MT Related Papers
Abduselam,•low resource languages
August 24th, 2025
- Continuing off reading Prompt Engineering Enhances Faroese MT, but Only Humans Can Tell (opens in a new tab):
- TODO: idea came while reading to research: what if you have multiple trained adapters for an MT model with each adapter trained for language-style? Then you have a multilabel classifier for determining the formality classification which is connected to the adapter of the MT model which helps determine the best way to translate the sentence.
- Remember what “shared tasks” means: basically challenges shared by a community, typically in these nlp mt papers it is talking about tasks posted by WMT, it's “shared” with the community to do their experiments and publish results.
- TODO: An idea popped up when reading about the leaderboard which is to use Oromo.AI domain for a leaderboard on best Oromo AI models. It can be used to increase visibility
- Things to keep in mind, the best low-resource language models to use as a benchmark are firstly NLLB(facebook’s and they have 2 sizes), and also MADLAD400(google’s), and Google Translate.
- Got sidetracked and remembered Isaac Casweel, consider using their open source training data from SMOL (opens in a new tab), MADLAD (opens in a new tab), and GATITOS (opens in a new tab). Notes: FLORES-101 (opens in a new tab) and FLORES-200 (opens in a new tab) are important datasets to use during evaluation.
- Important evaluation metrics commonly seen are BLEU and chrF
- TODO: research LLM based evaluation metrics to see how effective they are. I suspect they would be somewhat effective only if they are experts in the target language
- TODO: idea came to do daily blog style posts about my low-resource language research journey, need free (no overhead), best auto SEO capabilities, auto keyword tagging, minimalistic/UX focused, native content editing.
- Remembered n-grams: basically the n can be any value so if it was 1, it’d be 1-gram commonly known as unigram meaning single word matches.
- Remembered BLEU score thru this article: Foundations of NLP Explained - Bleu Score and WER Metrics | Towards Data Science (opens in a new tab). Basically average the clipped precisions across 1-gram, 2-grams, 3-grams, and 4-grams scaled by the comparative length of gold and predicted sentences. Has pretty big downsides, namely: word synonms, conjugations, and different ordering of predicated sentences rated down even though they could be totally acceptable to a human rater. Also remembered WER which is super strict since no ambiguity really in ASR, WER not meant for MT.
- TODO: research multimodal low resource languages that have hearing, vision, reasoning.