Day 2 - More Reading LRL MT Related Papers

Abduselam,Sun Aug 24 2025•low resource languages

Back

August 24th, 2025

Continuing off reading Prompt Engineering Enhances Faroese MT, but Only Humans Can Tell (opens in a new tab):
- TODO: idea came while reading to research: what if you have multiple trained adapters for an MT model with each adapter trained for language-style? Then you have a multilabel classifier for determining the formality classification which is connected to the adapter of the MT model which helps determine the best way to translate the sentence.
- Remember what “shared tasks” means: basically challenges shared by a community, typically in these nlp mt papers it is talking about tasks posted by WMT, it's “shared” with the community to do their experiments and publish results.
- TODO: An idea popped up when reading about the leaderboard which is to use Oromo.AI domain for a leaderboard on best Oromo AI models. It can be used to increase visibility
- Things to keep in mind, the best low-resource language models to use as a benchmark are firstly NLLB(facebook’s and they have 2 sizes), and also MADLAD400(google’s), and Google Translate.
- Got sidetracked and remembered Isaac Casweel, consider using their open source training data from SMOL (opens in a new tab), MADLAD (opens in a new tab), and GATITOS (opens in a new tab). Notes: FLORES-101 (opens in a new tab) and FLORES-200 (opens in a new tab) are important datasets to use during evaluation.
- Important evaluation metrics commonly seen are BLEU and chrF
- TODO: research LLM based evaluation metrics to see how effective they are. I suspect they would be somewhat effective only if they are experts in the target language
- TODO: idea came to do daily blog style posts about my low-resource language research journey, need free (no overhead), best auto SEO capabilities, auto keyword tagging, minimalistic/UX focused, native content editing.
- Remembered n-grams: basically the n can be any value so if it was 1, it’d be 1-gram commonly known as unigram meaning single word matches.
- Remembered BLEU score thru this article: Foundations of NLP Explained - Bleu Score and WER Metrics | Towards Data Science (opens in a new tab). Basically average the clipped precisions across 1-gram, 2-grams, 3-grams, and 4-grams scaled by the comparative length of gold and predicted sentences. Has pretty big downsides, namely: word synonms, conjugations, and different ordering of predicated sentences rated down even though they could be totally acceptable to a human rater. Also remembered WER which is super strict since no ambiguity really in ASR, WER not meant for MT.
- TODO: research multimodal low resource languages that have hearing, vision, reasoning.