Day 12 - Side Quest (More Reproducibility)
Abduselam,•reproducibility
September 9th, 2025:
- Thoughts read https://arxiv.org/pdf/1909.06674 (opens in a new tab):
- The idea of independent reproducibility seems important to a prior thought/issue i had with research having a life span.
- Keep reading “taxonomy”, basically its a system to classifying info, ideas, concepts. Like how with biology theres “families” and “genus” and “species” and “mammal or reptile or etc” or whatever. These are all classifications and this paper was mentioning there isn't an agreed taxonomy for what constitutes a “rigorous” paper.
- Paper references this talk: Ali Rahimi's talk at NIPS(NIPS 2017 Test-of-time award presentation) (opens in a new tab). Honestly a really good presentation, i understood like 65% maybe 70%, but the underlying issue at hand was a call for more rigor and trying to understand the underpinnings of techniques that we so heavily use like batch norm rather than just continuing as if ML is alchemy (i.e. trying to apply ML to everything)
- Also, big lmao becuz there is a whole test-of-time award which i was just thinking bought knowledge having a lifespan. Seems like there's drama with trying to get awards in general though since i searched it up i found https://www.reddit.com/r/MachineLearning/comments/1hctf36/d_the_winner_of_the_neurips_2024_best_paper_award/ (opens in a new tab). I think when theres awards in any field of anything theres always a ppl trying to do whatever to get the top place or award. Idk if that drama is real, i barely read it and got annoyed, i dont like reading about this kinda stuff
- Talk mentioned Levenberg–Marquardt algorithm - Wikipedia (opens in a new tab) which was sorta surprising that it optimized the problem way faster as it reached a low loss of 0. TODO: look into this optimizer more.
- It’d be interesting to take an Islamic approach of verifying and rejecting Hadith but towards research papers and techniques, so that we do what Ali is getting at, which is trying to understand ML and not just do alchemy. This might be way too rigorous for the field though. TODO: research this more. To do this, I should first clarify the goals and then look up if there are any indexes or anything or people doing this.
- Stumbled upon https://en.wikipedia.org/wiki/John_Ioannidis (opens in a new tab) who seems relevant. I guess the field is called “meta-analysis (opens in a new tab)” or “meta-research (opens in a new tab)”.
- Found this paper of John’s https://en.wikipedia.org/wiki/Why_Most_Published_Research_Findings_Are_False (opens in a new tab) which lists causes of high false positive rates one of which is preregistration (opens in a new tab) which after reading it seems like a good idea which doesn’t seem like its pretty established within computer science, let alone machine learning. Anyway, found this Pre-registration for Predictive Modeling (opens in a new tab) which is sorta recent and relevant. TODO: I skimmed most of these above listed links but they all seem relevant, I should go thru and see if i can apply some of John’s listed causes (which mostly apply to the medical field) but towards machine learning. Anyway, seems like a lot of people are trying to solve this issue of reproducibility and basically publishing high quality research, but I don’t see any that consider other factors like who the person is, the institution they work for, their affiliations, if they have in the past been caught fabricating works or publishing unreliable work, and other things like that. This seems like an important way to easily filter out and ignore certain papers, one could start by creating a database of researchers like in Islam how there are reports about individuals from others.
- I should continue to research the actual things mentioned here before jumping to conclusions for new approaches: https://chatgpt.com/share/68c0d809-ce7c-8013-a4a3-ed2115d25671 (opens in a new tab).
- Another relevant resource: Retraction Watch (opens in a new tab)
- Wow, gift authorship is a thing but im not surprised:
- Emailed Jesse to hear his thoughts about preregistration in ML.
- TODO: look into all tests mentioned in section 3 (Results)
- Emailed Jesse about preregistration