KingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agoMM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasksplus-squareaclanthology.orgexternal-linkmessage-square0fedilinkarrow-up12arrow-down10
arrow-up12arrow-down1external-linkMM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasksplus-squareaclanthology.orgKingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agomessage-square0fedilink
KingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agoDemystifying CLIP Dataplus-squarearxiv.orgexternal-linkmessage-square0fedilinkarrow-up12arrow-down10
arrow-up12arrow-down1external-linkDemystifying CLIP Dataplus-squarearxiv.orgKingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agomessage-square0fedilink
minus-squareKingsmanVince@kbin.socialOPtoMachine Learning@kbin.social•PaLI-3 Vision Language Models: Smaller, Faster, Strongerlinkfedilinkarrow-up1·1 year agoindeed it would be great if the authors did so. I personally found some non-official implementations: https://github.com/kyegomez/PALI https://github.com/ahmdtaha/distributed_sigmoid_loss linkfedilink
minus-squareKingsmanVince@kbin.socialOPtoMachine Learning@kbin.social•PaLI-3 Vision Language Models: Smaller, Faster, Strongerlinkfedilinkarrow-up1·edit-21 year ago SigLIP PaLI PaLI-X linkfedilink
KingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agoPaLI-3 Vision Language Models: Smaller, Faster, Strongerplus-squarearxiv.orgexternal-linkmessage-square3fedilinkarrow-up12arrow-down10
arrow-up12arrow-down1external-linkPaLI-3 Vision Language Models: Smaller, Faster, Strongerplus-squarearxiv.orgKingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agomessage-square3fedilink
KingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agoMiniGPT-v2: large language model as a unified interface for vision-language multi-task learningplus-squarearxiv.orgexternal-linkmessage-square0fedilinkarrow-up13arrow-down10
arrow-up13arrow-down1external-linkMiniGPT-v2: large language model as a unified interface for vision-language multi-task learningplus-squarearxiv.orgKingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agomessage-square0fedilink
KingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agoFinetune Like You Pretrain: Improved Finetuning of Zero-Shot Vision Modelsplus-squareopenaccess.thecvf.comexternal-linkmessage-square0fedilinkarrow-up12arrow-down10
arrow-up12arrow-down1external-linkFinetune Like You Pretrain: Improved Finetuning of Zero-Shot Vision Modelsplus-squareopenaccess.thecvf.comKingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agomessage-square0fedilink
minus-squareKingsmanVince@kbin.socialtoMachine Learning@kbin.social•Think before you speak: Training Language Models With Pause Tokenslinkfedilinkarrow-up1·1 year agoIIRC DeTr generate a sequence to predict boxes of objects. I think this paradigm can be applied to such models. “Think before you locate” could be a new path to explore. linkfedilink
KingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agoCLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say Noplus-squarearxiv.orgexternal-linkmessage-square0fedilinkarrow-up13arrow-down10
arrow-up13arrow-down1external-linkCLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say Noplus-squarearxiv.orgKingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agomessage-square0fedilink
KingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agoScaling Vision-Language Models with Sparse Mixture of Expertsplus-squarearxiv.orgexternal-linkmessage-square0fedilinkarrow-up13arrow-down10
arrow-up13arrow-down1external-linkScaling Vision-Language Models with Sparse Mixture of Expertsplus-squarearxiv.orgKingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agomessage-square0fedilink
KingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agoHydra-MoE: A new class of Open-Source Mixture of Expertsplus-squaregithub.comexternal-linkmessage-square0fedilinkarrow-up14arrow-down10
arrow-up14arrow-down1external-linkHydra-MoE: A new class of Open-Source Mixture of Expertsplus-squaregithub.comKingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agomessage-square0fedilink
KingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agoBridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasksplus-squarearxiv.orgexternal-linkmessage-square0fedilinkarrow-up13arrow-down10
arrow-up13arrow-down1external-linkBridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasksplus-squarearxiv.orgKingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agomessage-square0fedilink
KingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agoFoundational Models Defining a New Era in Vision: A Survey and Outlookplus-squarearxiv.orgexternal-linkmessage-square0fedilinkarrow-up13arrow-down10
arrow-up13arrow-down1external-linkFoundational Models Defining a New Era in Vision: A Survey and Outlookplus-squarearxiv.orgKingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agomessage-square0fedilink
minus-squareKingsmanVince@kbin.socialOPtoMachine Learning@kbin.social•Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-traininglinkfedilinkarrow-up1·1 year agohttps://github.com/FudanDISC/weakly-supervised-mVLP/tree/master linkfedilink
KingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agoUnifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-trainingplus-squareaclanthology.orgexternal-linkmessage-square1fedilinkarrow-up15arrow-down10
arrow-up15arrow-down1external-linkUnifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-trainingplus-squareaclanthology.orgKingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agomessage-square1fedilink
minus-squareKingsmanVince@kbin.socialOPtoMachine Learning@kbin.social•MaMMUT: A Simple Architecture for Joint Learning for MultiModal Taskslinkfedilinkarrow-up1·1 year agoRelated links: https://github.com/lucidrains/MaMMUT-pytorch https://ai.googleblog.com/2023/05/mammut-simple-vision-encoder-text.html linkfedilink
KingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agoMaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasksplus-squarearxiv.orgexternal-linkmessage-square1fedilinkarrow-up13arrow-down10
arrow-up13arrow-down1external-linkMaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasksplus-squarearxiv.orgKingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agomessage-square1fedilink
KingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agoVision Language Transformers: A Surveyplus-squarearxiv.orgexternal-linkmessage-square0fedilinkarrow-up13arrow-down10
arrow-up13arrow-down1external-linkVision Language Transformers: A Surveyplus-squarearxiv.orgKingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agomessage-square0fedilink
KingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agoVisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Useplus-squarearxiv.orgexternal-linkmessage-square0fedilinkarrow-up13arrow-down10
arrow-up13arrow-down1external-linkVisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Useplus-squarearxiv.orgKingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agomessage-square0fedilink
KingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agoRecycleGPT: An Autoregressive Language Model with Recyclable Moduleplus-squarearxiv.orgexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkRecycleGPT: An Autoregressive Language Model with Recyclable Moduleplus-squarearxiv.orgKingsmanVince@kbin.social to Machine Learning@kbin.social · 1 year agomessage-square0fedilink
KingsmanVince@kbin.social to Programmer Humor@kbin.social · 1 year agoAI replaces programmers for realmedia.kbin.socialimagemessage-square0fedilinkarrow-up15arrow-down10
arrow-up15arrow-down1imageAI replaces programmers for realmedia.kbin.socialKingsmanVince@kbin.social to Programmer Humor@kbin.social · 1 year agomessage-square0fedilink
KingsmanVince@kbin.social to Random@kbin.social · 1 year ago```plus-squaremessage-squaremessage-square0fedilinkarrow-up12arrow-down10
arrow-up12arrow-down1message-square```plus-squareKingsmanVince@kbin.social to Random@kbin.social · 1 year agomessage-square0fedilink
minus-squareKingsmanVince@kbin.socialtoMachine Learning@kbin.social•Machine Learning Beginner Info/Resourceslinkfedilinkarrow-up1·1 year agoI also want to share some resources. For Pytorch, https://pytorch.org/tutorials/ their basic tutorials are fundamental but some more advanced tutorials might be outdated. https://www.learnpytorch.io/ the author guides mostly in computer vision but he gives the overview from research to production. For TPU, https://github.com/ayaka14732/tpu-starter full guideline using TPUs with Jax linkfedilink
indeed it would be great if the authors did so. I personally found some non-official implementations: