The Top Machine Learning Research of June 2024

ODSC - Open Data Science
6 min readJul 24, 2024

--

As we saw last month, modern AI and Machine Learning are moving faster than the speed of light. Well, it kinda feels that way. Though we know better, it’s amazing to see rapid advancements are going. We’re going to see what moved the world of machine learning and AI and see where the direction of the field is going thanks to the latest machine learning research.

This month, there’s a great deal of robotics, synthetic data, multilingual advancements, and more.

Get your ODSC Europe 2024 pass today!

In-Person and Virtual Conference

September 5th to 6th, 2024 — London

Featuring 200 hours of content, 90 thought leaders and experts, and 40+ workshops and training sessions, Europe 2024 will keep you up-to-date with the latest topics and tools in everything from machine learning to generative AI and more.

REGISTER NOW

EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data

Most reinforcement learning (RL) methods typically focus on learning optimal policies over low-level action spaces, which limits their flexibility to adapt to new tasks. To address this, RL agents that utilize temporally extended skills can learn new tasks more effectively. Traditional skill-based RL approaches often require expert supervision to define useful skills or rely on heuristics from offline data, both of which hinder their adaptability and transferability.

The new approach, EXTRACT, overcomes these limitations by leveraging pre-trained vision-language models to autonomously extract semantically meaningful skills from offline data without human supervision. Each skill is parameterized by continuous arguments, enabling robots to learn new tasks by selecting the appropriate skill and modifying its parameters for the specific task.

DiffusionPDE: Generative PDE-Solving Under Partial Observation

DiffusionPDE is a general framework for solving partial differential equations (PDEs) using generative diffusion models. This method addresses scenarios where complete scene information is unavailable, a common challenge in real-world measurements that hampers the performance of classical PDE solvers. Traditional approaches struggle with incomplete observations or underlying coefficients, leading to poor performance.

DiffusionPDE overcomes these limitations by simultaneously filling in missing information and solving the PDE by modeling the joint distribution of the solution and coefficient spaces.

CaLMQA: Exploring culturally specific long-form question answering across 23 languages

Large language models are employed for long-form question answering, generating paragraph-length answers to complex questions. While extensively studied in English, LFQA research has not extended to other languages. To address this, researchers have introduced CaLMQA, a dataset of 1.5K culturally specific complex questions across 23 languages, along with 51 culturally agnostic questions translated from English into 22 other languages. Culturally specific questions are those uniquely or more likely to be asked by individuals from the cultures associated with the respective languages.

Interpreting Attention Layer Outputs with Sparse Autoencoders

Decomposing model activations into interpretable components is a significant challenge in mechanistic interpretability. Sparse autoencoders are a popular method for decomposing the internal activations of trained transformers into sparse, interpretable features and have been applied to MLP layers and the residual stream. So the researchers of the paper trained SAEs on attention layer outputs, demonstrating that SAEs can achieve a sparse, interpretable decomposition in this context as well. Our experiments involve transformers from several model families, including models with up to 2 billion parameters.

They conducted a qualitative analysis of the features computed by attention layers, identifying multiple feature families: long-range context, short-range context, and induction features. The study of GPT-2 Small reveals that at least 90% of the attention heads are polysemantic, meaning they have multiple unrelated roles. SAEs prove to be a valuable tool for explaining model behavior in greater detail than previous methods.

Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation

The rise of complex deep learning models has significantly advanced applications like computer vision, leading to their use in real-time systems. The issue is, that deploying these resource-intensive models on low-computational power and low-memory devices, such as embedded and edge devices, poses significant challenges. The study examines the optimization of complex DL models to evaluate their performance on an embedded device, specifically the NVIDIA Jetson Nano. It assesses the optimized models’ effectiveness in terms of inference speed for image classification and video action detection.

Compositional Models for Estimating Causal Effects

This study explores a compositional approach for estimating individual treatment effects in structured systems, where each unit is composed of multiple heterogeneous components. Using a modular architecture, this approach models potential outcomes at the component level and aggregates them to determine unit-level potential outcomes.

The compositional approach offers novel benefits in causal inference, including systematic generalization to estimate counterfactual outcomes for unseen component combinations and improved overlap guarantees between treatment and control groups compared to classical methods. The effectiveness of this approach is demonstrated through novel environments for empirical evaluation, using both simulated and real-world data.

Knowledge Distillation in Automated Annotation: Supervised Text Classification with LLM-Generated Training Labels

Practitioners of computational social science often use human-labeled data to fine-tune supervised text classifiers. This study evaluates the potential for researchers to augment or replace human-generated training data with surrogate training labels from generative large language models. It introduces a recommended workflow and tests this application by replicating 14 classification tasks and measuring performance. Utilizing a novel corpus of English-language text classification datasets from recent high-impact CSS articles, the analysis benefits from password-protected archives that reduce contamination issues.

For each task, the study compares supervised classifiers fine-tuned with GPT-4 labels against those fine-tuned with human annotations, and against labels from GPT-4 and Mistral-7B using few-shot in-context learning. The findings reveal that supervised classification models fine-tuned on LLM-generated labels perform comparably to those fine-tuned with human annotations. This suggests that fine-tuning models with LLM-generated labels can be a fast, efficient, and cost-effective method for building supervised text classifiers.

Learning Dynamic Bayesian Networks from Data: Foundations, First Principles and Numerical Comparisons

This paper provides a comprehensive guide to the foundations of learning Dynamic Bayesian Networks from data composed of multiple trajectory samples over time. It details the formalism for generic DBNs and several common types with specific variable distributions, including the analytical form of the models. The discussion emphasizes the interdependence between structure and weights in a DBN model and their implications for the learning process.

The paper offers a broad overview of learning methods, categorizing them based on key statistical features and their approach to the interplay between learning structure and weights. It explains the analytical form of the likelihood and Bayesian score functions, highlights the differences from static cases, and discusses optimization functions used to enforce structural requirements.

ODSC West 2024 70% off ends soon!

In-Person & Virtual Data Science Conference

October 29th-31st, 2024 — Burlingame, CA

Join us for 300+ hours of expert-led content, featuring hands-on, immersive training sessions, workshops, tutorials, and talks on cutting-edge AI tools and techniques, including our first-ever track devoted to AI Robotics!

REGISTER NOW

Conclusion

What a month! May was great, but June also shined with some amazing machine learning research papers pushing new methods of data annotations, and multi-lingual contextual support for LLMs. And we can expect that July will be just as dynamic, but that’s for the next blog! Now if you’re just trying to keep up on these papers on your own, ODSC can help.

At ODSC Europe and ODSC West, you’ll have the opportunity to directly engage with the latest machine learning research, and those driving these studies through talks, hands-on workshops, networking events, and more.

So why just read each paper when you can experience them at ODSC Europe and West? Virtual and in-person passes are limited so you’ll want to get yours today

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.