On a structural evolution of ML research ideas

Here I discuss some personal observations on how an idea in ML research is evolved with the goal in mind that I would eventually capture some of the salient structures of how a research idea is evolved through different stages from a naive, seemly trivial intuition to a rigorous piece to a published work. This would have many interesting applications namely to apply and develop my own idea evolution pipeline, and to understand how novelty and knowledge is born.

Some cognitive tasks for generating ML novelty that will be discussed shortly:

Creative application of an idea from one domain to another
Generalization of several relevent methods
Reformulation of several problems
Proposing new problems

1. Applying a technique/idea from one problem domain to another domain

This is a basic form of doing research and perhaps the most general approach because the other approaches (will be presented next) can be more or less thought of as some application of an idea or technique to an problem of interest; it is hardly that an idea or technique come from nowhere. The idea itself is not original and the problem itself might not be original, so where does the novelty come from? The originality might lie at the meta idea of applying the idea to the problem domain and at the way you develop the idea in the problem domain to its fullest. Such an application many times requires non-trivial design choices and novel treatments. Remember that having such an idea is nice but it is for from enough for a publishable work if one stops there; it is a very small step among many other steps required to generate novelty. Here I discuss a proposed pipeline of how to develop an idea from one domain to another. Of course, this should not be a unique pipeline but at least some thing that can be doable.

A good rationale (e.g., motivation) for why this leveraging would make sense and have potential advantage. If possible, provide some simple analytical example where the existing methods fail but leveraging the idea would overcome it is very convincing and powerful.
It is a good sign if when one applies an idea from one domain to another nontrivial design choices and novel treatments are required. At this point, some theorem or preposition would be very rigorous and convincing.
Design a good algorithimc solution and experimental prototype to demonstrate the soundness of the proposed idea.

Some examples:

[1] applies Stein’s operator as control variates to reduce variance in policy gradient class
[2] applies Stein’s variational inference to the maximum entropy policy optimization

2. Reformulation of the existing problems

Reformulating the existing problems usually gives novel perspectives. By reframing a problem into another class of problem, one can leverage the techniques from the new problem class to solve the existing problem. For example, [3] reframes actor-critic into a minimax framework and solve the resuling problem by leveraging techniques from the resulting problem. [4] reframes RL as variational inferences and gives some interesting insights resulted from the reformulation.

3. Generalizing several methods

Connecting different domains through a unifying framewokr has a great benefit of novelty as it brings what’s understood in one domain into solving problems in the other ones. Some examples for this generalization approach:

Connecting smoothed game optimization with GAN and RL [6].
Lift Stein Variational Gradient Descent from Euclidean space to Riemannian manifold [8]
Lift Stein variational inference and black-box variational inference into a unifying framework via gradient flow [9], quite similar in the spirit to the way [10] generalizes GAN, variational inference and policy gradient via a unifying framework called probability functional descent.

4. Proposing new problems

Proposing new important problems and solving it by probably leveraging existing techniques.
Provide a rationale for the new problems, e.g., why are they important and worth solving?
Sometimes, the solution to the new problem might be not difficult but as long as the problem itself is novel, this idea is also important.
Sometimes, there are also the unique non-trival problems emerged along the way of solving the original problem. This would be interesting.
And again, algorithmic and experimental design for the proposed solutions for the problem are always important for empirical research. (not completed yet)

5. Some other practical observations

Intuition (or thought experiment) → simple analytical example → Scale the idea in practice
Figure 1 is often used to tell everything about the main intuition (aka the mental experiment) of a paper. Quick check: A good figure 1 is the one that if you read only the Figure 1 and its caption, you still have an idea of the main thing in this paper. A really nice exemplar for this is [5]. While the main contribution of [5] is a technical method for preventing catastrophic forgetting, Figure 1 of this paper presents only the mental experiment that requires not much expertise to understand.
Some papers have very simple final derivation which is very intuitive (and looks as if a heuristic), e.g. [5,7]. Despite this, there should be good evidence for this final derivation: (1) Pre-evidence: The mental experiment motivation, simple analytical example and mathematical backup that lead to the final derivation; (2) Post-evidence: Some good experimental prototype (synthetic or scaled one) to prove why the final derivation works. Without the pre-evidence, the idea would look like very much arbitrary and heuristic. Without the post-evidence, the idea becomes useless.

Three pillars for a good contribution: theoretical, algorithmic, and empirical. Yes, I feel RL gives a good playground for all these pillars; it also connects classic problems (e.g., bandit) to modern ones (e.g., deep reinforcement learning). A good contribution needs not be significant equally in all these aspects. Some looks like a natural, incremental modification (simple in algorithmic approach) but can demonstrate it theoretical and/or empirical significance (e.g., this one). Some start with empirical comparitive study to figure out some emprical questions (e.g., this one). RL is also a good representive problem for general AI in which we aim at learning to solve many general intelligence tasks using little domain knowledge. That said, if there is AGI (Artificial General Intelligence), or at least if we want to make existing learning algorithms sufficiently more general that they are today, RL is a good testbed for such gold. Just a little remark on it: causality and reasoning are two of the important ingredients (or subgoals) in this path.

6. Claim structure

I here provide some example expressions used to make a claim for research idea in ML. These expressions are interesting because they can reflect how an idea is evolved from the perspective of authors.

We leverage this perspective/viewpoint/inisight to shed a light on some theoretical aspects of an algorithm such as the convergence properties and the asymtotic convergence.
The insight/analysis leads us to some design choices for an algorithm.
We address the problem of the problem of interest.
We demonstrate the effectiveness of our proposed methods in various experimental settings.

7. Troubling trends in machine learning scholarship

Disclaimer: This section is summarized (many times using the same wording) from [11]. This section is on progress.

Determining which knowledge warrants inquiry is subjective
Most valueable papers to community: act in service of the readers by creating foundational knowledge and communicating as clearly as possible
Desirable characteristics of most valuable papers:
- Provide intuition to aid the reader’s understanding
- But distinguish between intuition and stronger conclusions supported by evidence
- Describe empirical investigations that consider and rule out alternative hypothesis
- Make clear the relationship beween theoretical analysis and intuitive or empirical claims
- Use language to empower the reader, choosing terminology to avoid misleading or unproven connotations, collisions with other definitions, or conflaction with other related but distinct concepts
Some troubling trends that departs from the desirable characteristics described above:
- Fail to distinguish between explanation and speculation
- Fail to identify the source of empirical gains
- Mathiness: Use mathematics to obsfuscate or impress rather than clarify
- Misuse of language: e.g., overloading established technical terms.
Possible causes: the often-misaligned incentives between scholarship and short-term measures of success (e.g., bibliometrics, attention, and entrepreneurial opportunity)
How to combat these troubling trends:
- Communicate more precise information with greater clarity -> accelerate research, reducing the on-boarding time for new researchers, and play a more constructive role in the public discourse.
- Science is self-discipline: we need to criticize ourselves
- Promoting clear scientific thinking and communication can sustain the trust and investment we currently have.
Explanation vs speculation:
- New research areas often require intuitions for exploration before they have formal representations
- Speculation: to communicate intuitions that might not yet be scientifically scrutinized
- Avoid speculation disguised as explanations
- Separate speculation from explanation by e.g. putting it in “Motivation” section where we can freely express informal ideas
- Carefully convey uncertainty by saying e.g. something can be challenged and is not formally validated.

8. Paradiagm shift

This is a summary of “The Structure of Scientific Revolutions” by Thomas S. Kuhn.

Science cannot operate without some paradigm.
A paradigm can be thought of as a set of beliefs and assumptions based on which research questions are created and new knowledge is established.

Normal science exploits a paradiagm.

When a paradigm encounters anomaly (in the form of novelties of fact or theory), paradigm shift can (gradually) takes place.

As a scientist, how to deal with anomaly when practicing some paradigm?
- Those who pause and examine every anomaly would not get much done because an anomaly is more than just another puzzle in the current paradigm; it requires more effort.
- When an anomaly becomes more recognized and serious, people start to look at it. They isolate the anomaly and try to understand it by giving it a structure. They try to extend the current paradigm to explain for the anomaly.
- If the current paradigm cannot be modified to handle the anomaly, a new paradigm must arise to account for the anomaly. In the meantime, you can choose to keep exploiting the current paradigm and ignore the anomaly for other people, or you can choose to exploring new paradigms to account for the anomaly.

References

Action-dependent control variates for policy gradient via Stein’s Identity
Stein Variational Policy Gradient
Boosting the actor with dual critic
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review
Continual Learning Through Synaptic Intelligence
Smooth games optimization and machine learning workshop
Proximal Policy Optimization Algorithms
Riemann Stein Variational Gradient Descent
The equivalence between Stein variational gradient descent and black-box variational inference
Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning
Troubling Trends in Machine Learning Scholarship