Introduction to Reinforcement Learning

Reinforcement is the field of machine learning that involves learning without the involvement of any human interaction as it has an agent that learns how to behave in an environment by performing actions and then learn based upon the outcome of these actions to obtain the required goal that is set by the system two accomplish. Based upon the type of goals, it is classified as Positive and Negative learning methods with their application in the field of Healthcare, Education, Computer Vision, Games, NLP, Transportation, etc.

What is Reinforcement Learning?

Let us try to do the working of reinforcement learning with the help of 2 simple use cases:

Case #1

There is a baby in the family, and she has just started walking, and everyone is quite happy about it. So one day, the parents try to set a goal, let us baby reach the couch, and see if the baby is able to do so.

Result of Case 1: The baby successfully reaches the settee, and thus everyone in the family is very happy to see this. The chosen path now comes with a positive reward.

Points: Reward + (+n) → Positive reward.

Case #2

The baby was not able to reach the couch, and the baby has fallen. It hurts! What possibly could be the reason? There might be some obstacles in the path to the couch, and the baby had fallen to obstacles.

Result of Case 2: The baby falls to some obstacles, and she cries! Oh, that was bad, she learned, not to fall into the trap of obstacle the next time. The chosen path now comes with a negative reward.

Points: Rewards + (-n) →Negative reward.

This now we have seen cases 1 and 2, reinforcement learning, in concept, does the same except for it is not human but instead performed computationally.

Using Reinforcement Stepwise

Let us understand reinforcement learning by bringing a Reinforcement agent in a stepwise manner. In this example, our reinforcement learning agent is Mario, who will learn to play on its own:

The current state of the Mario game environment is S_0. Because the game has not yet started, and Mario is at its place.
Next, the game is started, and Mario moves, the Mario, i.e. RL agent, take action, let’s say A_0.
Now the state of the game environment has become S_1.
Also, the RL agent, i.e. the Mario, is now assigned with some positive reward point, R_1, probably because the Mario is still alive and there wasn’t any danger encountered.

Now the above loop will keep on running until Mario is finally dead or the Mario reaches its destination. This model will continuously output the action, reward, and state.

Maximization Rewards

The goal of reinforcement learning is to maximize rewards by taking into account certain other factors like the rewards discount; we will be explaining shortly what is meant by the discount with the help of an illustration.

The Cumulative Formula for discounted rewards is as:

Discount Rewards

Let us understand this through an example:

In the given figure, the objective is that the game’s mouse has to eat as much cheese before getting eaten by a cat or without being electroshocked.
We can assume that the closer we are to the cat or the electric trap, the more probability we allow for the mouse to get eaten or shocked.
This implies that even if we have the full cheese near the electric shock block or near the cat, the riskier it is to go there; it is better to eat the cheese nearby to avoid any risk.
So even though we have one “block1” of cheese which is full and is far from the cat and the electric shock block and the other one “block2”, which is full as well but is either near to cat or the electric shock block, the later cheese block, i.e. “block2” will be more discounted in rewards than the previous one.

Types of Reinforcement Learning

Below are the two types of reinforcement learning with their advantage and disadvantage:

1. Positive

When the strength and frequency of the behavior are increased due to the occurrence of some particular behavior, it is known as Positive Reinforcement Learning.

Advantage: The performance is maximized, and the change remains for a longer time.
Disadvantage: Results can be diminished if we have too much reinforcement.

2. Negative

It is the strengthening of behavior, mostly because of the negative term vanishes.

Advantage: Behavior is increased.
Disadvantage: Only the minimum behavior of the model can be reached with the help of negative reinforcement learning.

Where Reinforcement Learning Should be Used?

Things that can be done with Reinforcement Learning/Examples.

Following are the areas where Reinforcement learning is used these days:

Healthcare
Education
Games
Computer Vision
Business Management
Robotics
Finance
NLP (Natural language Processing)
Transportation
Energy

Careers in Reinforcement Learning

There is a report from the job site indeed, as RL is a branch of Machine learning; as per the report, Machine Learning is the best job of 2019. Below is the snapshot of the report. According to the current trends, a Machine Learning Engineer comes with a whopping average salary of $146,085 and with a growth rate of 344 percent.

Source: Indeed

Skills for Reinforcement Learning

Below are the skill needed for reinforcement learning:

1. Basic Skills

Probability
Statistics
Data Modeling

2. Programming Skills

Fundamentals of Programming and Computer Science
Design of Software
Able to Apply Machine Learning Libraries and Algorithms

3. Machine Learning Programming Languages

Python
R
Though there are other languages as well where Machine Learning models can be designed, such as Java, C/C++ but Python and R are the most favoured languages used.

Conclusion

In this article, we started with a brief introduction about reinforcement learning, and then we deep-dived into the working of RL and various factors that are involved in the working of RL models. Then we had put some real-world examples to understand them even better about the topic. Thus, by the end of this article, one should understand the working of reinforcement learning.

Quiz Result
Total Questions	Correct Answers	Wrong Answers	Percentage