On my social media posts, you will see a lot of talk about "positive reinforcement", "negative reinforcement", "punishment", "Aversives", "Appetitives" and more. Those terms can get confusing quite quickly. Especially coming from a traditional training background those words can mean absolutely nothing, or they may already carry a meaning that isn't the same as the scientific definition. So today, I want to help clear some of those confusions and share what operant condition is all about.
Operant conditioning is a learning process through which the strength of a behaviour is modified by reinforcement or punishment. This basically means it's a way that humans and other animals modify our own behaviour in accordance to the consequences of those behaviours.
So if you do something, and the outcome is something good, you'll be more likely to it again.
But if you do something and the outcome is unpleasant. You'll be less likely to do it again.
So how does this relate to horse training? Well, we humans can utilize operant conditioning to change our animals' behaviour. We call this "Training". Now again this is nothing new. Behavioural scientists have known this for a long time, but it hasn't really been introduced into the animal training world until more recently. Despite this, many trainers figured out how to use part of operant conditioning to train their animals, they just never had the name for it so they called it things like natural horsemanship, classical training, or put their own brand and name to it. Unfortunately, the horse world only discovered parts of the potential of operant conditioning,
So, what are the 4 quadrants of Operant conditioning?
Positive Reinforcement (R+)
Negative reinforcement (R-)
Positive Punishment (P+)
Negative Punishment (P-)
It is important to understand the words Positive and Negative, and reinforcement and punishment. Positive and negative refers NOT to an emotional meaning, such as 'good' or 'bad'. They are mathematical term instead. So positive = addition, negative = subtraction. Specifically, the addition or subtraction of a stimulus.
Reinforcement refers to the strengthening of behaviour, this could mean increasing the frequency of a behaviour to happen, increasing the speed a behaviour reoccurs, or increase the consistency.
Now Punishment refers to the weakening of a behaviour. Which can mean it decreases the frequency of behaviour to happen, decreases the speed a behaviour reoccurs, or reduces the consistency.
These are the definitions of those terms.
let's talk details:
Negative Reinforcement (R-)
This is the removing of an aversive stimulus (unpleasant) after a behaviour to strengthen the behaviour.
This is the quadrant you all will be most familiar with. This is what is commonly called pressure release
You’ve probably been told before that “horses don’t learn from pressure, they learn from the release of pressure”?
That’s because of negative reinforcement. The removal of an aversive to increase the likelihood of a behaviour to repeat.
So the pressure is aversive, and we take it away so that next time the horse is more likely to do the task again when presented with the aversive. The horse does this because it wants to get rid of that aversive. The horse learns “Oh, if I do this, then that goes away”
An aversive can be anything from a mild annoyance to something painful, but anything the horse does not like.
Examples of Negative Reinforcement:
when we lunge a horse, we apply pressure, something the horse doesn't like, until they walk off in a circle, then we take away that pressure.
when we ask a horse to back up, we tap on their nose, their chest, and wave our hands or lead rope, the moment the horse backs up, we stop. Again we took away that aversive, right after the desired response.
Next time the aversive/pressure is applied the horse will be more likely to repeat that behaviour The chance is he’ll do it even quicker. This is negative reinforcement in action.
We even learn this way ourselves! Like, you’re outside on a windy day, you're cold so you're going to put your new jacket on you just bought. This jacket keeps you warm, you're going to be more likely to put it on again next time it's windy. Or if kids do a good job in school, so the teacher doesn't make them do a quiz.
Positive Reinforcement (R+)
This is the addition of an appetitive stimulus (pleasant) after a behaviour to strengthen the behaviour.
We give the horse something they like after having done the correct behaviour.
Since a behaviour the horse did gave them something desirable, they are more likely to repeat that behaviour, because they want to get more of that good thing. The horse learns “Oh, if I do this, then I get that”
We usually use food to achieve this in training because it is the most effective and easiest reinforcer to use, especially with animals. However, pets, vocal praise, friends, turn out etc. can also be used as positive reinforcers.
Examples of Positive Reinforcement:
We ask the horse to swing their hip towards a target, the horse receives alfalfa pellets when its hip touches the target.
The horse is being asked to perform a piaffe, if the horse performs the movement successfully they receive some scratches on its wither, and/or a treat.
The horse walks into their stall and their daily grain is in the stall ready for them.
In humans this could look like a boss giving you a raise, praising a kid for bringing you their plate from the kitchen, or a child studying hard for a test and getting a good grade.
Positive Punishment (P+)
This is the addition of an aversive stimulus (unpleasant) after a behaviour to weaken the behaviour.
What we usually think of with the word punishment is the Positive punishment quadrant. By applying an aversive stimulus, one the horse doesn't like, after a behaviour the horse is less likely to repeat that behaviour because they associate that behaviour with the following something unpleasant. The horse learns: “Oh, if I do this, then that happens” This quadrant is used surprisingly frequently and often without the awareness that it is positive punishment. Unfortunately, it also has the highest risk of fallout for its use.
Examples of Positive Punishment:
Smacking a horse in response to nibbling,
Whipping the horse for kicking, or bucking
Throwing something at the horse for pawing
Horse being zapped by the electric fence
Horse taking dewormer and it tastes bad.
An example in humans would be yelling at a kid for dropping a plate, or smacking a kid for throwing a tantrum, but can also be a natural consequence, such as reaching into a fire and burning your hand, or eating something yucky.
Negative Punishment (P-)
Negative punishment is the removal of an appetitive stimulus (pleasant) after a behaviour to weaken a behaviour.
This is the quadrant that is least talked about in training. But it is often accidentally used without the trainers’ knowledge, especially in positive reinforcement training. By taking away something the horse likes after a behaviour the horse is less likely to do that behaviour again because they do not want to lose what they like again. The horse learns “Oh, if I do this, then that goes away” When using positive reinforcement, negative punishment can accidentally sneak in, either by removing food or even ending a training session. This is also why we need a clear end session signal that isn't punishing for the horse.
Examples of Negative Punishment
Removing the horses’ food if they paw at it.
Ending a positive training session
Catching a horse and bringing it into a stall (removing the food or freedom of turnout)
An example in humans would be that two kids are fighting over a toy, so you take the toy away, grounding a kid, but also, parking tickets and late fees are negative punishment.
And that is operant conditioning in it’s simplest form! Now of course when we apply operation conditioning in the world and in training, it doesn’t always come in a neat bundle of 4 separate quadrants