This Document outlines my MSc Dissertation Project Proposal, the purpose of which is to reimplement a new model proposed by Peter Dayan and Quentin J. M. Huys that helps reconcile apparent contradictions in the literature regarding the neuromodulator serotonin, and its role in psychiatric disorders,(such as depression) and in normal affective behaviors. After implementing the model, I will evaluate its claims and work on possible extension.
Serotonin [5-HT] is a neurotransmitter that appears to play a critical role in a wealth of psychiatric conditions, including depression, anxiety, panic, and obsessive compulsions. However, despite the importance of serotonergic pharmacotherapies, notably selective serotonin reuptake inhibitors (SSRIs -- the classic antidepressant drugs), the roles that serotonin plays in normal and abnormal function are still mysterious. Recently, Dayan and Huys  proposed a new model for the role of serotonin. They considered a simple model of chains of affectively charged thoughts, and interpret the effect of serotonin in terms of pruning a tree of possible decisions, (i.e., eliminating choices that have low or negative expected outcomes). They show that this model could help reconcile apparent contradictions in the litterature, in particular the fact that inhibition of serotonin reuptake is the first-line treatment of depression, despite the fact that serotonin itself might signal more 'negative' than appetitive outcomes and predictions. In this project, I will reimplement the model of , evaluate its claims and work on simple extensions.
The major or unipolar depression is a serious mental disorder that is an outcome of a combination of genetic, developmental, and environmental factors. It is most characteristic symptoms are depressed mood (sadness) and the inability to experience pleasure (melancholy or anhedonia) [1,2]. Depressive states are also characterized by sleep disorder, change in weight or appetite, fatigue, loss of energy, psychomotor retardation, agitation, difficulty concentrating, indecisiveness, guilt, low self-esteem, recurrent thoughts of death or suicide . According to the National Statistics Psychiatric Morbidity report, conducted in 2001, depression and mixed anxiety is considered as the most common mental disorder in Britain, affecting approximately 9% of the people.
Serotonin5-Hydroxytryptamine (5-HT)is a monoamine neurotransmitter that is primarily found in thegastrointestinal(GI) tract andcentral nervous system (CNS) of humans and animals. Serotonin is created in the body from theamino acid tryptophan and acts as a chemical messenger that transmitsnerve signals between nerve cells. It has various functions, including the regulation of mood, appetite, sleep, muscle contraction, and some cognitive functions including memory and learning.
Since the discovery of the hormone serotonin in the brain and its identification as a neurotransmitter, many hypotheses have emerged, suggesting that alterations in serotonergic neuronal function in the central nervous system may cause major depression . The most concrete evidence of this connection is the decreased concentration of serotonin metabolites like 5-HIAA (5-hydroxyindole acetic acid) in the cerebrospinal fluid and brain tissues of depressed people, found from many studies . So based on that theory, if depression is an outcome of lower than usual levels of serotonin in the brain, pharmaceutical agents that can reverse this effect should contribute to the treat of patients that suffer from depression. Serotonin is activated when released by neurons into the synapse and then terminated by being taken back into the neuron (serotonin reuptake). So antidepressants function at the synapse to enhance serotonin activity, by blocking serotonin reuptake.
One kind of antidepressants used for cases of depression, are the selective serotonin inhibitors (SSRIs). These drugs, as described above, work by altering the function of neurons that release serotonin by blocking the reuptake of serotonin back into the cell. Therefore the level of serotonin activity is increased in any part of the nervous system that uses this neurotransmitter as a chemical signal between cells [6,7]. Such drugs are Citalopram, Fluoxetine, Flovoxamine, Paroxetine and Sertraline.
On the other hand if we consider all the studies conducted to determine the connection between serotonin and depression, we conclude to the fact that depression can not be understood as a simple excess or deficiency of serotonin, as the serotonergic system is much more complex. There are different serotonin receptor subtypes that influence different psychological functions. So the roles that serotonin plays in normal and abnormal functions are still mysterious [9,10]. There are 4 particular findings from the literature that support this fact. First serotonin is involved in the prediction of aversive events, possibly as a form of opponent to dopamine another neurotransmitter which has involvement in control associated with appetitive outcomes [11,15]. Secondly serotonin is involved in behavioral inhibition (a pattern of behavior involving avoidance, withdrawal, and fear of the unfamiliar) preventing actions that would lead to aversive outcomes because of their prediction . Thirdly, studies show that serotonin is connected to depression and that by depleting 5-HT in human subjects who have recovered from depression (by depleting the amino acid tryptophan from which serotonin is produced), can lead to re-experience of subjective symptoms of the disease . Fourthly, while SSRIs, as stated above, are used in the treatment of depression, genetic decreases in the efficiency of serotonin reuptake, may cause a depression . The above findings show the serotonins various roles in normal affective behaviors that are hard to reconcile.
The ultimate goal is to compress all the findings into one unifying 5-HT theory that will be able to explain the apparent contradiction that on the one hand blocking of serotonin reuptake is the first-line treatment of depression but on the other hand serotonin is linked with aversive outcomes and predictions .
Peter Dayan and Quentin J. M. Huys (2008) , suggested that a way to describe these contradicting findings, is by considering the involvement of serotonin in the interaction between pavlovian predictions and action selection, as it was observed in the conditioned suppression, a number of experiments that have proven that animals learn to choose whether or not to emit a certain action by first learning though classical (pavlovian) conditioning. This kind of Pavlovian response to a threat is the so called inhibition that was already described above and can be described by withdrawal or disengagement. In order to explore the consequences of serotonergic inhibition of actions for learning in affective settings, and the effects of serotonin depletion (tryptophan depletion), Peter Dayan and Quentin J. M. Huys built a reinforcement learning model to demonstrate these specific effects. In this project, I will reimplement this model, evaluate its claims and work on simple extensions.
The building, implementation and extensions of a computational model of serotonin and depression such as this, as well as of other computational models of psychiatric disorders in general, can contribute to better understanding of the roles that neuromodulators of the human brain play in psychiatric disorders and in normal affective behaviors, where existing animal and cognitive models seem to be inadequate to do so. So, by using computational methods, neuroscientists, psychologists, and pharmacologists can get a feedback of their findings and research and contribute more effectively in the thorough understanding of the brain function and the prognosis and cure of psychiatric disorders.
In order to show the complex role of serotonin in depression and anxiety and more specifically how it contradicts to its role under normal affective behaviors , Dayan P. and Huys Q. J. M. created a reinforcement learning model which models basic functions of normal serotonin function .
The first function is serotonergic inhibition. As mentioned above serotonin, by helping in the prediction of possible aversive outcomes, prevents certain actions that could lead to such affective states. The second function is the outcome of tryptophan depletion. Tryptophan is a standard amino acid from which serotonin is produced. It has been observed that acute tryptophan depletion in the human organism (by pharmacological or psychiatric reduction in serotonin function) of medicated, formerly depressed patients can increase depressive symptoms. An additional effect of serotonin is the one of so called recall bias, that is depressed patients tend to recall (remember) memories of aversive affective states they have been in. Finally, in contrast to serotonergic inhibition there are the dopaminergically controlled approach responses . These result to the tendency of choice of positive thoughts in order to feel better (result to an appetitive affective state). This is called reward seeking.
More specifically they built a model of trains of thoughts in a form of Markov Decision Process (MDP) , as it can be seen in the figure 1. The states correspond to the belief states of a person and the actions are thoughts that lead from one belief state to another one. The first kind of belief states are the internal ones that are divided to internal positive () and internal negative () belief states. Moreover there is a group of terminal (outcome) states () that yield positive or negative affective values that result from positive or negative emotions respectively. States and are preferentially connected with each other and with the positive and negative outcome states () respectively (red arrows). However there are also links to form positive to negative states and the other way around (black arrows). The model is approximately balanced as a whole, having an equal number of positive and negative states.
The immediate rewards () of each state of the Reinforcement Learning Model correspond to immediate affective values that are zero for the positive and negative internal states , non positive for the negative outcome states , and non negative for the positive outcome states . Moreover each state has a value ( value state, ) which represents the expected reward which is obtained from the state, when a particular policy is followed. Finally each action (thought) a starting from a state (belief state) s, under a policy π, has an expected return or value . The policy in this model is affected by the serotonin factor, . Different serotonin values yield different policies, resulting to different results.
Serotonergic inhibition is modeled by the simplified proposal that serotonin stochastically terminates trains of thoughts when they reach aversive states. The transition probabilities of the MDP are determined by the value of serotonin and the states' values, . Once there is inhibition the thoughts start in a randomly chosen internal state. To show this, the authors change the policy of the learning procedure based on the value of the serotonin factor .In the beginning, to simulate a normal affective control behavior, a basic fixed policy () is used along with dynamic programming techniques , to calculate the values of each state, resulting in trains of thought ending in terminal states () equally often as a function of their actual outcomes. Then the value of serotonin is changed () resulting in a different policy and the new values of each state are calculated using the temporal difference learning rule (TDL) . In this way the new policy results in the negative value states being poorly estimated (due to inhibition, or stop of thoughts according to the values of the transition probabilities), and the more negative the outcome, the less it gets visited during learning (negative consequences are under-explored and over-valued). So we have a reflexive inhibition due to serotonin resulting in a critical bias toward optimistic valuation.
Afterwards, to model the effects of tryptophan depletion caused by pharmacological or psychiatric reduction in serotonin function, the authors calculate the steady-state transition probabilities for a new value of , smaller than the value used to learn the value states under a specific policy, by computing the probability of inhibition of each state. With the new sampling of thoughts, firstly it is observed that in contrast to before there is a less bias against actions that lead to aversive outcomes than before. Moreover, it is observed that although the negative positive outcomes are not affected by changing the inhibition, there is a significant change in the terminations in negative outcome states with the presence of adverse surprises that have to do with transitions from positive internal states to negative outcome states that were previously inhibited.
To model the recall bias often seen in depression, instead of a randomly chosen internal state after inhibition, there can exist a biased resampling toward internal negative states () described by,
In the same way the reward seeking effect that dopamine has in the pavlovian affective control is realized by choosing transitions toward certain states and creating a restriction to the inhibition of trains of thought. This is accomplished by choosing action a in state s according to a softmax .
The reinforcement learning model of serotonin and depression will be implemented using Matlab code. The whole implementation of the model of this Reinforcement Learning System will be realized by thoroughly studying the model presented in Peter Dayan's and Quentin J. M. Huys' paper titled “Serotonin, Inhibition, and Negative Mood”  and using methods discussed in Sutton's and Barto's book “Reinforcement Learning” .
The project goals are the successful reimplementation of the proposed reinforcement learning model for serotonin and depression , the evaluation of its claims and the development of certain extensions of the present model that can describe more serotonin functions.
A successful reimplementation means the results of my model and the proposed model will be the same, having the same observations taken from the figures that describe the effects of the basic functions of serotonin, as described by the model. Evaluation of the paper's claims will be realized by comparing the conclusions of the quantitive results that the model produces to conclusions of previous research in serotonin and depression based on psychopharmacological data, animal models and cognitive models and by searching whether or not the paper based on its solutions can dissolve to an extend the contradicting findings of several studies in serotonin function.
Finally possible extensions of the of the present model include what impact the change of serotonin factor during the learning procedure can have to surprises associated with transitions that previously were inhibited (negative prediction errors) and possible comparison of these findings to similar findings in literature. Other extension could involve the less directly connection between 5-HT depletion and impulsive behavior of subjects with the reduction of discount factor γ for future aversive outcomes as described by Doya K. in . Many other possible extension of the model can result by creating a reinforcement learning systems that models in great detail the Behavioral Inhibition System proposed by Gray 
This section of the informatics research proposal paper describes the timetable of the phases that have to be followed for the completion of my project along with expected completion dates. Some phases may not be strictly consecutive, that means some may overlap to an extend with other phases. The phases and their estimated time of completion are given in the table below. The deadline of the submission of the dissertation is on the 20th of August at 12:00 noon.
Estimated Time of completion
Problem definition and research proposal writing
Model's results evaluation
study of model extensions
Build and evaluation of model extensions