# Publications

## 2021

• 2021 Volodymyr Tkachuk, Sriram Ganapathi Subramanian, and Matthew Taylor. 2021. “The Effect of Q-Function Reuse on the Total Regret of Tabular, Model-Free, Reinforcement Learning.” Adaptive Learning Agents Workshop (ALA) - International Conference on Autonomous Agents and Multi Agent Systems. Accepted as a long talk in ALA

##### BibTex Citation
@misc{VladEffectof2021,
title = {The Effect of Q-function Reuse on the Total Regret of Tabular, Model-Free, Reinforcement Learning},
author = {Tkachuk, Volodymyr and Ganapathi Subramanian, Sriram and Taylor, Matthew},
booktitle = {Adaptive Learning agents Workshop (ALA) - International Conference on Autonomous Agents and Multi Agent Systems},
year = {2021}
}

• 2021 Yaodong Yang, Jun Luo, Ying Wen, Oliver Slumbers, Daniel Graves, Haitham Bou Ammar, Jun Wang, and Matthew E. Taylor. 2021. “Diverse Auto-Curriculum Is Critical for Successful Real-World Multiagent Learning Systems.” In Proceedings of the International Conference on Autonomous Agents and Multi Agent Systems (AAMAS-21), edited by U. Endriss, A. Nowé, F. Dignum, and A. Lomuscio. IFAAMAS. 28% acceptance rate in the Blue Sky Track

##### BibTex Citation
@inproceedings{AAMAS21-BlueSky,
title = {Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems},
author = {Yang, Yaodong and Luo, Jun and Wen, Ying and Slumbers, Oliver and Graves, Daniel and {Bou Ammar}, Haitham and Wang, Jun and E.\ Taylor, Matthew},
booktitle = {Proceedings of the International Conference on Autonomous Agents and Multi Agent Systems {(AAMAS-21)}},
year = {2021},
editor = {Endriss, U. and Nowé, A. and Dignum, F. and Lomuscio, A.},
month = {3--7 May},
publisher = {IFAAMAS}
}

• 2021 Sai Krishna Gottipati, Yashaswi Pathak, Boris Sattarov, Sahir, Rohan Nuttall, Mohammad Amini, Matthew E. Taylor, and Sarath Chandar. 2021. “TAC: Towered Actor Critic for Handling Multiple Action Types in ReinforcementLearning for Drug Discovery.” In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21 2021). AAAI Press. https://www.amii.ca/latest-from-amii/tac-towered-actor-critic-handling-multiple-action-types-reinforcement-learning-drug-discovery/. 21% Acceptance Rate

##### BibTex Citation
@inproceedings{TAC2021,
author = {Gottipati, Sai Krishna and Pathak, Yashaswi and Sattarov, Boris and Sahir and Nuttall, Rohan and Amini, Mohammad and E.\ Taylor, Matthew and Chandar, Sarath},
title = {TAC: Towered Actor Critic for Handling Multiple Action Types in ReinforcementLearning for Drug Discovery},
booktitle = {Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence {(AAAI-21 2021)}},
publisher = {{AAAI} Press},
year = {2021},
url = {https://www.amii.ca/latest-from-amii/tac-towered-actor-critic-handling-multiple-action-types-reinforcement-learning-drug-discovery/}
}

• 2021 Sriram Ganapathi Subramanian, Matthew E. Taylor, Mark Crowley, and Pascal Poupart. 2021. “Partially Observable Mean Field Reinforcement Learning.” In Proceedings of the International Conference on Autonomous Agents and Multi Agent Systems (AAMAS 2021), edited by U. Endriss, A. Nowé, F. Dignum, and A. Lomuscio. London, United Kingdom: IFAAMAS. 25% acceptance rate

##### BibTex Citation
@inproceedings{Srirampomfrl2021,
title = {Partially Observable Mean Field Reinforcement Learning},
author = {Subramanian, Sriram Ganapathi and Matthew~E.\ Taylor and Crowley, Mark and Poupart, Pascal},
booktitle = {Proceedings of the International Conference on Autonomous Agents and Multi Agent Systems (AAMAS 2021)},
year = {2021},
editor = {Endriss, U. and Nowé, A. and Dignum, F. and Lomuscio, A.},
address = {London, United Kingdom},
month = {3--7 May},
publisher = {IFAAMAS}
}

• 2021 Katheryn Hume, and Matthew E. Taylor. April 2021. “Why AI That Teaches Itself to Achieve a Goal Is the Next Bit Thing.” Harvard Business Review. https://hbr.org/2021/04/why-ai-that-teaches-itself-to-achieve-a-goal-is-the-next-big-thing.

##### BibTex Citation
@other{taylor_hbr,
author = {Hume, Katheryn and Taylor, Matthew E.},
journal = {Harvard Business Review},
month = apr,
title = {Why AI That Teaches Itself to Achieve a Goal Is the Next Bit Thing},
year = {2021},
url = {https://hbr.org/2021/04/why-ai-that-teaches-itself-to-achieve-a-goal-is-the-next-big-thing},
month_numeric = {4}
}

• 2021 Nikunj Gupta, G Srinivasaraghavan, Swarup Kumar Mohalik, Nishant Kumar, and Matthew E Taylor. May 2021. “HAMMER: Multi-Level Coordination of Reinforcement Learning Agents via Learned Messaging.” Proceedings of the Adaptive and Learning Agents Workshop at AAMAS.

##### BibTex Citation
@misc{guptahammer,
title = {HAMMER: Multi-Level Coordination of Reinforcement Learning Agents via Learned Messaging},
author = {Gupta, Nikunj and Srinivasaraghavan, G and Mohalik, Swarup Kumar and Kumar, Nishant and Taylor, Matthew E},
booktitle = {Proceedings of the Adaptive and Learning Agents Workshop at AAMAS},
month = may,
year = {2021},
month_numeric = {5}
}

• 2021 Calarina Muslimani, Kerrick Johnstonbaugh, and Matthew E. Taylor. May 2021. “Work-in-Progress: Comparing Feedback Distributions in Limited Teacher-Student Settings.” Proceedings of the Adaptive and Learning Agents Workshop at AAMAS.

##### BibTex Citation
@misc{2021ala-muslimani,
author = {Muslimani, Calarina and Johnstonbaugh, Kerrick and Taylor, Matthew E.},
title = {Work-in-progress: Comparing Feedback Distributions in Limited Teacher-Student Settings},
booktitle = {Proceedings of the Adaptive and Learning Agents Workshop at AAMAS},
month = may,
year = {2021},
month_numeric = {5}
}

• 2021 Yunshu Du, Garrett Warnell, Assefaw Gebremedhin, Peter Stone, and Matthew E. Taylor. May 2021. “Lucid Dreaming for Experience Replay: Refreshing Past States with the Current Policy.” Neural Computing and Applications, May. DOI: 10.1007/s00521-021-06104-5.

##### Abstract

Experience replay (ER) improves the data efficiency of off-policy reinforcement learning (RL) algorithms by allowing an agent to store and reuse its past experiences in a replay buffer. While many techniques have been proposed to enhance ER by biasing how experiences are sampled from the buffer, thus far they have not considered strategies for refreshing experiences inside the buffer. In this work, we introduce L uc i d D reaming for E xperience R eplay (LiDER), a conceptually new framework that allows replay experiences to be refreshed by leveraging the agent’s current policy. LiDER consists of three steps: First, LiDER moves an agent back to a past state. Second, from that state, LiDER then lets the agent execute a sequence of actions by following its current policy—as if the agent were “dreaming” about the past and can try out different behaviors to encounter new experiences in the dream. Third, LiDER stores and reuses the new experience if it turned out better than what the agent previously experienced, i.e., to refresh its memories. LiDER is designed to be easily incorporated into off-policy, multi-worker RL algorithms that use ER; we present in this work a case study of applying LiDER to an actor–critic-based algorithm. Results show LiDER consistently improves performance over the baseline in six Atari 2600 games. Our open-source implementation of LiDER and the data used to generate all plots in this work are available at https://github.com/duyunshu/lucid-dreaming-for-exp-replay.

##### BibTex Citation
@article{DuLiDER2021,
author = {Du, Yunshu and Warnell, Garrett and Gebremedhin, Assefaw and Stone, Peter and Taylor, Matthew E.},
title = {Lucid dreaming for experience replay: refreshing past states with the current policy},
journal = {Neural Computing and Applications},
year = {2021},
month = may,
day = {25},
issn = {1433-3058},
doi = {10.1007/s00521-021-06104-5},
url = {https://doi.org/10.1007/s00521-021-06104-5},
month_numeric = {5}
}


## 2020

• 2020 Yang Hu, Diane J. Cook, and Matthew E. Taylor. 2020. “Study of Effectiveness of Prior Knowledge for Smart Home Kit Installation.” Sensors 20 (6145). https://www.mdpi.com/1424-8220/20/21/6145.

##### BibTex Citation
@article{2020Sensors,
author = {Hu, Yang and J.~Cook, Diane and E.~Taylor, Matthew},
title = {Study of Effectiveness of Prior Knowledge for Smart Home Kit Installation},
journal = {Sensors},
year = {2020},
volume = {20},
number = {6145},
url = {https://www.mdpi.com/1424-8220/20/21/6145}
}

• 2020 Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E. Taylor, and Peter Stone. 2020. “Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey.” Journal of Machine Learning Research 21 (181): 1–50. http://jmlr.org/papers/v21/20-212.html.

##### BibTex Citation
@article{2020JMLR,
author = {Narvekar, Sanmit and Peng, Bei and Leonetti, Matteo and Sinapov, Jivko and Taylor, Matthew E. and Stone, Peter},
title = {Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey},
journal = {Journal of Machine Learning Research},
year = {2020},
volume = {21},
number = {181},
pages = {1-50},
url = {http://jmlr.org/papers/v21/20-212.html}
}

• 2020 Yang Huang, Rachel Min Wong, Olusola Adesope, and Matthew E. Taylor. 2020. “Effects of a Computer-Based Learning Environment That Teaches Older Adults How to Install a Smart Home System.” Computers and Education 149. DOI: 10.1016/j.compedu.2020.103816.

##### Abstract

In this study, we examined the teaching effectiveness of three strategies (trial-and-error, textbook, and combination) via a computer-based learning environment (CBLE) that teaches smart-home installation (SHiB). One hundred and twenty-five participants were randomly assigned to one of the strategies and tested with SHiB CBLE. Findings revealed that participants in the combination condition performed significantly better than those in the textbook (control) group with medium effect-size (g=0.70). Senior participants in the trial-and-error group performed significantly better than those in the control condition with large effect-size (g=0.89). Younger participants in the combination condition performed significantly better than those in the control condition with medium effect-size (g=0.70). Results suggest that the teaching strategies had differential effects due to age groups.

##### BibTex Citation
@article{2020yang,
title = {Effects of a Computer-Based Learning Environment That Teaches Older Adults How to Install a Smart Home System},
author = {Huang, Yang and Wong, Rachel Min and Adesope, Olusola and Taylor, Matthew E.},
journal = {Computers and Education},
volume = {149},
year = {2020},
doi = {10.1016/j.compedu.2020.103816}
}

• 2020 Matthew E. Taylor, Yang Yu, Edith Elkind, and Yang Gao, eds. 2020. Distributed Artificial Intelligence - Second International Conference, DAI 2020, Nanjing, China, October 24-27, 2020, Proceedings. Vol. 12547. Lecture Notes in Computer Science. Springer. DOI: 10.1007/978-3-030-64096-5.

##### BibTex Citation
@book{DBLP:conf/dai2/2020,
editor = {Taylor, Matthew E. and Yu, Yang and Elkind, Edith and Gao, Yang},
title = {Distributed Artificial Intelligence - Second International Conference,
{DAI} 2020, Nanjing, China, October 24-27, 2020, Proceedings},
series = {Lecture Notes in Computer Science},
volume = {12547},
publisher = {Springer},
year = {2020},
url = {https://doi.org/10.1007/978-3-030-64096-5},
doi = {10.1007/978-3-030-64096-5},
isbn = {978-3-030-64095-8}
}

• 2020 Felipe Leno Da Silva, Pablo Hernandez-Leal, Bilal Kartal, and Matthew E. Taylor. January 2020. “Uncertainty-Aware Action Advising for Deep Reinforcement Learning Agents.” In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20). 21% acceptance rate

##### BibTex Citation
@inproceedings{da2020uncertainty,
title = {Uncertainty-Aware Action Advising for Deep Reinforcement Learning Agents},
author = {Da Silva, Felipe Leno and Hernandez-Leal, Pablo and Kartal, Bilal and Taylor, Matthew E.},
booktitle = {Thirty-Fourth AAAI Conference on Artificial Intelligence {(AAAI-20)}},
month = jan,
year = {2020},
month_numeric = {1}
}

• 2020 Behzad Ghazanfari, Fatemeh Afghah, and Matthew E. Taylor. January 2020. “Sequential Association Rule Mining for Autonomously Extracting Hierarchical Task Structures in Reinforcement Learning.” IEEE Access, January. DOI: 10.1109/ACCESS.2020.2965930.

##### BibTex Citation
@article{2020behzad,
title = {Sequential Association Rule Mining for Autonomously Extracting Hierarchical Task Structures in Reinforcement Learning},
author = {Ghazanfari, Behzad and Afghah, Fatemeh and Taylor, Matthew E.},
journal = {IEEE Access},
year = {2020},
month = jan,
doi = {10.1109/ACCESS.2020.2965930},
month_numeric = {1}
}

• 2020 Paniz Behboudian, Yash Satsangi, Matthew E. Taylor, Anna Harutyunyan, and Michael Bowling. May 2020. “Useful Policy Invariant Shaping from Arbitrary Advice.” Proceedings of the Adaptive and Learning Agents Workshop at AAMAS.

##### BibTex Citation
@misc{2020ala-paniz,
author = {Behboudian, Paniz and Satsangi, Yash and Taylor, Matthew E. and Harutyunyan, Anna and Bowling, Michael},
title = {Useful Policy Invariant Shaping from Arbitrary Advice},
booktitle = {Proceedings of the Adaptive and Learning Agents Workshop at AAMAS},
month = may,
year = {2020},
month_numeric = {5}
}

• 2020 Yunshu Du, Garrett Warnell, Assefaw Gebremedhin, Peter Stone, and Matthew E. Taylor. May 2020. “Work-in-Progress: Corrected Self Imitation Learning via Demonstrations.” Proceedings of the Adaptive and Learning Agents Workshop at AAMAS.

##### BibTex Citation
@misc{2020ala-yunshu,
author = {Du, Yunshu and Warnell, Garrett and Gebremedhin, Assefaw and Stone, Peter and Taylor, Matthew E.},
title = {Work-in-progress: Corrected Self Imitation Learning via Demonstrations},
booktitle = {Proceedings of the Adaptive and Learning Agents Workshop at AAMAS},
month = may,
year = {2020},
month_numeric = {5}
}

• 2020 Craig Sherstan, Bilal Kartal, Pablo Hernandez-Leal, and Matthew E. Taylor. May 2020. “Work in Progress: Temporally Extended Auxiliary Tasks.” Proceedings of the Adaptive and Learning Agents Workshop at AAMAS. Work done while at Borealis AI

##### BibTex Citation
@misc{2020ala-craig,
author = {Sherstan, Craig and Kartal, Bilal and Hernandez-Leal, Pablo and Taylor, Matthew E.},
title = {Work in Progress: Temporally Extended Auxiliary Tasks},
booktitle = {Proceedings of the Adaptive and Learning Agents Workshop at AAMAS},
month = may,
year = {2020},
month_numeric = {5}
}

• 2020 Pablo Hernandez-Leal, Bilal Kartal, and Matthew E. Taylor. May 2020. “A Very Condensed Survey and Critique of Multiagent Deep Reinforcement Learning.” In Proc. of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020). Auckland, New Zealand.

##### BibTex Citation
@inproceedings{hern2020multiagent,
title = {{A Very Condensed Survey and Critique of Multiagent Deep Reinforcement Learning}},
author = {Hernandez-Leal, Pablo and Kartal, Bilal and Taylor, Matthew E.},
year = {2020},
month = may,
address = {Auckland, New Zealand},
booktitle = {Proc. of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020)},
month_numeric = {5}
}

• 2020 Sriram Ganapathi Subramanian, Pascal Poupart, Matthew E. Taylor, and Nidhi Hegde. May 2020. “Multi Type Mean Field Reinforcement Learning.” In Proceedings of the 19th International Conference on Autonomous Agents and Multi-Agent Systems. 23% acceptance rate

##### BibTex Citation
@inproceedings{aamas20-sriram,
title = {Multi Type Mean Field Reinforcement Learning},
author = {Subramanian, Sriram Ganapathi and Poupart, Pascal and Taylor, Matthew E. and Hegde, Nidhi},
booktitle = {Proceedings of the 19th International Conference on Autonomous Agents and Multi-Agent Systems},
month = may,
year = {2020},
month_numeric = {5}
}

• 2020 Sai Krishna Gottipati, Yashaswi Pathak, Rohan Nuttall, Sahir, Raviteja Chunduru, Ahmed Touati, Sriram Ganapathi Subramanian, Matthew E. Taylor, and Sarath Chandar. December 2020. “Maximum Reward Formulation In Reinforcement Learning.” Proceedings of the 6th Annual NeurIPS Deep Learning Workshop.

##### BibTex Citation
@misc{gottipati2020maximum,
title = {Maximum Reward Formulation In Reinforcement Learning},
author = {Gottipati, Sai Krishna and Pathak, Yashaswi and Nuttall, Rohan and Sahir and Chunduru, Raviteja and Touati, Ahmed and Subramanian, Sriram Ganapathi and Taylor, Matthew E. and Chandar, Sarath},
booktitle = {Proceedings of the 6th Annual {NeurIPS} Deep Learning Workshop},
month = dec,
year = {2020},
month_numeric = {12}
}


## 2019

• 2019 Nathan Douglas, Dianna Yim, Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor, and Frank Maurer. 2019. “Towers of Saliency: A Reinforcement Learning Visualization Using Immersive Environments.” In Proceedings of the 2019 ACM International Conference on Interactive Surfaces and Spaces. ISS ’19. New York, NY, USA: ACM. DOI: 10.1145/3343055.3360747.

##### BibTex Citation
@inproceedings{douglas2019,
author = {Douglas, Nathan and Yim, Dianna and Kartal, Bilal and Hernandez-Leal, Pablo and Taylor, Matthew E. and Maurer, Frank},
title = {Towers of Saliency: A Reinforcement Learning Visualization Using Immersive Environments},
booktitle = {Proceedings of the 2019 ACM International Conference on Interactive Surfaces and Spaces},
series = {ISS '19},
year = {2019},
isbn = {978-1-4503-6891-9/19/11},
location = {Daejon, Republic of Korea},
numpages = {4},
url = {http://doi.acm.org/10.1145/3343055.3360747},
doi = {10.1145/3343055.3360747},
publisher = {ACM},
address = {New York, NY, USA}
}

• 2019 Chao Gao, Pablo Hernandez-Leal, Bilal Kartal, and Matthew E. Taylor. 2019. “Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition.” In The Multi-Disciplinary Conference on Reinforcement Learning and Decision Making ( RLDM ).

##### BibTex Citation
@inproceedings{2019rldm-chao,
author = {Gao, Chao and Hernandez-Leal, Pablo and Kartal, Bilal and Taylor, Matthew E.},
title = {Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition},
booktitle = {The Multi-disciplinary Conference on Reinforcement Learning and Decision Making ( {RLDM} )},
year = {2019}
}

• 2019 Pablo Hernandez-Leal, Bilal Kartal, and Matthew E. Taylor. 2019. “Opponent Modeling with Actor-Critic Methods in Deep Reinforcement Learning.” In The Multi-Disciplinary Conference on Reinforcement Learning and Decision Making ( RLDM ).

##### BibTex Citation
@inproceedings{2019rldm-pablo,
author = {Hernandez-Leal, Pablo and Kartal, Bilal and Taylor, Matthew E.},
title = {Opponent Modeling with Actor-Critic Methods in Deep Reinforcement Learning},
booktitle = {The Multi-disciplinary Conference on Reinforcement Learning and Decision Making ( {RLDM} )},
year = {2019}
}

• 2019 Bilal Kartal, Pablo Hernandez-Leal, and Matthew E. Taylor. 2019. “Predicting When to Expect Terminal States Improves Deep RL.” In The Multi-Disciplinary Conference on Reinforcement Learning and Decision Making ( RLDM ).

##### BibTex Citation
@inproceedings{2019rldm-bilal,
author = {Kartal, Bilal and Hernandez-Leal, Pablo and Taylor, Matthew E.},
title = {Predicting When to Expect Terminal States Improves Deep RL},
booktitle = {The Multi-disciplinary Conference on Reinforcement Learning and Decision Making ( {RLDM} )},
year = {2019}
}

• 2019 Sriram Ganapathi Subramanian, Pascal Poupart, Matthew E. Taylor, and Nidhi Hegde. 2019. “Multi Type Mean Field Reinforcement Learning.” In The Multi-Disciplinary Conference on Reinforcement Learning and Decision Making ( RLDM ).

##### BibTex Citation
@inproceedings{2019rldm-sriram,
author = {Subramanian, Sriram Ganapathi and Poupart, Pascal and Taylor, Matthew E. and Hegde, Nidhi},
title = {Multi Type Mean Field Reinforcement Learning},
booktitle = {The Multi-disciplinary Conference on Reinforcement Learning and Decision Making ( {RLDM} )},
year = {2019}
}

• 2019 Zhaodong Wang, and Matthew E. Taylor. 2019. “Interactive Reinforcement Learning with Dynamic Reuse of Prior Knowledge from Human and Agent Demonstrations.” In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI). 18% acceptance rate

##### Abstract

Reinforcement learning has enjoyed multiple impressive successes in recent years. However, these successes typically require very large amounts of data before an agent achieves acceptable performance. This paper focuses on a novel way of combating such requirements by leveraging existing (human or agent) knowledge. In particular, this paper leverages demonstrations, allowing an agent to quickly achieve high performance.

This paper introduces the Dynamic Reuse of Prior (DRoP) algorithm, which combines the offline knowledge (demonstrations recorded before learning) with an online confidence-based performance analysis. DRoP leverages the demonstrator’s knowledge by automatically balancing between reusing the prior knowledge and the current learned policy, allowing the agent to outperform the original demonstrations. We compare with multiple state-of-the-art learning algorithms and empirically show that DRoP can achieve superior performance in two domains. Additionally, we show that this confidence measure can be used to selectively request additional demonstrations, significantly improving the learning performance of the agent.

##### BibTex Citation
@inproceedings{2019ijcai-wang,
author = {Wang, Zhaodong and Taylor, Matthew E.},
title = {Interactive Reinforcement Learning with Dynamic Reuse of Prior Knowledge from Human and Agent Demonstrations},
booktitle = {Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI)},
year = {2019}
}

• 2019 Kenny Young, Baoxiang Wang, and Matthew E. Taylor. 2019. “Metatrace: Online Step-Size Tuning by Meta-Gradient Descent for Reinforcement Learning Control.” In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI). 18% acceptance rate

##### Abstract

Reinforcement learning (RL) has had many successes, but significant hyperparameter tuning is commonly required to achieve good performance. Furthermore, when nonlinear function approximation is used, non-stationarity in the state representation can lead to learning instability. A variety of techniques exist to combat this - most notably experience replay or the use of parallel actors. These techniques stabilize learning by making the RL problem more similar to the supervised setting. However, they come at the cost of moving away from the RL problem as it is typically formulated, that is, a single agent learning online without maintaining a large database of training examples.

To address these issues, we propose Metatrace, a meta-gradient descent based algorithm to tune the step-size online. Metatrace leverages the structure of eligibility traces, and works for both tuning a scalar step-size and a respective step-size for each parameter. We empirically evaluate Metatrace for actor-critic on the Arcade Learning Environment. Results show Metatrace can speed up learning, and improve performance in non-stationary settings.

##### BibTex Citation
@inproceedings{young2019metatrace,
title = {Metatrace: Online Step-Size Tuning by Meta-Gradient Descent for Reinforcement Learning Control},
author = {Young, Kenny and Wang, Baoxiang and Taylor, Matthew E.},
booktitle = {Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI)},
year = {2019}
}

• 2019 Bikramjit Banerjee, Syamala Vittanala, and Matthew E. Taylor. 2019. “Team Learning from Human Demonstration with Coordination Confidence.” The Knowledge Engineering Review 34.

##### Abstract

Among an array of techniques proposed to speed-up reinforcement learning (RL), learning from human demonstration has a proven record of success. A related technique, called Human-Agent Transfer, and its confidence-based derivatives have been successfully applied to single-agent RL. This article investigates their application to collaborative multi-agent RL problems. We show that a first-cut extension may leave room for improvement in some domains, and propose a new algorithm called coordination confidence (CC). CC analyzes the difference in perspectives between a human demonstrator (global view) and the learning agents (local view) and informs the agents’ action choices when the difference is critical and simply following the human demonstration can lead to miscoordination. We conduct experiments in three domains to investigate the performance of CC in comparison with relevant baselines.

##### BibTex Citation
@article{2019ker,
title = {Team learning from human demonstration with coordination confidence},
author = {Banerjee, Bikramjit and Vittanala, Syamala and Taylor, Matthew E.},
journal = {The Knowledge Engineering Review},
year = {2019},
volume = {34}
}

• 2019 Sarah Morton, Julie Kmec, and Matthew E. Taylor. 2019. “It’s What You Call It: Gendered Framing and Women’s and Men’s Interest in a Robotics Instruction Task.” International Journal of Gender, Science and Technology 11 (2). Accepted

##### BibTex Citation
@article{2019sociology,
title = {It’s What You Call It: Gendered Framing and Women’s and Men’s Interest in a Robotics Instruction Task},
author = {Morton, Sarah and Kmec, Julie and Taylor, Matthew E.},
journal = {International Journal of Gender, Science and Technology},
volume = {11},
number = {2},
year = {2019}
}

• 2019 Gabriel V. de la Cruz Jr., Yunshu Du, and Matthew E. Taylor. 2019. “Pre-Training with Non-Expert Human Demonstration for Deep Reinforcement Learning.” The Knowledge Engineering Review 34. DOI: 10.1017/S0269888919000055.

##### Abstract

Deep reinforcement learning (deep RL) has achieved superior performance in complex sequential tasks by using deep neural networks as function approximators to learn directly from raw input images. However, learning directly from raw images is data inefficient. The agent must learn feature representation of complex states in addition to learning a policy. As a result, deep RL typically suffers from slow learning speeds and often requires a prohibitively large amount of training time and data to reach reasonable performance, making it inapplicable to real-world settings where data are expensive. In this work, we improve data efficiency in deep RL by addressing one of the two learning goals, feature learning. We leverage supervised learning to pre-train on a small set of non-expert human demonstrations and empirically evaluate our approach using the asynchronous advantage actor-critic algorithms in the Atari domain. Our results show significant improvements in learning speed, even when the provided demonstration is noisy and of low quality.

##### BibTex Citation
@article{gabe_du_taylor_2019,
title = {Pre-training with non-expert human demonstration for deep reinforcement learning},
volume = {34},
doi = {10.1017/S0269888919000055},
journal = {The Knowledge Engineering Review},
publisher = {Cambridge University Press},
author = {de la Cruz Jr., Gabriel V. and Du, Yunshu and Taylor, Matthew E.},
year = {2019}
}

• 2019 Santosh Bhusal, Kapil Khanal, Shivam Goel, Manoj Karkee, and Matthew E. Taylor. 2019. “Bird Deterrence in a Vineyard Using an Unmanned Aerial System (UAS).” Transactions of the ASABE 62 (2): 561–69. DOI: 10.13031/trans.12923.

##### Abstract

Washington State growers lose more than \$80 million annually to bird damage in fruit crops such as cherries, grapes, Honeycrisp apples, and blueberries. Conventional bird deterrence techniques, such as netting, auditory devices, visual devices, chemical application, falconry, and shooting, are either costly, ineffective, or harmful to birds. At the same time, unmanned aerial systems (UAS) have become popular in military, civilian, and agricultural applications due to decreasing cost, good maneuverability, and their ability to perform multiple types of missions. This article presents an approach using UAS to deter birds and minimize their damage to wine grapes. A quadcopter UAS was flown for three days in September 2016 over a section (30 m x 30 m) of a vineyard to deter birds. The test section of the vineyard was next to a canyon with many trees that provided shelter for a large number of birds. The experimental design included different deterrence methods against birds, including auditory deterrence, visual deterrence, and varying UAS flight patterns. The test section of the vineyard was under continuous video surveillance from 7:00 to 9:00 a.m. using four GoPro cameras for five continuous days, including three days when the UAS was flown. A Gaussian mixture model-based motion detection algorithm was used to detect birds in the videos, a Kalman filter was then used for tracking the detected birds, and bird activities (incoming and outgoing birds) were counted based on the movement of birds across the plot boundary. Two accuracy measures (precision and recall) were calculated to analyze the performance of the automated bird detection and counting system. The results showed that the proposed system achieved a precision of 84% and recall of 87% in counting incoming and outgoing birds. The automated bird counting system was then used to evaluate the performance of the UAS-based bird deterrence system. The results showed that bird activity was more than 300% higher on days with no UAS flights compared to days when the UAS was flown with on-board bird deterrence measures. UAS flights covering the entire experimental plot with auditory deterrence had a better effect than flights with visual deterrence. The results showed the potential for developing an automated bird deterrence system for vineyards and other crops. Extended studies with multi-year, multi-field, and multi-platform experiments are essential to further validate the results.

##### BibTex Citation
@article{2019asabe,
author = {Bhusal, Santosh and Khanal, Kapil and Goel, Shivam and Karkee, Manoj and Taylor, Matthew E.},
title = {Bird Deterrence in a Vineyard using an Unmanned Aerial System {(UAS)}},
journal = {Transactions of the ASABE},
year = {2019},
volume = {62},
number = {2},
pages = {561--569},
doi = {10.13031/trans.12923}
}

• 2019 Yunshu Du, Assefaw Gebremedhin, and Matthew E. Taylor. 2019. “Analysis of University Fitness Center Data Uncovers Interesting Patterns, Enables Prediction.” IEEE Transactions on Knowledge and Data Engineering 31 (8): 1478–90. DOI: 10.1109/TKDE.2018.2863705.

##### Abstract

Data is increasingly being used to make everyday life easier and better. Applications such as waiting time estimation, traffic prediction, and parking search are good examples of how data from different sources can be used to facilitate our daily life. In this study, we consider an under-utilized data source: university ID cards. Such cards are used on many campuses to purchase food, allow access to different areas, and even take attendance in classes. In this article, we use data from our university to analyze usage of the university fitness center and build a predictor for future visit volume. The work makes several contributions: it demonstrates the richness of the data source, shows how the data can be leveraged to improve student services, discovers interesting trends and behavior, and serves as a case study illustrating the entire data science process.

##### BibTex Citation
@article{2019kde,
author = {Du, Yunshu and Gebremedhin, Assefaw and Taylor, Matthew E.},
title = {Analysis of University Fitness Center Data Uncovers Interesting Patterns, Enables Prediction},
journal = {IEEE Transactions on Knowledge and Data Engineering},
year = {2019},
volume = {31},
issue = {8},
pages = {1478--1490},
doi = {10.1109/TKDE.2018.2863705}
}

• 2019 Bilal Kartal, Pablo Hernandez-Leal, and Matthew E. Taylor. January 2019. “Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL.” Proceedings of the AAAI Workshop on Reinforcement Learning in Games.

##### Abstract

Deep reinforcement learning (DRL) has achieved great successes in recent years with the help of novel methods and higher compute power. However, there are still several challenges to be addressed such as convergence to locally optimal policies and long training times. In this paper, firstly, we augment Asynchronous Advantage Actor-Critic (A3C) method with a novel self-supervised auxiliary task, i.e. Terminal Prediction, measuring temporal closeness to terminal states, namely A3C-TP. Secondly, we propose a new framework where planning algorithms such as Monte Carlo tree search or other sources of (simulated) demonstrators can be integrated to asynchronous distributed DRL methods. Compared to vanilla A3C, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.

##### BibTex Citation
@misc{2019aaaiws,
title = {Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL},
author = {Kartal, Bilal and Hernandez-Leal, Pablo and Taylor, Matthew E.},
booktitle = {Proceedings of the AAAI Workshop on Reinforcement Learning in Games},
month = jan,
year = {2019},
month_numeric = {1}
}

• 2019 Anestis Fachantidis, Matthew Taylor, and I. Vlahavas. March 2019. “Learning to Teach Reinforcement Learning Agents.” Machine Learning and Knowledge Extraction 1 (March). DOI: 10.3390/make1010002.

##### BibTex Citation
@article{2017Fachantidis-Taylor-Vlahavas,
author = {Fachantidis, Anestis and Taylor, Matthew and Vlahavas, I.},
year = {2019},
month = mar,
title = {Learning to Teach Reinforcement Learning Agents},
volume = {1},
journal = {Machine Learning and Knowledge Extraction},
doi = {10.3390/make1010002},
url = {https://www.researchgate.net/publication/318785192_Learning_to_Teach_Reinforcement_Learning_Agents},
month_numeric = {3}
}

• 2019 Nadia V. Panossian, Dustin McLarty, and Matthew E. Taylor. April 2019. “Artificial Neural Network for Unit Commitment on Networks with Significant Energy Storage.” In Proceedings of the IEEE Green Technologies Conference (GREENTECH).

##### Abstract

Adding energy storage capacities equivalent to several hours of demand can lower operating costs of independent power providers and reduce reliance on the regional utility network. Energy storage compounds the mixed-integer unit commitment problem by coupling the scheduling decisions within the dispatch horizon, making otherwise straightforward problems highly computationally intensive. A sufficiently trained artificial neural network (ANN) can altogether void the mixed-integer unit-commitment problem and facilitate dispatch receding horizon control in applications with significant energy storage assets. Neural networks with varying layers are trained to replicate the optimal binary unit commitment status of the micro-grid generators as determined by a non-linear mixed integer solver. The combined approach proposed uses an ANN to determine the unit-commitment solution and quadratic programming to determine the active generation and storage set points. The accuracy in replicating the mixed-integer unit commitment exhibits depreciating improvement with additional network layers. The single layer ANN reduced dispatch time by a factor of 40 while replicating unit commitment with 95.3% accuracy. A two-layer ANN reduces dispatch time by a factor of 20 with a 97.3% accuracy.

##### BibTex Citation
@inproceedings{2019greentech,
author = {Panossian, Nadia V. and McLarty, Dustin and Taylor, Matthew E.},
title = {Artificial Neural Network for Unit Commitment on Networks with Significant Energy Storage},
booktitle = {{Proceedings of the IEEE Green Technologies Conference (GREENTECH)}},
month = apr,
year = {2019},
month_numeric = {4}
}

• 2019 Gabriel de la Cruz Jr., Yunshu Du, and Matthew E. Taylor. May 2019. “Jointly Pre-Training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement Learning.” Proceedings of the Adaptive and Learning Agents Workshop at AAMAS.

##### BibTex Citation
@misc{2019ala-gabe,
author = {de la Cruz Jr., Gabriel and Du, Yunshu and Taylor, Matthew E.},
title = {Jointly Pre-training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement Learning},
booktitle = {Proceedings of the Adaptive and Learning Agents Workshop at AAMAS},
month = may,
year = {2019},
month_numeric = {5}
}

• 2019 Bilal Kartal, Pablo Hernandez-Leal, Chao Gao, and Matthew E. Taylor. May 2019. “Safer Deep RL with Shallow MCTS: A Case Study in Pommerman.” Proceedings of the Adaptive and Learning Agents Workshop at AAMAS.

##### BibTex Citation
@misc{2019ala-bilal,
author = {Kartal, Bilal and Hernandez-Leal, Pablo and Gao, Chao and Taylor, Matthew E.},
title = {Safer Deep RL with Shallow MCTS: A Case Study in Pommerman},
booktitle = {Proceedings of the Adaptive and Learning Agents Workshop at AAMAS},
month = may,
year = {2019},
month_numeric = {5}
}

• 2019 Garrett Wilson, Christopher Pereyda, Nisha Raghunath, Gabriel de la Cruz Jr., Shivam Goel, Sepehr Nesaei, Bryan Minor, Maureen Schmitter-Edgecombe, Matthew E. Taylor, and Diane J. Cook. May 2019. “Robot-Enabled Support of Daily Activities in Smart Home Environments.” Cognitive Systems Research 54 (May): 258–72.

##### Abstract

Smart environments offer valuable technologies for activity monitoring and health assessment. Here, we describe an integration of robots into smart environments to provide more interactive support of individuals with functional limitations. RAS, our Robot Activity Support system, partners smart environment sensing, object detection and mapping, and robot interaction to detect and assist with activity errors that may occur in everyday settings. We describe the components of the RAS system and demonstrate its use in a smart home testbed. To evaluate the usability of RAS, we also collected and analyzed feedback from participants who received assistance from RAS in a smart home setting as they performed routine activities.

##### BibTex Citation
@article{2019cogsystems,
author = {Wilson, Garrett and Pereyda, Christopher and Raghunath, Nisha and de la Cruz Jr., Gabriel and Goel, Shivam and Nesaei, Sepehr and Minor, Bryan and Schmitter-Edgecombe, Maureen and Taylor, Matthew E. and Cook, Diane J.},
title = {Robot-enabled support of daily activities in smart home environments},
journal = {Cognitive Systems Research},
year = {2019},
month = may,
volume = {54},
pages = {258--272},
month_numeric = {5}
}

• 2019 Noa Agmon, Matthew E. Taylor, Edith Elkind, and Manuela Veloso, eds. May 2019. AAMAS ’19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems. http://www.ifaamas.org/Proceedings/aamas2019/forms/index.htm.

##### BibTex Citation
@book{10.5555/3306127,
title = {{AAMAS} '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems},
year = {2019},
month = may,
isbn = {9781450363099},
issn = {2523-5699},
editor = {Agmon, Noa and Taylor, Matthew E. and Elkind, Edith and Veloso, Manuela},
publisher = {International Foundation for Autonomous Agents and Multiagent Systems},
url = {http://www.ifaamas.org/Proceedings/aamas2019/forms/index.htm},
location = {Montreal QC, Canada},
month_numeric = {5}
}

• 2019 Weixun Wang, Jianye Hao, Yixi Wang, and Matthew E. Taylor. October 2019. “Achieving Cooperation Through Deep Multiagent Reinforcement Learning in Sequential Prisoner’s Dilemmas.” In Proceedings of the 1st International Conference on Distributed Artificial Intelligence (DAI-19). 35% acceptance rate, Best paper award

##### BibTex Citation
@inproceedings{2019dai,
author = {Wang, Weixun and Hao, Jianye and Wang, Yixi and Taylor, Matthew E.},
title = {Achieving Cooperation Through Deep Multiagent Reinforcement Learning in Sequential Prisoner’s Dilemmas},
booktitle = {Proceedings of the 1st International Conference on Distributed Artificial Intelligence (DAI-19)},
year = {2019},
month = oct,
month_numeric = {10}
}

• 2019 Pablo Hernandez-Leal, Bilal Kartal, and Matthew E. Taylor. October 2019. “Agent Modeling as Auxiliary Task for Deep Reinforcement Learning.” In Proceedings of the 15th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE’19). 25% acceptance rate for oral presentation

##### BibTex Citation
@inproceedings{2019aiide-opponentmodeling,
author = {Hernandez-Leal, Pablo and Kartal, Bilal and Taylor, Matthew E.},
title = {Agent Modeling as Auxiliary Task for Deep Reinforcement Learning},
booktitle = {Proceedings of the 15th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE’19)},
year = {2019},
month = oct,
month_numeric = {10}
}

• 2019 Bilal Kartal, Pablo Hernandez-Leal, and Matthew E. Taylor. October 2019. “Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning.” In Proceedings of the 15th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE’19). 25% acceptance rate for oral presentation

##### BibTex Citation
@inproceedings{2019aiide-terminalprediction,
author = {Kartal, Bilal and Hernandez-Leal, Pablo and Taylor, Matthew E.},
title = {Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning},
booktitle = {Proceedings of the 15th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE’19)},
year = {2019},
month = oct,
month_numeric = {10}
}

• 2019 Chao Gao, Bilal Kartal, Pablo Hernandez-Leal, and Matthew E. Taylor. October 2019. “On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman.” In Proceedings of the 15th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE’19). 25% acceptance rate for oral presentation

##### BibTex Citation
@inproceedings{2019aiide-pommerman,
author = {Gao, Chao and Kartal, Bilal and Hernandez-Leal, Pablo and Taylor, Matthew E.},
title = {On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman},
booktitle = {Proceedings of the 15th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE’19)},
year = {2019},
month = oct,
month_numeric = {10}
}

• 2019 Bilal Kartal, Pablo Hernandez-Leal, and Matthew E. Taylor. October 2019. “Action Guidance with MCTS for Deep Reinforcement Learning.” In Proceedings of the 15th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE’19). Poster: 25% acceptance rate for oral presentation, additional 25% acceptance rate for posters

##### BibTex Citation
@inproceedings{2019aiide-mcts,
author = {Kartal, Bilal and Hernandez-Leal, Pablo and Taylor, Matthew E.},
title = {Action Guidance with MCTS for Deep Reinforcement Learning},
booktitle = {Proceedings of the 15th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE’19)},
year = {2019},
month = oct,
month_numeric = {10}
}

• 2019 Pablo Hernandez-Leal, Bilal Kartal, and Matthew E. Taylor. October 2019. “A Survey and Critique of Multiagent Deep Reinforcement Learning.” Journal of Autonomous Agents and Multiagent Systems 33 (6): 750–97. DOI: https://doi.org/10.1007/s10458-019-09421-1.

##### BibTex Citation
@article{2019jaamas,
title = {A Survey and Critique of Multiagent Deep Reinforcement Learning},
journal = {Journal of Autonomous Agents and Multiagent Systems},
author = {Hernandez-Leal, Pablo and Kartal, Bilal and Taylor, Matthew E.},
year = {2019},
month = oct,
volume = {33},
issue = {6},
pages = {750--797},
doi = {https://doi.org/10.1007/s10458-019-09421-1},
month_numeric = {10}
}

• 2019 Nathan Douglas, Dianna Yim, Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor, and Frank Maurer. November 2019. “Towers of Saliency: A Reinforcement Learning Visualization Using Immersive Environments.” In Proceedings of the 2019 ACM International Conference on Interactive Surfaces and Spaces. ISS ’19. New York, NY, USA: ACM.

##### BibTex Citation
@inproceedings{douglas2020,
author = {Douglas, Nathan and Yim, Dianna and Kartal, Bilal and Hernandez-Leal, Pablo and Taylor, Matthew E. and Maurer, Frank},
title = {Towers of Saliency: A Reinforcement Learning Visualization Using Immersive Environments},
booktitle = {Proceedings of the 2019 ACM International Conference on Interactive Surfaces and Spaces},
series = {ISS '19},
year = {2019},
month = nov,
isbn = {978-1-4503-6891-9/19/11},
location = {Daejon, Republic of Korea},
numpages = {4},
publisher = {ACM},
address = {New York, NY, USA},
month_numeric = {11}
}

• 2019 Felipe Leno da Silva, Pablo Hernandez-Leal, Bilal Kartal, and Matthew E. Taylor. December 2019. “Receiving Uncertainty-Aware Advice in Deep Reinforcement Learning.” Proceedings of the 5th Annual NeurIPS Deep Learning Workshop.

##### BibTex Citation
@misc{2019neuripsws,
author = {da Silva, Felipe Leno and Hernandez-Leal, Pablo and Kartal, Bilal and Taylor, Matthew E.},
title = {Receiving Uncertainty-Aware Advice in Deep Reinforcement Learning},
booktitle = {Proceedings of the 5th Annual {NeurIPS} Deep Learning Workshop},
month = dec,
year = {2019},
month_numeric = {12}
}


## 2018

• 2018 Weixun Wang, Jianye Hao, Yixi Wang, and Matthew E. Taylor. 2018. “Achieving Cooperation Through Deep Multiagent Reinforcement Learning in Sequential Prisoner’s Dilemmas.” Proceedings of the Adaptive and Learning Agents Workshop at AAMAS.

##### BibTex Citation
@misc{2018ala-jianye,
author = {Wang, Weixun and Hao, Jianye and Wang, Yixi and Taylor, Matthew E.},
title = {Achieving Cooperation Through Deep Multiagent Reinforcement Learning in Sequential Prisoner's Dilemmas},
booktitle = {Proceedings of the Adaptive and Learning Agents Workshop at AAMAS},
year = {2018}
}

• 2018 Bikramjit Banerjee, and Matthew E. Taylor. 2018. “Coordination Confidence Based Human-Multi-Agent Transfer Learning for Collaborative Teams.” Proceedings of the Adaptive and Learning Agents Workshop at AAMAS. Best paper award nominee

##### BibTex Citation
@misc{2018ala-bikram,
author = {Banerjee, Bikramjit and Taylor, Matthew E.},
title = {Coordination Confidence based Human-Multi-Agent Transfer Learning for Collaborative Teams},
booktitle = {Proceedings of the Adaptive and Learning Agents Workshop at AAMAS},
year = {2018}
}

• 2018 Santosh Bhusal, Kapil Khanal, Manoj Karkee, Karen Steensma, and Matthew E. Taylor. 2018. “Bird Detection, Tracking and Counting in Wine Grapes.” In 2018 ASABE Annual International Meeting. American Society of Agricultural and Biological Engineers.

##### BibTex Citation
@inproceedings{2018ASABE-Santosh,
author = {Bhusal, Santosh and Khanal, Kapil and Karkee, Manoj and Steensma, Karen and Taylor, Matthew E.},
title = {Bird Detection, Tracking and Counting in Wine Grapes},
booktitle = {{2018 {ASABE} Annual International Meeting}},
year = {2018},
organization = {American Society of Agricultural and Biological Engineers}
}

• 2018 Matthew E. Taylor. 2018. “Improving Reinforcement Learning with Human Input.” In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI).

##### Abstract

Reinforcement learning (RL) has had many successes when learning autonomously. This paper and accompanying talk consider how to make use of a non-technical human participant, when available. In particular, we consider the case where a human could 1) provide demonstrations of good behavior, 2) provide online evaluative feedback, or 3) define a curriculum of tasks for the agent to learn on. In all cases, our work has shown such information can be effectively leveraged. After giving a high-level overview of this work, we will highlight a set of open questions and suggest where future work could be usefully focused.

##### BibTex Citation
@inproceedings{taylorijcai18,
title = {Improving Reinforcement Learning with Human Input},
author = {Taylor, Matthew E.},
booktitle = {Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI)},
year = {2018}
}

• 2018 Yunxiang Ye, Long He, Zhaodong Wang, Dylan Jones, Geoffrey A. Hollinger, Matthew E. Taylor, and Qin Zhang. 2018. “Orchard Manoeuvring Strategy for a Robotic Bin-Handling Machine.” Biosystems Engineering 169: 85–103. DOI: https://doi.org/10.1016/j.biosystemseng.2017.12.005.

##### Abstract

Unlike a car-like vehicle manoeuvring its way in an open field, a four-wheel-independent-steered robotic machine placed in an orchard must operate in a very confined working space between tree rows. Because the machine is subject to the unique constraints of the worksite space and operation limits, multiple steering modes are often required to effectively accomplish the desired bin-handling manoeuvers. In this study, we created a multi-mode manoeuvring strategy selection method to generate strategies that can guide the robotic platform to accomplish bin handling tasks, such as correcting pose error between tree rows, entering a tree lane from the headland, and loading a bin between tree rows, effectively. The method determines the manoeuvring strategies based on the situation among four steering modes: 1) Ackermann steering, 2) coordinated four wheel steering, 3) crab steering, and 4) spinning. The study first evaluated applicable strategies and selected the best of these strategies for different bin handling scenarios. Then, the selected strategies were implemented to drive a four-wheel-independent-steering (4WIS) system to complete the tasks in a commercial orchard in order to validate the method. Obtained results showed that the system could navigate the platform on desired trajectories to complete bin-handling tasks with a root mean square errors less than 0.06 m.

##### BibTex Citation
@article{YE201885,
title = {Orchard manoeuvring strategy for a robotic bin-handling machine},
journal = {Biosystems Engineering},
volume = {169},
pages = {85-103},
year = {2018},
issn = {1537-5110},
doi = {https://doi.org/10.1016/j.biosystemseng.2017.12.005},
url = {https://www.sciencedirect.com/science/article/pii/S1537511016308571},
author = {Ye, Yunxiang and He, Long and Wang, Zhaodong and Jones, Dylan and Hollinger, Geoffrey A. and Taylor, Matthew E. and Zhang, Qin},
keywords = {Automatic navigation, Four-wheel-independent-steering system, Bin management, Tree fruit}
}

• 2018 Su Zhang, and Matthew E Taylor. 2018. “Work-In-Progress: Enhanced Learning from Multiple Demonstrations with a Two-Level Structured Approach.”

##### Abstract

Learning from demonstrations (LfD) has been successfully leveraged to speed up reinforcement learning techniques. This paper investigates how to mitigate, or at least reduce, the impact of noisy or poor demonstrations. In particular, we consider how heterogeneous demonstrators may provide inconsistent and/or even conflicting information, and noise may occur in otherwise high-quality demonstrations, both of which can significantly affect the policy generalization and hurt learning performance. To address the above challenge, this paper develops a Flexible Two-level Structured Approach (FTSA) inspired by the existing Human Agent Transfer algorithm and the multi-armed contextual bandit problem. Our FTSA approach can mitigate the effects of bad demonstrations while retaining the benefits provided by good ones. Our preliminary experimental results in the Mario domain show that this approach holds promise and may be able to successfully leverage demonstrations of different quality.

##### BibTex Citation
@article{zhang2018work,
title = {Work-In-Progress: Enhanced Learning from Multiple Demonstrations with a Two-level Structured Approach},
author = {Zhang, Su and Taylor, Matthew E},
year = {2018}
}

• 2018 Bei Peng, James MacGlashan, Robert Loftin, Michael L. Littman, David L. Roberts, and Matthew E. Taylor. 2018. “Curriculum Design for Machine Learners in Sequential Decision Tasks.” IEEE Transactions on Emerging Topics in Computational Intelligence 2 (4): 268–77. DOI: 10.1109/TETCI.2018.2829980.

##### Abstract

Existing work in machine learning has shown that algorithms can benefit from the use of curricula-learning first on simple examples before moving to more difficult problems. This work studies the curriculum-design problem in the context of sequential decision tasks, analyzing how different curricula affect learning in a Sokoban-like domain, and presenting the results of a user study that explores whether nonexperts generate effective curricula. Our results show that 1) the way in which evaluative feedback is given to the agent as it learns individual tasks does not affect the relative quality of different curricula, 2) nonexpert users can successfully design curricula that result in better overall performance than having the agent learn from scratch, and 3) nonexpert users can discover and follow salient principles when selecting tasks in a curriculum. We also demonstrate that our curriculum-learning algorithm can be improved by incorporating the principles people use when designing curricula. This work gives us insights into the development of new machine-learning algorithms and interfaces that can better accommodate machineor human-created curricula.

##### BibTex Citation
@article{2018curriculadesign,
author = {Peng, Bei and MacGlashan, James and Loftin, Robert and Littman, Michael L. and Roberts, David L. and Taylor, Matthew E.},
title = {Curriculum Design for Machine Learners in Sequential Decision Tasks},
journal = {IEEE Transactions on Emerging Topics in Computational Intelligence},
year = {2018},
volume = {2},
issue = {4},
pages = {268--277},
doi = {10.1109/TETCI.2018.2829980}
}

• 2018 Ariel Rosenfeld, Moshe Cohen, Matthew E. Taylor, and Sarit Kraus. 2018. “Leveraging Human Knowledge in Tabular Reinforcement Learning: a Study of Human Subjects.” The Knowledge Engineering Review 33. DOI: 10.1017/S0269888918000206.

##### Abstract

Reinforcement learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort and expertise on the human designer’s part. To date, human factors are generally not considered in the development and evaluation of possible RL approaches. In this article, we set out to investigate how different methods for injecting human knowledge are applied, in practice, by human designers of varying levels of knowledge and skill. We perform the first empirical evaluation of several methods, including a newly proposed method named State Action Similarity Solutions (SASS) which is based on the notion of similarities in the agent’s state–action space. Through this human study, consisting of 51 human participants, we shed new light on the human factors that play a key role in RL. We find that the classical reward shaping technique seems to be the most natural method for most designers, both expert and non-expert, to speed up RL. However, we further find that our proposed method SASS can be effectively and efficiently combined with reward shaping, and provides a beneficial alternative to using only a single-speedup method with minimal human designer effort overhead.

##### BibTex Citation
@article{rosenfeld_cohen_taylor_kraus_2018,
title = {Leveraging human knowledge in tabular reinforcement learning: a study of human subjects},
volume = {33},
doi = {10.1017/S0269888918000206},
journal = {The Knowledge Engineering Review},
publisher = {Cambridge University Press},
author = {Rosenfeld, Ariel and Cohen, Moshe and Taylor, Matthew E. and Kraus, Sarit},
year = {2018}
}

• 2018 Gabriel V. de la Cruz Jr., Yunshu Du, and Matthew E. Taylor. July 2018. “Pre-Training Neural Networks with Human Demonstrations for Deep Reinforcement Learning.” In Poster of the Adaptive Learning Agents (ALA) Workshop (at FAIM). Stockholm, Sweden.

##### Abstract

Deep reinforcement learning (deep RL) has achieved superior performance in complex sequential tasks by using a deep neural network as its function approximator and by learning directly from raw images. A drawback of using raw images is that deep RL must learn the state feature representation from the raw images in addition to learning a policy. As a result, deep RL often requires a prohibitively large amount of training time and data to reach reasonable performance, making it inapplicable in real-world settings, particularly when data is expensive. In this work, we speed up training by addressing half of what deep RL is trying to solve — feature learning. We show that using a small set of non-expert human demonstrations during a supervised pre-training stage allows significant improvements in training times. We empirically evaluate our approach using the deep Q-network and the asynchronous advantage actor-critic algorithms in the Atari 2600 games of Pong, Freeway, and Beamrider. Our results show that pre-training a deep RL network provides a significant improvement in training time, even when pre-training from a small number of noisy demonstrations.

##### BibTex Citation
@inproceedings{2018ALA-DelaCruz,
author = {de la Cruz, Jr., Gabriel V. and Du, Yunshu and Taylor, Matthew E.},
title = {{Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning}},
booktitle = {{Poster of the Adaptive Learning Agents ({ALA}) workshop (at {FAIM})}},
year = {2018},
address = {Stockholm, Sweden},
month = jul,
month_numeric = {7}
}

• 2018 Konstantin I. Matveev, John P. Swensen, and Matthew E. Taylor. July 2018. “Modeling of Decelerated Descent of an Elongated Body With Vectored Thrust.” In Proceedings of American Society of Mechanical Engineer’s 5th Joint US-European Fluids Engineering Division Summer Meeting.

##### Abstract

The subject of this study is a simplified model of an elongated body intended for controlled, low-speed landing after being released far above the ground. The envisioned system is structurally simple and compact. It comprises a cylindrical body with a vectored propulsor attached to its upper end. Far from the ground a low-magnitude thrust force directs the body toward the target site and maintains stable orientation, whereas near the ground higher thrust decelerates and directs the body to ensure low-speed landing near the target location. A 6-DOF dynamics model is applied for simulating the body descent. A strip approach is used for evaluating aerodynamic forces on the body. The thrust magnitude and direction are the controlled parameters. Results of simulations are presented for several scenarios of the body descending on the ground in calm air and in the presence of wind.

##### BibTex Citation
@inproceedings{2018asme,
author = {Matveev, Konstantin I. and Swensen, John P. and Taylor, Matthew E.},
title = {Modeling of Decelerated Descent of an Elongated Body With Vectored Thrust},
booktitle = {Proceedings of American Society of Mechanical Engineer's 5th Joint US-European Fluids Engineering Division Summer Meeting},
year = {2018},
month = jul,
month_numeric = {7}
}

• 2018 Felipe Leno Da Silva, Matthew E. Taylor, and Anna Helena Reali Costa. July 2018. “Autonomously Reusing Knowledge in Multiagent Reinforcement Learning.” In Proceedings of the Twenty-Seventh International Joint Conference On Artificial Intelligence, IJCAI-18, 5487–93. International Joint Conferences on Artificial Intelligence Organization. DOI: 10.24963/ijcai.2018/774.

##### Abstract

Autonomous agents are increasingly required to solve complex tasks; hard-coding behaviors has become infeasible. Hence, agents must learn how to solve tasks via interactions with the environment. In many cases, knowledge reuse will be a core technology to keep training times reasonable, and for that, agents must be able to autonomously and consistently reuse knowledge from multiple sources, including both their own previous internal knowledge and from other agents. In this paper, we provide a literature review of methods for knowledge reuse in Multiagent Reinforcement Learning. We define an important challenge problem for the AI community, survey the existent methods, and discuss how they can all contribute to this challenging problem. Moreover, we highlight gaps in the current literature, motivating “low-hanging fruit” for those interested in the area. Our ambition is that this paper will encourage the community to work on this difficult and relevant research challenge.

##### BibTex Citation
@inproceedings{2018ijcaisurvey,
title = {Autonomously Reusing Knowledge in Multiagent Reinforcement Learning},
author = {Silva, Felipe Leno Da and Taylor, Matthew E. and Costa, Anna Helena Reali},
booktitle = {Proceedings of the Twenty-Seventh International Joint Conference on
Artificial Intelligence, {IJCAI-18}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
pages = {5487--5493},
year = {2018},
month = jul,
doi = {10.24963/ijcai.2018/774},
url = {https://doi.org/10.24963/ijcai.2018/774},
month_numeric = {7}
}

• 2018 Alex Kearney, Anna Koop, Craig Sherstan, Johannes Gunther, Richard S. Sutton, Patrick M. Pilarski, and Matthew E. Taylor. October 2018. “Evaluating Predictive Knowledge.” Proceedings of the Fall AAAI Symposium on Reasoning and Learning in Real-World Systems for Long-Term Autonomy.

##### Abstract

Predictive Knowledge (PK) is a group of approaches to machine perception and knowledgability using large collections of predictions made online in real-time through interaction with the environment. Determining how well a collection of predictions captures the relevant dynamics of the environment remains an open challenge. In this paper, we introduce specifications for sensorimotor baselines and robustness-totransfer metrics for evaluation of PK. We illustrate the use of these metrics by comparing variant architectures of General Value Function (GVF) networks.

##### BibTex Citation
@misc{2018fss,
author = {Kearney, Alex and Koop, Anna and Sherstan, Craig and Gunther, Johannes and Sutton, Richard S. and Pilarski, Patrick M. and Taylor, Matthew E.},
title = {Evaluating Predictive Knowledge},
booktitle = {Proceedings of the Fall AAAI Symposium on Reasoning and Learning in Real-World Systems for Long-Term Autonomy},
year = {2018},
month = oct,
month_numeric = {10}
}


## 2017

• 2017 Bei Peng, James MacGlashan, Robert Loftin, Michael Littman, David Roberts, and Matthew E. Taylor. 2017. “Curriculum Design for Machine Learners in Sequential Decision Tasks.” Proceedings of the Adaptive and Learning Agents Workshop at AAMAS.

##### BibTex Citation
@misc{2017ala-bei,
author = {Peng, Bei and MacGlashan, James and Loftin, Robert and Littman, Michael and Roberts, David and Taylor, Matthew E.},
title = {Curriculum Design for Machine Learners in Sequential Decision Tasks},
booktitle = {Proceedings of the Adaptive and Learning Agents Workshop at AAMAS},
year = {2017}
}

• 2017 Ariel Rosenfeld, Matthew E. Taylor, and Sarit Kraus. 2017. “Speeding up Tabular Reinforcement Learning Using State-Action Similarities.” Proceedings of the Adaptive and Learning Agents Workshop at AAMAS. Best paper award nominee

##### BibTex Citation
@misc{2017ala-ariel,
author = {Rosenfeld, Ariel and Taylor, Matthew E. and Kraus, Sarit},
title = {Speeding up Tabular Reinforcement Learning Using State-Action Similarities},
booktitle = {Proceedings of the Adaptive and Learning Agents Workshop at AAMAS},
year = {2017}
}

• 2017 Shivam Goel, Santosh Bhusal, Matthew E Taylor, and Manoj Karkee. 2017. “Detection and Localization of Birds for Bird Deterrence Using UAS.” In 2017 ASABE Annual International Meeting. American Society of Agricultural and Biological Engineers.

##### BibTex Citation
@inproceedings{2017asabe-goel,
author = {Goel, Shivam and Bhusal, Santosh and Taylor, Matthew E and Karkee, Manoj},
title = {{Detection and localization of birds for Bird Deterrence using UAS}},
booktitle = {{2017 {ASABE} Annual International Meeting}},
year = {2017},
organization = {American Society of Agricultural and Biological Engineers}
}

• 2017 Santosh Bhusal, Shivam Goel, Kapil Khanal, Matthew E. Taylor, and Manoj Karkee. 2017. “Bird Detection, Tracking and Counting in Wine Grapes.” In 2017 ASABE Annual International Meeting. American Society of Agricultural and Biological Engineers.

##### BibTex Citation
@inproceedings{2017asabe-santosh,
author = {Bhusal, Santosh and Goel, Shivam and Khanal, Kapil and Taylor, Matthew E. and Karkee, Manoj},
title = {{Bird Detection, Tracking and Counting in Wine Grapes}},
booktitle = {{2017 {ASABE} Annual International Meeting}},
year = {2017},
organization = {American Society of Agricultural and Biological Engineers}
}

• 2017 Yunxiang Ye, Zhaodong Wang, Dylan Jones, Long He, Matthew E. Taylor, Geoffrey A. Hollinger, and Qin Zhang. 2017. “Bin-Dog: A Robotic Platform for Bin Management in Orchards.” Robotics 6 (2). DOI: 10.3390/robotics6020012.

##### Abstract

Bin management during apple harvest season is an important activity for orchards. Typically, empty and full bins are handled by tractor-mounted forklifts or bin trailers in two separate trips. In order to simplify this work process and improve work efficiency of bin management, the concept of a robotic bin-dog system is proposed in this study. This system is designed with a “go-over-the-bin” feature, which allows it to drive over bins between tree rows and complete the above process in one trip. To validate this system concept, a prototype and its control and navigation system were designed and built. Field tests were conducted in a commercial orchard to validate its key functionalities in three tasks including headland turning, straight-line tracking between tree rows, and “go-over-the-bin.” Tests of the headland turning showed that bin-dog followed a predefined path to align with an alleyway with lateral and orientation errors of 0.02 m and 1.5°. Tests of straight-line tracking showed that bin-dog could successfully track the alleyway centerline at speeds up to 1.00 m·s−1 with a RMSE offset of 0.07 m. The navigation system also successfully guided the bin-dog to complete the task of go-over-the-bin at a speed of 0.60 m·s−1. The successful validation tests proved that the prototype can achieve all desired functionality.

##### BibTex Citation
@article{2017robotics,
author = {Ye, Yunxiang and Wang, Zhaodong and Jones, Dylan and He, Long and Taylor, Matthew E. and Hollinger, Geoffrey A. and Zhang, Qin},
title = {Bin-Dog: A Robotic Platform for Bin Management in Orchards},
journal = {Robotics},
volume = {6},
year = {2017},
number = {2},
url = {http://www.mdpi.com/2218-6581/6/2/12},
issn = {2218-6581},
doi = {10.3390/robotics6020012}
}

• 2017 Yusen Zhan, Haitham Bou Ammar, and Matthew E. Taylor. 2017. “Non-Convex Policy Search Using Variational Inequalities.” Neural Computation.

##### Abstract

Policy search is a class of reinforcement learning algorithms for finding optimal policies in control problems with limited feedback. These methods have shown to be successful in high-dimensional problems, such as robotics control. Though successful, current methods can lead to unsafe policy parameters potentially damaging hardware units. Motivated by such constraints, projection based methods are proposed for safe policies. These methods, however, can only handle convex policy constraints. In this paper, we propose the first safe policy search reinforcement learner capable of operating under non-convex policy constraints. This is achieved by observing, for the first time, a connection between non-convex variational inequalities and policy search problems. We provide two algorithms, i.e., Mann and two-step iteration, to solve the above problems and prove convergence in the non-convex stochastic setting. Finally, we demonstrate the performance of the above algorithms on six benchmark dynamical systems and show that our new method is capable of outperforming previous methods under a variety of settings.

##### BibTex Citation
@article{2017neuralcomputation,
author = {Zhan, Yusen and {Bou Ammar}, Haitham and Taylor, Matthew E.},
title = {Non-convex Policy Search Using Variational Inequalities},
journal = {Neural Computation},
year = {2017}
}

• 2017 Tim Brys, Anna Harutyunyan, Peter Vranck, Matthew E. Taylor, and Ann Nowe. 2017. “Multi-Objectivization and Ensembles of Shapings in Reinforcement Learning.” Neurocomputing.

##### Abstract

Ensemble techniques are a powerful approach to creating better decision makers in machine learning. A number of decision makers is trained to solve a given task, grouped in an ensemble, and their decisions are aggregated. The ensemble derives its power from the diversity of its components, as the assumption is that they make mistakes on different inputs, and that the majority is more likely to be correct than any individual component. Diversity usually comes from the different algorithms employed by the decision makers, or the different inputs used to train the decision makers. We advocate a third way to achieve this diversity, based on multi -objectivization. This is the process of taking a single-objective problem and transforming it into a multi-objective problem in order to solve the original problem faster and/or better. This is either done through decomposition of the original objective, or the addition of extra objectives, typically based on some (heuristic) domain knowledge. This process basically creates a diverse set of feedback signals for what is underneath still a single-objective problem. In the context of ensemble techniques, these various ways to evaluate a (solution to a) problem allow for different components of the ensemble to look at the problem in different ways, generating the necessary diversity for the ensemble. In this paper, we argue for the combination of multi-objectivization and ensemble techniques as a powerful tool to boost solving performance in reinforcement learning. We inject various pieces of heuristic information through reward shaping, creating several distinct enriched reward signals, which can strategically be combined using ensemble techniques to reduce sample complexity. We demonstrate the potential of the approach with a range of experiments.

##### BibTex Citation
@article{17neurocomputing,
author = {Brys, Tim and Harutyunyan, Anna and Vranck, Peter and Taylor, Matthew E. and Nowe, Ann},
title = {Multi-objectivization and Ensembles of Shapings in Reinforcement Learning},
journal = {Neurocomputing},
year = {2017}
}

• 2017 Lonny Simonian, and Matthew E. Taylor. January 2017. “Applications for UAVs in Electric Utility Construction.” F3415. The Foundation for Electricial Construction Inc. ( ELECTRI ). http://electri.org/research/applications-uavs-electric-utility-construction.

##### BibTex Citation
@techreport{electri17-simonian,
author = {Simonian, Lonny and Taylor, Matthew E.},
title = {Applications for {UAVs} in Electric Utility Construction},
institution = {The Foundation for Electricial Construction Inc. ( {ELECTRI} )},
year = {2017},
month = jan,
number = {F3415},
pages = {1--69},
url = {http://electri.org/research/applications-uavs-electric-utility-construction},
month_numeric = {1}
}

• 2017 A. Leah Zulas, Kaitlyn I. Franz, Darrin Griechen, and Matthew E. Taylor. February 2017. “Solar Decathlon Competition: Towards a Solar-Powered Smart Home.” Proceedings of the AI for Smart Grids and Buildings Workshop (at AAAI).

##### Abstract

Alternative energy is becoming a growing source of power in the United States, including wind, hydroelectric and solar. The Solar Decathlon is a competition run by the US Department of Energy every two years. Washington State University (WSU) is one of twenty teams recently selected to compete in the fall 2017 challenge. A central part to WSU’s entry is incorporating new and existing smart home technology from the ground up. The smart home can help to optimize energy loads, battery life and general comfort of the user in the home. This paper discusses the high-level goals of the project, hardware selected, build strategy and anticipated approach.

##### BibTex Citation
@misc{zulas2017,
title = {Solar Decathlon Competition: Towards a Solar-Powered Smart Home},
author = {Zulas, A. Leah and Franz, Kaitlyn I. and Griechen, Darrin and Taylor, Matthew E.},
booktitle = {{Proceedings of the AI for Smart Grids and Buildings Workshop (at AAAI)}},
month = feb,
year = {2017},
month_numeric = {2}
}

• Pedagogy

2017 Matthew E. Taylor, and Sakire Arslan Ay. February 2017. “AI Projects for Computer Science Capstone Classes (Extended Abstract).” Proceedings of the Seventh Symposium on Educational Advances in Artificial Intelligence.

##### Abstract

Capstone senior design projects provide students with a collaborative software design and development experience to reinforce learned material while allowing students latitude in developing real-world applications. Our two-semester capstone classes are required for all computer science majors. Students must have completed a software engineering course — capstone classes are typically taken during their last two semesters. Project proposals come from a variety of sources, including industry, WSU faculty (from our own and other departments), local agencies, and entrepreneurs. We have recently targeted projects in AI — although students typically have little background, they find the ideas and methods compelling. This paper outlines our instructional approach and reports our experiences with three projects.

EAAI-17

##### BibTex Citation
@misc{2017eaai-taylor,
author = {Taylor, Matthew E. and {Arslan Ay}, Sakire},
title = {{AI Projects for Computer Science Capstone Classes (Extended Abstract)}},
booktitle = {{Proceedings of the Seventh Symposium on Educational Advances in Artificial Intelligence}},
month = feb,
year = {2017},
month_numeric = {2}
}

• Reinforcement Learning, Transfer Learning

2017 Salam El Bsat, Haitham Bou Ammar, and Matthew E. Taylor. February 2017. “Scalable Multitask Policy Gradient Reinforcement Learning.” In Proceedings of the 31st AAAI Conference on Artificial Intelligence ( AAAI ). 25% acceptance rate

##### Abstract

Policy search reinforcement learning (RL) allows agents to learn autonomously with limited feedback. However, such methods typically require extensive experience for successful behavior due to their tabula rasa nature. Multitask RL is an approach, which aims to reduce data requirements by allowing knowledge transfer between tasks. Although successful, current multitask learning methods suffer from scalability issues when considering large number of tasks. The main reasons behind this limitation is the reliance on centralized solutions. This paper proposes to a novel distributed multitask RL framework, improving the scalability across many different types of tasks. Our framework maps multitask RL to an instance of general consensus and develops an efficient decentralized solver. We justify the correctness of the algorithm both theoretically and empirically: we first proof an improvement of convergence speed to an order of O(1/k) with k being the number of iterations, and then show our algorithm surpassing others on multiple dynamical system benchmarks.

##### BibTex Citation
@inproceedings{2017aaai-salam,
author = {{El Bsat}, Salam and {Bou Ammar}, Haitham and Taylor, Matthew E.},
title = {Scalable Multitask Policy Gradient Reinforcement Learning},
booktitle = {{Proceedings of the 31st AAAI Conference on Artificial Intelligence ( {AAAI} )}},
month = feb,
year = {2017},
month_numeric = {2}
}

• 2017 Bei Peng, James MacGlashan, Robert Loftin, Michael L. Littman, David L. Roberts, and Matthew E. Taylor. May 2017. “Curriculum Design for Machine Learners in Sequential Decision Tasks (Extended Abstract).” In Proceedings of the 2017 International Conference on Autonomous Agents and Multiagent Systems ( AAMAS ). Extended abstract: 26% acceptance rate for papers, additional 22% for extended abstracts.

##### Abstract

Existing machine-learning work has shown that algorithms can benet from curricula|learning rst on simple examples before moving to more di cult examples. While most existing work on curriculum learning focuses on developing automatic methods to iteratively select training examples with increasing di culty tailored to the current ability of the learner, relatively little attention has been paid to the ways in which humans design curricula. We argue that a better understanding of the human-designed curricula could give us insights into the development of new machine learning algorithms and interfaces that can better accommodate machine- or human-created curricula. Our work addresses this emerging and vital area empirically, taking an important step to characterize the nature of human-designed curricula relative to the space of possible curricula and the performance benets that may (or may not) occur.

##### BibTex Citation
@inproceedings{2017aamas-peng,
author = {Peng, Bei and MacGlashan, James and Loftin, Robert and Littman, Michael L. and Roberts, David L. and Taylor, Matthew E.},
title = {{Curriculum Design for Machine Learners in Sequential Decision Tasks (Extended Abstract)}},
booktitle = {{Proceedings of the 2017 International Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = may,
year = {2017},
month_numeric = {5}
}

• 2017 James MacGlashan, Mark Ho, Robert Loftin, Bei Peng, Guan Wang, David L. Roberts, Matthew E.Taylor, and Michael L. Littman. August 2017. “Interactive Learning from Policy-Dependent Human Feedback.” In Proceedings of the International Conference on Machine Learning (ICML). 25% acceptance rate

##### BibTex Citation
@inproceedings{2017icml,
author = {MacGlashan, James and Ho, Mark and Loftin, Robert and Peng, Bei and Wang, Guan and Roberts, David L. and E.Taylor, Matthew and Littman, Michael L.},
title = {Interactive Learning from Policy-Dependent Human Feedback},
booktitle = {{Proceedings of the International Conference on Machine Learning (ICML)}},
month = aug,
year = {2017},
month_numeric = {8}
}

• Reinforcement Learning

2017 Ariel Rosenfeld, Matthew E. Taylor, and Sarit Kraus. August 2017. “Leveraging Human Knowledge in Tabular Reinforcement Learning: A Study of Human Subjects.” In Proceedings of the 26th International Conference on Artificial Intelligence ( IJCAI ). 26% acceptance rate

##### Abstract

Reinforcement Learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort on the human designer’s part. To date, human factors are generally not considered in the development and evaluation of possible approaches. In this paper, we propose and evaluate a novel method, based on human psychology literature, which we show to be both effective and efficient, for both expert and non-expert designers, in injecting human knowledge for speeding up tabular RL.

##### BibTex Citation
@inproceedings{2017ijcai-rosenfeld,
author = {Rosenfeld, Ariel and Taylor, Matthew E. and Kraus, Sarit},
title = {Leveraging Human Knowledge in Tabular Reinforcement Learning: A Study of Human Subjects},
booktitle = {{Proceedings of the 26th International Conference on Artificial Intelligence ( {IJCAI} )}},
month = aug,
year = {2017},
month_numeric = {8}
}

• Reinforcement Learning

2017 Zhaodong Wang, and Matthew E. Taylor. August 2017. “Improving Reinforcement Learning with Confidence-Based Demonstrations.” In Proceedings of the 26th International Conference on Artificial Intelligence ( IJCAI ). 26% acceptance rate

##### Abstract

Reinforcement learning has had many successes, but in practice it often requires significant amounts of data to learn high-performing policies. One common way to improve learning is to allow a trained (source) agent to assist a new (target) agent. The goals in this setting are to 1) improve the target agent’s performance, relative to learning unaided, and 2) allow the target agent to outperform the source agent. Our approach leverages source agent demonstrations, removing any requirements on the source agent’s learning algorithm or representation. The target agent then estimates the source agent’s policy and improves upon it. The key contribution of this work is to show that leveraging the target agent’s uncertainty in the source agent’s policy can significantly improve learning in two complex simulated domains, Keepaway and Mario.

##### BibTex Citation
@inproceedings{2017ijcai-wang,
author = {Wang, Zhaodong and Taylor, Matthew E.},
title = {Improving Reinforcement Learning with Confidence-Based Demonstrations},
booktitle = {{Proceedings of the 26th International Conference on Artificial Intelligence ( {IJCAI} )}},
month = aug,
year = {2017},
month_numeric = {8}
}


## 2016

• 2016 Pablo Hernandez-Leal, Yusen Zhan, Matthew E. Taylor, L. Enrique Sucar, and Enrique Munoz de Cote. 2016. “Efficiently Detecting Switches against Non-Stationary Opponents.” Autonomous Agents and Multi-Agent Systems. DOI: 10.1007/s10458-016-9352-6.

##### Abstract

Interactions in multiagent systems are generally more complicated than single agent ones. Game theory provides solutions on how to act in multiagent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real world scenarios where agents have limited capacities and may deviate from a perfect rational response. Our goal is still to act optimally in these cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that (1) learns a model of the opponent, (2) uses that to obtain an optimal policy and then (3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner‚Äôs dilemma and then in a more realistic scenario, the Power TAC simulator.

##### BibTex Citation
@article{2016jaamas2-hernandez-leal,
author = {Hernandez-Leal, Pablo and Zhan, Yusen and Taylor, Matthew E. and {Sucar}, L. Enrique and {Munoz de Cote}, Enrique},
title = {{Efficiently detecting switches against non-stationary opponents}},
journal = {{Autonomous Agents and Multi-Agent Systems}},
year = {2016},
doi = {10.1007/s10458-016-9352-6},
url = {http://dx.doi.org/10.1007/s10458-016-9352-6},
issn = {1387-2532}
}

• 2016 Pablo Hernandez-Leal, Yusen Zhan, Matthew E. Taylor, L. Enrique Sucar, and Enrique Munoz de Cote. 2016. “An Exploration Strategy for Non-Stationary Opponents.” Autonomous Agents and Multi-Agent Systems, 1–32. DOI: 10.1007/s10458-016-9347-3.

##### Abstract

The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environments. Our proposed adversarial drift exploration (DE) is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non-stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial environment. The agent’s objective is to learn a model of the opponent’s strategy to act optimally. Our contribution is twofold. First, we present DE as a strategy for switch detection. Second, we propose a new algorithm called R-max # for learning and planning against non-stationary opponent. To handle such opponents, R-max # reasons and acts in terms of two objectives: (1) to maximize utilities in the short term while learning and (2) eventually explore opponent behavioral changes. We provide theoretical results showing that R-max # is guaranteed to detect the opponent’s switch and learn a new model in terms of finite sample complexity. R-max # makes efficient use of exploration experiences, which results in rapid adaptation and efficient DE, to deal with the non-stationary nature of the opponent. We show experimentally how using DE outperforms the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains.

##### BibTex Citation
@article{2016jaamas-hernandez-leal,
author = {Hernandez-Leal, Pablo and Zhan, Yusen and Taylor, Matthew E. and {Sucar}, L. Enrique and {Munoz de Cote}, Enrique},
title = {{An exploration strategy for non-stationary opponents}},
journal = {{Autonomous Agents and Multi-Agent Systems}},
year = {2016},
pages = {1--32},
issn = {1573-7454},
doi = {10.1007/s10458-016-9347-3},
url = {http://dx.doi.org/10.1007/s10458-016-9347-3}
}

• 2016 Pablo Hernandez-Leal, Matthew E. Taylor, Benjamin Rosman, L. Enrique Sucar, and Enrique Munoz de Cote. February 2016. “Identifying and Tracking Switching, Non-Stationary Opponents: a Bayesian Approach.” Proceedings of the Multiagent Interaction without Prior Coordination Workshop (at AAAI ). Phoenix, AZ, USA.

##### Abstract

In many situations, agents are required to use a set of strategies (behaviors) and switch among them during the course of an interaction. This work focuses on the problem of recognizing the strategy used by an agent within a small number of interactions. We propose using a Bayesian framework to address this problem. Bayesian policy reuse (BPR) has been empirically shown to be efficient at correctly detecting the best policy to use from a library in sequential decision tasks. In this paper we extend BPR to adversarial settings, in particular, to opponents that switch from one stationary strategy to another. Our proposed extension enables learning new models in an online fashion when the learning agent detects that the current policies are not performing optimally. Experiments presented in repeated games show that our approach is capable of efficiently detecting opponent strategies and reacting quickly to behavior switches, thereby yielding better performance than state-of-the-art approaches in terms of average rewards.

##### BibTex Citation
@misc{2016aaai-hernandezleal,
author = {Hernandez-Leal, Pablo and Taylor, Matthew E. and Rosman, Benjamin and Sucar, L. Enrique and {Munoz de Cote}, Enrique},
title = {{Identifying and Tracking Switching, Non-stationary Opponents: a Bayesian Approach}},
booktitle = {{Proceedings of the Multiagent Interaction without Prior Coordination workshop (at {AAAI} )}},
year = {2016},
address = {Phoenix, AZ, USA},
month = feb,
month_numeric = {2}
}

• 2016 Zhaodong Wang, and Matthew E. Taylor. March 2016. “Effective Transfer via Demonstrations in Reinforcement Learning: A Preliminary Study.” AAAI 2016 Spring Symposium.

##### Abstract

There are many successful methods for transferring information from one agent to another. One approach, taken in this work, is to have one (source) agent demonstrate a policy to a second (target) agent, and then have that second agent improve upon the policy. By allowing the target agent to observe the source agent’s demonstrations, rather than relying on other types of direct knowledge transfer like Q-values, rules, or shared representations, we remove the need for the agents to know anything about each other’s internal representation or have a shared language. In this work, we introduce a refinement to HAT, an existing transfer learning method, by integrating the target agent’s confidence in its representation of the source agent’s policy. Results show that a target agent can effectively 1) improve its initial performance relative to learning without transfer (jumpstart) and 2) improve its performance relative to the source agent (total reward). Furthermore, both the jumpstart and total reward are improved with this new refinement, relative to learning without transfer and relative to learning with HAT.

##### BibTex Citation
@misc{2016aaai-sss-wang,
author = {Wang, Zhaodong and Taylor, Matthew E.},
title = {{Effective Transfer via Demonstrations in Reinforcement Learning: A Preliminary Study}},
booktitle = {{{AAAI} 2016 Spring Symposium}},
month = mar,
year = {2016},
month_numeric = {3}
}

• 2016 Pablo Hernandez-Leal, Benajamin Rosman, Matthew E. Taylor, L. Enrique Sucar, and Enrique Munoz de Cote. May 2016. “A Bayesian Approach for Learning and Tracking Switching, Non-Stationary Opponents (Extended Abstract).” In Proceedings of 15th International Conference on Autonomous Agents and Multiagent Systems. Singapore.

##### Abstract

In many situations, agents are required to use a set of strategies (behaviors) and switch among them during the course of an interaction. This work focuses on the problem of recognizing the strategy used by an agent within a small number of interactions. We propose using a Bayesian framework to address this problem. In this paper we extend Bayesian Policy Reuse to adversarial settings where opponents switch from one stationary strategy to another. Our extension enables online learning of new models when the learning agent detects that the current policies are not performing optimally. Experiments presented in repeated games show that our approach yields better performance than state-of-the-art approaches in terms of average rewards

##### BibTex Citation
@inproceedings{2016aamas-hernandezleal,
author = {Hernandez-Leal, Pablo and Rosman, Benajamin and Taylor, Matthew E. and Sucar, L. Enrique and de Cote, Enrique Munoz},
title = {{A Bayesian Approach for Learning and Tracking Switching, Non-stationary Opponents (Extended Abstract)}},
booktitle = {{Proceedings of 15th International Conference on Autonomous Agents and Multiagent Systems}},
month = may,
year = {2016},
month_numeric = {5}
}

• 2016 Yunshu Du, and Matthew E. Taylor. May 2016. “Work In-Progress: Mining the Student Data for Fitness .” Proceedings of the 12th International Workshop on Agents and Data Mining Interaction ( ADMI ) (at AAMAS ). Singapore.

##### Abstract

Data mining-driven agents are often used in applications such as waiting times estimation or traffic flow prediction. Such approaches often require large amounts of data from multiple sources, which may be difficult to obtain and lead to incomplete or noisy datasets. University ID card data, in contrast, is easy to access with very low noise. However, little attention has been paid to the availability of these datasets and few applications have been developed to improve student services on campus. This work uses data from CougCard, the Washington State University official ID card, used daily by most students. Our goal is to build an intelligent agent to improve student service quality by predicting the crowdedness at different campus facilities. This work in-progress focuses on the University Recreation Center, one of the most popular facilities on campus, to optimize students’ workout experiences.

##### BibTex Citation
@misc{2016admi-du,
author = {Du, Yunshu and Taylor, Matthew E.},
title = {{Work In-progress: Mining the Student Data for Fitness }},
booktitle = {{Proceedings of the 12th International Workshop on Agents and Data Mining Interaction ( {ADMI} ) (at {AAMAS} )}},
year = {2016},
month = may,
month_numeric = {5}
}

• 2016 David Isele, José Marcio Luna, Eric Eaton, Gabriel V. de la Cruz Jr., James Irwin, Brandon Kallaher, and Matthew E. Taylor. May 2016. “Work in Progress: Lifelong Learning for Disturbance Rejection on Mobile Robots.” Proceedings of the Adaptive Learning Agents ( ALA ) Workshop (at AAMAS ). Singapore.

##### Abstract

No two robots are exactly the same — even for a given model of robot, different units will require slightly different controllers. Furthermore, because robots change and degrade over time, a controller will need to change over time to remain optimal. This paper leverages lifelong learning in order to learn controllers for different robots. In particular, we show that by learning a set of control policies over robots with different (unknown) motion models, we can quickly adapt to changes in the robot, or learn a controller for a new robot with a unique set of disturbances. Further, the approach is completely model-free, allowing us to apply this method to robots that have not, or cannot, be fully modeled. These preliminary results are an initial step towards learning robust fault-tolerant control for arbitrary robots.

##### BibTex Citation
@misc{2016ala-isele,
author = {Isele, David and Luna, Jos\'e Marcio and Eaton, Eric and {de la Cruz Jr.}, Gabriel V. and Irwin, James and Kallaher, Brandon and Taylor, Matthew E.},
title = {{Work in Progress: Lifelong Learning for Disturbance Rejection on Mobile Robots}},
booktitle = {{Proceedings of the Adaptive Learning Agents ( {ALA} ) workshop (at {AAMAS} )}},
year = {2016},
month = may,
month_numeric = {5}
}

• 2016 Halit Bener Suay, Tim Brys, Matthew E. Taylor, and Sonia Chernova. May 2016. “Learning from Demonstration for Shaping through Inverse Reinforcement Learning.” In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems ( AAMAS ). 24.9% acceptance rate

##### Abstract

Model-free episodic reinforcement learning problems define the environment reward with functions that often provide only sparse information throughout the task. Consequently, agents are not given enough feedback about the fitness of their actions until the task ends with success or failure. Previous work addresses this problem with reward shaping. In this paper we introduce a novel approach to improve model-free reinforcement learning agents’ performance with a three step approach. Specifically, we collect demonstration data, use the data to recover a linear function using inverse reinforcement learning and we use the recovered function for potential-based reward shaping. Our approach is model-free and scalable to high dimensional domains. To show the scalability of our approach we present two sets of experiments in a two dimensional Maze domain, and the 27 dimensional Mario AI domain. We compare the performance of our algorithm to previously introduced reinforcement learning from demonstration algorithms. Our experiments show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance.

##### BibTex Citation
@inproceedings{2016aamas-suay,
author = {Suay, Halit Bener and Brys, Tim and Taylor, Matthew E. and Chernova, Sonia},
title = {{Learning from Demonstration for Shaping through Inverse Reinforcement Learning}},
booktitle = {{Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = may,
year = {2016},
month_numeric = {5}
}

• 2016 Bei Peng, James MacGlashan, Robert Loftin, Michael L. Littman, David L. Roberts, and Matthew E. Taylor. May 2016. “A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans.” In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems ( AAMAS ). 24.9% acceptance rate

##### Abstract

As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer’s target policy. This work presents a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent’s action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions.

##### BibTex Citation
@inproceedings{2016aamas-peng,
author = {Peng, Bei and MacGlashan, James and Loftin, Robert and Littman, Michael L. and Roberts, David L. and Taylor, Matthew E.},
title = {{A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans}},
booktitle = {{Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = may,
year = {2016},
month_numeric = {5}
}

• Gamification, Motivation, Education

2016 Chris Cain, Anne Anderson, and Matthew E. Taylor. June 2016. “Content-Independent Classroom Gamification.” In Proceedings of the ASEE ’s 123rd Annual Conference & Exposition. New Orleans, LA, USA.

##### Abstract

This paper introduces Topic-INdependent Gamification Learning Environment (TINGLE), a framework designed to increase student motivation and engagement in the classroom through the use of a game played outside the classroom. A 131-person study was implemented in a construction management course. Game statistics and survey responses were recorded to estimate the effect of the game and correlations with student traits. While the data analyzed so far is mostly inconclusive, this study served as an important first step toward content-independent gamification.

##### BibTex Citation
@inproceedings{2016asee-cain,
author = {Cain, Chris and Anderson, Anne and Taylor, Matthew E.},
title = {{Content-Independent Classroom Gamification}},
booktitle = {{Proceedings of the {ASEE} 's 123rd Annual Conference \& Exposition}},
month = jun,
year = {2016},
address = {New Orleans, LA, USA},
month_numeric = {6}
}

• Intelligent Tutoring System, Multiple solutions

2016 Yang Hu, and Matthew E. Taylor. June 2016. “Work In Progress: A Computer-Aided Design Intelligent Tutoring System Teaching Strategic Flexibility.” In Proceedings of the ASEE ’s 123rd Annual Conference & Exposition. New Orleans, LA, USA.

##### BibTex Citation
@inproceedings{2016asee-hu,
author = {Hu, Yang and Taylor, Matthew E.},
title = {{Work In Progress: A Computer-Aided Design Intelligent Tutoring System Teaching Strategic Flexibility}},
booktitle = {{Proceedings of the {ASEE} 's 123rd Annual Conference \& Exposition}},
month = jun,
year = {2016},
address = {New Orleans, LA, USA},
month_numeric = {6}
}

• 2016 Ruofei Xu, Robin Hartshorn, Ryan Huard, James Irwin, Kaitlyn Johnson, Gregory Nelson, Jon Campbell, Sakire Arslan Ay, and Matthew E. Taylor. July 2016. “Towards a Semi-Autonomous Wheelchair for Users with ALS.” Proceedings of Workshop on Autonomous Mobile Service Robots (at IJCAI ). New York City, NY, USA.

##### Abstract

This paper discusses a prototype system built over two years by teams of undergraduate students with the goal of assisting users with Amyotrophic Lateral Sclerosis (ALS). The current prototype powered wheelchair uses both onboard and offboard sensors to navigate within and between rooms, avoiding obstacles. The wheelchair can be directly controlled via multiple input devices, including gaze tracking — in this case, the wheelchair can augment the user’s control to avoid obstacles. In its fully autonomous mode, the user can select a position on a pre-built map and the wheelchair will navigate to the desired location. This paper introduces the design and implementation of our system, as well as performs three sets of experiments to characterize its performance. The long-term goal of this work is to significantly improve the lives of users with mobility impairments, with a particular focus on those that have limited motor abilities.

##### BibTex Citation
@misc{2016ijcai-xu,
author = {Xu, Ruofei and Hartshorn, Robin and Huard, Ryan and Irwin, James and Johnson, Kaitlyn and Nelson, Gregory and Campbell, Jon and Ay, Sakire Arslan and Taylor, Matthew E.},
title = {{Towards a Semi-Autonomous Wheelchair for Users with {ALS}}},
booktitle = {{Proceedings of Workshop on Autonomous Mobile Service Robots (at {IJCAI} )}},
year = {2016},
address = {New York City, NY, USA},
month = jul,
month_numeric = {7}
}

• 2016 Yunshu Du, Gabriel V. de la Cruz Jr., James Irwin, and Matthew E. Taylor. July 2016. “Initial Progress in Transfer for Deep Reinforcement Learning Algorithms.” Proceedings of Deep Reinforcement Learning: Frontiers and Challenges Workshop (at IJCAI ). New York City, NY, USA.

##### Abstract

As one of the first successful models that combines reinforcement learning technique with deep neural networks, the Deep Q-network (DQN) algorithm has gained attention as it bridges the gap between high-dimensional sensor inputs and autonomous agent learning. However, one main drawback of DQN is the long training time required to train a single task. This work aims to leverage transfer learning (TL) techniques to speed up learning in DQN. We applied this technique in two domains, Atari games and cart-pole, and show that TL can improve DQN’s performance on both tasks without altering the network structure.

##### BibTex Citation
@misc{2016deeprl-du,
author = {Du, Yunshu and {de la Cruz Jr.}, Gabriel V. and Irwin, James and Taylor, Matthew E.},
title = {{Initial Progress in Transfer for Deep Reinforcement Learning Algorithms}},
booktitle = {{Proceedings of Deep Reinforcement Learning: Frontiers and Challenges workshop (at {IJCAI} )}},
year = {2016},
address = {New York City, NY, USA},
month = jul,
month_numeric = {7}
}

• 2016 Bei Peng, James MacGlashan, Robert Loftin, Michael L. Littman, David L. Roberts, and Matthew E. Taylor. July 2016. “An Empirical Study of Non-Expert Curriculum Design for Machine Learners.” Proceedings of the Interactive Machine Learning Workshop (at IJCAI ). New York City, NY, USA.

##### Abstract

Existing machine-learning work has shown that algorithms can benefit from curriculum learning, a strategy where the target behavior of the learner is changed over time. However, most existing work focuses on developing automatic methods to iteratively select training examples with increasing difficulty tailored to the current ability of the learner, neglecting how non-expert humans may design curricula. In this work we introduce a curriculumdesign problem in the context of reinforcement learning and conduct a user study to explicitly explore how non-expert humans go about assembling curricula. We present results from 80 participants on Amazon Mechanical Turk that show 1) humans can successfully design curricula that gradually introduce more complex concepts to the agent within each curriculum, and even across different curricula, and 2) users choose to add task complexity in different ways and follow salient principles when selecting tasks into the curriculum. This work serves as an important first step towards better integration of non-expert humans into the reinforcement learning process and the development of new machine learning algorithms to accommodate human teaching strategies.

##### BibTex Citation
@misc{2016iml-peng,
author = {Peng, Bei and MacGlashan, James and Loftin, Robert and Littman, Michael L. and Roberts, David L. and Taylor, Matthew E.},
title = {{An Empirical Study of Non-Expert Curriculum Design for Machine Learners}},
booktitle = {{Proceedings of the Interactive Machine Learning workshop (at {IJCAI} )}},
month = jul,
year = {2016},
address = {New York City, NY, USA},
month_numeric = {7}
}

• 2016 Yusen Zhan, Haitham Bou Ammar, and Matthew E. Taylor. July 2016. “Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer.” In Proceedings of the 25th International Conference on Artificial Intelligence ( IJCAI ). 25% acceptance rate

##### Abstract

Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher’s advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.

##### BibTex Citation
@inproceedings{2016ijcai-zhan,
author = {Zhan, Yusen and Ammar, Haitham Bou and Taylor, Matthew E.},
title = {{Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer}},
booktitle = {{Proceedings of the 25th International Conference on Artificial Intelligence ( {IJCAI} )}},
month = jul,
year = {2016},
month_numeric = {7}
}

• 2016 James MacGlashan, Michael L. Littman, David L. Roberts, Robert Loftin, Bei Peng, and Matthew E. Taylor. October 2016. “Convergent Actor Critic by Humans.” Workshop on Human-Robot Collaboration: Towards Co-Adaptive Learning Through Semi-Autonomy and Shared Control (at IROS ).

##### Abstract

Programming robot behavior can be painstaking: for a layperson, this path is unavailable without investing significant effort in building up proficiency in coding. In contrast, nearly half of American households have a pet dog and at least some exposure to animal training, suggesting an alternative path for customizing robot behavior. Unfortunately, most existing reinforcement-learning (RL) algorithms are not well suited to learning from human-delivered reinforcement. This paper introduces a framework for incorporating human-delivered rewards into RL algorithms and preliminary results demonstrating feasibility.

##### BibTex Citation
@misc{2016iros-hrc-macglashan,
author = {MacGlashan, James and Littman, Michael L. and Roberts, David L. and Loftin, Robert and Peng, Bei and Taylor, Matthew E.},
title = {{Convergent Actor Critic by Humans}},
booktitle = {{Workshop on Human-Robot Collaboration: Towards Co-Adaptive Learning Through Semi-Autonomy and Shared Control (at {IROS} )}},
month = oct,
year = {2016},
month_numeric = {10}
}

• 2016 David Isele, José Marcio Luna, Eric Eaton, Gabriel V. de la Cruz Jr., James Irwin, Brandon Kallaher, and Matthew E. Taylor. October 2016. “Lifelong Learning for Disturbance Rejection on Mobile Robots.” In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems ( IROS ). 48% acceptance rate

##### Abstract

No two robots are exactly the same—even for a given model of robot, different units will require slightly different controllers. Furthermore, because robots change and degrade over time, a controller will need to change over time to remain optimal. This paper leverages lifelong learning in order to learn controllers for different robots. In particular, we show that by learning a set of control policies over robots with different (unknown) motion models, we can quickly adapt to changes in the robot, or learn a controller for a new robot with a unique set of disturbances. Furthermore, the approach is completely model-free, allowing us to apply this method to robots that have not, or cannot, be fully modeled.

##### BibTex Citation
@inproceedings{2016iros-isele,
author = {Isele, David and Luna, Jos\'e Marcio and Eaton, Eric and {de la Cruz Jr.}, Gabriel V. and Irwin, James and Kallaher, Brandon and Taylor, Matthew E.},
title = {{Lifelong Learning for Disturbance Rejection on Mobile Robots}},
booktitle = {{Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems ( {IROS} )}},
month = oct,
year = {2016},
video = {https://youtu.be/u7pkhLx0FQ0},
month_numeric = {10}
}

• 2016 Yang Hu, and Matthew E. Taylor. October 2016. “A Computer-Aided Design Intelligent Tutoring System Teaching Strategic Flexibility.” Transactions on Techniques for STEM Education, October.

##### Abstract

Taking a Computer-Aided Design (CAD) class is a prerequisite for Mechanical Engineering freshmen at many universities, including at Washington State University. The traditional way to learn CAD software is to follow examples and exercises in a textbook. However, using written instruction is not always effective because textbooks usually support a single strategy to construct a model. Missing even one detail may cause the student to become stuck, potentially leading to frustration.

To make the learning process easier and more interesting, we designed and implemented an intelligent tutorial system for an open source CAD program, FreeCAD, for the sake of teaching students some basic CAD skills (such as Boolean operations) to construct complex objects from multiple simple shapes. Instead of teaching a single method to construct a model, the program first automatically learns all possible ways to construct a model and then can teach the student to draw the 3D model in multiple ways. Previous research efforts have shown that learning multiple potential solutions can encourage students to develop the tools they need to solve new problems.

This study compares textbook learning with learning from two variants of our intelligent tutoring system. The textbook approach is considered the baseline. In the first tutorial variant, subjects were given minimal guidance and were asked to construct a model in multiple ways. Subjects in the second tutorial group were given two guided solutions to constructing a model and then asked to demonstrate the third solution when constructing the same model. Rather than directly providing instructions, participants in the first tutorial group were expected to independently explore and were only provided feedback when the program determined he/she had deviated too far from a potential solution. The three groups are compared by measuring the time needed to 1) successfully construct the same model in a testing phase, 2) use multiple methods to construct the same model in a testing phase, and 3) construct a novel model.

##### BibTex Citation
@article{2017stemtransactions-yang,
author = {Hu, Yang and Taylor, Matthew E.},
title = {A Computer-Aided Design Intelligent Tutoring System Teaching Strategic Flexibility},
journal = {Transactions on Techniques for STEM Education},
month = oct,
year = {2016},
month_numeric = {10}
}

• 2016 Chris Cain, Anne Anderson, and Matthew E. Taylor. October 2016. “Content-Independent Classroom Gamification.” Computers in Education Journal, October.

##### Abstract

This paper introduces Topic-INdependent Gamification Learning Environment (TINGLE), a framework designed to increase student motivation and engagement in the classroom through the use of a game played outside the classroom. A 131-person pilot study was implemented in a construction management course. Game statistics and survey responses were recorded to estimate the effect of the game and correlations with student traits. While the data analyzed so far is mostly inconclusive, this study served as an important first step toward content-independent gamification.

##### BibTex Citation
@article{2017coed-cain,
author = {Cain, Chris and Anderson, Anne and Taylor, Matthew E.},
title = {Content-Independent Classroom Gamification},
journal = {{Computers in Education Journal}},
month = oct,
year = {2016},
month_numeric = {10}
}

• 2016 Timothy Lewis, Amy Hurst, Matthew E. Taylor, and Cynthia Matuszek. November 2016. “Using Language Groundings for Context-Sensitive Text Prediction.” Proceedings of EMNLP 2016 Workshop on Uphill Battles in Language Processing. Austin, TX, USA.

##### Abstract

In this paper, we present the concept of using language groundings for context-sensitive text prediction using a semantically informed, context-aware language model. We show initial findings from a preliminary study investigating how users react to a communication interface driven by context-based prediction using a simple language model. We suggest that the results support further exploration using a more informed semantic model and more realistic context.

##### BibTex Citation
@misc{2016emnlp-lewis,
author = {Lewis, Timothy and Hurst, Amy and Taylor, Matthew E. and Matuszek, Cynthia},
title = {{Using Language Groundings for Context-Sensitive Text Prediction}},
booktitle = {{Proceedings of {EMNLP} 2016 Workshop on Uphill Battles in Language Processing}},
month = nov,
year = {2016},
address = {Austin, TX, USA},
month_numeric = {11}
}

• 2016 Robert Loftin, James MacGlashan, Bei Peng, Matthew E. Taylor, Michael L. Littman, and David L. Roberts. November 2016. “Towards Behavior-Aware Model Learning from Human-Generated Trajectories.” AAAI Fall Symposium on Artificial Intelligence for Human-Robot Interaction. Arlington, VA, USA.

##### Abstract

Inverse reinforcement learning algorithms recover an unknown reward function for a Markov decision process, based on observations of user behaviors that optimize this reward function. Here we consider the complementary problem of learning the unknown transition dynamics of an MDP based on such observations. We describe the behavior-aware modeling (BAM) algorithm, which learns models of transition dynamics from user generated state-action trajectories. BAM makes assumptions about how users select their actions that are similar to those used in inverse reinforcement learning, and searches for a model that maximizes the probability of the observed actions. The BAM algorithm is based on policy gradient algorithms, essentially reversing the roles of the policy and transition distribution in those algorithms. As a result, BAMis highly flexible, and can be applied to continuous state spaces using a wide variety of model representations. In this preliminary work, we discuss why the model learning problem is interesting, describe algorithms to solve this problem, and discuss directions for future work.

##### BibTex Citation
@misc{2016aaai-ai-hri-loftin,
author = {Loftin, Robert and MacGlashan, James and Peng, Bei and Taylor, Matthew E. and Littman, Michael L. and Roberts, David L.},
title = {{Towards Behavior-Aware Model Learning from Human-Generated Trajectories}},
booktitle = {{{AAAI} Fall Symposium on Artificial Intelligence for Human-Robot Interaction}},
month = nov,
year = {2016},
address = {Arlington, VA, USA},
month_numeric = {11}
}

• 2016 William Curran, Tim Brys, David Aha, Matthew E. Taylor, and William D. Smart. November 2016. “Dimensionality Reduced Reinforcement Learning for Assistive Robots.” AAAI 2016 Fall Symposium on Artificial Intelligence: Workshop on Artificial Intelligence for Human-Robot Interaction. Arlington, VA, USA.

##### Abstract

State-of-the-art personal robots need to perform complex manipulation tasks to be viable in assistive scenarios. However, many of these robots, like the PR2, use manipulators with high degrees-of-freedom, and the problem is made worse in bimanual manipulation tasks. The complexity of these robots lead to large dimensional state spaces, which are difficult to learn in. We reduce the state space by using demonstrations to discover a representative low-dimensional hyperplane in which to learn. This allows the agent to converge quickly to a good policy. We call this Dimensionality Reduced Reinforcement Learning (DRRL). However, when performing dimensionality reduction, not all dimensions can be fully represented. We extend this work by first learning in a single dimension, and then transferring that knowledge to a higher-dimensional hyperplane. By using our Iterative DRRL (IDRRL) framework with an existing learning algorithm, the agent converges quickly to a better policy by iterating to increasingly higher dimensions. IDRRL is robust to demonstration quality and can learn efficiently using few demonstrations. We show that adding IDRRL to the Q-Learning algorithm leads to faster learning on a set of mountain car tasks and the robot swimmers problem.

##### BibTex Citation
@misc{2016aaai-ai-hri-curran,
author = {Curran, William and Brys, Tim and Aha, David and Taylor, Matthew E. and Smart, William D.},
title = {{Dimensionality Reduced Reinforcement Learning for Assistive Robots}},
booktitle = {{{AAAI} 2016 Fall Symposium on Artificial Intelligence: Workshop on Artificial Intelligence for Human-Robot Interaction}},
month = nov,
year = {2016},
address = {Arlington, VA, USA},
month_numeric = {11}
}


## 2015

• Reinforcement Learning, Reward Shaping

2015 Tim Brys, Anna Harutyunyan, Matthew E. Taylor, and Ann Nowé. 2015. “Ensembles of Shapings.” In The Multi-Disciplinary Conference on Reinforcement Learning and Decision Making ( RLDM ). 15% acceptance rate for oral presentations

##### Abstract

Many reinforcement learning algorithms try to solve a problem from scratch, i.e., without a priori knowledge. This works for small and simple problems, but quickly becomes impractical as problems of growing complexity are tackled. The reward function with which the agent evaluates its behaviour often is sparse and uninformative, which leads to the agent requiring large amounts of exploration before feedback is discovered and good behaviour can be generated. Reward shaping is one approach to address this problem, by enriching the reward signal with extra intermediate rewards, often of a heuristic nature. These intermediate rewards may be derived from expert knowledge, knowledge transferred from a previous task, demonstrations provided to the agent, etc. In many domains, multiple such pieces of knowledge are available, and could all potentially benefit the agent during its learning process. We investigate the use of ensemble techniques to automatically combine these various sources of information, helping the agent learn faster than with any of the individual pieces of information alone. We empirically show that the use of such ensembles alleviates two tuning problems: (1) the problem of selecting which (combination of) heuristic knowledge to use, and (2) the problem of tuning the scaling of this information as it is injected in the original reward function. We show that ensembles are both robust against bad information and bad scalings.

##### BibTex Citation
@inproceedings{2015rldm-brys,
author = {Brys, Tim and Harutyunyan, Anna and Taylor, Matthew E. and Now\'e, Ann},
title = {{Ensembles of Shapings}},
booktitle = {{The Multi-disciplinary Conference on Reinforcement Learning and Decision Making ( {RLDM} )}},
year = {2015}
}

• Reinforcement Learning, Reward Shaping, Learning from Demonstration

2015 Halit Bener Suay, Tim Brys, Matthew E. Taylor, and Sonia Chernova. 2015. “Reward Shaping by Demonstration.” In The Multi-Disciplinary Conference on Reinforcement Learning and Decision Making ( RLDM ).

##### Abstract

Potential-based reward shaping is a theoretically sound way of incorporating prior knowledge in a reinforcement learning setting. While providing flexibility for choosing the potential function, under certain conditions this method guarantees the convergence of the final policy, regardless of the properties of the potential function. However, this flexibility of choice may cause confusion when making a design decision for a specific domain, as the number of possible candidates for a potential function can be overwhelming. Moreover, the potential function either can be manually designed, to bias the behavior of the learner, or can be recovered from prior knowledge, e.g. from human demonstrations. In this paper we investigate the efficacy of two different methods of using a potential function recovered from human demonstrations. Our first approach uses a mixture of Gaussian distributions generated by samples collected during demonstrations (Gaussian-Shaping), and the second approach uses a reward function recovered from demonstrations with Relative Entropy Inverse Reinforcement Learning (RE-IRL-Shaping). We present our findings in Cart-Pole, Mountain Car, and Puddle World domains. Our results show that Gaussian-Shaping can provide an efficient reward heuristic, accelerating learning through its ability to capture local information, and RE-IRL-Shaping can be more resilient to bad demonstrations. We report a brief analysis of our findings and we aim to provide a future reference for reinforcement learning agent designers who consider using reward shaping by human demonstrations.

##### BibTex Citation
@inproceedings{2015rldm-suay,
author = {Suay, Halit Bener and Brys, Tim and Taylor, Matthew E. and Chernova, Sonia},
title = {{Reward Shaping by Demonstration}},
booktitle = {{The Multi-disciplinary Conference on Reinforcement Learning and Decision Making ( {RLDM} )}},
year = {2015}
}

• Reinforcement Learning

2015 Tim Brys, Anna Harutyunyan, Halit Bener Suay, Sonia Chernova, Matthew E. Taylor, and Ann Nowé. 2015. “Reinforcement Learning from Demonstration through Shaping.” In Proceedings of the International Joint Conference on Artificial Intelligence ( IJCAI ). 28.8% acceptance rate

##### Abstract

Reinforcement learning describes how a learning agent can achieve optimal behaviour based on interactions with its environment and reward feedback. A limiting factor in reinforcement learning as employed in artificial intelligence is the need for an often prohibitively large number of environment samples before the agent reaches a desirable level of performance. Learning from demonstration is an approach that provides the agent with demonstrations by a supposed expert, from which it should derive suitable behaviour. Yet, one of the challenges of learning from demonstration is that no guarantees can be provided for the quality of the demonstrations, and thus the learned behavior. In this paper, we investigate the intersection of these two approaches, leveraging the theoretical guarantees provided by reinforcement learning, and using expert demonstrations to speed up this learning by biasing exploration through a process called reward shaping. This approach allows us to leverage human input without making an erroneous assumption regarding demonstration optimality. We show experimentally that this approach requires significantly fewer demonstrations, is more robust against suboptimality of demonstrations, and achieves much faster learning than the recently developed HAT algorithm.

##### BibTex Citation
@inproceedings{2015ijcai-brys,
author = {Brys, Tim and Harutyunyan, Anna and Suay, Halit Bener and Chernova, Sonia and Taylor, Matthew E. and Now\'e, Ann},
title = {{Reinforcement Learning from Demonstration through Shaping}},
booktitle = {{Proceedings of the International Joint Conference on Artificial Intelligence ( {IJCAI} )}},
year = {2015}
}

• Reinforcement Learning, Transfer Learning

2015 Anestis Fachantidis, Ioannis Partalas, Matthew E. Taylor, and Ioannis Vlahavas. 2015. “Transfer Learning with Probabilistic Mapping Selection.” Adaptive Behavior 23 (1): 3–19. DOI: 10.1177/1059712314559525.

##### Abstract

When transferring knowledge between reinforcement learning agents with different state representations or actions, past knowledge must be efficiently mapped to novel tasks so that it aids learning. The majority of the existing approaches use pre-defined mappings provided by a domain expert. To overcome this limitation and enable autonomous transfer learning, this paper introduces a method for weighting and using multiple inter-task mappings based on a probabilistic framework. Experimental results show that the use of multiple inter-task mappings, accompanied with a probabilistic selection mechanism, can significantly boost the performance of transfer learning relative to 1) learning without transfer and 2) using a single hand-picked mapping. We especially introduce novel tasks for transfer learning in a realistic simulation of the iCub robot, demonstrating the ability of the method to select mappings in complex tasks where human intuition could not be applied to select them. The results verified the efficacy of the proposed approach in a real world and complex environment.

##### BibTex Citation
@article{2015adaptivebehavior-fachantidis,
author = {Fachantidis, Anestis and Partalas, Ioannis and Taylor, Matthew E. and Vlahavas, Ioannis},
title = {{Transfer learning with probabilistic mapping selection}},
journal = {{Adaptive Behavior}},
volume = {23},
number = {1},
pages = {3-19},
year = {2015},
doi = {10.1177/1059712314559525},
}

• 2015 Robert Loftin, Bei Peng, James MacGlashan, Michael L. Littman, Matthew E. Taylor, Jeff Huang, and David L. Roberts. 2015. “Learning Behaviors via Human-Delivered Discrete Feedback: Modeling Implicit Feedback Strategies to Speed up Learning.” Journal of Autonomous Agents and Multi-Agent Systems, 1–30. DOI: 10.1007/s10458-015-9283-7.

##### Abstract

For real-world applications, virtual agents must be able to learn new behaviors from non-technical users. Positive and negative feedback are an intuitive way to train new behaviors, and existing work has presented algorithms for learning from such feedback. That work, however, treats feedback as numeric reward to be maximized, and assumes that all trainers provide feedback in the same way. In this work, we show that users can provide feedback in many different ways, which we describe as “training strategies.” Specifically, users may not always give explicit feedback in response to an action, and may be more likely to provide explicit reward than explicit punishment, or vice versa, such that the lack of feedback itself conveys information about the behavior. We present a probabilistic model of trainer feedback that describes how a trainer chooses to provide explicit reward and/or explicit punishment and, based on this model, develop two novel learning algorithms (SABL and I-SABL) which take trainer strategy into account, and can therefore learn from cases where no feedback is provided. Through online user studies we demonstrate that these algorithms can learn with less feedback than algorithms based on a numerical interpretation of feedback. Furthermore, we conduct an empirical analysis of the training strategies employed by users, and of factors that can affect their choice of strategy.

##### BibTex Citation
@article{2015aamas-loftin,
author = {Loftin, Robert and Peng, Bei and MacGlashan, James and Littman, Michael L. and Taylor, Matthew E. and Huang, Jeff and Roberts, David L.},
title = {{Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning}},
journal = {{Journal of Autonomous Agents and Multi-Agent Systems}},
pages = {1--30},
year = {2015},
doi = {10.1007/s10458-015-9283-7},
publisher = {Springer},
}

• Reinforcement Learning, Crowdsourcing

2015 Gabriel V. de la Cruz Jr., Bei Peng, Walter S. Lasecki, and Matthew E. Taylor. January 2015. “Generating Real-Time Crowd Advice to Improve Reinforcement Learning Agents.” Proceedings of the Learning for General Competency in Video Games Workshop ( AAAI ).

Funded by NSF

##### Abstract

Reinforcement learning is a powerful machine learning paradigm that allows agents to autonomously learn to maximize a scalar reward. However, it often suffers from poor initial performance and long learning times. This paper discusses how collecting on-line human feedback, both in real time and post hoc, can potentially improve the performance of such learning systems. We use the game Pac-Man to simulate a navigation setting and show that workers are able to accurately identify both when a sub-optimal action is executed, and what action should have been performed instead. Our results demonstrate that the crowd is capable of generating helpful input. We conclude with a discussion the types of errors that occur most commonly when engaging human workers for this task, and a discussion of how such data could be used to improve learning. Our work serves as a critical first step in designing systems that use real-time human feedback to improve the learning performance of automated systems on-the-fly.

##### Notes

The Arcade Learning Environment

##### BibTex Citation
@misc{2015aaai-delacruz,
title = {{Generating Real-Time Crowd Advice to Improve Reinforcement Learning Agents}},
author = {{de la Cruz Jr.}, Gabriel V. and Peng, Bei and Lasecki, Walter S. and Taylor, Matthew E.},
booktitle = {{Proceedings of the Learning for General Competency in Video Games workshop ( {AAAI} )}},
month = jan,
year = {2015},
month_numeric = {1}
}

• Reinforcement Learning, Transfer Learning

2015 Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, and Matthew E. Taylor. January 2015. “Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment.” In Proceedings of the 29th AAAI Conference on Artificial Intelligence ( AAAI ). 27% acceptance rate

##### BibTex Citation
@inproceedings{2015aaai-bouaamar,
author = {Ammar, Haitham Bou and Eaton, Eric and Ruvolo, Paul and Taylor, Matthew E.},
title = {{Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment}},
booktitle = {{Proceedings of the 29th AAAI Conference on Artificial Intelligence ( {AAAI} )}},
month = jan,
year = {2015},
month_numeric = {1}
}

• Reinforcement Learning, Crowdsourcing

2015 Gabriel V. de la Cruz Jr., Bei Peng, Walter S. Lasecki, and Matthew E. Taylor. March 2015. “Towards Integrating Real-Time Crowd Advice with Reinforcement Learning.” In The 20th ACM Conference on Intelligent User Interfaces ( IUI ). DOI: 10.1145/2732158.2732180. Poster: 41% acceptance rate for poster submissions

Funded by NSF

##### Abstract

Reinforcement learning is a powerful machine learning paradigm that allows agents to autonomously learn to maximize a scalar reward. However, it often suffers from poor initial performance and long learning times. This paper discusses how collecting on-line human feedback, both in real time and post hoc, can potentially improve the performance of such learning systems. We use the game Pac-Man to simulate a navigation setting and show that workers are able to accurately identify both when a sub-optimal action is executed, and what action should have been performed instead. Demonstrating that the crowd is capable of generating this input, and discussing the types of errors that occur, serves as a critical first step in designing systems that use this real-time feedback to improve systems’ learning performance on-the-fly.

ACM iUI-15

##### BibTex Citation
@inproceedings{2015iui-delacruz,
author = {{de la Cruz Jr.}, Gabriel V. and Peng, Bei and Lasecki, Walter S. and Taylor, Matthew E.},
title = {{Towards Integrating Real-Time Crowd Advice with Reinforcement Learning}},
booktitle = {{The 20th {ACM} Conference on Intelligent User Interfaces ( {IUI} )}},
month = mar,
year = {2015},
doi = {10.1145/2732158.2732180},
month_numeric = {3}
}

• Multiagent Systems

2015 Pablo Hernandez-Leal, Matthew E. Taylor, Enrique Munoz de Cote, and L. Enrique Sucar. May 2015. “Bidding in Non-Stationary Energy Markets.” In The 14th International Conference on Autonomous Agents and Multiagent Systems ( AAMAS ). Extended Abstract: 25% acceptance rate for papers, additional 22% for extended abstracts

##### Abstract

The PowerTAC competition has gained attention for being a realistic and powerful simulation platform used for research on retail engergy markets, in part because of the growing number of energy markets worldwide. Agetns in this complex environment typically use multiple strategies, changing from one to another, posing a problem for current learning algorithms. This paper introduces DriftER, an algorithm that learns an opponent model and tracks its error rate. We compare our algorithm in the PowerTAC simulator against the champion of the 2013 competition and a state of the art algorithm tailored for interacting against switching (non-stationary) opponents. The results show that DriftER outperforms the competition in terms of profit and accuracy.

##### BibTex Citation
@inproceedings{2015aamas-hernandezleal,
author = {Hernandez-Leal, Pablo and Taylor, Matthew E. and {de Cote}, Enrique Munoz and Sucar, L. Enrique},
title = {{Bidding in Non-Stationary Energy Markets}},
booktitle = {{The 14th International Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = may,
year = {2015},
month_numeric = {5}
}

• 2015 Yawei Zhang, Yunxiang Ye, Zhaodong Wang, Matthew E. Taylor, Geoffrey A. Hollinger, and Qin Zhang. May 2015. “Intelligent In-Orchard Bin-Managing System for Tree Fruit Production.” Proceedings of the Robotics in Agriculture Workshop (at ICRA ).

##### Abstract

The labor-intensive nature of harvest in the tree fruit industry makes it particularly sensitive to labor shortages. Technological innovation is thus critical in order to meet current demands without significantly increasing prices. This paper introduces a robotic system to help human workers during fruit harvest. A second-generation prototype is currently being built and simulation results demonstrate potential improvement in productivity.

##### BibTex Citation
@misc{2015icra-zhang,
author = {Zhang, Yawei and Ye, Yunxiang and Wang, Zhaodong and Taylor, Matthew E. and Hollinger, Geoffrey A. and Zhang, Qin},
title = {{Intelligent In-Orchard Bin-Managing System for Tree Fruit Production}},
booktitle = {{Proceedings of the Robotics in Agriculture workshop (at {ICRA} )}},
month = may,
year = {2015},
month_numeric = {5}
}

• 2015 Bei Peng, Robert Loftin, James MacGlashan, Michael L. Littman, Matthew E. Taylor, and David L. Roberts. May 2015. “Language and Policy Learning from Human-Delivered Feedback.” Proceedings of the Machine Learning for Social Robotics Workshop (at ICRA ).

##### Abstract

Using rewards and punishments is a common and familiar paradigm for humans to train intelligent agents. Most existing learning algorithms in this paradigm follow a framework in which human feedback is treated as a numerical signal to be maximized by the agent. However, treating feedback as a numeric signal fails to capitalize on implied information the human trainer conveys with a lack of explicit feedback. For example, a trainer may withhold reward to signal to the agent a failure, or they may withhold punishment to signal that the agent is behaving correctly. We review our progress to date with Strategy-aware Bayesian Learning, which is able to learn from experience the ways trainers use feedback, and can exploit that knowledge to accelerate learning. Our work covers contextual bandits, goal-directed sequential decision-making tasks, and natural language command learning. We present a user study design to identify how users’ feedback strategies are affected by properties of the environment and agent competency for natural language command learning in sequential decision making tasks, which will inform the development of more adaptive models of human feedback in the future.

##### BibTex Citation
@misc{2015icra-peng,
author = {Peng, Bei and Loftin, Robert and MacGlashan, James and Littman, Michael L. and Taylor, Matthew E. and Roberts, David L.},
title = {{Language and Policy Learning from Human-delivered Feedback}},
booktitle = {{Proceedings of the Machine Learning for Social Robotics workshop (at {ICRA} )}},
month = may,
year = {2015},
month_numeric = {5}
}

• 2015 Pablo Hernandez-Leal, Matthew E. Taylor, Enrique Munoz de Cote, and L. Enrique Sucar. May 2015. “Learning Against Non-Stationary Opponents in Double Auctions.” Proceedings of the Adaptive Learning Agents ( ALA ) Workshop 2015. Istanbul, Turkey. Finalist for Best Student Paper

##### Abstract

Energy markets are emerging around the world. In this context, the PowerTAC competition has gained attention for being a realistic and powerful simulation platform that can be used to perform robust research on retail energy markets. Agent in this complex environment typically use different strategies throughout their interaction, changing from one to another depending on diverse factors, for example, to adapt to population needs and to keep the competitors guessing. This poses a problem for learning algorithms as most of them are not capable of handling changing strategies. The previous champion of the PowerTAC competition is no exception, and is not capable of adapting quickly to non-stationary opponents, potentially impacting its performance. This paper introduces DriftER, an algorithm that learns a model of the opponent and keeps track of its error-rate. When the error-rate increases for several timesteps, the opponent has most likely changed strategy and the agent should learn a new model. Results in the PowerTAC simulator show that DriftER is capable of detecting switches in the opponent faster than an existing state of the art algorithms against switching (non-stationary) opponents obtaining better results in terms of profit and accuracy.

##### BibTex Citation
@misc{2015ala-hernandezleal,
author = {Hernandez-Leal, Pablo and Taylor, Matthew E. and de Cote, Enrique Munoz and Sucar, L. Enrique},
title = {{Learning Against Non-Stationary Opponents in Double Auctions}},
booktitle = {{Proceedings of the Adaptive Learning Agents ( {ALA} ) workshop 2015}},
year = {2015},
address = {Istanbul, Turkey},
month = may,
month_numeric = {5}
}

• Reinforcement Learning, Transfer Learning

2015 Tim Brys, Anna Harutyunyan, Matthew E. Taylor, and Ann Nowé. May 2015. “Policy Transfer Using Reward Shaping.” In The 14th International Conference on Autonomous Agents and Multiagent Systems ( AAMAS ). 25% acceptance rate

##### Abstract

Transfer learning has proven to be a wildly successful approach for speeding up reinforcement learning. Techniques often use low-level information obtained in the source task to achieve successful transfer in the target task. Yet, a most general transfer approach can only assume access to the output of the learning algorithm in the source task, i.e. the learned policy, enabling transfer irrespective of the learning algorithm used in the source task. We advance the state-of-the-art by using a reward shaping approach to policy transfer. One of the advantages in following such an approach, is that it firmly grounds policy transfer in an actively developing body of theoretical research on reward shaping. Experiments in Mountain Car, Cart Pole and Mario demonstrate the practical usefulness of the approach.

##### BibTex Citation
@inproceedings{2015aamas-brys,
author = {Brys, Tim and Harutyunyan, Anna and Taylor, Matthew E. and Now\'{e}, Ann},
title = {{Policy Transfer using Reward Shaping}},
booktitle = {{The 14th International Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = may,
year = {2015},
month_numeric = {5}
}

• 2015 William Curran, Tim Brys, Matthew E. Taylor, and William D. Smart. July 2015. “Using PCA to Efficiently Represent State Spaces.” ICML -2015 European Workshop on Reinforcement Learning. Lille, France.

##### Abstract

Reinforcement learning algorithms need to deal with the exponential growth of states and actions when exploring optimal control in high-dimensional spaces. This is known as the curse of dimensionality. By projecting the agent’s state onto a low-dimensional manifold, we can represent the state space in a smaller and more efficient representation. By using this representation during learning, the agent can converge to a good policy much faster. We test this approach in the Mario Benchmarking Domain. When using dimensionality reduction in Mario, learning converges much faster to a good policy. But, there is a critical convergence-performance trade-off. By projecting onto a low-dimensional manifold, we are ignoring important data. In this paper, we explore this trade-off of convergence and performance. We find that learning in as few as 4 dimensions (instead of 9), we can improve performance past learning in the full dimensional space at a faster convergence rate.

##### BibTex Citation
@misc{2015icml-curran,
author = {Curran, William and Brys, Tim and Taylor, Matthew E. and Smart, William D.},
title = {{Using PCA to Efficiently Represent State Spaces}},
booktitle = {{{ICML} -2015 European Workshop on Reinforcement Learning}},
address = {Lille, France},
month = jul,
year = {2015},
month_numeric = {7}
}

• 2015 Yusen Zhan, and Matthew E. Taylor. November 2015. “Online Transfer Learning in Reinforcement Learning Domains.” Proceedings of the AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents ( SDMIA ).

##### Abstract

This paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Q-learning and Sarsa with tabular representation with a finite budget is proven. Second, the convergence of Q-learning and Sarsa with linear function approximation is established. Third, the we show the asymptotic performance cannot be hurt through teaching. Additionally, all theoretical results are empirically validated.

##### BibTex Citation
@misc{2015sdmia-zhan,
author = {Zhan, Yusen and Taylor, Matthew E.},
title = {{Online Transfer Learning in Reinforcement Learning Domains}},
booktitle = {{Proceedings of the {AAAI} Fall Symposium on Sequential Decision Making for Intelligent Agents ( {SDMIA} )}},
month = nov,
year = {2015},
month_numeric = {11}
}

• 2015 Mitchell Scott, Bei Peng, Madeline Chili, Tanay Nigam, Francis Pascual, Cynthia Matuszek, and Matthew E. Taylor. November 2015. “On the Ability to Provide Demonstrations on a UAS: Observing 90 Untrained Participants Abusing a Flying Robot.” Proceedings of the AAAI Fall Symposium on Artificial Intelligence and Human-Robot Interaction ( AI-HRI ).

##### Abstract

This paper presents an exploratory study where participants piloted a commercial UAS (unmanned aerial system) through an obstacle course. The goal was to determine how varying the instructions given to participants affected their performance. Preliminary data suggests future studies to perform, as well as guidelines for human-robot interaction, and some best practices for learning from demonstration studies.

##### BibTex Citation
@misc{2015ai_hri-scott,
author = {Scott, Mitchell and Peng, Bei and Chili, Madeline and Nigam, Tanay and Pascual, Francis and Matuszek, Cynthia and Taylor, Matthew E.},
title = {{On the Ability to Provide Demonstrations on a UAS: Observing 90 Untrained Participants Abusing a Flying Robot}},
booktitle = {{Proceedings of the {AAAI} Fall Symposium on Artificial Intelligence and Human-Robot Interaction ( {AI-HRI} )}},
month = nov,
year = {2015},
month_numeric = {11}
}


## 2014

• Reinforcement Learning, Transfer Learning

2014 Matthew E. Taylor, Nicholas Carboni, Anestis Fachantidis, Ioannis Vlahavas, and Lisa Torrey. 2014. “Reinforcement Learning Agents Providing Advice in Complex Video Games.” Connection Science 26 (1): 45–63. DOI: 10.1080/09540091.2014.885279.

##### Abstract

This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. Proceedings of the international conference on autonomous agents and multiagent systems ] and in a non-archival workshop paper [Carboni, N., &Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. Proceedings of the adaptive and learning agents workshop (at AAMAS-13) ]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

##### Notes

Download Pac-Man Java code which contains minimial documentation.

##### BibTex Citation
@article{2014connectionscience-taylor,
author = {Taylor, Matthew E. and Carboni, Nicholas and Fachantidis, Anestis and Vlahavas, Ioannis and Torrey, Lisa},
title = {{Reinforcement learning agents providing advice in complex video games}},
journal = {{Connection Science}},
volume = {26},
number = {1},
pages = {45-63},
year = {2014},
doi = {10.1080/09540091.2014.885279},
url = {http://dx.doi.org/10.1080/09540091.2014.885279},
eprint = {http://dx.doi.org/10.1080/09540091.2014.885279}
}

• Reinforcement Learning, DCOP

2014 Tim Brys, Tong T. Pham, and Matthew E. Taylor. 2014. “Distributed Learning and Multi-Objectivity in Traffic Light Control.” Connection Science 26 (1): 65–83. DOI: 10.1080/09540091.2014.885282.

##### Abstract

Traffic jams and suboptimal traffic flows are ubiquitous in modern societies, and they create enormous economic losses each year. Delays at traffic lights alone account for roughly 10% of all delays in US traffic. As most traffic light scheduling systems currently in use are static, set up by human experts rather than being adaptive, the interest in machine learning approaches to this problem has increased in recent years. Reinforcement learning (RL) approaches are often used in these studies, as they require little pre-existing knowledge about traffic flows. Distributed constraint optimisation approaches (DCOP) have also been shown to be successful, but are limited to cases where the traffic flows are known. The distributed coordination of exploration and exploitation (DCEE) framework was recently proposed to introduce learning in the DCOP framework. In this paper, we present a study of DCEE and RL techniques in a complex simulator, illustrating the particular advantages of each, comparing them against standard isolated traffic actuated signals. We analyse how learning and coordination behave under different traffic conditions, and discuss the multi-objective nature of the problem. Finally we evaluate several alternative reward signals in the best performing approach, some of these taking advantage of the correlation between the problem-inherent objectives to improve performance.

##### BibTex Citation
@article{2014connectionscience-brys,
author = {Brys, Tim and Pham, Tong T. and Taylor, Matthew E.},
title = {{Distributed learning and multi-objectivity in traffic light control}},
journal = {{Connection Science}},
volume = {26},
number = {1},
pages = {65-83},
year = {2014},
doi = {10.1080/09540091.2014.885282},
url = {http://dx.doi.org/10.1080/09540091.2014.885282},
eprint = {http://dx.doi.org/10.1080/09540091.2014.885282}
}

• Reinforcement Learning

2014 Tim Brys, Kristof Van Moffaert, Ann Nowe, and Matthew E. Taylor. May 2014. “Adaptive Objective Selection for Correlated Objectives in Multi-Objective Reinforcement Learning (Extended Abstract).” In The 13th International Conference on Autonomous Agents and Multiagent Systems ( AAMAS ). Extended abstract: 24% acceptance rate for papers, additional 22% for extended abstracts

##### BibTex Citation
@inproceedings{2014aamas-brys,
author = {Brys, Tim and Moffaert, Kristof Van and Nowe, Ann and Taylor, Matthew E.},
title = {{Adaptive Objective Selection for Correlated Objectives in Multi-Objective Reinforcement Learning (Extended Abstract)}},
booktitle = {{The 13th International Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = may,
year = {2014},
month_numeric = {5}
}

• Reinforcement Learning

2014 Chris HolmesParker, Matthew E. Taylor, Adrian Agogino, and Kagan Tumer. May 2014. “CLEANing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning (Extended Abstract).” In The Thirteenth International Joint Conference on Autonomous Agents and Multiagent Systems. Extended abstract: 24% acceptance rate for papers, additional 22% for extended abstracts

##### BibTex Citation
@inproceedings{2014aamas-holmesparker,
author = {HolmesParker, Chris and Taylor, Matthew E. and Agogino, Adrian and Tumer, Kagan},
title = {{CLEANing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning (Extended Abstract)}},
booktitle = {{The Thirteenth International Joint Conference on Autonomous Agents and Multiagent Systems}},
month = may,
year = {2014},
month_numeric = {5}
}

• Reinforcement Learning

2014 Chris HolmesParker, Matthew E. Taylor, Yusen Zhan, and Kagan Tumer. May 2014. “Exploiting Structure and Agent-Centric Rewards to Promote Coordination in Large Multiagent Systems.” Proceedings of the Adaptive and Learning Agents Workshop (at AAMAS ).

##### BibTex Citation
@misc{2014ala-holmesparker,
author = {HolmesParker, Chris and Taylor, Matthew E. and Zhan, Yusen and Tumer, Kagan},
title = {{Exploiting Structure and Agent-Centric Rewards to Promote Coordination in Large Multiagent Systems}},
booktitle = {{Proceedings of the Adaptive and Learning Agents workshop (at {AAMAS} )}},
month = may,
year = {2014},
month_numeric = {5}
}

• Reinforcement Learning

2014 Yusen Zhan, Anestis Fachantidis, Ioannis Vlahavas, and Matthew E. Taylor. May 2014. “Agents Teaching Humans in Reinforcement Learning Tasks.” Proceedings of the Adaptive and Learning Agents Workshop (at AAMAS ).

##### BibTex Citation
@misc{2014ala-zhan,
author = {Zhan, Yusen and Fachantidis, Anestis and Vlahavas, Ioannis and Taylor, Matthew E.},
title = {{Agents Teaching Humans in Reinforcement Learning Tasks}},
booktitle = {{Proceedings of the Adaptive and Learning Agents workshop (at {AAMAS} )}},
month = may,
year = {2014},
month_numeric = {5}
}

• Transfer Learning, Reinforcement Learning

2014 Anestis Fachantidis, Ioannis Partalas, Matthew E. Taylor, and Ioannis Vlahavas. May 2014. “An Autonomous Transfer Learning Algorithm for TD-Learners.” In Proceedings of the 8th Hellenic Conference on Artificial Intelligence ( SETN ). 50% acceptance rate

##### BibTex Citation
@inproceedings{2014setn-fachantidis,
author = {Fachantidis, Anestis and Partalas, Ioannis and Taylor, Matthew E. and Vlahavas, Ioannis},
title = {{An Autonomous Transfer Learning Algorithm for TD-Learners}},
booktitle = {{Proceedings of the 8th Hellenic Conference on Artificial Intelligence ( {SETN} )}},
month = may,
year = {2014},
month_numeric = {5}
}

• Transfer Learning, Reinforcement Learning

2014 Haitham Bou Ammar, Eric Eaton, Paul Ruvolo, and Matthew E. Taylor. June 2014. “Online Multi-Task Learning for Policy Gradient Methods.” In Proceedings of the 31st International Conferences on Machine Learning ( ICML ). 25% acceptance rate

##### BibTex Citation
@inproceedings{2014icml-bouammar,
author = {Ammar, Haitham Bou and Eaton, Eric and Ruvolo, Paul and Taylor, Matthew E.},
title = {{Online Multi-Task Learning for Policy Gradient Methods}},
booktitle = {{Proceedings of the 31st International Conferences on Machine Learning ( {ICML} )}},
month = jun,
year = {2014},
month_numeric = {6}
}

• Reinforcement Learning

2014 James Macglashan, Michael L. Littman, Robert Loftin, Bei Peng, David Roberts, and Matthew E. Taylor. July 2014. “Training an Agent to Ground Commands with Reward and Punishment.” Proceedings of the Machine Learning for Interactive Systems Workshop (at AAAI ).

##### BibTex Citation
@misc{2014mlis-james,
title = {{Training an Agent to Ground Commands with Reward and Punishment}},
author = {Macglashan, James and Littman, Michael L. and Loftin, Robert and Peng, Bei and Roberts, David and Taylor, Matthew E.},
booktitle = {{Proceedings of the Machine Learning for Interactive Systems workshop (at {AAAI} )}},
month = jul,
year = {2014},
month_numeric = {7}
}

• Reinforcement Learning, Transfer Learning

2014 Haitham Bou Ammar, Eric Eaton, Matthew E. Taylor, Decibal C. Mocanu, Kurt Driessens, Gerhard Weiss, and Karl Tuyls. July 2014. “An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning.” Proceedings of the Machine Learning for Interactive Systems Workshop (at AAAI ).

##### BibTex Citation
@misc{2014mlis-bouammar,
title = {{An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning}},
author = {Ammar, Haitham Bou and Eaton, Eric and Taylor, Matthew E. and Mocanu, Decibal C. and Driessens, Kurt and Weiss, Gerhard and Tuyls, Karl},
booktitle = {{Proceedings of the Machine Learning for Interactive Systems workshop (at {AAAI} )}},
month = jul,
year = {2014},
month_numeric = {7}
}

• Reinforcement Learning

2014 Tim Brys, Ann Nowé, Daniel Kudenko, and Matthew E. Taylor. July 2014. “Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence.” In Proceedings of the 28th AAAI Conference on Artificial Intelligence ( AAAI ). 28% acceptance rate

##### BibTex Citation
@inproceedings{2014aaai-brys,
author = {Brys, Tim and Now\'{e}, Ann and Kudenko, Daniel and Taylor, Matthew E.},
title = {{Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence}},
booktitle = {{Proceedings of the 28th {AAAI} Conference on Artificial Intelligence ( {AAAI} )}},
month = jul,
year = {2014},
month_numeric = {7}
}

• Reinforcement Learning

2014 Robert Loftin, Bei Peng, James MacGlashan, Machiael L. Littman, Matthew E. Taylor, Jeff Huang, and David L. Roberts. July 2014. “A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback.” In Proceedings of the 28th AAAI Conference on Artificial Intelligence ( AAAI ). 28% acceptance rate

##### BibTex Citation
@inproceedings{2014aaai-loftin,
author = {Loftin, Robert and Peng, Bei and MacGlashan, James and Littman, Machiael L. and Taylor, Matthew E. and Huang, Jeff and Roberts, David L.},
title = {{A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback}},
booktitle = {{Proceedings of the 28th {AAAI} Conference on Artificial Intelligence ( {AAAI} )}},
month = jul,
year = {2014},
month_numeric = {7}
}

• Reinforcement Learning

2014 Tim Brys, Anna Harutyunyan, Peter Vrancx, Matthew E. Taylor, Daniel Kudenko, and Ann Nowé. July 2014. “Multi-Objectivization of Reinforcement Learning Problems by Reward Shaping.” In Proceedings of the IEEE 2014 International Joint Conference on Neural Networks ( IJCNN ). 59% acceptance rate

##### BibTex Citation
@inproceedings{2014ijcnn-brys,
author = {Brys, Tim and Harutyunyan, Anna and Vrancx, Peter and Taylor, Matthew E. and Kudenko, Daniel and Now\'{e}, Ann},
title = {{Multi-Objectivization of Reinforcement Learning Problems by Reward Shaping}},
booktitle = {{Proceedings of the {IEEE} 2014 International Joint Conference on Neural Networks ( {IJCNN} )}},
month = jul,
year = {2014},
month_numeric = {7}
}

• Reinforcement Learning

2014 Tim Brys, Matthew E. Taylor, and Ann Nowé. August 2014. “Using Ensemble Techniques and Multi-Objectivization to Solve Reinforcement Learning Problems.” In Proceedings of the 21st European Conference on Artificial Intelligence ( ECAI ). 41% acceptance rate for short papers

##### BibTex Citation
@inproceedings{2014ecai-brys,
author = {Brys, Tim and Taylor, Matthew E. and Now\'{e}, Ann},
title = {{Using Ensemble Techniques and Multi-Objectivization to Solve Reinforcement Learning Problems}},
booktitle = {{Proceedings of the 21st European Conference on Artificial Intelligence ( {ECAI} )}},
month = aug,
year = {2014},
month_numeric = {8}
}

• Reinforcement Learning

2014 Chris HolmesParker, Matthew E. Taylor, Adrian Agogino, and Kagan Tumer. August 2014. “CLEAN Ing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning.” In Proceedings of the 2014 IEEE/WIC/ACM International Conference on Intelligent Agent Technology ( IAT ). 43% acceptance rate

##### BibTex Citation
@inproceedings{2014iat-holmesparker,
author = {HolmesParker, Chris and Taylor, Matthew E. and Agogino, Adrian and Tumer, Kagan},
title = {{{CLEAN} ing the Reward: Counterfactual Actions Remove Exploratory Action Noise in Multiagent Learning}},
booktitle = {{Proceedings of the 2014 {IEEE/WIC/ACM} International Conference on Intelligent Agent Technology ( {IAT} )}},
month = aug,
year = {2014},
month_numeric = {8}
}

• Reinforcement Learning

2014 Robert Loftin, Bei Peng, James MacGlashan, Michael Littman, Matthew E. Taylor, David Roberts, and Jeff Huang. August 2014. “Learning Something from Nothing: Leveraging Implicit Human Feedback Strategies.” In Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication ( RO-MAN ).

##### BibTex Citation
@inproceedings{2014roman-loftin,
author = {Loftin, Robert and Peng, Bei and MacGlashan, James and Littman, Michael and Taylor, Matthew E. and Roberts, David and Huang, Jeff},
title = {{Learning Something from Nothing: Leveraging Implicit Human Feedback Strategies}},
booktitle = {{Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication ( {RO-MAN} )}},
month = aug,
year = {2014},
month_numeric = {8}
}

• Reinforcement Learning

2014 Matthew E. Taylor, and Lisa Torrey. September 2014. “Agents Teaching Agents in Reinforcement Learning (Nectar Abstract).” In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD). Nectar Track, 45% acceptance rate

##### BibTex Citation
@inproceedings{2014ecml-taylor,
author = {Taylor, Matthew E. and Torrey, Lisa},
title = {{Agents Teaching Agents in Reinforcement Learning (Nectar Abstract)}},
booktitle = {{Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD)}},
month = sep,
year = {2014},
month_numeric = {9}
}


## 2013

• Transfer Learning, Reinforcement Learning, Robotics

2013 Anestis Fachantidis, Ioannis Partalas, Matthew E. Taylor, and Ioannis Vlahavas. March 2013. “Autonomous Selection of Inter-Task Mappings in Transfer Learning (Extended Abstract).” The AAAI 2013 Spring Symposium — Lifelong Machine Learning.

##### Notes

Lifelong Machine Learning

##### BibTex Citation
@misc{aaai13-anestis,
author = {Fachantidis, Anestis and Partalas, Ioannis and Taylor, Matthew E. and Vlahavas, Ioannis},
title = {{Autonomous Selection of Inter-Task Mappings in Transfer Learning (extended abstract)}},
booktitle = {{The {AAAI} 2013 Spring Symposium --- Lifelong Machine Learning}},
month = mar,
year = {2013},
month_numeric = {3}
}

• Transfer Learning, Reinforcement Learning, Robotics

2013 Ravi Balasubramanian, and Matthew E. Taylor. March 2013. “Learning for Mobile-Robot Error Recovery (Extended Abstract).” The AAAI 2013 Spring Symposium — Designing Intelligent Robots: Reintegrating AI II.

##### Notes

Designing Intelligent Robots

##### BibTex Citation
@misc{aaai13symp-balasubramanian,
author = {Balasubramanian, Ravi and Taylor, Matthew E.},
title = {{Learning for Mobile-Robot Error Recovery (Extended Abstract)}},
booktitle = {{The {AAAI} 2013 Spring Symposium --- Designing Intelligent Robots: Reintegrating {AI} {II}}},
month = mar,
year = {2013},
month_numeric = {3}
}

• Reinforcement Learning,DCOP

2013 Tong Pham, Tim Brys, and Matthew E. Taylor. May 2013. “Learning Coordinated Traffic Light Control.” Proceedings of the Adaptive and Learning Agents Workshop ( AAMAS ).

##### Abstract

Traffic jams and suboptimal traffic flows are ubiquitous in our modern societies, and they create enormous economic losses each year. Delays at traffic lights alone contribute roughly 10 percent of all delays in US traffic. As most traffic light scheduling systems currently in use are static, set up by human experts rather than being adaptive, the interest in machine learning approaches to this problem has increased in recent years. Reinforcement learning approaches are often used in these studies, as they require little pre-existing knowledge about traffic flows. Some distributed constraint optimization approaches have also been used, but focus on cases where the traffic flows are known. This paper presents a preliminary comparison between these two classes of optimization methods in a complex simulator, with the goal of eventually producing real-time algorithms that could be deployed in real-world situations.

ALA-13

##### BibTex Citation
@misc{ala13-pham,
author = {Pham, Tong and Brys, Tim and Taylor, Matthew E.},
title = {{Learning Coordinated Traffic Light Control}},
booktitle = {{Proceedings of the Adaptive and Learning Agents workshop ( {AAMAS} )}},
month = may,
year = {2013},
month_numeric = {5}
}

• Reinforcement Learning

2013 Nicholas Carboni, and Matthew E. Taylor. May 2013. “Preliminary Results for 1 vs. 1 Tactics in Starcraft.” Proceedings of the Adaptive and Learning Agents Workshop ( AAMAS ).

##### Abstract

This paper describes the development and analysis of two algorithms designed to allow one agent, the teacher, to give advice to another agent, the student. These algorithms contribute to a family of algorithms designed to allow teaching with limited advice. We compare the ability of the student to learn using reinforcement learning with and without such advice. Experiments are conducted in the Starcraft domain, a challenging but appropriate domain for this type of research. Our results show that the time at which advice is given has a significant effect on the result of student learning and that agents with the best performance in a task may not always be the most effective teachers.

ALA-13

##### BibTex Citation
@misc{ala13-carboni,
author = {Carboni, Nicholas and Taylor, Matthew E.},
title = {{Preliminary Results for 1 vs.~1 Tactics in Starcraft}},
booktitle = {{Proceedings of the Adaptive and Learning Agents workshop ( {AAMAS} )}},
month = may,
year = {2013},
month_numeric = {5}
}

• Transfer Learning, Reinforcement Learning

2013 Lisa Torrey, and Matthew E. Taylor. May 2013. “Teaching on a Budget: Agents Advising Agents in Reinforcement Learning.” In International Conference on Autonomous Agents and Multiagent Systems ( AAMAS ). 23% acceptance rate

##### Abstract

This paper introduces a teacher-student framework for reinforcement learning. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two experimental domains: Mountain Car and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.

AAMAS-13

##### BibTex Citation
@inproceedings{aamas13-torrey,
author = {Torrey, Lisa and Taylor, Matthew E.},
title = {{Teaching on a Budget: Agents Advising Agents in Reinforcement Learning}},
booktitle = {{International Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = may,
year = {2013},
month_numeric = {5}
}

• Transfer Learning, Reinforcement Learning

2013 Haitham Bou Ammar, Decebal Constantin Mocanu, Matthew E. Taylor, Kurt Driessens, Karl Tuyls, and Gerhard Weiss. September 2013. “Automatically Mapped Transfer Between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines.” In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ( ECML PKDD ). 25% acceptance rate

##### BibTex Citation
@inproceedings{ecml13-bouaamar,
author = {Ammar, Haitham Bou and Mocanu, Decebal Constantin and Taylor, Matthew E. and Driessens, Kurt and Tuyls, Karl and Weiss, Gerhard},
title = {{Automatically Mapped Transfer Between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines}},
booktitle = {{Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ( {ECML PKDD} )}},
month = sep,
year = {2013},
month_numeric = {9}
}

• DCOP

2013 Tong Pham, Aly Tawfika, and Matthew E. Taylor. September 2013. “A Simple, Naive Agent-Based Model for the Optimization of a System of Traffic Lights: Insights from an Exploratory Experiment.” In Proceedings of Conference on Agent-Based Modeling in Transportation Planning and Operations.

##### BibTex Citation
@inproceedings{abm13-pham,
author = {Pham, Tong and Tawfika, Aly and Taylor, Matthew E.},
title = {{A Simple, Naive Agent-based Model for the Optimization of a System of Traffic Lights: Insights from an Exploratory Experiment}},
booktitle = {{Proceedings of Conference on Agent-Based Modeling in Transportation Planning and Operations}},
month = sep,
year = {2013},
month_numeric = {9}
}

• Reinforcement Learning, Transfer Learning

2013 Haitham Bou Ammar, Decebal Constantin Mocanu, Matthew E. Taylor, Kurt Driessens, Karl Tuyls, and Gerhard Weiss. November 2013. “Automatically Mapped Transfer Between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines.” In The 25th Benelux Conference on Artificial Intelligence ( BNAIC ).

##### BibTex Citation
@inproceedings{bnaic13-bouaamar,
author = {Ammar, Haitham Bou and Mocanu, Decebal Constantin and Taylor, Matthew E. and Driessens, Kurt and Tuyls, Karl and Weiss, Gerhard},
title = {{Automatically Mapped Transfer Between Reinforcement Learning Tasks via Three-Way Restricted Boltzmann Machines}},
booktitle = {{The 25th Benelux Conference on Artificial Intelligence ( {BNAIC} )}},
month = nov,
year = {2013},
month_numeric = {11}
}


## 2012

• Reinforcement Learning, Transfer Learning

2012 Lisa Torrey, and Matthew E. Taylor. June 2012. “Towards Student/Teacher Learning in Sequential Decision Tasks.” In International Conference on Autonomous Agents and Multiagent Systems ( AAMAS ). Extended Abstract: 20% acceptance rate for papers, additional 23% for extended abstracts

AAMAS-12

##### BibTex Citation
@inproceedings{12aamas-torrey,
author = {Torrey, Lisa and Taylor, Matthew E.},
title = {{Towards Student/Teacher Learning in Sequential Decision Tasks}},
booktitle = {{International Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = jun,
year = {2012},
month_numeric = {6}
}

• Reinforcement Learning

2012 Matthew Adams, Robert Loftin, Matthew E. Taylor, Michael Littman, and David Roberts. June 2012. “An Empirical Analysis of RL ’s Drift From Its Behaviorism Roots.” Proceedings of the Adaptive and Learning Agents Workshop ( AAMAS ).

##### Abstract

We present an empirical survey of reinforcement learning techniques and relate these techniques to concepts from behaviorism, a field of psychology concerned with the learning process. Specifically, we examine two standard RL algorithms, model-free SARSA, and model-based R-MAX, when used with various shaping techniques. We consider multiple techniques for incorporating shaping into these algorithms, including the use of options and potentialbased shaping. Findings indicate any improvement in sample complexity that results from shaping is limited at best. We suggest that this is either due to reinforcement learning not modeling behaviorism well, or behaviorism not modeling animal learning well. We further suggest that a paradigm shift in reinforcement learning techniques is required before the kind of learning performance that techniques from behaviorism indicate are possible can be realized.

ALA-12

##### BibTex Citation
@misc{ala12-adams,
author = {Adams, Matthew and Loftin, Robert and Taylor, Matthew E. and Littman, Michael and Roberts, David},
title = {{An Empirical Analysis of {RL} 's Drift From Its Behaviorism Roots}},
booktitle = {{Proceedings of the Adaptive and Learning Agents workshop ( {AAMAS} )}},
month = jun,
year = {2012},
month_numeric = {6}
}

• Reinforcement Learning

2012 Lisa Torrey, and Matthew E. Taylor. June 2012. “Help an Agent Out: Student/Teacher Learning in Sequential Decision Tasks.” Proceedings of the Adaptive and Learning Agents Workshop ( AAMAS ).

##### Abstract

Research on agents has led to the development of algorithms for learning from experience, accepting guidance from humans, and imitating experts. This paper explores a new direction for agents: the ability to teach other agents. In particular, we focus on situations where the teacher has limited expertise and instructs the student through action advice. The paper proposes and evaluates several teaching algorithms based on providing advice at a gradually decreasing rate. A crucial component of these algorithms is the ability of an agent to estimate its confidence in a state. We also contribute a student/teacher framework for implementing teaching strategies, which we hope will spur additional development in this relatively unexplored area.

ALA-12

##### BibTex Citation
@misc{ala12-torrey,
author = {Torrey, Lisa and Taylor, Matthew E.},
title = {{Help an Agent Out: Student/Teacher Learning in Sequential Decision Tasks}},
booktitle = {{Proceedings of the Adaptive and Learning Agents workshop ( {AAMAS} )}},
month = jun,
year = {2012},
month_numeric = {6}
}

• Transfer Learning, Reinforcement Learning

2012 Haitham Bou Ammar, Karl Tuyls, Matthew E. Taylor, Kurt Driessen, and Gerhard Weiss. June 2012. “Reinforcement Learning Transfer via Sparse Coding.” In International Conference on Autonomous Agents and Multiagent Systems ( AAMAS ). 20% acceptance rate

##### Abstract

Although reinforcement learning (RL) has been successfully deployed in a variety of tasks, learning speed remains a fundamental problem for applying RL in complex environments. Transfer learning aims to ameliorate this shortcoming by speeding up learning through the adaptation of previously learned behaviors in similar tasks. Transfer techniques often use an inter-task mapping, which determines how a pair of tasks are related. Instead of relying on a hand-coded inter-task mapping, this paper proposes a novel transfer learning method capable of autonomously creating an inter-task mapping by using a novel combination of sparse coding, sparse projection learning and sparse Gaussian processes. We also propose two new transfer algorithms (TrLSPI and TrFQI) based on least squares policy iteration and fitted-Q-iteration. Experiments not only show successful transfer of information between similar tasks, inverted pendulum to cart pole, but also between two very different domains: mountain car to cart pole. This paper empirically shows that the learned inter-task mapping can be successfully used to (1) improve the performance of a learned policy on a fixed number of environmental samples, (2) reduce the learning times needed by the algorithms to converge to a policy on a fixed number of samples, and (3) converge faster to a near-optimal policy given a large number of samples.

AAMAS-12

##### BibTex Citation
@inproceedings{12aamas-haitham,
author = {Ammar, Haitham Bou and Tuyls, Karl and Taylor, Matthew E. and Driessen, Kurt and Weiss, Gerhard},
title = {{Reinforcement Learning Transfer via Sparse Coding}},
booktitle = {{International Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = jun,
year = {2012},
month_numeric = {6}
}

• Reinforcement Learning, Robotics

2012 Sanjeev Sharma, and Matthew E. Taylor. October 2012. “Autonomous Waypoint Generation Strategy for On-Line Navigation in Unknown Environments.” IROS Workshop on Robot Motion Planning: Online, Reactive, and in Real-Time.

##### BibTex Citation
@misc{irosws12-sharma,
author = {Sharma, Sanjeev and Taylor, Matthew E.},
title = {{Autonomous Waypoint Generation Strategy for On-Line Navigation in Unknown Environments}},
booktitle = {{{IROS} Workshop on Robot Motion Planning: Online, Reactive, and in Real-Time}},
year = {2012},
month = oct,
month_numeric = {10}
}


## 2011

• Security

2011 Matthew E. Taylor, Christopher Kiekintveld, and Milind Tambe. 2011. “Evaluating Deployed Decision Support Systems for Security: Challenges, Arguments, and Approaches.” In Security Games: Theory, Deployed Applications, Lessons Learned, edited by Milind Tambe, 254–83. Cambridge University Press.

##### BibTex Citation
@incollection{11evaluation-taylor,
author = {Taylor, Matthew E. and Kiekintveld, Christopher and Tambe, Milind},
title = {{Evaluating Deployed Decision Support Systems for Security: Challenges, Arguments, and Approaches}},
editor = {Tambe, Milind},
booktitle = {{Security Games: Theory, Deployed Applications, Lessons Learned}},
publisher = {Cambridge University Press},
year = {2011},
pages = {254-283},
isbn = {978-1-107-09642-4}
}

• 2011 Matthew E. Taylor, Manish Jain, Christopher Kiekintveld, Jun-young Kwak, Rong Yang, Zhengyu Yin, and Milind Tambe. 2011. “Two Decades of Multiagent Teamwork Research: Past, Present, and Future.” In Collaborative Agents - REsearch and Development (CARE) 2009-2010, edited by C. Guttmann, F. Dignum, and M. Georgeff. Vol. 6066. Lecture Notes in Artificial Intelligence. Springer-Verlag.

##### BibTex Citation
@incollection{11care-taylor,
author = {Taylor, Matthew E. and Jain, Manish and Kiekintveld, Christopher and Kwak, Jun-young and Yang, Rong and Yin, Zhengyu and Tambe, Milind},
title = {Two Decades of Multiagent Teamwork Research: Past, Present, and Future},
editor = {Guttmann, C. and Dignum, F. and Georgeff, M.},
booktitle = {Collaborative Agents - REsearch and Development {(CARE)} 2009-2010},
publisher = {Springer-Verlag},
series = {Lecture Notes in Artificial Intelligence},
volume = {6066},
year = {2011},
byb2html_rescat = {DCOP}
}

• DCOP

2011 Marcos A. M. Vieira, Matthew E. Taylor, Prateek Tandon, Manish Jain, Ramesh Govindan, Gaurav S. Sukhatme, and Milind Tambe. 2011. “Mitigating Multi-Path Fading in a Mobile Mesh Network.” Ad Hoc Networks Journal.

##### BibTex Citation
@article{adhoc11-vieira,
author = {A.~M.~Vieira, Marcos and Taylor, Matthew E. and Tandon, Prateek and Jain, Manish and Govindan, Ramesh and S.~Sukhatme, Gaurav and Tambe, Milind},
title = {{Mitigating Multi-path Fading in a Mobile Mesh Network}},
journal = {{Ad Hoc Networks Journal}},
year = {2011}
}

• DCOP

2011 Matthew E. Taylor, Manish Jain, Prateek Tandon, Makoto Yokoo, and Milind Tambe. 2011. “Distributed On-Line Multi-Agent Optimization Under Uncertainty: Balancing Exploration and Exploitation.” Advances in Complex Systems.

##### BibTex Citation
@article{acs11-taylor,
author = {Taylor, Matthew E. and Jain, Manish and Tandon, Prateek and Yokoo, Makoto and Tambe, Milind},
title = {{Distributed On-line Multi-Agent Optimization Under Uncertainty: Balancing Exploration and Exploitation}},
journal = {{Advances in Complex Systems}},
year = {2011}
}

• Reinforcement Learning, Transfer Learning

2011 Matthew E. Taylor, and Peter Stone. 2011. “An Introduction to Inter-Task Transfer for Reinforcement Learning.” AI Magazine 32 (1): 15–34.

##### BibTex Citation
@article{aaaimag11-taylor,
author = {Taylor, Matthew E. and Stone, Peter},
title = {{An Introduction to Inter-task Transfer for Reinforcement Learning}},
journal = {{{AI} Magazine}},
year = {2011},
volume = {32},
number = {1},
pages = {15--34}
}

• Transfer Learning, Reinforcement Learning

2011 Matthew E. Taylor, Halit Bener Suay, and Sonia Chernova. March 2011. “Using Human Demonstrations to Improve Reinforcement Learning.” The AAAI 2011 Spring Symposium — Help Me Help You: Bridging the Gaps in Human-Agent Collaboration.

##### Abstract

This work introduces Human-Agent Transfer (HAT), an algorithm that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations transferred into a baseline policy for an agent and refined using reinforcement learning significantly improve both learning time and policy performance. Our evaluation compares three algorithmic approaches to incorporating demonstration rule summaries into transfer learning, and studies the impact of demonstration quality and quantity. Our results show that all three transfer methods lead to statistically significant improvement in performance over learning without demonstration.

HMHY2011

##### BibTex Citation
@misc{aaai11symp-taylor,
author = {Taylor, Matthew E. and Suay, Halit Bener and Chernova, Sonia},
title = {{Using Human Demonstrations to Improve Reinforcement Learning}},
booktitle = {{The {AAAI} 2011 Spring Symposium --- Help Me Help You: Bridging the Gaps in Human-Agent Collaboration}},
month = mar,
year = {2011},
month_numeric = {3}
}

• Reinforcement Learning

2011 Shimon Whiteson, Brian Tanner, Matthew E. Taylor, and Peter Stone. April 2011. “Protecting Against Evaluation Overfitting in Empirical Reinforcement Learning.” Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning ( ADPRL ).

Funded by Reinforcement Learning

##### BibTex Citation
@misc{adprl11-whiteson,
author = {Whiteson, Shimon and Tanner, Brian and Taylor, Matthew E. and Stone, Peter},
title = {{Protecting Against Evaluation Overfitting in Empirical Reinforcement Learning}},
booktitle = {{Proceedings of the {IEEE} Symposium on Adaptive Dynamic Programming and Reinforcement Learning ( {ADPRL} )}},
month = apr,
year = {2011},
month_numeric = {4}
}

• Distributed POMDPs

2011 Jun-young Kwak, Rong Yang, Zhengyu Yin, Matthew E. Taylor, and Milind Tambe. May 2011. “Teamwork in Distributed POMDP s: Execution-Time Coordination Under Model Uncertainty (Poster).” In International Conference on Autonomous Agents and Multiagent Systems ( AAMAS ). Extended Abstract: 22% acceptance rate for papers, additional 25% for extended abstracts

AAMAS-11

##### BibTex Citation
@inproceedings{11aamas-kwak,
author = {Kwak, Jun-young and Yang, Rong and Yin, Zhengyu and Taylor, Matthew E. and Tambe, Milind},
title = {{Teamwork in Distributed {POMDP} s: Execution-time Coordination Under Model Uncertainty (Poster)}},
booktitle = {{International Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = may,
year = {2011},
month_numeric = {5}
}

• 2011 Paul Scerri, Balajee Kannan, Pras Velagapudi, Kate Macarthur, Peter Stone, Matthew E. Taylor, John Dolan, et al. May 2011. “Flood Disaster Mitigation: A Real-World Challenge Problem for Multi-Agent Unmanned Surface Vehicles.” Proceedings of the Autonomous Robots and Multirobot Systems Workshop ( AAMAS ).

##### Abstract

As we advance the state of technology for robotic systems, there is a need for defining complex real-world challenge problems for the multi-agent/robot community to address. A well-defined challenge problem can motivate researchers to aggressively address and overcome core domain challenges that might otherwise take years to solve. As the focus of multi-agent research shifts from the mature domains of UGV and UAVs to USVs, there is a need for outlining well-defined and realistic challenge problems. In this position paper, we define one such problem, food disaster mitigation. The ability to respond quickly and effectively to disasters is essential to saving lives and limiting the scope of damage. The nature of floods dictates the need for a fleet of low-cost and small autonomous boats that can provide situational awareness (SA), damage assessment and deliver supplies before more traditional emergency response assets can access an affected area. In addition to addressing an essential need, the outlined application provides an interesting challenge problem for advancing fundamental research in multi-agent systems (MAS) specific to the USV domain. In this paper, we define a technical statement of this MAS challenge problem based and outline MAS specific technical constraints based on the associated real-world constraints. Core MAS sub-problems that must be solved for this application include coordination, control, human interaction, autonomy, task allocation, and communication. This problem provides a concrete and real-world MAS application that will bring together researchers with a diverse range of expertise to develop and implement the necessary algorithms and mechanisms.

ARMS-11

##### BibTex Citation
@misc{arms11-scerri,
author = {Scerri, Paul and Kannan, Balajee and Velagapudi, Pras and Macarthur, Kate and Stone, Peter and Taylor, Matthew E. and Dolan, John and Farinelli, Alessandro and Chapman, Archie and Dias, Bernadine and Kantor, George},
title = {{Flood Disaster Mitigation: A Real-world Challenge Problem for Multi-Agent Unmanned Surface Vehicles}},
booktitle = {{Proceedings of the Autonomous Robots and Multirobot Systems workshop ( {AAMAS} )}},
month = may,
year = {2011},
month_numeric = {5}
}

• Transfer Learning, Reinforcement Learning

2011 Haitham Bou Ammar, and Matthew E. Taylor. May 2011. “Common Subspace Transfer for Reinforcement Learning Tasks.” Proceedings of the Adaptive and Learning Agents Workshop (AAMAS).

ALA-11

##### BibTex Citation
@misc{ala11-ammar,
author = {Ammar, Haitham Bou and Taylor, Matthew E.},
title = {Common Subspace Transfer for Reinforcement Learning Tasks},
booktitle = {Proceedings of the Adaptive and Learning Agents workshop (AAMAS)},
month = may,
year = {2011},
month_numeric = {5}
}

• Distributed POMDPs

2011 Jun-young Kwak, Zhengyu Yin, Rong Yang, Matthew E. Taylor, and Milind Tambe. May 2011. “Robust Execution-Time Coordination in DEC-POMDPs Under Model Uncertainty.” Proceedings of the Workshop on Multiagent Sequential Decision Making in Uncertain Domains ( AAMAS ).

MSDM-11

##### BibTex Citation
@misc{msdm11-kwak,
author = {Kwak, Jun-young and Yin, Zhengyu and Yang, Rong and Taylor, Matthew E. and Tambe, Milind},
title = {{Robust Execution-time Coordination in {DEC-POMDPs} Under Model Uncertainty}},
booktitle = {{Proceedings of the Workshop on Multiagent Sequential Decision Making in Uncertain Domains ( {AAMAS} )}},
month = may,
year = {2011},
month_numeric = {5}
}

• DCOP

2011 Scott Alfeld, Kumera Berkele, Stephen A. Desalvo, Tong Pham, Daniel Russo, Lisa Yan, and Matthew E. Taylor. May 2011. “Reducing the Team Uncertainty Penalty: Empirical and Theoretical Approaches.” Proceedings of the Workshop on Multiagent Sequential Decision Making in Uncertain Domains (AAMAS).

MSDM-11

##### BibTex Citation
@misc{msdm11-alfeld,
author = {Alfeld, Scott and Berkele, Kumera and Desalvo, Stephen A. and Pham, Tong and Russo, Daniel and Yan, Lisa and Taylor, Matthew E.},
title = {Reducing the Team Uncertainty Penalty: Empirical and Theoretical Approaches},
booktitle = {Proceedings of the Workshop on Multiagent Sequential Decision Making in Uncertain Domains (AAMAS)},
month = may,
year = {2011},
month_numeric = {5}
}

• Transfer Learning, Reinforcement Learning

2011 Matthew E. Taylor, Halit Bener Suay, and Sonia Chernova. May 2011. “Integrating Reinforcement Learning with Human Demonstrations of Varying Ability.” In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ( AAMAS ). 22% acceptance rate

AAMAS-11

##### BibTex Citation
@inproceedings{11aamas-hat-taylor,
author = {Taylor, Matthew E. and Suay, Halit Bener and Chernova, Sonia},
title = {{Integrating Reinforcement Learning with Human Demonstrations of Varying Ability}},
booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = may,
year = {2011},
month_numeric = {5}
}

• Reinforcement Learning

2011 Matthew E. Taylor, Brian Kulis, and Fei Sha. May 2011. “Metric Learning for Reinforcement Learning Agents.” In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ( AAMAS ). 22% acceptance rate

AAMAS-11

##### BibTex Citation
@inproceedings{11aamas-metriclearn-taylor,
author = {Taylor, Matthew E. and Kulis, Brian and Sha, Fei},
title = {{Metric Learning for Reinforcement Learning Agents}},
booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = may,
year = {2011},
month_numeric = {5}
}

• 2011 Jason Tsai, Natalie Fridman, Emma Bowring, Matthew Brown, Shira Epstein, Gal Kaminka, Stacy Marsella, et al. May 2011. “ESCAPES: Evacuation Simulation with Children, Authorities, Parents, Emotions, and Social Comparison.” In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ( AAMAS ). 22% acceptance rate

AAMAS-11

##### BibTex Citation
@inproceedings{11aamas-tsai,
author = {Tsai, Jason and Fridman, Natalie and Bowring, Emma and Brown, Matthew and Epstein, Shira and Kaminka, Gal and Marsella, Stacy and Ogden, Andrew and Rika, Inbal and Sheel, Ankur and Taylor, Matthew E. and {Xuezhi Wang} and Zilka, Avishay and Tambe, Milind},
title = {{ESCAPES: Evacuation Simulation with Children, Authorities, Parents, Emotions, and Social Comparison}},
booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = may,
year = {2011},
month_numeric = {5}
}

• 2011 W. Bradley Knox, Matthew E. Taylor, and Peter Stone. July 2011. “Understanding Human Teaching Modalities in Reinforcement Learning Environments: A Preliminary Report.” Proceedings of the Agents Learning Interactively from Human Teachers Workshop ( IJCAI ).

ALIHT-11

##### BibTex Citation
@misc{aliht11-knox,
author = {Knox, W. Bradley and Taylor, Matthew E. and Stone, Peter},
title = {{Understanding Human Teaching Modalities in Reinforcement Learning Environments: A Preliminary Report}},
booktitle = {{Proceedings of the Agents Learning Interactively from Human Teachers workshop ( {IJCAI} )}},
month = jul,
year = {2011},
month_numeric = {7}
}

• Distributed POMDPs

2011 Jun-young Kwak, Rong Yang, Zhengyu Yin, Matthew E. Taylor, and Milind Tambe. August 2011. “Towards Addressing Model Uncertainty: Robust Execution-Time Coordination for Teamwork (Short Paper).” In The IEEE/WIC/ACM International Conference on Intelligent Agent Technology ( IAT ). Short Paper: 21% acceptance rate for papers, additional 28% for short papers

##### Abstract

Despite their worst-case NEXP-complete planning complexity, DEC-POMDPs remain a popular framework for multiagent teamwork. This paper introduces effective teamwork under model uncertainty (i.e., potentially inaccurate transition and observation functions) as a novel challenge for DEC-POMDPs and presents MODERN, the first executioncentric framework for DEC-POMDPs explicitly motivated by addressing such model uncertainty. MODERN’s shift of coordination reasoning from planning-time to execution-time avoids the high cost of computing optimal plans whose promised quality may not be realized in practice. There are three key ideas in MODERN: (i) it maintains an exponentially smaller model of other agents’ beliefs and actions than in previous work and then further reduces the computationtime and space expense of this model via bounded pruning; (ii) it reduces execution-time computation by exploiting BDI theories of teamwork, and limits communication to key trigger points; and (iii) it limits its decision-theoretic reasoning about communication to trigger points and uses a systematic markup to encourage extra communication at these points - thus reducing uncertainty among team members at trigger points. We empirically show that MODERN is substantially faster than existing DEC-POMDP execution-centric methods while achieving significantly higher reward.

IAT-11

##### BibTex Citation
@inproceedings{11iat-kwak,
author = {Kwak, Jun-young and Yang, Rong and Yin, Zhengyu and Taylor, Matthew E. and Tambe, Milind},
title = {{Towards Addressing Model Uncertainty: Robust Execution-time Coordination for Teamwork (Short Paper)}},
booktitle = {{The {IEEE/WIC/ACM} International Conference on Intelligent Agent Technology ( {IAT} )}},
month = aug,
year = {2011},
month_numeric = {8}
}

• Reinforcement Learning, Pedagogy

2011 Matthew E. Taylor. August 2011. “Model Assignment: Reinforcement Learning in a Generalized Mario Domain.” Proceedings of the Second Symposium on Educational Advances in Artificial Intelligence.

##### BibTex Citation
@misc{eaai11-modelassignment,
author = {Taylor, Matthew E.},
title = {{Model Assignment: Reinforcement Learning in a Generalized Mario Domain}},
booktitle = {{Proceedings of the Second Symposium on Educational Advances in Artificial Intelligence}},
month = aug,
year = {2011},
month_numeric = {8}
}

• Reinforcement Learning, Pedagogy

2011 Matthew E. Taylor. August 2011. “Teaching Reinforcement Learning with Mario: An Argument and Case Study.” Proceedings of the Second Symposium on Educational Advances in Artificial Intelligence.

EAAI-11

##### BibTex Citation
@misc{eaai11-taylor,
author = {Taylor, Matthew E.},
title = {{Teaching Reinforcement Learning with Mario: An Argument and Case Study}},
booktitle = {{Proceedings of the Second Symposium on Educational Advances in Artificial Intelligence}},
month = aug,
year = {2011},
month_numeric = {8}
}

• 2011 Anestis Fachantidis, Ioannis Partalas, Matthew E. Taylor, and Ioannis Vlahavas. September 2011. “Transfer Learning via Multiple Inter-Task Mappings.” Proceedings of European Workshop on Reinforcement Learning ( ECML ).

EWRL-11

##### BibTex Citation
@misc{ewrl11-fachantidis,
author = {Fachantidis, Anestis and Partalas, Ioannis and Taylor, Matthew E. and Vlahavas, Ioannis},
title = {{Transfer Learning via Multiple Inter-Task Mappings}},
booktitle = {{Proceedings of European Workshop on Reinforcement Learning ( {ECML} )}},
month = sep,
year = {2011},
month_numeric = {9}
}

• Reinforcement Learning, Transfer Learning

2011 Haitham Bou Ammar, Matthew E. Taylor, and Karl Tuyls. November 2011. “Common Sub-Space Transfer for Reinforcement Learning Tasks (Poster).” In The 23rd Benelux Conference on Artificial Intelligence ( BNAIC ). 44% overall acceptance rate

BNAIC-11

##### BibTex Citation
@inproceedings{11bnaic-ammar,
author = {Ammar, Haitham Bou and Taylor, Matthew E. and Tuyls, Karl},
title = {{Common Sub-Space Transfer for Reinforcement Learning Tasks (Poster)}},
booktitle = {{The 23rd Benelux Conference on Artificial Intelligence ( {BNAIC} )}},
month = nov,
year = {2011},
month_numeric = {11}
}

• Reinforcement Learning, Transfer Learning

2011 Haitham Bou Ammar, Matthew E. Taylor, Karl Tuyls, and Gerhard Weiss. November 2011. “Reinforcement Learning Transfer Using a Sparse Coded Inter-Task Mapping.” Proceedings of the European Workshop on Multi-Agent Systems.

EUMAS-11

##### BibTex Citation
@misc{eumass11-amar,
author = {Ammar, Haitham Bou and Taylor, Matthew E. and Tuyls, Karl and Weiss, Gerhard},
title = {{Reinforcement Learning Transfer using a Sparse Coded Inter-Task Mapping}},
booktitle = {{Proceedings of the European Workshop on Multi-agent Systems}},
month = nov,
year = {2011},
month_numeric = {11}
}


## 2010

• Reinforcement Learning

2010 Matthew E. Taylor, and Karl Tuyls, eds. 2010. Adaptive Agents and Multi-Agent Systems IV. Vol. 5924. Lecture Notes in Computer Science. Springer-Verlag.

##### Notes

Many chapters are extended versions of papers appearing at the AAMAS 2009 workshop on Adaptive and Learning Agents. Publisher’s website: http://www.springer.com/computer/ai/book/978-3-642-11813-5

##### BibTex Citation
@book{springer10,
editor = {Taylor, Matthew E. and Tuyls, Karl},
title = {{Adaptive Agents and Multi-Agent Systems {IV}}},
year = {2010},
publisher = {Springer-Verlag},
series = {Lecture Notes in Computer Science},
volume = {5924},
isbn = {978-3-642-11813-5}
}

• Reinforcement Learning

2010 Marc Ponsen, Matthew E. Taylor, and Karl Tuyls. 2010. “Abstraction and Generalization in Reinforcement Learning.” In Adaptive Agents and Multi-Agent Systems IV, edited by Matthew E. Taylor and Karl Tuyls, 5924:1–33. Springer-Verlag.

##### BibTex Citation
@incollection{ponsen10,
author = {Ponsen, Marc and Taylor, Matthew E. and Tuyls, Karl},
title = {{Abstraction and Generalization in Reinforcement Learning}},
booktitle = {{Adaptive Agents and Multi-Agent Systems {IV}}},
editor = {Taylor, Matthew E. and Tuyls, Karl},
publisher = {Springer-Verlag},
year = {2010},
pages = {1--33},
volume = {5924}
}

• Security

2010 Matthew E. Taylor, Christopher Kiekintveld, Craig Western, and Milind Tambe. 2010. “A Framework for Evaluating Deployed Security Systems: Is There a Chink in Your ARMOR ?.” Informatica 34 (2): 129–39.

Funded by CREATE

##### Abstract

A growing number of security applications are being developed and deployed to explicitly reduce risk from adversaries’ actions. However, there are many challenges when attempting to \emph evaluate such systems, both in the lab and in the real world. Traditional evaluations used by computer scientists, such as runtime analysis and optimality proofs, may be largely irrelevant. The primary contribution of this paper is to provide a preliminary framework which can guide the evaluation of such systems and to apply the framework to the evaluation of ARMOR (a system deployed at LAX since August 2007). This framework helps to determine what evaluations could, and should, be run in order to measure a system’s overall utility. A secondary contribution of this paper is to help familiarize our community with some of the difficulties inherent in evaluating deployed applications, focusing on those in security domains.

##### BibTex Citation
@article{informatica10-taylor,
author = {Taylor, Matthew E. and Kiekintveld, Christopher and Western, Craig and Tambe, Milind},
title = {{A Framework for Evaluating Deployed Security Systems: Is There a Chink in your {ARMOR} ?}},
journal = {{Informatica}},
year = {2010},
volume = {34},
number = {2},
pages = {129--139}
}

• Reinforcement Learning, Machine Learning in Practice

2010 Shimon Whiteson, Matthew E. Taylor, and Peter Stone. 2010. “Critical Factors in the Empirical Performance of Temporal Difference and Evolutionary Methods for Reinforcement Learning.” Journal of Autonomous Agents and Multi-Agent Systems 21 (1): 1–27.

##### Abstract

Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address this shortcoming by presenting results of empirical comparisons between Sarsa and NEAT, two representative methods, in mountain car and keepaway, two benchmark reinforcement learning tasks. In each task, the methods are evaluated in combination with both linear and nonlinear representations to determine their best configurations. In addition, this article tests two specific hypotheses about the critical factors contributing to these methods’ relative performance: 1) that sensor noise reduces the final performance of Sarsa more than that of NEAT, because Sarsa’s learning updates are not reliable in the absence of the Markov property and 2) that stochasticity, by introducing noise in fitness estimates, reduces the learning speed of NEAT more than that of Sarsa. Experiments in variations of mountain car and keepaway designed to isolate these factors confirm both these hypotheses.

##### BibTex Citation
@article{jaamas09-whiteson,
author = {Whiteson, Shimon and Taylor, Matthew E. and Stone, Peter},
title = {{Critical Factors in the Empirical Performance of Temporal Difference and Evolutionary Methods for Reinforcement Learning}},
journal = {{Journal of Autonomous Agents and Multi-Agent Systems}},
year = {2010},
volume = {21},
number = {1},
pages = {1--27}
}

• Transfer Learning, Reinforcement Learning

2010 Matthew E. Taylor, and Sonia Chernova. May 2010. “Integrating Human Demonstration and Reinforcement Learning: Initial Results in Human-Agent Transfer.” Proceedings of the Agents Learning Interactively from Human Teachers Workshop ( AAMAS ).

##### Abstract

This work introduces Human-Agent Transfer (HAT), a method that combines transfer learning, learning from demonstration and reinforcement learning to achieve rapid learning and high performance in complex domains. Using experiments in a simulated robot soccer domain, we show that human demonstrations can be transferred into a baseline policy for an agent, and reinforcement learning can be used to significantly improve policy performance. These results are an important initial step that suggest that agents can not only quickly learn to mimic human actions, but that they can also learn to surpass the abilities of the teacher.

ALIHT-10

##### BibTex Citation
@misc{aliht10-taylor,
author = {Taylor, Matthew E. and Chernova, Sonia},
title = {{Integrating Human Demonstration and Reinforcement Learning: Initial Results in Human-Agent Transfer}},
booktitle = {{Proceedings of the Agents Learning Interactively from Human Teachers workshop ( {AAMAS} )}},
month = may,
year = {2010},
month_numeric = {5}
}

• DCOP, Robotics

2010 Scott Alfeld, Matthew E. Taylor, Prateek Tandon, and Milind Tambe. May 2010. “Towards a Theoretic Understanding of DCEE.” Proceedings of the Distributed Constraint Reasoning Workshop ( AAMAS ).

##### Abstract

Common wisdom says that the greater the level of teamwork, the higher the performance of the team. In teams of cooperative autonomous agents, working together rather than independently can increase the team reward. However, recent results show that in uncertain environments, increasing the level of teamwork can actually decrease overall performance. Coined the team uncertainty penalty, this phenomenon has been shown empirically in simulation, but the underlying mathematics are not yet understood. By understanding the mathematics, we could develop algorithms that reduce or eliminate this penalty of increased teamwork.
In this paper we investigate the team uncertainty penalty on two fronts. First, we provide results of robots exhibiting the same behavior seen in simulations. Second, we present a mathematical foundation by which to analyze the phenomenon. Using this model, we present findings indicating that the team uncertainty penalty is inherent to the level of teamwork allowed, rather than to specific algorithms.

DCR-10

##### BibTex Citation
@misc{dcr10-alfeld,
author = {Alfeld, Scott and Taylor, Matthew E. and Tandon, Prateek and Tambe, Milind},
title = {{Towards a Theoretic Understanding of {DCEE}}},
booktitle = {{Proceedings of the Distributed Constraint Reasoning workshop ( {AAMAS} )}},
month = may,
year = {2010},
month_numeric = {5}
}

• Transfer Learning, Reinforcement Learning, Robotics

2010 Samuel Barrett, Matthew E. Taylor, and Peter Stone. May 2010. “Transfer Learning for Reinforcement Learning on a Physical Robot.” Proceedings of the Adaptive and Learning Agents Workshop ( AAMAS ).

##### Abstract

As robots become more widely available, many capabilities that were once only practical to develop and test in simulation are becoming feasible on real, physically grounded, robots. This newfound feasibility is important because simulators rarely represent the world with sufficient fidelity that developed behaviors will work as desired in the real world. However, development and testing on robots remains difficult and time consuming, so it is desirable to minimize the number of trials needed when developing robot behaviors.
This paper focuses on reinforcement learning (RL) on physically grounded robots. A few noteworthy exceptions notwithstanding, RL has typically been done purely in simulation, or, at best, initially in simulation with the eventual learned behaviors run on a real robot. However, some recent RL methods exhibit sufficiently low sample complexity to enable learning entirely on robots. One such method is transfer learning for RL. The main contribution of this paper is the first empirical demonstration that transfer learning can significantly speed up and even improve asymptotic performance of RL done entirely on a physical robot. In addition, we show that transferring information learned in simulation can bolster additional learning on the robot.

ALA-10

##### BibTex Citation
@misc{ala10-barrett,
author = {Barrett, Samuel and Taylor, Matthew E. and Stone, Peter},
title = {{Transfer Learning for Reinforcement Learning on a Physical Robot}},
booktitle = {{Proceedings of the Adaptive and Learning Agents workshop ( {AAMAS} )}},
month = may,
year = {2010},
month_numeric = {5}
}

• DCOP

2010 Matthew E. Taylor, Manish Jain, Yanquin Jin, Makoto Yooko, and Milind Tambe. May 2010. “When Should There Be a ‘Me’ in ‘Team’? D Istributed Multi-Agent Optimization Under Uncertainty.” In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ( AAMAS ). 24% acceptance rate

##### Abstract

Increasing teamwork between agents typically increases the performance of a multi-agent system, at the cost of increased communication and higher computational complexity. This work examines joint actions in the context of a multi-agent optimization problem where agents must cooperate to balance exploration and exploitation. Surprisingly, results show that increased teamwork can hurt agent performance, even when communication and computation costs are ignored, which we term the team uncertainty penalty. This paper introduces the above phenomena, analyzes it, and presents algorithms to reduce the effect of the penalty in our problem setting.

##### Notes

Supplemental material is available at http://teamcore.usc.edu/dcop/.

##### BibTex Citation
@inproceedings{aamas10-taylor,
author = {Taylor, Matthew E. and Jain, Manish and Jin, Yanquin and Yooko, Makoto and Tambe, Milind},
title = {{When Should There be a Me'' in Team''? {D} istributed Multi-Agent Optimization Under Uncertainty}},
booktitle = {{Proceedings of the International Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = may,
year = {2010},
month_numeric = {5}
}

• Reinforcement Learning, Genetic Algorithms

2010 Matthew E. Taylor, Katherine E. Coons, Behnam Robatmili, Bertrand A. Maher, Doug Burger, and Kathryn S. McKinley. July 2010. “Evolving Compiler Heuristics to Manage Communication and Contention.” In Proceedings of the Twenty-Fourth Conference on Artificial Intelligence ( AAAI ). Nectar Track, 25% acceptance rate

Funded by NSF, DARPA

##### Abstract

As computer architectures become increasingly complex, hand-tuning compiler heuristics becomes increasingly tedious and time consuming for compiler developers. This paper presents a case study that uses a genetic algorithm to learn a compiler policy. The target policy implicitly balances communication and contention among processing elements of the TRIPS processor, a physically realized prototype chip. We learn specialized policies for individual programs as well as general policies that work well across all programs. We also employ a two-stage method that first classifies the code being compiled based on salient characteristics, and then chooses a specialized policy based on that classification.
This work is particularly interesting for the AI community because it 1 emphasizes the need for increased collaboration between AI researchers and researchers from other branches of computer science and 2 discusses a machine learning setup where training on the custom hardware requires weeks of training, rather than the more typical minutes or hours.

##### Notes

AAAI-2010. This paper is based on results presented in our earlier PACT-08 paper.

##### BibTex Citation
@inproceedings{aaai10-nectar-taylor,
author = {Taylor, Matthew E. and Coons, Katherine E. and Robatmili, Behnam and Maher, Bertrand A. and Burger, Doug and McKinley, Kathryn S.},
title = {{Evolving Compiler Heuristics to Manage Communication and Contention}},
booktitle = {{Proceedings of the Twenty-Fourth Conference on Artificial Intelligence ( {AAAI} )}},
month = jul,
year = {2010},
month_numeric = {7}
}


## 2009

• Reinforcement Learning, Transfer Learning

2009 Matthew E. Taylor, and Peter Stone. 2009. “Transfer Learning for Reinforcement Learning Domains: A Survey.” Journal of Machine Learning Research 10 (1): 1633–85.

Funded by NSF, DARPA

##### Abstract

The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.

##### BibTex Citation
@article{jmlr09-taylor,
author = {Taylor, Matthew E. and Stone, Peter},
title = {{Transfer Learning for Reinforcement Learning Domains: A Survey}},
journal = {{Journal of Machine Learning Research}},
volume = {10},
number = {1},
pages = {1633--1685},
year = {2009}
}

• Reinforcement Learning, Transfer Learning

2009 Matthew E. Taylor. 2009. Transfer in Reinforcement Learning Domains. Vol. 216. Studies in Computational Intelligence. Springer-Verlag.

##### Notes

A book based on my PhD thesis.
Publishers Webpage.

##### BibTex Citation
@book{springer09,
author = {Taylor, Matthew E.},
title = {{Transfer in Reinforcement Learning Domains}},
year = {2009},
publisher = {Springer-Verlag},
series = {Studies in Computational Intelligence},
volume = {216},
isbn = {978-3-642-01881-7}
}

• Transfer Learning, Reinforcement Learning

2009 Matthew E. Taylor. March 2009. “Assisting Transfer-Enabled Machine Learning Algorithms: Leveraging Human Knowledge for Curriculum Design.” The AAAI 2009 Spring Symposium on Agents That Learn from Human Teachers.

##### Abstract

Transfer learning is a successful technique that significantly improves machine learning algorithms by training on a sequence of tasks rather than a single task in isolation. However, there is currently no systematic method for deciding how to construct such a sequence of tasks. In this paper, I propose that while humans are well-suited for the task of curriculum development, significant research is still necessary to better understand how to create effective curricula for machine learning algorithms.

##### BibTex Citation
@misc{aaai09ss-taylor,
author = {Taylor, Matthew E.},
title = {{Assisting Transfer-Enabled Machine Learning Algorithms: Leveraging Human Knowledge for Curriculum Design}},
booktitle = {{The {AAAI} 2009 Spring Symposium on Agents that Learn from Human Teachers}},
month = mar,
year = {2009},
month_numeric = {3}
}

• DCOP, Robotics

2009 Manish Jain, Matthew E. Taylor, Makoto Yokoo, and Milind Tambe. May 2009. “DCOP s Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks.” Proceedings of the Third International Workshop on Agent Technology for Sensor Networks ( AAMAS ).

Funded by DARPA

##### Abstract

Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.

##### BibTex Citation
@misc{atsn09-jain,
author = {Jain, Manish and Taylor, Matthew E. and Yokoo, Makoto and Tambe, Milind},
title = {{{DCOP} s Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks}},
booktitle = {{Proceedings of the Third International Workshop on Agent Technology for Sensor Networks ( {AAMAS} )}},
month = may,
year = {2009},
month_numeric = {5}
}

• Distributed POMDPs

2009 Jun-young Kwak, Pradeep Varakantham, Matthew E. Taylor, Janusz Marecki, Paul Scerri, and Milind Tambe. May 2009. “Exploiting Coordination Locales in Distributed POMDP s via Social Model Shaping.” Proceedings of the Fourth Workshop on Multi-Agent Sequential Decision-Making in Uncertain Domains ( AAMAS ).

Funded by ARMY

##### Abstract

While distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, NEXP-Complete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, a novel algorithm to solve such distributed POMDPs. Two major novelties in TREMOR are (i) use of social model shaping to coordinate agents, (ii) harnessing efficient single agent-POMDP solvers. Experimental results demonstrate that TREMOR may provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.

##### Notes

MSDM-2009
Superseded by the ICAPS-09 conference paper Exploiting Coordination Locales in Distributed POMDP s via Social Model Shaping.

##### BibTex Citation
@misc{msdm09-kwak,
author = {Kwak, Jun-young and Varakantham, Pradeep and Taylor, Matthew E. and Marecki, Janusz and Scerri, Paul and Tambe, Milind},
title = {{Exploiting Coordination Locales in Distributed {POMDP} s via Social Model Shaping}},
booktitle = {{Proceedings of the Fourth Workshop on Multi-agent Sequential Decision-Making in Uncertain Domains ( {AAMAS} )}},
month = may,
year = {2009},
month_numeric = {5}
}

• Security

2009 Matthew E. Taylor, Chris Kiekintveld, Craig Western, and Milind Tambe. May 2009. “Beyond Runtimes and Optimality: Challenges and Opportunities in Evaluating Deployed Security Systems.” Proceedings of the AAMAS -09 Workshop on Agent Design: Advancing from Practice to Theory.

Funded by CREATE

##### Abstract

As multi-agent research transitions into the real world, evaluation becomes an increasingly important challenge. One can run controlled and repeatable tests in a laboratory environment, but such tests may be difficult, or even impossible, once the system is deployed. Furthermore, traditional metrics used by computer scientists, such as runtime analysis, may be largely irrelevant.

##### BibTex Citation
@misc{adapt09-taylor,
author = {Taylor, Matthew E. and Kiekintveld, Chris and Western, Craig and Tambe, Milind},
title = {{Beyond Runtimes and Optimality: Challenges and Opportunities in Evaluating Deployed Security Systems}},
booktitle = {{Proceedings of the {AAMAS} -09 Workshop on Agent Design: Advancing from Practice to Theory}},
month = may,
year = {2009},
month_numeric = {5}
}

• Reinforcement Learning, Transfer Learning

2009 Matthew E. Taylor, and Peter Stone. June 2009. “Categorizing Transfer for Reinforcement Learning.” Poster at the Multidisciplinary Symposium on Reinforcement Learning.

##### BibTex Citation
@misc{msrl09-taylor,
author = {Taylor, Matthew E. and Stone, Peter},
title = {{Categorizing Transfer for Reinforcement Learning}},
booktitle = {{Poster at the Multidisciplinary Symposium on Reinforcement Learning}},
month = jun,
year = {2009},
month_numeric = {6}
}

• Reinforcement Learning

2009 Shimon Whiteson, Brian Tanner, Matthew E. Taylor, and Peter Stone. June 2009. “Generalized Domains for Empirical Evaluations in Reinforcement Learning.” Proceedings of the Fourth Workshop on Evaluation Methods for Machine Learning at ICML -09.

##### Abstract

Many empirical results in reinforcement learning are based on a very small set of environments. These results often represent the best algorithm parameters that were found after an ad-hoc tuning or fitting process. We argue that presenting tuned scores from a small set of environments leads to method overfitting, wherein results may not generalize to similar environments. To address this problem, we advocate empirical evaluations using generalized domains: parameterized problem generators that explicitly encode variations in the environment to which the learner should be robust. We argue that evaluating across a set of these generated problems offers a more meaningful evaluation of reinforcement learning algorithms.

##### BibTex Citation
@misc{icmlws09-whiteson,
author = {Whiteson, Shimon and Tanner, Brian and Taylor, Matthew E. and Stone, Peter},
title = {{Generalized Domains for Empirical Evaluations in Reinforcement Learning}},
booktitle = {{Proceedings of the Fourth Workshop on Evaluation Methods for Machine Learning at {ICML} -09}},
month = jun,
year = {2009},
month_numeric = {6}
}

• DCOP

2009 Matthew E. Taylor, Manish Jain, Prateek Tandon, and Milind Tambe. July 2009. “Using DCOP s to Balance Exploration and Exploitation in Time-Critical Domains.” Proceedings of the IJCAI 2009 Workshop on Distributed Constraint Reasoning.

Funded by ARMY

##### Abstract

Substantial work has investigated balancing exploration and exploitation, but relatively little has addressed this tradeoff in the context of coordinated multi-agent interactions. This paper introduces a class of problems in which agents must maximize their on-line reward, a decomposable function dependent on pairs of agent’s decisions. Unlike previous work, agents must both learn the reward function and exploit it on-line, critical properties for a class of physically-motivated systems, such as mobile wireless networks. This paper introduces algorithms motivated by the \emph Distributed Constraint Optimization Problem framework and demonstrates when, and at what cost, increasing agents’ coordination can improve the global reward on such problems.

DCR-2009

##### BibTex Citation
@misc{dcr09-taylor,
author = {Taylor, Matthew E. and Jain, Manish and Tandon, Prateek and Tambe, Milind},
title = {{Using {DCOP} s to Balance Exploration and Exploitation in Time-Critical Domains}},
booktitle = {{Proceedings of the {IJCAI} 2009 Workshop on Distributed Constraint Reasoning}},
month = jul,
year = {2009},
month_numeric = {7}
}

• Security

2009 Matthew E. Taylor, Chris Kiekintveld, Craig Western, and Milind Tambe. July 2009. “Is There a Chink in Your ARMOR? T Owards Robust Evaluations for Deployed Security Systems.” Proceedings of the IJCAI 2009 Workshop on Quantitative Risk Analysis for Security Applications.

Funded by CREATE

##### Notes

QRASA-2009
Superseded by the journal article A Framework for Evaluating Deployed Security Systems: Is There a Chink in your ARMOR?.

##### BibTex Citation
@misc{qrasa09-taylor,
author = {Taylor, Matthew E. and Kiekintveld, Chris and Western, Craig and Tambe, Milind},
title = {{Is There a Chink in Your ARMOR? {T} owards Robust Evaluations for Deployed Security Systems}},
booktitle = {{Proceedings of the {IJCAI} 2009 Workshop on Quantitative Risk Analysis for Security Applications}},
month = jul,
year = {2009},
month_numeric = {7}
}

• DCOP, Robotics

2009 Manish Jain, Matthew E. Taylor, Makoto Yokoo, and Milind Tambe. July 2009. “DCOP s Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks.” In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence ( IJCAI ). 26% acceptance rate

Funded by DARPA

##### Abstract

Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.

IJCAI-2009

##### BibTex Citation
@inproceedings{ijcai09-jain,
author = {Jain, Manish and Taylor, Matthew E. and Yokoo, Makoto and Tambe, Milind},
title = {{{DCOP} s Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks}},
booktitle = {{Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence ( {IJCAI} )}},
month = jul,
year = {2009},
month_numeric = {7}
}

• Distributed POMDPs

2009 Pradeep Varakantham, Jun-young Kwak, Matthew E. Taylor, Janusz Marecki, Paul Scerri, and Milind Tambe. September 2009. “Exploiting Coordination Locales in Distributed POMDP s via Social Model Shaping.” In Proceedings of the Nineteenth International Conference on Automated Planning and Scheduling ( ICAPS ). 34% acceptance rate

Funded by ARMY

##### Abstract

Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXP-Complete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed POMDPs. The primary novelty of TREMOR is that agents plan individually with a single agent POMDP solver and use social model shaping to implicitly coordinate with other agents. Experiments demonstrate that TREMOR can provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.

ICAPS-2009

##### BibTex Citation
@inproceedings{icaps09-varakantham,
author = {Varakantham, Pradeep and Kwak, Jun-young and Taylor, Matthew E. and Marecki, Janusz and Scerri, Paul and Tambe, Milind},
title = {{Exploiting Coordination Locales in Distributed {POMDP} s via Social Model Shaping}},
booktitle = {{Proceedings of the Nineteenth International Conference on Automated Planning and Scheduling ( {ICAPS} )}},
month = sep,
year = {2009},
month_numeric = {9}
}

• Security

2009 Jason Tsai, Emma Bowring, Shira Epstein, Natalie Fridman, Prakhar Garg, Gal Kaminka, Andrew Ogden, Milind Tambe, and Matthew E. Taylor. November 2009. “Agent-Based Evacuation Modeling: Simulating the Los Angeles International Airport.” Proceedings of the Workshop on Emergency Management: Incident, Resource, and Supply Chain Management.

EMWS09-2009

##### BibTex Citation
@misc{emws09-tsai,
author = {Tsai, Jason and Bowring, Emma and Epstein, Shira and Fridman, Natalie and Garg, Prakhar and Kaminka, Gal and Ogden, Andrew and Tambe, Milind and Taylor, Matthew E.},
title = {{Agent-based Evacuation Modeling: Simulating the Los Angeles International Airport}},
booktitle = {{Proceedings of the Workshop on Emergency Management: Incident, Resource, and Supply Chain Management}},
month = nov,
year = {2009},
month_numeric = {11}
}


## 2008

• Transfer Learning

2008 Matthew E. Taylor, Gregory Kuhlmann, and Peter Stone. March 2008. “Transfer Learning and Intelligence: an Argument and Approach.” In Proceedings of the First Conference on Artificial General Intelligence ( AGI ). 50% acceptance rate

Funded by NSF, DARPA

##### Abstract

In order to claim fully general intelligence in an autonomous agent, the ability to learn is one of the most central capabilities. Classical machine learning techniques have had many significant empirical successes, but large real-world problems that are of interest to generally intelligent agents require learning much faster (with much less training experience) than is currently possible. This paper presents transfer learning, where knowledge from a learned task can be used to significantly speed up learning in a novel task, as the key to achieving the learning capabilities necessary for general intelligence. In addition to motivating the need for transfer learning in an intelligent agent, we introduce a novel method for selecting types of tasks to be used for transfer and empirically demonstrate that such a selection can lead to significant increases in training speed in a two-player game.

##### Notes

AGI-2008
A video of talk is available here.

##### BibTex Citation
@inproceedings{agi08-taylor,
author = {Taylor, Matthew E. and Kuhlmann, Gregory and Stone, Peter},
title = {{Transfer Learning and Intelligence: an Argument and Approach}},
booktitle = {{Proceedings of the First Conference on Artificial General Intelligence ( {AGI} )}},
month = mar,
year = {2008},
month_numeric = {3}
}

• Transfer Learning, Reinforcement Learning, Planning

2008 Matthew E. Taylor, Nicholas K. Jong, and Peter Stone. May 2008. “Transferring Instances for Model-Based Reinforcement Learning.” The Adaptive Learning Agents and Multi-Agent Systems ( ALAMAS+ALAG ) Workshop at AAMAS.

Funded by NSF, DARPA

##### Abstract

\emph Reinforcement learning agents typically require a significant amount of data before performing well on complex tasks. \emph Transfer learning methods have made progress reducing sample complexity, but they have only been applied to model-free learning methods, not more data-efficient model-based learning methods. This paper introduces TIMBREL, a novel method capable of transferring information effectively into a model-based reinforcement learning algorithm. We demonstrate that TIMBREL can significantly improve the sample complexity and asymptotic performance of a model-based algorithm when learning in a continuous state space.

##### BibTex Citation
@misc{aamas08-alamas-taylor,
author = {Taylor, Matthew E. and Jong, Nicholas K. and Stone, Peter},
title = {{Transferring Instances for Model-Based Reinforcement Learning}},
booktitle = {{The Adaptive Learning Agents and Multi-Agent Systems ( {ALAMAS+ALAG} ) workshop at {AAMAS}}},
month = may,
year = {2008},
month_numeric = {5}
}

• Reinforcement Learning, Transfer Learning

2008 Matthew E. Taylor, Gregory Kuhlmann, and Peter Stone. May 2008. “Autonomous Transfer for Reinforcement Learning.” In Proceedings of the Seventh International Joint Conference on Autonomous Agents and Multiagent Systems ( AAMAS ), 283–90. 22% acceptance rate

Funded by DARPA, NSF

##### Abstract

Recent work in transfer learning has succeeded in making reinforcement learning algorithms more efficient by incorporating knowledge from previous tasks. However, such methods typically must be provided either a full model of the tasks or an explicit relation mapping one task into the other. An autonomous agent may not have access to such high-level information, but would be able to analyze its experience to find similarities between tasks. In this paper we introduce Modeling Approximate State Transitions by Exploiting Regression (MASTER), a method for automatically learning a mapping from one task to another through an agent’s experience. We empirically demonstrate that such learned relationships can significantly improve the speed of a reinforcement learning algorithm in a series of Mountain Car tasks. Additionally, we demonstrate that our method may also assist with the difficult problem of task selection for transfer.

AAMAS-2008

##### BibTex Citation
@inproceedings{aamas08-taylor,
author = {Taylor, Matthew E. and Kuhlmann, Gregory and Stone, Peter},
title = {{Autonomous Transfer for Reinforcement Learning}},
booktitle = {{Proceedings of the Seventh International Joint Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = may,
year = {2008},
pages = {283--290},
month_numeric = {5}
}

• Transfer Learning, Reinforcement Learning, Planning

2008 Matthew E. Taylor, Nicholas K. Jong, and Peter Stone. September 2008. “Transferring Instances for Model-Based Reinforcement Learning.” In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ( ECML PKDD ), 488–505. 19% acceptance rate

Funded by NSF, DARPA

##### Abstract

Recent work in transfer learning has succeeded in Reinforcement learning agents typically require a significant amount of data before performing well on complex tasks. Transfer learning methods have made progress reducing sample complexity, but they have primarily been applied to model-free learning methods, not more data-efficient model-based learning methods. This paper introduces TIMBREL, a novel method capable of transferring information effectively into a model-based reinforcement learning algorithm. We demonstrate that TIMBREL can significantly improve the sample efficiency and asymptotic performance of a model-based algorithm when learning in a continuous state space. Additionally, we conduct experiments to test the limits of TIMBREL’s effectiveness.

ECML-2008

##### BibTex Citation
@inproceedings{ecml08-taylor,
author = {Taylor, Matthew E. and Jong, Nicholas K. and Stone, Peter},
title = {{Transferring Instances for Model-Based Reinforcement Learning}},
booktitle = {{Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ( {ECML PKDD} )}},
pages = {488--505},
month = sep,
year = {2008},
month_numeric = {9}
}

• Reinforcement Learning, Autonomic Computing, Machine Learning in Practice

2008 Katherine K. Coons, Behnam Robatmili, Matthew E. Taylor, Bertrand A. Maher, Kathryn McKinley, and Doug Burger. October 2008. “Feature Selection and Policy Optimization for Distributed Instruction Placement Using Reinforcement Learning.” In Proceedings of the Seventh International Joint Conference on Parallel Architectures and Compilation Techniques ( PACT ), 32–42. 19% acceptance rate

##### Abstract

Communication overheads are one of the fundamental challenges in a multiprocessor system. As the number of processors on a chip increases, communication overheads and the distribution of computation and data become increasingly important performance factors. Explicit Dataflow Graph Execution (EDGE) processors, in which instructions communicate with one another directly on a distributed substrate, give the compiler control over communication overheads at a fine granularity. Prior work shows that compilers can effectively reduce fine-grained communication overheads in EDGE architectures using a spatial instruction placement algorithm with a heuristic-based cost function. While this algorithm is effective, the cost function must be painstakingly tuned. Heuristics tuned to perform well across a variety of applications leave users with little ability to tune performance-critical applications, yet we find that the best placement heuristics vary significantly with the application.

First, we suggest a systematic feature selection method that reduces the feature set size based on the extent to which features affect performance. To automatically discover placement heuristics, we then use these features as input to a reinforcement learning technique, called Neuro-Evolution of Augmenting Topologies (NEAT), that uses a genetic algorithm to evolve neural networks. We show that NEAT outperforms simulated annealing, the most commonly used optimization technique for instruction placement. We use NEAT to learn general heuristics that are as effective as hand-tuned heuristics, but we find that improving over highly hand-tuned general heuristics is difficult. We then suggest a hierarchical approach to machine learning that classifies segments of code with similar characteristics and learns heuristics for these classes. This approach performs closer to the specialized heuristics. Together, these results suggest that learning compiler heuristics may benefit from both improved feature selection and classification.

PACT-2008

##### BibTex Citation
@inproceedings{pact08-coons,
author = {Coons, Katherine K. and Robatmili, Behnam and Taylor, Matthew E. and Maher, Bertrand A. and McKinley, Kathryn and Burger, Doug},
title = {{Feature Selection and Policy Optimization for Distributed Instruction Placement Using Reinforcement Learning}},
booktitle = {{Proceedings of the Seventh International Joint Conference on Parallel Architectures and Compilation Techniques ( {PACT} )}},
month = oct,
year = {2008},
pages = {32--42},
month_numeric = {10}
}


## 2007

• Reinforcement Learning, Transfer Learning

2007 Matthew E. Taylor, Peter Stone, and Yaxin Liu. 2007. “Transfer Learning via Inter-Task Mappings for Temporal Difference Learning.” Journal of Machine Learning Research 8 (1): 2125–67.

Funded by NSF, DARPA

##### BibTex Citation
@article{jmlr07-taylor,
author = {Taylor, Matthew E. and Stone, Peter and Liu, Yaxin},
title = {{Transfer Learning via Inter-Task Mappings for Temporal Difference Learning}},
journal = {{Journal of Machine Learning Research}},
year = {2007},
volume = {8},
number = {1},
pages = {2125--2167}
}

• Reinforcement Learning

2007 Shimon Whiteson, Matthew E. Taylor, and Peter Stone. 2007. “Empirical Studies in Action Selection for Reinforcement Learning.” Adaptive Behavior 15 (1).

Funded by NSF, DARPA

##### Abstract

To excel in challenging tasks, intelligent agents need sophisticated mechanisms for action selection: they need policies that dictate what action to take in each situation. Reinforcement learning (RL) algorithms are designed to learn such policies given only positive and negative rewards. Two contrasting approaches to RL that are currently in popular use are temporal difference (TD) methods, which learn value functions, and evolutionary methods, which optimize populations of candidate policies. Both approaches have had practical successes but few studies have directly compared them. Hence, there are no general guidelines describing their relative strengths and weaknesses. In addition, there has been little cross-collaboration, with few attempts to make them work together or to apply ideas from one to the other. This article aims to address these shortcomings via three empirical studies that compare these methods and investigate new ways of making them work together.

First, we compare the two approaches in a benchmark task and identify variations of the task that isolate factors critical to each method’s performance. Second, we investigate ways to make evolutionary algorithms excel at on-line tasks by borrowing exploratory mechanisms traditionally used by TD methods. We present empirical results demonstrating a dramatic performance improvement. Third, we explore a novel way of making evolutionary and TD methods work together by using evolution to automatically discover good representations for TD function approximators. We present results demonstrating that this novel approach can outperform both TD and evolutionary methods alone.

##### BibTex Citation
@article{ab07-whiteson,
author = {Whiteson, Shimon and Taylor, Matthew E. and Stone, Peter},
title = {{Empirical Studies in Action Selection for Reinforcement Learning}},
journal = {{Adaptive Behavior}},
year = {2007},
volume = {15},
number = {1}
}

• Reinforcement Learning

2007 Shimon Whiteson, Matthew E. Taylor, and Peter Stone. 2007. “Adaptive Tile Coding for Value Function Approximation.” AI-TR-07-339. University of Texas at Austin.

Funded by NSF, DARPA

##### Abstract

Reinforcement learning problems are commonly tackled by estimating the optimal value function. In many real-world problems, learning this value function requires a function approximator, which maps states to values via a parameterized function. In practice, the success of function approximators depends on the ability of the human designer to select an appropriate representation for the value function. This paper presents \emph adaptive tile coding , a novel method that automates this design process for tile coding, a popular function approximator, by beginning with a simple representation with few tiles and refining it during learning by splitting existing tiles into smaller ones. In addition to automatically discovering effective representations, this approach provides a natural way to reduce the function approximator’s level of generalization over time. Empirical results in multiple domains compare two different criteria for deciding which tiles to split and verify that adaptive tile coding can automatically discover effective representations and that its speed of learning is competitive with the best fixed representations.

##### BibTex Citation
@techreport{whitesontr07,
author = {Whiteson, Shimon and Taylor, Matthew E. and Stone, Peter},
title = {{Adaptive Tile Coding for Value Function Approximation}},
institution = {University of Texas at Austin},
number = {AI-TR-07-339},
year = {2007}
}

• Reinforcement Learning, Transfer Learning

2007 Matthew E. Taylor, and Peter Stone. May 2007. “Towards Reinforcement Learning Representation Transfer (Poster).” In The Sixth International Joint Conference on Autonomous Agents and Multiagent Systems ( AAMAS ), 683–85. Poster: 22% acceptance rate for talks, additional 25% for posters.

Funded by DARPA, NSF

##### Abstract

Transfer learning problems are typically framed as leveraging knowledge learned on a source task to improve learning on a related, but different, target task. Current transfer methods are able to successfully transfer knowledge between agents in different reinforcement learning tasks, reducing the time needed to learn the target. However, the complimentary task of representation transfer, i.e. transferring knowledge between agents with different internal representations, has not been well explored. The goal in both types of transfer problems is the same: reduce the time needed to learn the target with transfer, relative to learning the target without transfer. This work introduces one such representation transfer algorithm which is implemented in a complex multiagent domain. Experiments demonstrate that transferring the learned knowledge between different representations is both possible and beneficial.

##### Notes

AAMAS-2007.
Superseded by the symposium paper Representation Transfer for Reinforcement Learning.

##### BibTex Citation
@inproceedings{aamas07-taylorrt,
author = {Taylor, Matthew E. and Stone, Peter},
title = {{Towards Reinforcement Learning Representation Transfer (Poster)}},
booktitle = {{The Sixth International Joint Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
pages = {683--685},
month = may,
year = {2007},
month_numeric = {5}
}

• Reinforcement Learning, Transfer Learning

2007 Matthew E. Taylor, Shimon Whiteson, and Peter Stone. May 2007. “Transfer via Inter-Task Mappings in Policy Search Reinforcement Learning.” In Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems ( AAMAS ), 156–63. 22% acceptance rate

Funded by DARPA, NSF

##### Abstract

The ambitious goal of transfer learning is to accelerate learning on a target task after training on a different, but related, source task. While many past transfer methods have focused on transferring value-functions, this paper presents a method for transferring policies across tasks with different state and action spaces. In particular, this paper utilizes transfer via inter-task mappings for policy search methods ( \sc tvitm-ps ) to construct a transfer functional that translates a population of neural network policies trained via policy search from a source task to a target task. Empirical results in robot soccer Keepaway and Server Job Scheduling show that \sc tvitm-ps can markedly reduce learning time when full inter-task mappings are available. The results also demonstrate that \sc tvitm-ps still succeeds when given only incomplete inter-task mappings. Furthermore, we present a novel method for learning such mappings when they are not available, and give results showing they perform comparably to hand-coded mappings.

AAMAS-2007

##### BibTex Citation
@inproceedings{aamas07-taylor,
author = {Taylor, Matthew E. and Whiteson, Shimon and Stone, Peter},
title = {{Transfer via Inter-Task Mappings in Policy Search Reinforcement Learning}},
booktitle = {{Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
pages = {156--163},
month = may,
year = {2007},
month_numeric = {5}
}

• Reinforcement Learning

2007 Mazda Ahmadi, Matthew E. Taylor, and Peter Stone. May 2007. “IFSA : Incremental Feature-Set Augmentation for Reinforcement Learning Tasks.” In Proceedings of the the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems ( AAMAS ), 1120–27. 22% acceptance rate, Finalist for Best Student Paper

Funded by DARPA, NSF, ONR

##### Abstract

Reinforcement learning is a popular and successful framework for many agent-related problems because only limited environmental feedback is necessary for learning. While many algorithms exist to learn effective policies in such problems, learning is often used to solve real world problems, which typically have large state spaces, and therefore suffer from the “curse of dimensionality.” One effective method for speeding-up reinforcement learning algorithms is to leverage expert knowledge. In this paper, we propose a method for dynamically augmenting the agent’s feature set in order to speed up value-function-based reinforcement learning. The domain expert divides the feature set into a series of subsets such that a novel problem concept can be learned from each successive subset. Domain knowledge is also used to order the feature subsets in order of their importance for learning. Our algorithm uses the ordered feature subsets to learn tasks significantly faster than if the entire feature set is used from the start. Incremental Feature-Set Augmentation (IFSA) is fully implemented and tested in three different domains: Gridworld, Blackjack and RoboCup Soccer Keepaway. All experiments show that IFSA can significantly speed up learning and motivates the applicability of this novel RL method.

##### Notes

Best Student Paper Nomination at AAMAS-2007.

##### BibTex Citation
@inproceedings{aamas07-ahmadi,
author = {Ahmadi, Mazda and Taylor, Matthew E. and Stone, Peter},
title = {{{IFSA} : Incremental Feature-Set Augmentation for Reinforcement Learning Tasks}},
booktitle = {{Proceedings of the the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
pages = {1120--1127},
month = may,
year = {2007},
month_numeric = {5}
}

• Reinforcement Learning, Inference, Machine Learning in Practice

2007 Matthew E. Taylor, Cynthia Matuszek, Pace Reagan Smith, and Michael Witbrock. May 2007. “Guiding Inference with Policy Search Reinforcement Learning.” In Proceedings of the Twentieth International FLAIRS Conference (FLAIRS ). 52% acceptance rate

Funded by DARPA

##### Abstract

Symbolic reasoning is a well understood and effective approach to handling reasoning over formally represented knowledge; however, simple symbolic inference systems necessarily slow as complexity and ground facts grow. As automated approaches to ontology-building become more prevalent and sophisticated, knowledge base systems become larger and more complex, necessitating techniques for faster inference. This work uses reinforcement learning, a statistical machine learning technique, to learn control laws which guide inference. We implement our learning method in ResearchCyc, a very large knowledge base with millions of assertions. A large set of test queries, some of which require tens of thousands of inference steps to answer, can be answered faster after training over an independent set of training queries. Furthermore, this learned inference module outperforms ResearchCyc’s integrated inference module, a module that has been hand-tuned with considerable effort.

FLAIRS-2007

##### BibTex Citation
@inproceedings{flairs07-taylor-inference,
author = {Taylor, Matthew E. and Matuszek, Cynthia and Smith, Pace Reagan and Witbrock, Michael},
title = {{Guiding Inference with Policy Search Reinforcement Learning}},
booktitle = {{Proceedings of the Twentieth International FLAIRS Conference {(FLAIRS} )}},
month = may,
year = {2007},
month_numeric = {5}
}

• Reinforcement Learning, Ontologies, Machine Learning in Practice

2007 Matthew E. Taylor, Cynthia Matuszek, Bryan Klimt, and Michael Witbrock. May 2007. “Autonomous Classification of Knowledge into an Ontology.” In Proceedings of the Twentieth International FLAIRS Conference ( FLAIRS ). 52% acceptance rate

Funded by DARPA

##### Abstract

Ontologies are an increasingly important tool in knowledge representation, as they allow large amounts of data to be related in a logical fashion. Current research is concentrated on automatically constructing ontologies, merging ontologies with different structures, and optimal mechanisms for ontology building; in this work we consider the related, but distinct, problem of how to automatically determine where to place new knowledge into an existing ontology. Rather than relying on human knowledge engineers to carefully classify knowledge, it is becoming increasingly important for machine learning techniques to automate such a task. Automation is particularly important as the rate of ontology building via automatic knowledge acquisition techniques increases. This paper compares three well-established machine learning techniques and shows that they can be applied successfully to this knowledge placement task. Our methods are fully implemented and tested in the Cyc knowledge base system.

FLAIRS-2007

##### BibTex Citation
@inproceedings{flairs07-taylor-ontology,
author = {Taylor, Matthew E. and Matuszek, Cynthia and Klimt, Bryan and Witbrock, Michael},
title = {{Autonomous Classification of Knowledge into an Ontology}},
booktitle = {{Proceedings of the Twentieth International FLAIRS Conference ( {FLAIRS} )}},
month = may,
year = {2007},
month_numeric = {5}
}

• Reinforcement Learning, Transfer Learning

2007 Matthew E. Taylor, and Peter Stone. June 2007. “Cross-Domain Transfer for Reinforcement Learning.” In Proceedings of the Twenty-Fourth International Conference on Machine Learning ( ICML ). 29% acceptance rate

Funded by NSF, DARPA

##### Abstract

A typical goal for transfer learning algorithms is to utilize knowledge gained in a source task to learn a target task faster. Recently introduced transfer methods in reinforcement learning settings have shown considerable promise, but they typically transfer between pairs of very similar tasks. This work introduces Rule Transfer, a transfer algorithm that first learns rules to summarize a source task policy and then leverages those rules to learn faster in a target task. This paper demonstrates that Rule Transfer can effectively speed up learning in Keepaway, a benchmark RL problem in the robot soccer domain, based on experience from source tasks in the gridworld domain. We empirically show, through the use of three distinct transfer metrics, that Rule Transfer is effective across these domains.

ICML-2007

##### BibTex Citation
@inproceedings{icml07-taylor,
author = {Taylor, Matthew E. and Stone, Peter},
title = {{Cross-Domain Transfer for Reinforcement Learning}},
booktitle = {{Proceedings of the Twenty-Fourth International Conference on Machine Learning ( {ICML} )}},
month = jun,
year = {2007},
month_numeric = {6}
}

• Reinforcement Learning, Genetic Algorithms

2007 Matthew E. Taylor, Shimon Whiteson, and Peter Stone. July 2007. “Temporal Difference and Policy Search Methods for Reinforcement Learning: An Empirical Comparison.” In Proceedings of the Twenty-Second Conference on Artificial Intelligence ( AAAI ), 1675–78. Nectar Track, 38% acceptance rate

Funded by NSF, DARPA

##### Abstract

Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving difficult RL problems, but few rigorous comparisons have been conducted. Thus, no general guidelines describing the methods’ relative strengths and weaknesses are available. This paper summarizes a detailed empirical comparison between a GA and a TD method in Keepaway, a standard RL benchmark domain based on robot soccer. The results from this study help isolate the factors critical to the performance of each learning method and yield insights into their general strengths and weaknesses.

AAAI-2007

##### BibTex Citation
@inproceedings{aaai07-taylor,
author = {Taylor, Matthew E. and Whiteson, Shimon and Stone, Peter},
title = {{Temporal Difference and Policy Search Methods for Reinforcement Learning: An Empirical Comparison}},
pages = {1675--1678},
booktitle = {{Proceedings of the Twenty-Second Conference on Artificial Intelligence ( {AAAI} )}},
month = jul,
year = {2007},
month_numeric = {7}
}

• Transfer Learning, Planning

2007 Matthew E. Taylor, Gregory Kuhlmann, and Peter Stone. September 2007. “Accelerating Search with Transferred Heuristics.” ICAPS -07 Workshop on AI Planning and Learning.

Funded by NSF, DARPA

##### BibTex Citation
@misc{icaps07ws-taylor,
author = {Taylor, Matthew E. and Kuhlmann, Gregory and Stone, Peter},
title = {{Accelerating Search with Transferred Heuristics}},
booktitle = {{{ICAPS} -07 workshop on AI Planning and Learning}},
month = sep,
year = {2007},
month_numeric = {9}
}

• Reinforcement Learning, Transfer Learning

2007 Matthew E. Taylor, and Peter Stone. November 2007. “Representation Transfer for Reinforcement Learning.” AAAI 2007 Fall Symposium on Computational Approaches to Representation Change during Learning and Development.

Funded by DARPA, NSF

##### Abstract

Transfer learning problems are typically framed as leveraging knowledge learned on a source task to improve learning on a related, but different, target task. Current transfer learning methods are able to successfully transfer knowledge from a source reinforcement learning task into a target task, reducing learning time. However, the complimentary task of transferring knowledge between agents with different internal representations has not been well explored The goal in both types of transfer problems is the same: reduce the time needed to learn the target with transfer, relative to learning the target without transfer. This work defines representation transfer, contrasts it with task transfer, and introduces two novel algorithms. Additionally, we show representation transfer algorithms can also be successfully used for task transfer, providing an empirical connection between the two problems. These algorithms are fully implemented in a complex multiagent domain and experiments demonstrate that transferring the learned knowledge between different representations is both possible and beneficial.

##### BibTex Citation
@misc{aaai07-symposium,
author = {Taylor, Matthew E. and Stone, Peter},
title = {{Representation Transfer for Reinforcement Learning}},
booktitle = {{{AAAI} 2007 Fall Symposium on Computational Approaches to Representation Change during Learning and Development}},
month = nov,
year = {2007},
month_numeric = {11}
}

• Reinforcement Learning, Autonomic Computing, Machine Learning in Practice

2007 Matthew E. Taylor, Katherine E. Coons, Behnam Robatmili, Doug Burger, and Kathryn S. McKinley. December 2007. “Policy Search Optimization for Spatial Path Planning.” NIPS -07 Workshop on Machine Learning for Systems Problems. (Two page extended abstract.)

##### BibTex Citation
@misc{nips07-taylor,
author = {Taylor, Matthew E. and Coons, Katherine E. and Robatmili, Behnam and Burger, Doug and McKinley, Kathryn S.},
title = {{Policy Search Optimization for Spatial Path Planning}},
booktitle = {{{NIPS} -07 workshop on Machine Learning for Systems Problems}},
month = dec,
year = {2007},
month_numeric = {12}
}


## 2006

• Transfer Learning, Reinforcement Learning

2006 Matthew E. Taylor, Shimon Whiteson, and Peter Stone. June 2006. “Transfer Learning for Policy Search Methods.” ICML Workshop on Structural Knowledge Transfer for Machine Learning.

Funded by NSF, DARPA

##### Abstract

An ambitious goal of transfer learning is to learn a task faster after training on a different, but related, task. In this paper we extend a previously successful temporal difference approach to transfer in reinforcement learning tasks to work with policy search. In particular, we show how to construct a mapping to translate a population of policies trained via genetic algorithms (GAs) from a source task to a target task. Empirical results in robot soccer Keepaway, a standard RL benchmark domain, demonstrate that transfer via inter-task mapping can markedly reduce the time required to learn a second, more complex, task.

##### BibTex Citation
@misc{icml06-taylor,
author = {Taylor, Matthew E. and Whiteson, Shimon and Stone, Peter},
title = {{Transfer Learning for Policy Search Methods}},
booktitle = {{{ICML} workshop on Structural Knowledge Transfer for Machine Learning}},
month = jun,
year = {2006},
month_numeric = {6}
}

• Reinforcement Learning, Genetic Algorithms, Machine Learning in Practice

2006 Matthew E. Taylor, Shimon Whiteson, and Peter Stone. July 2006. “Comparing Evolutionary and Temporal Difference Methods for Reinforcement Learning.” In Proceedings of the Genetic and Evolutionary Computation Conference ( GECCO ), 1321–28. 46% acceptance rate, Best Paper Award in GA track (of 85 submissions)

Funded by NSF, DARPA

##### Abstract

Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical comparisons have been conducted, there are no general guidelines describing the methods’ relative strengths and weaknesses. This paper presents the results of a detailed empirical comparison between a GA and a TD method in Keepaway, a standard RL benchmark domain based on robot soccer. In particular, we compare the performance of NEAT \cite stanley:ec02evolving , a GA that evolves neural networks, with Sarsa \cite Rummery94,Singh96 , a popular TD method. The results demonstrate that NEAT can learn better policies in this task, though it requires more evaluations to do so. Additional experiments in two variations of Keepaway demonstrate that Sarsa learns better policies when the task is fully observable and NEAT learns faster when the task is deterministic. Together, these results help isolate the factors critical to the performance of each method and yield insights into their general strengths and weaknesses.

##### Notes

Best Paper Award (Genetic Algorithms Track) at GECCO-2006.

##### BibTex Citation
@inproceedings{gecco06-taylor,
author = {Taylor, Matthew E. and Whiteson, Shimon and Stone, Peter},
title = {{Comparing Evolutionary and Temporal Difference Methods for Reinforcement Learning}},
booktitle = {{Proceedings of the Genetic and Evolutionary Computation Conference ( {GECCO} )}},
month = jul,
year = {2006},
pages = {1321--28},
month_numeric = {7}
}

• Transfer Learning, Reinforcement Learning

2006 Shimon Whiteson, Matthew E. Taylor, and Peter Stone. December 2006. “Adaptive Tile Coding for Reinforcement Learning.” NIPS Workshop on: Towards a New Reinforcement Learning?

Funded by NSF, DARPA

##### Notes

NIPS-2006 (Poster).
Superseded by the technical report Adaptive Tile Coding for Value Function Approximation.

##### BibTex Citation
@misc{nips06-whiteson,
author = {Whiteson, Shimon and Taylor, Matthew E. and Stone, Peter},
title = {{Adaptive Tile Coding for Reinforcement Learning}},
booktitle = {{{NIPS} workshop on: Towards a New Reinforcement Learning?}},
month = dec,
year = {2006},
month_numeric = {12}
}


## 2005

• Reinforcement Learning, Transfer Learning

2005 Matthew E. Taylor, Peter Stone, and Yaxin Liu. July 2005. “Value Functions for RL -Based Behavior Transfer: A Comparative Study.” In Proceedings of the Twentieth National Conference on Artificial Intelligence ( AAAI ). 18% acceptance rate.

Funded by NSF, DARPA

##### Abstract

Temporal difference (TD) learning methods have become popular reinforcement learning techniques in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but have often been found slow in practice. This paper presents methods for further generalizing across tasks, thereby speeding up learning, via a novel form of behavior transfer. We compare learning on a complex task with three function approximators, a CMAC, a neural network, and an RBF, and demonstrate that behavior transfer works well with all three. Using behavior transfer, agents are able to learn one task and then markedly reduce the time it takes to learn a more complex task. Our algorithms are fully implemented and tested in the RoboCup-soccer keepaway domain.

##### Notes

AAAI-2005.
Superseded by the journal article Transfer Learning via Inter-Task Mappings for Temporal Difference Learning.

##### BibTex Citation
@inproceedings{aaai05-taylor,
author = {Taylor, Matthew E. and Stone, Peter and Liu, Yaxin},
title = {{Value Functions for {RL} -Based Behavior Transfer: A Comparative Study}},
booktitle = {{Proceedings of the Twentieth National Conference on Artificial Intelligence ( {AAAI} )}},
month = jul,
year = {2005},
month_numeric = {7}
}

• Reinforcement Learning, Transfer Learning

2005 Matthew E. Taylor, and Peter Stone. July 2005. “Behavior Transfer for Value-Function-Based Reinforcement Learning.” In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems ( AAMAS ), 53–59. 25% acceptance rate.

Funded by DARPA, NSF

##### Abstract

Temporal difference (TD) learning methods have become popular reinforcement learning techniques in recent years. TD methods have had some experimental successes and have been shown to exhibit some desirable properties in theory, but have often been found very slow in practice. A key feature of TD methods is that they represent policies in terms of value functions. In this paper we introduce \emph behavior transfer , a novel approach to speeding up TD learning by transferring the learned value function from one task to a second related task. We present experimental results showing that autonomous learners are able to learn one multiagent task and then use behavior transfer to markedly reduce the total training time for a more complex task.

##### Notes

AAMAS-2005.
Superseded by the journal article Transfer Learning via Inter-Task Mappings for Temporal Difference Learning.

##### BibTex Citation
@inproceedings{aamas05-taylor,
author = {Taylor, Matthew E. and Stone, Peter},
title = {{Behavior Transfer for Value-Function-Based Reinforcement Learning}},
booktitle = {{Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems ( {AAMAS} )}},
month = jul,
year = {2005},
pages = {53--59},
month_numeric = {7}
}


## 2004

• Reinforcement Learning, Transfer Learning

2004 Matthew E. Taylor, and Peter Stone. October 2004. “Speeding up Reinforcement Learning with Behavior Transfer.” AAAI 2004 Fall Symposium on Real-Life Reinforcement Learning.

Funded by NSF

##### Abstract

Reinforcement learning (RL) methods have become popular machine learning techniques in recent years. RL has had some experimental successes and has been shown to exhibit some desirable properties in theory, but it has often been found very slow in practice. In this paper we introduce \emph behavior transfer , a novel approach to speeding up traditional RL. We present experimental results showing a learner is able learn one task and then use behavior transfer to markedly reduce the total training time for a more complex task.

##### Notes

Superseded by the journal article Transfer Learning via Inter-Task Mappings for Temporal Difference Learning.

##### BibTex Citation
@misc{aaai04-symposium,
author = {Taylor, Matthew E. and Stone, Peter},
title = {{Speeding up Reinforcement Learning with Behavior Transfer}},
booktitle = {{{AAAI} 2004 Fall Symposium on Real-life Reinforcement Learning}},
month = oct,
year = {2004},
month_numeric = {10}
}