#summary Algorithms and Methods @@[Home] -> [ArtificialIntelligenceDictionary] -> [methods] ---- ==General Methods== * back-propagation (Werbos, 1974) * backward modeling * batch training *# batch back-propagation *# training by epoch * boosting (Schapire, 2001) * cascade algorithm * constructive training (of neural network) * convex optimization * covariance training * direct error minimization * discriminative training methods * dynamic algorithms * ensemble learning (Krogh and Vedelsby, 1995; Diettrich, 2000) * explorative algorithms * exploitive algorithms * forward modeling * genetic algorithms (Goldberg, 1989) * global optimization techniques * gradient methods * greedy learning * incremental training (of neural network) *# incremental back-propagation *# online training *# training by pattern * kernel methods * learning *# learning-by-example * mini-batch training (Wilson and Martinez, 2003) * off-line learning * off-policy learning * on-policy learning * online learning * policy iteration algorithm (Kaelbling et al., 1996) * reinforcement learning *# episodic reinforcement learning *# model-based reinforcement learning *# model-free reinforcement learning *# non-episodic reinforcement learning *# off-policy learning *# on-policy learning *# tabular reinforcement learning * supervised training * temporal difference learning algorithms * training * unsupervised training * value iteration algorithm * wake-sleep algorithm *# contrastive wake-sleep * weight update algorithm ==Common Parameters of Algorithms== * fixed step-size * dynamic step-size * patience parameter * momentum * steepness parameter * step size *# fixed step-size *# dynamic step-size * variational bound ==Named Algorithms== * Backprop * Bayesian techniques (Neal, 1996) * Cascade 2 *# Cascade 2 with caching * Cascade Correlation (Prechelt, 1997) * Casper algorithm (Treadgold and Gedeon, 1997) * Cerebellar Model Articulation Controller (CMAC) (Albus, 1975; Glanz et al., 1991; Sutton and Barto, 1998) * Contrastive Divergence Learning * Dyna-Q (Sutton and Barto, 1998) * Explicit Explore or Exploit (Kearns and Singh, 1998) * Gibbs Sampling *# Alternating Gibbs Sampling * K-Nearest Neighbor * Learning Vector Quantization (LVQ) * Levenberg-Marquardt (More, 1977) * Locality-Sensitive Hashing (LSH) * Maximum Likelihood (ML) Learning * Model-Based Interval Estimation (MBIE) (Strehl and Littman, 2004) * Model-Based Policy Gradient methods (MBPG) (Wang and Dietterich, 2003) * Monte Carlo Algorithms * N-Step Return Algorithm * Neural Fitted Q Iteration (Riedmiller, 2005) *# NFQ-SARSA(L) * Optimal Brain Damage (LeCun et al., 1990) * Orthogonal Least Squares (OLS) * Particle Swarm (Kennedy and Eberhart, 1995) * Prioritized Sweeping (Sutton and Barto, 1998) * Q-Learning *# Delayed Q-learning (Strehl et al., 2006) *# Generalized Policy Iteration (GPI) (Sutton and Barto, 1998) *# Naive Q(λ) (Sutton and Barto, 1998) *# One-Step Q-Learning *# Peng’s Q(λ) (Peng and Williams, 1994) *# Q(λ) *# Watkin’s Q(λ) (Watkins, 1989) * Q-SARSA(λ) *# One-Step Q-SARSA Algorithm * Quickprop (Fahlman, 1988) * R-Max (Brafman and Tennenholtz, 2002) * RPROP (Riedmiller and Braun, 1993) *# iRPROP- (Igel and Hsken, 2000) * SARSA(λ) *# On-Policy SARSA-Learning *# SARSA with Linear Function Approximators (Gordon, 2000) * Simulated Annealing (Kirkpatrick et al., 1987) * Sliding Window Cache * SONN (Tenorio and Lee, 1989) * Temporal Difference (TD) Learning *# Monte Carlo Prediction *# TD(0) Algorithm (Sutton, 1988) *# TD(λ) Algorithm *# TD(λ) Algorithm With Linear Function Approximators (Tsitsiklis and Roy, 1996) *# Temporal Difference (TD) Prediction Algorithm * λ-Return Approach