Difference between revisions of "Methods"
From aHuman Wiki
(Automated page entry using MWPush.pl) |
(Automated page entry using MWPush.pl) |
||
Line 1: | Line 1: | ||
− | |||
<pre style="color: green">Algorithms and Methods</pre> | <pre style="color: green">Algorithms and Methods</pre> | ||
Line 10: | Line 9: | ||
* backward modeling | * backward modeling | ||
* batch training | * batch training | ||
− | + | *# batch back-propagation | |
− | + | *# training by epoch | |
* boosting (Schapire, 2001) | * boosting (Schapire, 2001) | ||
* cascade algorithm | * cascade algorithm | ||
Line 29: | Line 28: | ||
* greedy learning | * greedy learning | ||
* incremental training (of neural network) | * incremental training (of neural network) | ||
− | + | *# incremental back-propagation | |
− | + | *# online training | |
− | + | *# training by pattern | |
* kernel methods | * kernel methods | ||
* learning | * learning | ||
− | + | *# learning-by-example | |
* mini-batch training (Wilson and Martinez, 2003) | * mini-batch training (Wilson and Martinez, 2003) | ||
* off-line learning | * off-line learning | ||
Line 42: | Line 41: | ||
* policy iteration algorithm (Kaelbling et al., 1996) | * policy iteration algorithm (Kaelbling et al., 1996) | ||
* reinforcement learning | * reinforcement learning | ||
− | + | *# episodic reinforcement learning | |
− | + | *# model-based reinforcement learning | |
− | + | *# model-free reinforcement learning | |
− | + | *# non-episodic reinforcement learning | |
− | + | *# off-policy learning | |
− | + | *# on-policy learning | |
− | + | *# tabular reinforcement learning | |
* supervised training | * supervised training | ||
* temporal difference learning algorithms | * temporal difference learning algorithms | ||
Line 55: | Line 54: | ||
* value iteration algorithm | * value iteration algorithm | ||
* wake-sleep algorithm | * wake-sleep algorithm | ||
− | + | *# contrastive wake-sleep | |
* weight update algorithm | * weight update algorithm | ||
Line 66: | Line 65: | ||
* steepness parameter | * steepness parameter | ||
* step size | * step size | ||
− | + | *# fixed step-size | |
− | + | *# dynamic step-size | |
* variational bound | * variational bound | ||
Line 75: | Line 74: | ||
* Bayesian techniques (Neal, 1996) | * Bayesian techniques (Neal, 1996) | ||
* Cascade 2 | * Cascade 2 | ||
− | + | *# Cascade 2 with caching | |
* Cascade Correlation (Prechelt, 1997) | * Cascade Correlation (Prechelt, 1997) | ||
* Casper algorithm (Treadgold and Gedeon, 1997) | * Casper algorithm (Treadgold and Gedeon, 1997) | ||
Line 83: | Line 82: | ||
* Explicit Explore or Exploit (Kearns and Singh, 1998) | * Explicit Explore or Exploit (Kearns and Singh, 1998) | ||
* Gibbs Sampling | * Gibbs Sampling | ||
− | + | *# Alternating Gibbs Sampling | |
* K-Nearest Neighbor | * K-Nearest Neighbor | ||
* Learning Vector Quantization (LVQ) | * Learning Vector Quantization (LVQ) | ||
Line 94: | Line 93: | ||
* N-Step Return Algorithm | * N-Step Return Algorithm | ||
* Neural Fitted Q Iteration (Riedmiller, 2005) | * Neural Fitted Q Iteration (Riedmiller, 2005) | ||
− | + | *# NFQ-SARSA(L) | |
* Optimal Brain Damage (LeCun et al., 1990) | * Optimal Brain Damage (LeCun et al., 1990) | ||
* Orthogonal Least Squares (OLS) | * Orthogonal Least Squares (OLS) | ||
Line 100: | Line 99: | ||
* Prioritized Sweeping (Sutton and Barto, 1998) | * Prioritized Sweeping (Sutton and Barto, 1998) | ||
* Q-Learning | * Q-Learning | ||
− | + | *# Delayed Q-learning (Strehl et al., 2006) | |
− | + | *# Generalized Policy Iteration (GPI) (Sutton and Barto, 1998) | |
− | + | *# Naive Q(λ) (Sutton and Barto, 1998) | |
− | + | *# One-Step Q-Learning | |
− | + | *# Peng’s Q(λ) (Peng and Williams, 1994) | |
− | + | *# Q(λ) | |
− | + | *# Watkin’s Q(λ) (Watkins, 1989) | |
* Q-SARSA(λ) | * Q-SARSA(λ) | ||
− | + | *# One-Step Q-SARSA Algorithm | |
* Quickprop (Fahlman, 1988) | * Quickprop (Fahlman, 1988) | ||
* R-Max (Brafman and Tennenholtz, 2002) | * R-Max (Brafman and Tennenholtz, 2002) | ||
* RPROP (Riedmiller and Braun, 1993) | * RPROP (Riedmiller and Braun, 1993) | ||
− | + | *# iRPROP- (Igel and Hsken, 2000) | |
* SARSA(λ) | * SARSA(λ) | ||
− | + | *# On-Policy SARSA-Learning | |
− | + | *# SARSA with Linear Function Approximators (Gordon, 2000) | |
* Simulated Annealing (Kirkpatrick et al., 1987) | * Simulated Annealing (Kirkpatrick et al., 1987) | ||
* Sliding Window Cache | * Sliding Window Cache | ||
* SONN (Tenorio and Lee, 1989) | * SONN (Tenorio and Lee, 1989) | ||
* Temporal Difference (TD) Learning | * Temporal Difference (TD) Learning | ||
− | + | *# Monte Carlo Prediction | |
− | + | *# TD(0) Algorithm (Sutton, 1988) | |
− | + | *# TD(λ) Algorithm | |
− | + | *# TD(λ) Algorithm With Linear Function Approximators (Tsitsiklis and Roy, 1996) | |
− | + | *# Temporal Difference (TD) Prediction Algorithm | |
* λ-Return Approach | * λ-Return Approach |
Latest revision as of 19:07, 28 November 2018
Algorithms and Methods
@@Home -> ArtificialIntelligenceDictionary -> methods
General Methods
- back-propagation (Werbos, 1974)
- backward modeling
- batch training
*# batch back-propagation *# training by epoch
- boosting (Schapire, 2001)
- cascade algorithm
- constructive training (of neural network)
- convex optimization
- covariance training
- direct error minimization
- discriminative training methods
- dynamic algorithms
- ensemble learning (Krogh and Vedelsby, 1995; Diettrich, 2000)
- explorative algorithms
- exploitive algorithms
- forward modeling
- genetic algorithms (Goldberg, 1989)
- global optimization techniques
- gradient methods
- greedy learning
- incremental training (of neural network)
*# incremental back-propagation *# online training *# training by pattern
- kernel methods
- learning
*# learning-by-example
- mini-batch training (Wilson and Martinez, 2003)
- off-line learning
- off-policy learning
- on-policy learning
- online learning
- policy iteration algorithm (Kaelbling et al., 1996)
- reinforcement learning
*# episodic reinforcement learning *# model-based reinforcement learning *# model-free reinforcement learning *# non-episodic reinforcement learning *# off-policy learning *# on-policy learning *# tabular reinforcement learning
- supervised training
- temporal difference learning algorithms
- training
- unsupervised training
- value iteration algorithm
- wake-sleep algorithm
*# contrastive wake-sleep
- weight update algorithm
Common Parameters of Algorithms
- fixed step-size
- dynamic step-size
- patience parameter
- momentum
- steepness parameter
- step size
*# fixed step-size *# dynamic step-size
- variational bound
Named Algorithms
- Backprop
- Bayesian techniques (Neal, 1996)
- Cascade 2
*# Cascade 2 with caching
- Cascade Correlation (Prechelt, 1997)
- Casper algorithm (Treadgold and Gedeon, 1997)
- Cerebellar Model Articulation Controller (CMAC) (Albus, 1975; Glanz et al., 1991; Sutton and Barto, 1998)
- Contrastive Divergence Learning
- Dyna-Q (Sutton and Barto, 1998)
- Explicit Explore or Exploit (Kearns and Singh, 1998)
- Gibbs Sampling
*# Alternating Gibbs Sampling
- K-Nearest Neighbor
- Learning Vector Quantization (LVQ)
- Levenberg-Marquardt (More, 1977)
- Locality-Sensitive Hashing (LSH)
- Maximum Likelihood (ML) Learning
- Model-Based Interval Estimation (MBIE) (Strehl and Littman, 2004)
- Model-Based Policy Gradient methods (MBPG) (Wang and Dietterich, 2003)
- Monte Carlo Algorithms
- N-Step Return Algorithm
- Neural Fitted Q Iteration (Riedmiller, 2005)
*# NFQ-SARSA(L)
- Optimal Brain Damage (LeCun et al., 1990)
- Orthogonal Least Squares (OLS)
- Particle Swarm (Kennedy and Eberhart, 1995)
- Prioritized Sweeping (Sutton and Barto, 1998)
- Q-Learning
*# Delayed Q-learning (Strehl et al., 2006) *# Generalized Policy Iteration (GPI) (Sutton and Barto, 1998) *# Naive Q(λ) (Sutton and Barto, 1998) *# One-Step Q-Learning *# Peng’s Q(λ) (Peng and Williams, 1994) *# Q(λ) *# Watkin’s Q(λ) (Watkins, 1989)
- Q-SARSA(λ)
*# One-Step Q-SARSA Algorithm
- Quickprop (Fahlman, 1988)
- R-Max (Brafman and Tennenholtz, 2002)
- RPROP (Riedmiller and Braun, 1993)
*# iRPROP- (Igel and Hsken, 2000)
- SARSA(λ)
*# On-Policy SARSA-Learning *# SARSA with Linear Function Approximators (Gordon, 2000)
- Simulated Annealing (Kirkpatrick et al., 1987)
- Sliding Window Cache
- SONN (Tenorio and Lee, 1989)
- Temporal Difference (TD) Learning
*# Monte Carlo Prediction *# TD(0) Algorithm (Sutton, 1988) *# TD(λ) Algorithm *# TD(λ) Algorithm With Linear Function Approximators (Tsitsiklis and Roy, 1996) *# Temporal Difference (TD) Prediction Algorithm
- λ-Return Approach