| |||||
| |||||
[1] The Vapnik-Chervonenkis dimension: Information versus complexity in learning. Neural Computation 312—317, 1989.
[2] Computational experience with a quasi Newton based training of the feedforward neural network. In World Congress on Neural Networks (San Diego), pp. 607—612. Lawrence Erlbaum, Hillsdale, NJ, 1994.
[4] Learning from natural selection in an artificial environment. In Proceedings of the International Joint Conference on Neural Networks (San Diego), p. 189, IEEE,New York, 1990.
[5] Scaling and generalization in neural networks: a case study. In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Connectionist Summer School, pp. 3—10. Morgan Kaufmann, San Mateo, 1988.
[6] Fitting autoregressive models for prediction. Annals of the Institute of Statistical Mathematics pp. 21: 243—247, 1969.
[7] A new look at the statistical model identification. IEEE Transactions on Automatic Control 716—723, 1974.
[8] Are modified back-propagation algorithms worth the effort? In IEEE International Conference on Neural Networks (Orlando), pp. 567—571. IEEE,New York, 1994.
[9] Efficacy of modified backpropagation and optimisation methods on a real-world medical problem. Neural Networks 945—962, 1995.
[10] Fuzzy control of backpropagation. In Proceedings of the First IEEE International Conference on Fuzzy Systems, (FUZZ-IEEE '92), San Diego, pp. 967—972. IEEE,New York, 1992.
[11] Dynamic node creation in back-propagation networks. Technical Report 8901. Institute for Cognitive Science, UCSD,La Jolla, 1989.
[12] Training connectionist networks with queries and selective sampling. In Advances in Neural Information Processing Systems (Denver) pp. 566—573, Morgan Kaufmann, San Mateo, 1990.
[13] Exponentially many local minima for single neurons. In D. S. Touretsky, M.C. Mozer, and M. E. Hasselmo, Advances in Neural Information Processing Systems pp. 316—322. MIT Press, Cambridge, 1996.
[14] One-dimensional search strategies for conjugate gradient training of backpropagation neural networks. In Proceedings of the Artificial Neural Networks in Engineering (ANNIE'92) Conference, St. Louis, pp. 192—202. IEEE,New York, 1992.
[15] Automatic evolution of neural net architectures. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), pp. 589—592. IEEE,New York, 1990.
[16] Temporal evolution of generalization during learning in linear networks. Neural Computation 598—603, 1991.
[17] Backpropagation and unsupervised learning in linear networks. In Y. Chauvin, and D. E. Rumelhart, editors, Backpropagation: Theory, Architectures, and Applications, chapter 12, pp. 389—432. Erlbaum, Hillsdale, NJ, 1994.
[18] Neural networks and principal component analysis: Learning from examples without local minima. Neural Networks 53—58, 1989.
[20] A comparative study of optimization techniques for backpropagation. Neuro-computing pp. 6: 19—30, 1994.
[21] Approximation bounds for superpositions of a sigmoidal function. In Proceedings of the IEEE International Symposium on Information Theory. IEEE Press, New York, 1991.
[22] Universal approximation bounds of a sigmoidal function. Technical Report pp. 58. Statistics Department, University of Illinois, Urbana-Champaign, 1991.
[23] Universal approximation bounds for superposition of a sigmoidal function. IEEE Transactions on Information Theory 930—945, 1993.
[24] Reinforcement learning and adaptive critic methods. In D. A. White, and D. A. Sofge, editors Handbook of Intelligent Control, chapter 12.Van Nostrand Reinhold, New York, 1992.
[25] Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics 834—846, 1983. (reprinted in J. A. Anderson, and E. Rosenfeld, editors, Neurocomputing. 1988).
[26] Accelerated backpropagation learning: Two optimization methods. Complex Systems, pp. 3: 331—342, 1989.
[27] Optimization methods for back-propagation: Automatic parameter tuning and faster convergence. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), pp. 593—596. IEEE,New York, 1990.
[28] First- and second-order methods for learning: Between steepest descent and Newton's method. Neural Computation 141—166, 1992.
[29] Learning with first, second, and no derivatives: A case study in high energy physics. Neurocomputing pp. 6: 181—206, 1994.
[32] When are k-nearest neighbor and back propagation accurate for feasible sized sets of examples? L. B. Almeida, C. J. Wellekens, Neural Networks, Proceedings EURASIP Workshop, Lecture Notes in Computer Science, pp. 2—25. Springer-Verlag, New York, February 1990..
[33] What size net gives valid generalization? Advances in Neural Information Processing Systems pp. 81—90. Morgan Kaufmann, San Mateo, 1989.
[35] Training a 3-node neural network is NP-complete. Proceedings of the 1988 Workshop on Computational Learning Theory, pp. 9—18. Morgan Kaufmann, San Mateo, 1988.
[36] Improving the accuracy of an artificial neural network using multiple differently trained networks. Neural Computation 772—780, 1992.
[37] Improving the convergence of back-propagation learning with second order methods. D. Touretzky, G. Hinton, T. Sejnowski, Proceedings of the 1988 Connectionist Models Summer School, pp. 29—37. Morgan Kaufmann, San Mateo, 1988.
[38] Enhanced training algorithms, and integrated training/architecture selection for multilayer perceptrons. IEEE Transactions on Neural Networks, 864—875, 1992.
[39] Exact calculation of the Hessian matrix for the multilayer perceptron. Neural Computation 494—501, 1992.
[40] Curvature-driven smoothing in backpropagation neural networks. IEEE Transactions on Neural Networks 494—501, 1993.
[41] A fast procedure for retraining the multilayer perceptron. International Journal of Neural Systems 229—236, 1991.
[42] Improving the generalization properties of radial basis function neural networks. Neural Computation 579—588, 1991.
[43] Curvature-driven smoothing: A learning algorithm for feedforward networks. IEEE Transactions on Neural Networks 882—884, 1993. [44] Neural Networks for Pattern Recognition. Oxford University Press, Oxford, 1995.
[45] Training with noise is equivalent to Tikhonov regularization. Neural Computation, 108—116, 1995.
[46] The perceptron: A model for brain functioning. I. Reviews of Modern Physics, pp. 34: 123—135, 1962. Neurocomputing. MIT Press, Cambridge, 1988.
[47] Training feed-forward neural networks using conjugate gradients. SPIE, pp. p. 179. Society of Photo-Optical Instrumentation Engineers,Bellingham, WA 1992.
[49] Learnability and the Vapnik-Chervonenkis dimension. Journal of the Association for Computing Machinery 929—965, 1989.
[50] Improving search in genetic algorithms. Genetic Algorithms and Simulated Annealing. Morgan Kaufmann, San Mateo, 1990.
[51] Using classifier systems to implement distributed representations. Proceedings of the International Joint Conference on Neural Networks (San Diego), pp. p. 39. IEEE,New York, 1990.
[52] Classifier systems and genetic algorithms. Machine Learning: Paradigms and Methods. MIT Press, Cambridge, 1990.
[53] Neural network design using Voronoi diagrams: Preliminaries. Proceedings of the International Joint Conference on Neural Networks (Baltimore), pp. 127—132. IEEE Press, New York, 1992.
[54] Neural network design using Voronoi diagrams. IEEE Transactions on Neural Networks 778—787, 1993.
[55] Neural Network Fundamentals with Graphs, Algorithms, and Applications. McGraw-Hill, New York, 1996.
[56] Generalization properties of radial basis functions. R. P. Lippmann, J. E. Moody, D. S. Touretzky, Advances in Neural Information Processing Systems pp. 707—713. Morgan Kaufmann, San Mateo, 1991.
[57] Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics pp. 59: 291—294, 1988.
[58] Gradient descent fails to separate. Proceedings of the IEEE International Conference on Neural Networks (San Diego), pp. 649—656. IEEE,New York, 1988.
[59] Back propagation fails to separate where perceptrons succeed. IEEE Transactions on Circuits and Systems pp. 36: 665—674, 1989.
[60] Current research. The Mathematics of Generalization, pp. 361—368. Addison-Wesley, Reading, MA 1995.
[61] Reflections after refereeing papers for NIPS. The Mathematics of Generalization, pp. 11—15. Addison-Wesley, Reading, MA 1995.
[62] Computing second derivatives in feed-forward networks: A review. IEEE Transactions on Neural Networks 480—488, 1994.
[63] Fuzzy ARTMAP: An adaptive resonance architecture for incremental learning of analog maps. Proceedings of the International Joint Conference on Neural Networks (Baltimore), pp. 309—314. IEEE,New York, 1992.
[64] Construction of neural nets using the Radon transform. Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), pp. 607—611. IEEE,New York, 1989.
[65] Successfully using peak learning rates of 10 (and greater) in back-propagation networks with the heuristic learning algorithm. Proceedings of the IEEE First International Conference on Neural Networks (San Diego). >IEEE,New York, 1987.
[67] An adaptive training algorithm for back propagation networks. Computer Speech and Language pp. 2: 205—218, 1987.
[68] Using genetic algorithms to improve pattern classification performance. R. P. Lippmann, J. E. Moody, D. S. Touretzky, Advances in Neural Information Processing (Denver, 1990) pp. 797—803. Morgan Kaufmann, San Mateo, 1991.
[69] Conjugate gradient algorithm for efficient training of artificial neural networks. IEE Proceedings 301—310, 1992.
[70] A back-propagation algorithm with optimal use of hidden units. Advances in Neural Information Processing Systems (Denver, 1988) pp. 519—526. Morgan Kaufmann, San Mateo, 1989.
[71] Principal component analysis by gradient descent on a constrained linear Hebbian cell. Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), pp. 373—380. IEEE,New York, 1989.
[72] Dynamic behavior of constrained back-propagation networks. Advances in Neural Information Processing Systems pp. 642—649. Morgan Kaufmann, San Mateo, 1990.
[73] Generalization performance of overtrained back-propagation networks. L. B. Almeida, C. J. Wellekens, Neural Networks, Proceedings EUROSIP Workshop, Lecture Notes in Computer Science, pp. 46—55. Springer-Verlag, New York, Feb. 1990..
[74] Generalization dynamics in LMS trained linear networks. R. Lippmann, J. Moody, D. Touretzky, Advances in Neural Information Processing Systems pp. 890—896. Morgan Kaufmann, San Mateo, 1991.
[75] A comparison study of the gradient descent and the conjugate gradient backpropagation neural networks. World Congress on Neural Networks (Portland), pp. 401—406. Erlbaum, Hillsdale, NJ 1993.
[76] Stepsize variation methods for accelerating the back propagation algorithm. Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), pp. 601—604. IEEE,New York, 1990.
[77] Optimization schemes for neural network training. IEEE International Conference on Neural Networks (Orlando), pp. 817—822. IEEE,New York, 1994.
[78] On the geometry of feedforward neural network error surfaces. Neural Computation 910—927, 1993.
[79] Statistical and neural network techniques for nonparametric regression. Selecting Models from Data, Lecture Notes in Statistics 89,pp. 383—392. Springer Verlag, New York, 1994.
[80] Why two hidden layers are better than one. Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), pp. 265—268. IEEE,New York, 1990.
[81] Fuzzy parameter adaptation in neural systems. Proceedings of the International Joint Conference on Neural Networks (Baltimore), pp. 232—235. IEEE,New York, 1992.
[82] Neural Networks for Optimization and Signal Processing. Wiley, 1993.
[83] Combining forecasts: A review and annotated bibliography. International Journal of Forecasting 559—583, 1989.
[84] Adaptive gain networks. IEEE International Conference on Neural Networks (Orlando), pp. 339—344. IEEE,New York, 1994.
[85] Can neural-networks do better than the Vapnik-Chervonenkis bounds? R.P. Lippmann, J.E. Moody, D.S Touretzky, Advances in Neural Information Processing Systems (Denver, 1990) pp. 911—917. Morgan Kaufmann, San Mateo, 1991.
[86] An iterative method for training multilayer networks with threshold functions. IEEE Transactions on Neural Networks pp. 507—508, . 1994.
[87] Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, pp. 14: 326—334, . 1965.
[88] Predicting the Mackey-Glass timeseries with cascade-correlation learning. D.S. Touretzky, J.L. Elman, T.J. Sejnowski, G.E. Hinton, Connectionist Models: Proceedings of the 1990 Summer School, pp. 117—123. Morgan Kaufmann, San Mateo, 1991.
[89] Y. Une procédure d'apprentissage pour réseau à seuil assymétrique. COGNITIVA 85: A la Frontière de l'Intelligence Artificielle des Science de la Connaissance des Neurosciences (Paris), pp. 599—604. CESTA, Paris, 1985.
[90] Learning process in an asymmetric threshold network. E. Bienenstock, F. Fogelman Soulié, G. Weisbuch, Disordered Systems and Biological Organization, pp. 233—240. Springer-Verlag, New York, 1986.
[91] Optimal brain damage. Advances in Neural Information Processing Systems (Denver) pp. 598—605. Morgan Kaufmann, San Mateo, 1990.
[92] Eigenvalues of covariance matrices: Application to neural-network learning. Physical Review Letters 2396—2399, 1991.
[93] Second order properties of error surfaces: Learning time and generalization. R.P. Lippmann, J.E. Moody, D.S. Touretzky, Advances in Neural Information Processing Systems pp. 918—924. Morgan Kaufmann, San Mateo, 1991.
[94] Automatic learning rate maximization by on-line estimation of the Hessian eigenvectors. S.J. Hanson, J.D. Cowan, C.L. Giles, Advances in Neural Information Processing Systems pp. 156—163. Morgan Kaufmann, San Mateo, 1993.
[95] Approximation by superpositions of a sigmoidal function. Technical Report pp. 856, . Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, 1988.
[96] Approximation by superpositions of sigmoids. Mathematics of Control, Signals, and Systems pp. 2: 303—314, . 1989.
[97] Note on learning rate schedules for stochastic optimization. R.P. Lippmann, J.E. Moody, D.S. Touretzky, Advances in Neural Information Processing Systems pp. 832—838. Morgan Kaufmann, San Mateo, 1991.
[98] Towards faster stochastic gradient search. J.E. Moody, S.J. Hanson, R.P. Lippmann, Advances in Neural Information Processing Systems pp. 1009—1016. Morgan Kaufmann, San Mateo, 1992.
[99] On the intractability of loading neural networks. V. Roychowdhury, K.-Y. Siu, A. Orlitsky, Theoretical Advances in Neural Computation and Learning, pp. 357—389. Kluwer, Boston, 1994.
[101] Mapping classifier systems into neural networks. Advances in Neural Information Processing Systems pp. 49—56. Morgan Kaufmann, San Mateo, 1989.
[102] Genetic programming: Modular neural evolution for Darwin machines. Proceedings of the International Joint Conference on Neural Networks (San Diego), pp. , p. 194. IEEE, . New York, 1990.
[103] "Plain backpropagation" and advanced optimization algorithms: A comparative study. Neurocomputing pp. 6: 153—161, . 1994.
[104] Backpropagation neural nets with one and two hidden layers. IEEE Transactions on Neural Networks 136—141, . 1993.
[106] Principal Component Neural Networks, Theory and Applications. John Wiley & Sons, New York 1996.
[107] Error-correcting output codes: A general method for improving multiclass inductive learning programs. In D.H. Wolpert, editor, The Mathematics of Generalization, pp. 395—407. Addison-Wesley, Reading MA, 1995.
[108] Optimisation of network structure using genetic techniques. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), pp. 965—970. IEEE, New York 1990.
[109] Genetic reinforcement learning for neural networks. In Proceedings of the International Joint Conference on Neural Networks (Seattle), pp. 71—76. IEEE, New York 1991.
[110] Statistically controlled activation weight initialization (SCAWI). IEEE Transactions on Neural Networks 627—631, 1992.
[112] Double backpropagation and increasing generalization performance. In Proceedings of the International Joint Conference on Neural Networks (Seattle), pp. 145—150. IEEE, New York 1991.
[113] Pattern Classification and Scene Analysis. Wiley, New York, 1973.
[117] A general lower bound on the number of examples needed for learning. In Proceedings of the 1988 Workshop on Computational Learning Theory. Morgan Kaufmann, San Mateo, 1988.
[118] Learning the hidden structure of speech. Journal of the Acoustical Society of America 1615—1626, 1988.
[119] The recurrent cascade-correlation architecture. In R.P. Lippmann, J.E. Moody, D.S. Touretzky, editors, Advances in Neural Information Processing Systems (Denver, 1990) pp. 190—196. Morgan Kaufmann, San Mateo, 1991.
[120] The cascade-correlation learning architecture. In D.S. Touretzky, editors, Advances in Neural Information Processing Systems (Denver, 1989) pp. 524—532. Morgan Kaufmann, San Mateo, 1990.
[121] Faster-learning variations of back-propagation: An empirical study. In D. Touretzky, G. Hinton, T. Sejnowski, editors, Proceedings of the 1988 Connectionist Models Summer School, pp. 38—51. Morgan Kaufmann, San Mateo, 1988.
[122] Extended regularization methods for nonconvergent model selection. In S.J. Hanson, J.D. Cowan, C.L. Giles, editors, Advances in Neural Information Processing Systems pp. 630—637. Morgan Kaufmann, San Mateo, 1993.
[124] Ship wake detection procedure using conjugate gradient trained artificial neural networks. IEEE Transactions on Geoscience and Remote Sensing 718, 1991.
[125] Summed weight neuron perturbation: An O(N) improvement over weight perturbation. In S.J. Hanson, J.D. Cowan, C.L. Giles, editors, Advances in Neural Information Processing Systems pp. 212—219. Morgan Kaufmann, San Mateo, 1993.
[126] Speech recognition with back propagation. In Proceedings of the Ninth Annual Conference of the IEEE Engineering in Medicine and Biology Society, pp. 1702—1703. IEEE Press, IEEE, New York, 1987.
[127] Backpropagation for linearly-separable patterns: a detailed analysis. In IEEE International Conference on Neural Networks (San Francisco), pp. 1818—1822. IEEE, New York, 1993.
[128] The upstart algorithm: A method for constructing and training feedforward neural networks. Neural Computation 198—209, 1990.
[131] On the approximate realization of continuous mappings by neural networks. Neural Networks 183—192, 1989.
[132] There exists a neural network that does not make avoidable mistakes. In Proceedings of the IEEE International Conference on Neural Networks (San Diego), pp. 657—664. IEEE, New York, 1988(reprinted in [395]).
[133] Perceptron-based learning algorithms. IEEE Transactions on Neural Networks 179—191, June 1990.
[134] On the relations between discriminant analysis and multilayer perceptrons. Neural Networks 349—360, 1991.
[135] Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 721—741, 1984. (reprinted in J.A. Anderson and E. Rosenfeld, editors, Neurocomputing, MIT Press, Cambridge, 1988).
[136] Embedding domain information in backpropagation. In Proceedings of SPIE Conference on Adaptive and Learning Systems, Orlando, Fla. Society of Photo-Optical Instrumentation Engineers, Bellingham, WA1992.
[137] On the decision regions of multilayer perceptrons. Proceedings of the IEEE 1590—1594, Oct. 1990.
[138] Practical Optimization. Academic Press, New York, 1981.
[139] Representation properties of networks: Kolmogorov's theorem is irrelevant. Neural Computation 465—469, 1989.
[140] Networks for learning. In P. Antognetti and V. Milutinović, editors, Neural Networks: Concepts, Applications, and Implementations, chapter 6. Prentice Hall, Englewood Cliffs, NJ, 1991.
[141] Genetic Algorithms in Search, Optimization, & Machine Learning. Addision-Wesley, Reading, MA, 1989.
[142] Communication complexity and lower bounds for threshold circuits. In V. Roychowdhury, K.-Y. Siu, and A. Orlitsky, Theoretical Advances in Neural Computation and Learning, pp. 85—125. Kluwer Boston, 1994.
[143] Majority gates vs. general weighted threshold gates. In Proceedings of the 7th Annual Structure in Complexity Theory Conference, pp. 2—13. IEEE Computer Society Press, Los Alamitos, CA, 1991.
[144] Optimal convergence of on-line backpropagation. IEEE Transactions on Neural Networks 251—254, 1996.
[145] On the problem of local minima in backpropagation. IEEE Transactions on Pattern Analysis and Machine Intelligence 76—86, 1992.
[146] An empirical comparison of backpropagation training algorithms. In World Congress on Neural Networks (Portland), pp. 401—406. Erlbaum Hillsdale, NJ, 1993.
[147] Structural risk minimization for character recognition. In J.E. Moody, S.J. Hanson, and R.P. Lippmann, Advances in Neural Information Processing Systems pp. 471—478. Morgan Kaufmann, San Mateo, 1992.
[148] Training feedforward networks with the Marquardt algorithm. IEEE Transactions on Neural Networks 989—993, 1994.
[149] Generalization ability of artificial neural network using Fahlman and Lebiere's learning algorithm. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), pp. 613—618. IEEE Press, New York, 1992.
[150] Comments on "Can backpropagation error surface not have local minima." IEEE Transactions on Neural Networks 844—845, 1994.
[152] Stochastic linear learning: Exact test and training set averages. Neural Networks 393—396, 1993.
[153] Meiosis networks. In Advances in Neural Information Processing Systems (Denver, 1989) pp. 533—541. Morgan Kaufmann, San Mateo, 1990.
[154] Comparing biases for minimal network construction with back-propagation. In D.S. Touretzky,editors, Advances in Neural Information Processing Systems (Denver, 1988) pp. 177—185. Morgan Kaufmann, San Mateo, 1989.
[155] Genetic optimization of self-organizing feature maps. In Proceedings of the International Joint Conference on Neural Networks (Seattle), pp. 341—346, 1991.
[156] Designing application-specific neural networks using the genetic algorithm. In D.S. Touretzky,editors, Advances in Neural Information Processing Systems pp. 447—454. Morgan Kaufmann, San Mateo, 1989.
[157] Optimal brain surgeon. S.J. Hanson, J.D. Cowan, and C.L. Giles, Advances in Neural Information Processing Systems (Denver, 1992) pp. 164—171. Morgan Kaufmann, San Mateo, 1993.
[158] Optimal brain surgeon: Extensions and performance comparisons. In J.D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems pp. 263—270. Morgan Kaufmann, San Mateo, 1994.
[159] Optimal brain surgeon and general network pruning. In IEEE International Conference on Neural Networks (San Francisco), pp. 293—299. IEEE, New York, 1993.
[160] Neural Networks, A Comprehensive Foundation. Macmillan, New York, 1993.
[161] mapping neural network existence theorem. In Proceedings of the IEEE First International Conference on Neural Networks (San Diego) pp. 11—13. IEEE, New York, 1987.
[162] Theory of the backpropagation neural network. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), pp. 593—605. IEEE, New York, 1989.
[163] On the algebraic structure of feedforward network weight spaces. In Advanced Neural Computers, pp. 129—135. Elsevier, New York, 1990.
[164] The munificence of high dimensionality. In I. Aleksander J. Taylor, editors, Proceedings of the 1992 International Conference on Artificial Neural Networks (ICANN-92), pp. 1017—1030. Elsevier, 1992.
[165] A comparison of weight elimination methods for reducing complexity in neural networks. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), pp. 980—987. IEEE, New York, 1992.
[166] Introduction to the Theory of Neural Computation. Addison-Wesley, Reading, MA, 1991.
[167] Statistical dynamics of learning. In T. Kohonen, K. Mäkisara, O. Simula, and J. Kangas, editors, Artificial Neural Networks (ICANN-91), pp. 125—131. Elsevier, New York, 1991.
[168] Numerical analysis and adaptation method for learning rate of back propagation. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), pp. 627—630. IEEE, New York, 1990.
[170] Back-propagation algorithm which varies the number of hidden units. Neural Networks 61—66, 1991.
[171] Learning with limited numerical precision using the cascade-correlation algorithm. IEEE Transactions on Neural Networks 602—611, 1992.
[172] How practical are VC dimension bounds. In IEEE International Conference on Neural Networks (Orlando), pp. 327—332. IEEE, New York, 1994.
[174] Using additive noise in back-propagation training. IEEE Transactions on Neural Networks 24—38, Jan. 1992.
[177] Multilayer feedforward networks are universal approximators. Neural Networks 359—366, 1989(reprinted in [395]).
[179] Should back propagation be replaced by more effective optimization algorithms? In Proceedings of the International Joint Conference on Neural Networks (Seattle), pp. 353—356. IEEE, New York, 1991.
[180] Neural net and traditional classifiers. In D. Anderson,editors, Neural Information Processing Systems (Denver 1987), pp. 387—396. American Institute of Physics, New York, 1988.
[181] Error surfaces for multi-layer perceptrons. IEEE Transactions on Systems, Man, and Cybernetics 1152—1161, 1992.
[182] Improving the learning rate of back-propagation with the gradient reuse algorithm. In Proceedings of the IEEE International Conference on Neural Networks (San Diego), pp. 441—447. IEEE, New York, 1988.
[183] Error surfaces for multi-layer perceptrons. In Proceedings of the International Joint Conference on Neural Networks (Seattle), pp. 759—764. IEEE, New York, 1991.
[184] Query learning based on boundary search and gradient computation of trained multilayer perceptrons. In Proceedings of the International Joint Conference on Neural Networks (San Diego), pp. 57—62. IEEE, New York, June 1990.
[185] Regression modeling in backpropagation and projection pursuit learning. IEEE Transactions on Neural Networks 342—353, 1994.
[186] Capabilities of three-layered perceptrons. In Proceedings of the IEEE International Conference on Neural Networks (San Diego), pp. 641—648. IEEE, New York, 1988.
[187] A structural learning algorithm with forgetting of link weights. Technical Report Electrotechnical Laboratory, Tsukuba-City, Japan1990.
[188] Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory. Neural Networks 385—394, 1991.
[190] Speeding up back propagation. In Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), pp. 639—642. IEEE, New York, 1990.
[191] Weight perturbation: An optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks. Neural Computation 546—565, 1991.
[192] Weight perturbation: An optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks. IEEE Transactions on Neural Networks 154—157, 1992.
[193] Initial experiments on constructing domains of expertise and hierarchies in connectionist systems. In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Connectionist Models Summer School, pp. 144—1153. Morgan Kaufmann, San Mateo, 1988.
[194] Increased rates of convergence through learning rate adaptation. Neural Networks 295—307, 1988.
[196] Equivalence relation between the backpropagation learning process of an FNN and that of an FNNG. Neural Networks 411, 1994.
[197] Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method. International Journal of Neural Systems 291—301, 1991.
[198] Using the symmetries of a multi-layered network to reduce the weight space. In Proceedings of the International Joint Conference on Neural Networks (Seattle), pp. 391—396. IEEE, New York, 1991.
[200] Modularity, unsupervised learning, and supervised learning. In Connectionism: Theory and Practice, pp. 21—29. Oxford University Press, Oxford, 1992.
[202] J.S., Judd, editors, Neural Network Design and the Complexity of Learning. MIT Press, Cambridge, 1990.
[203] Learning in neural networks. In D. Haussler and L. Pitt, editors, Proceedings of the 1988 Workshop on Computational Learning Theory, pp. 2—8. Morgan Kaufmann, San Mateo, 1988.
[205] Improving generalization performance by information minimization. IEEE International Conference on Neural Networks (Orlando), pp. 143—147. IEEE, New York, 1994.
[206] E.R., Kandel and J.H. Schwartz, editors, Principles of Neural Science, 2nd edition.Elsevier, New York, 1985.
[207] A simple procedure for pruning back-propagation trained neural networks. IEEE Transactions on Neural Networks 239—242, 1990.
[208] Davidon least squares based learning algorithm for feedforward neural networks. Neural Networks 661—670, 1994.
[209] Evolution and learning in neural networks: The number and distribution of learning trials affect the rate of evolution. In R.P. Lippmann, J.E. Moody, and D.S Touretzky, editors, Advances in Neural Information Processing (Denver) pp. 804—810. Morgan Kaufmann, San Mateo, 1991.
[210] Comparison and evaluation of variants of the conjugate gradient method for efficient learning in feed-forward neural networks with backward error propagation. Network: Computation in Neural Systems 27—35, 1992.
[211] Optimization by simulated annealing. Science 671—680, 1983 (reprinted in J.A. Anderson E. Rosenfeld, editors, Neurocomputing, MIT Press, Cambridge, 1988).
[212] Self-Organization and Associative Memory, Springer Series in Information Sciences, 2nd edition.Springer-Verlag, Berlin, 1988.
[213] Neural networks with quadratic VC dimension. In D.S. Touretsky M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems pp. 197—203. MIT Press, Cambridge, 1996.
[214] Kernel regression and backpropagation training with noise. In Proceedings of the International Joint Conference on Neural Networks (Singapore), pp. 367—372. IEEE, New York, 1991.
[215] Kernel regression and backpropagation training with noise. In J.E. Moody, S.J. Hanson, and R.P. Lippmann, Advances in Neural Information Processing Systems pp. 1035—1039. Morgan Kaufmnn San Mateo, 1992.
[217] Back propagation is sensitive to initial conditions. In Advances in Neural Information Processing Systems pp. 860—867. Morgan Kaufmann San Mateo, 1991.
[218] An adaptive least squares algorithm for the efficient training of multilayered networks. IEEE Transactions on Circuits and Systems 1092—1101, 1989.
[219] On the representation of continuous functions of several variables by superpositions of continuous functions of one variable and addition. Doklady Akademii Nauk SSSR 953—956, 1957(in Russian).
[220] On the representation of continuous functions of several variables by superpositions of continuous functions of one variable and addition. American Mathematical Society Translations 55—59, 1963.
[222] A genetic approach to the truck backer upper problem and the inter-twined spiral problem. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), pp. 310—318. IEEE, New York, 1992.
[223] Cart centering and broom balancing by genetically breeding populations of control strategy programs. In Proceedings of the International Joint Conference on Neural Networks (San Diego), p. 198. IEEE, New York, 1990.
[224] Genetic generation of both the weights and architecture for a neural network. In Proceedings of the International Joint Conference on Neural Networks (Seattle), p. 397. IEEE, New York, 1991.
[225] Efficient parallel learning algorithms for neural networks. In Advances in Neural Information Processing Systems pp. 40—48. Morgan Kaufmann, San Mateo, 1989.
[226] Arbitrary nonlinearity is sufficient to represent all functions by neural networks: A theorem. Neural Networks 381—383, 1991.
[227] A simple weight decay can improve generalization. In J.E. Moody, S.J. Hanson, and R.P. Lippmann, editors, Advances in Neural Information Processing Systems pp. 950—957. Morgan Kaufmann San Mateo, 1992.
[228] Creating local and distributed bottlenecks in hidden layers of back-propagation networks. In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Connectionist Models Summer School, pp. 120—126, Morgan Kaufmann San Mateo, 1989.
[229] Improving generalization in back-propagation networks with distributed bottlenecks. In Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), pp. 443—447. IEEE, New York, 1989.
[230] Benefits of the gain: Speeded learning and minimal hidden layers in back-propagation networks. IEEE Transactions on Systems, Man, and Cybernetics 273—280, 1991.
[233] Learning to tell two spirals apart. In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Connectionist Models Summer School, pp. 52—59. Morgan Kaufmann San Mateo, 1989.
[234] How neural nets work. In D. Anderson, editor, Neural Information Processing Systems (Denver 1987), pp. 442—456. American Institute of Physics, New York, 1988.
[235] Effect of initial weights on back-propagation and its variations. In IEEE International Conference on Systems, Man, and Cybernetics, pp. 218—219. IEEE, New York, 1989.
[236] Generalization and network design strategies. Technical Report CRG-TR-89-4, University of Toronto, Department of Computer Science, 1989.
[237] Improvement of function approximation capability of backpropagation neural networks. In Proceedings of the International Joint Conference on Neural Networks (Singapore), pp. 1367—1372. IEEE, New York, 1991.
[238] Integration of neural networks and decision tree classifiers for automated cytology screening. In Proceedings of the International Joint Conference on Neural Networks (Seattle), pp. 257—262. IEEE, New York, 1991.
[239] Practical characteristics of neural network and conventional pattern classifiers on artificial and speech problems. In Advances in Neural Information Processing Systems pp. 168—177. Morgan Kaufmann San Mateo, 1990.
[240] The effect of initial weights on premature saturation in back-propagation learning. In Proceedings of the International Joint Conference on Neural Networks (Seattle), pp. 765—770. IEEE, New York, 1991.
[241] An analysis of premature saturation in back propagation learning. Neural Networks 719—728, 1993.
[242] Fast pruning using principal components. In J.D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems pp. 35—42. Morgan Kaufmann, San Mateo, 1994.
[243] A statistical approach to learning and generalization in layered neural networks. Proceedings of the IEEE 1568—1574, Oct.1990.
[245] Synergy of clustering multiple backpropagation networks. In D.S. Touretzky, editor, Advances in Neural Information Processing Systems (Denver, 1989) pp. 650—657. Morgan Kaufmann San Mateo, 1990.
[246] Inversion of multilayer nets. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), pp. 425—430. IEEE, New York, 1989.
[248] Cascade network architectures. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), pp. 398—404. IEEE, New York, 1992.
[249] Analysis of decision contour of neural network with sigmoidal nonlinearity. In Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), pp. 655—658. IEEE, New York, 1990.
[250] On the computational power of sigmoid versus boolean threshold circuits. In Proceedings of the 32nd Annual Symp. on Foundations of Computer Science, pp. 767—776. IEEE, Computer Society Press, Los Alamitos, CA, 1991.
[253] A practical Bayesian framework for backpropagation networks. Neural Computation 448—472, 1992.
[254] Pattern recognition properties of neural networks. In B. H. Juang, S. Y. Kung, C. A. Kamm, Neural Networks for Signal Processing: Proceedings of the 1991 IEEE Workshop, pp. 173—187. IEEE Press, New York 1991.
[255] Formation of disconnected decision regions with a single hidden layer. Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), pp. 455—460. IEEE, New York 1989.
[256] A convergence theorem for sequential learning in two-layer perceptrons. Europhysics Letters 487—492, 1990.
[259] An approach to generalization problem in back-propagation learning. International Neural Network Conference (Paris), 765—768, 1990.
[260] Noise injection into inputs in back-propagation learning. IEEE Transactions on Systems, Man, and Cybernetics 436—440, 1992.
[261] Training hidden units: The generalized delta rule. In Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises, pp. 121—160. MIT Press, Cambridge 1988.
[262] Can back propagation error surfaces have non-global minima? Technical report, Department of Electrical and Computer Engineering, University of California at San Diego, August 1988.
[263] kack propagation error surfaces can have local minima. Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), pp. 627, 1989(abstract).
[264] Equations of state calculations by fast computing machines. Journal of Chemistry and Physics 1087—1091, 1953.
[265] Learning in feedforward layered networks: The tiling algorithm. Journal of Physics A 2191—2203, 1989.
[266] Acceleration of back-propagation through learning rate and momentum adaptation. In Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), pp. 676—679. IEEE, New York 1990.
[267] Fault tolerance of the backpropagation neural network trained on noisy inputs. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), pp. 847—852. IEEE, New York 1992.
[269] Bounds on the learning capacity of some multi-layer networks. Biological Cybernetics 345—356, 1989.
[270] A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks 525—533, 1993.
[271] Learning with localized receptive fields. Proceedings of the 1988 Connectionist Models Summer School, pp. 133—143. Morgan Kaufmann, San Mateo, 1988.
[272] Fast learning in networks of locally-tuned processing units. Neural Computation 281—294, 1989.
[273] The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems, In J.E. Moody, S.J. Hanson, and R.P. Lippmann, editors, Advances in Neural Information Processing Systems pp. 847—854. Morgan Kaufmann, San Mateo, 1992.
[274] Theory of networks for learning. In SPIE Applications of Artificial Neural Networks, pp. 22—30. Society of Photo-Optical Instrumentation Engineers, Bellingham, WA, 1990.
[275] Skeletonization: A technique for trimming the fat from a network via relevance assessment. In D.A. Touretzky, editor, Advances in Neural Information Processing Systems (Denver, 1988) pp. 107—115. Morgan Kaufmann, San Mateo, 1989.
[276] A polynomial time algorithm for generating neural networks for pattern classification: Its stability properties and some test results. Neural Computation 317—330, 1993.
[278] Threshold logic and its Applications. Wiley, New York, 1971.
[280] An information theoretic design and training algorithm for neural networks. IEEE Transaction on Circuits and Systems 1542—1547, 1991.
[281] On the generalization ability of neural network classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 659—663, 1994.
[283] Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In Proceedings of the International Joint Conference on Neural Networks (San Diego), pp. 211—226. IEEE, New York, 1990.
[285] Adaptive soft weight tying using Gaussian mixtures. In J.E. Moody,S.J. Hanson, R.P. Lippmann, editors, Advances in Neural Information Processing Systems pp. 993—1000. Morgan Kaufmann, San Mateo, 1992.
[287] Query based learning in a multilayered perceptron in the presence of data jitter. In M.A. El-Sharkawi, and R.J. Marks, II, editors, Applications of Neural Networks to Power Systems (Seattle) pp. 72—75. IEEE Press, New York, 1991.
[288] Neural networks, principal components, and subspaces. International Journal of Neural Systems 61—68, 1089.
[289] A simplified neuron model as a principal component analyzer. Journal of Mathematical Biology 267—273, 1982.
[290] On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. Journal of Mathematical Analysis and Applications 69—84, 1985.
[291] et al. Initializing multilayer neural networks with fuzzy logic. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), pp. 239—244. IEEE, New York, 1992.
[293] A note on the estimation of the generalization error and the prevention of overfitting. In IEEE International Conference on Neural Networks (Orlando), pp. 321—325. IEEE, New York, 1994.
[294] Data-driven weight initialization of back-propagation for pattern recognition. In Proceedings of the International Conferance on Artificial Neural Networks (ICANN'94), pp. 851—854. Springer-Verlag, London, 1994.
[295] Learning logic. Technical Report TR-47. Center for Computational Research in Economics and Management Science, Massachusetts Institute of Technology, Cambridge, 1985.
[296] Optimal algorithms for adaptive networks: Second-order back propagation, second-order direct propagation, and second-order Hebbian learning. In Proceedings of the IEEE First International Conference on Neural Networks (San Diego), pp. 593—600. IEEE, New York, 1987.
[298] Chaitin-Kolmogorov complexity and generalization in neural networks. In R. Lippmann, J. Moody, and D. Touretzky, editors, Advances in Neural Information Processing Systems pp. 925—931. Morgan Kaufmann, San Mateo, 1991.
[299] Experiments on learning by back propagation. Technical Report Carnegie-Mellon University, Pittsburgh, 1986.
[301] Local minima and back propagtion. In Proceedings of the International Joint Conference on Neural Networks (Seattle), pp. 173—176. IEEE, New York, 1991.
[302] Numerical Recipes in C. Cambridge University Press, Cambridge, 1988.
[303] The role of the number of training samples on weight initialization of artificial neural net classifier. In RNNS/IEEE Symposium on Neuroinformatics and Neurocomputing (Rostov-on-Don, Russia), pp. 343—353. IEEE Press, New York, 1992.
[304] Artificial evolution and artificial intelligence. In R. Forsyth, editor, Machine Learning, Principles and Techniques, pp. 83—103. Chapman and Hall Computing, New York, 1989.
[305] An equivalence between sigmoidal gain scaling and training with noisy (jittered) input data. In RNNS/IEEE Symposium on Neuroinformatics and Neurocomputing (Rostov-on-Don, Russia), pp. 120—127. IEEE Press, New York, 1992.
[306] Similarities of error regularization, sigmoid gain scaling, target smoothing, and training with jitter. IEEE Transactions on Neural Networks 529—538, May 1995.
[307] Regularization using jittered training data. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), pp. 147—152. IEEE, New York, 1992.
[309] Neural network aided prosthetic alignment. In 1995 IEEE International Conference on Systems, Man, and Cybernetics, Vancouver, British Columbia, Oct. 1995, pp. 505—508, IEEE, New York, 1995.
[310] An improved version of the pseudo-inverse solution for classification and neural networks. Europhysics Letters 387—392, 1989.
[311] Engineering Optimization, Methods and Applications. Wiley, 1983.
[312] The effect of the slope of the activation function on the back propagation algorithm. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), pp. 707—710. IEEE, New York, 1990.
[313] Learning of word stress in a sub-optimal second order backpropagation neural network. In Proceedings of the IEEE International Conference on Neural Networks (San Diego), pp. 355—361. IEEE, New York, 1988.
[314] Advanced supervised learning in multi-layer perceptronsߞfrom backpropagation to adaptive learning algorithms. Computer Standards & Interfaces 16, 1994.
[315] A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In IEEE International Conference on Neural Networks (San Francisco), pp. 586—591. IEEE, New York, 1993.
[319] Predicting weather using a genetic memory: a combination of Kanerva's sparse distributed memory with Holland's genetic algorithms. In D.S. Touretzky, editor, Advances in Neural Information Processing Systems pp. 455—464. Morgan Kaufmann, San Mateo, 1989.
[320] Time trials on second-order and variable-learning-rate algorithms. In Advances in Neural Information Processing Systems pp. 977—983. Morgan Kaufmann, San Mateo, 1991.
[321] Optimal weight initialization for neural networks. In Proceedings of the International Conference on Artificial Neural Networks (ICANN'94), pp. 577—580. Springer-Verlag, London, 1994.
[322] Dynamic neural networks with the use of divide and conquer. In Proceedings of the International Joint Conference on Neural Networks (Baltimore), pp. 658—663. IEEE, New York, 1992.
[323] Neural control for rolling mills: Incorporating domain theories to overcome data deficiency. In J.E. Moody, S.J. Hanson, and R.P. Lippmann, editors Advances in Neural Information Processing Systems pp. 659—666. Morgan Kaufmann, San Mateo, 1992.
[324] The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review 386—408, 1958 (reprinted in In J.A. Anderson, and E. Rosenfeld, editors, Neurocomputing. MIT Press, Cambridge, 1988).
[326] Analysis of the momentum LMS algorithm. IEEE Transactions on Acoustics, Speech, and Signal Processsing 2088—2098, 1990.
[328] Backpropagation: The basic theory. In Y. Chauvin, and D.E. Rumelhart, editors, Backpropagation: Theory, Architectures, and Applications, chapter 1. Erlbaum, Hillsdale, NJ, 1994.
[329] Learning internal representations by back-propagating errors. Nature 533—536, 1986 (reprinted in J.A. Anderson, and E. Rosenfeld, editors, Neurocomputing. MIT Press, Cambridge, 1988).
[330] Learning internal representations by error propagation. In D.E. Rumelhart, and J.L. McClelland, editors, Parallel Distributed Processing, MIT Press, Cambridge, 1986(reprinted in J.A. Anderson, and E. Rosenfeld, editors, Neurocomputing. MIT Press, Cambridge, 1988).
[331] Ill-conditioning in neural network training problems. SIAM Journal of Scientific Computing 693—714, 1993.
[332] Backpropagation improvements based on heuristic arguments. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), pp. 565—568. IEEE, New York, 1990.
[333] Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks 459—473, 1989.
[334] An optimality principle for unsupervised learning. In D.S. Tourenzky, editor, Advances in Neural Information Processing Systems pp. 11—19. Morgan Kaufmann, San Mateo, 1989.
[335] Optimal pruning of neural tree networks for improved generalization. In Proceedings of the International Joint Conference on Neural Networks (Seattle), pp. 219—224. IEEE, New York, 1991.
[336] Speeding up back propagation by gradient correlation. In Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), pp. 723—726. IEEE, New York, 1990.
[338] Fault tolerance of pruned multilayer networks. In Proceedings of the International Joint Conference on Neural Networks (Seattle), pp. 447—452. IEEE, New York, 1991.
[339] Fault tolerance in feed-forward artificial neural networks. In PaoloAntognetti and Veljko Milutinović, editors, Neural Networks: Concepts, Applications, and Implementations, pp. 111—141. Prentice-Hall, Englewood Cliffs, NJ, 1991.
[340] Entropy nets: From decision trees to neural networks. Proceedings of the IEEE 1605—1613, Oct. 1990.
[341] Recent advances in numerical techniques for large scale optimization. In W.T. Miller, III, R.S. Sutton, and P.J. Werbos, editors, Neural Networks for Control, pp. 171—178. MIT Press, Cambridge, 1990.
[342] A framework for combining symbolic and neural learning. In V. Honavar, and L. Uhr, editors, Artificial Intelligence and Neural Networks: Steps Toward Principled Integration, pp. 561—580. Academic Press, New York, 1994.
[343] The LMS algorithm with momentum updating. In Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 2651—2654. IEEE, New York, 1988.
[344] Neural net pruningߞwhy and how. In Proceedings of the IEEE International Conference on Neural Networks (San Diego), pp. 325—333. IEEE, New York, 1988.
[346] Acceleration techniques for the backpropagation algorithm. In L.B. Almeida andC.J. Wellekens, editors, Neural Networks, Proceedings EURASIP Workshop, of Lecture Notes in Computer Science, pp. 110—119. Springer-Verlag, New York, 1990.
[347] Discrete Neural Computation, A Theoretical Foundation. Prentice-Hall, Englewood Cliffs, NJ, 1995.
[349] Designing multi layer perceptrons from nearest neighbor systems. IEEE Transactions on Neural Networks 329—333, 1992.
[350] Feedback stabilization using two-hidden-layer nets. Technical Report SYCON-90-11, Rutgers Center for Systems and Control, Princeton, NJ1990.
[351] Feedback stabilization using two-hidden-layer nets. IEEE Transactions on Neural Networks 981—990, 1992.
[352] Backpropagation can give rise to spurious local minima even for networks without hidden layers. Complex Systems 91—106, 1989.
[353] Backpropagation separates when perceptrons do. In Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), pp. 639—642. IEEE Press, New York, 1989.
[354] Speed up learning and network optimization with extended back propagation. Neural Networks 365—383, 1993.
[355] On the structure of continuous functions of several variables. Transactions of the American Mathematical Society 340—355, 1965.
[357] Sensitivity of feedforward neural networks to weight errors. IEEE Transactions on Neural Networks 71—80, 1990.
[358] Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), pp. 613—617. IEEE, New York, 1989(reprinted in [395]).
[359] An analysis of the delta rule and the learning of statistical associations. In D.E. Rumelhart, and J.L. McClelland, editors, Parallel Distributed Processing, MIT Press, Cambridge, 1986.
[360] On the relative time complexities of standard and conjugate gradient back propagation. In IEEE International Conference on Neural Networks (Orlando), pp. 84—87. IEEE, New York, 1994.
[361] The Symbolic-Neural Method For Creating Models and Control Behaviors From Examples. Ph.D. thesis, University of Washington, Seattle, 1988.
[362] Uniqueness of the weights for minimal feedforward nets with a given input-output map. Neural Networks 589—593, 1992.
[363] Two problems with backpropagation and other steepest-descent learning procedures for networks. In Proceedings of the 8th Annual Conference of the Cognitive Science Society, pp. 823—831. Erlbaum, Hillsdale, NJ, 1986.
[365] On design and evaluation of tapped-delay neural network architectures. In IEEE International Conference on Neural Networks (San Francisco), pp. 977—984. IEEE, New York, 1993.
[366] Does the neuron "learn" like the synapse. In Advances in Neural Information Processing Systems pp. 169—176. Morgan KaufmannSan Mateo, 1989.
[368] Neural network initialization. In International Workshop on Artificial Neural Networks, Malaga-Torremolinos, Spain, June 1995, pp. 535—542. Springer-Verlag, New York, 1995.
[369] The interchangeability of learning rate and gain in backpropagation neural networks. Neural Computation 451—460, 1996.
[371] Consistent inference of probabilities in layered networks: Predictions and generalization. In Proceedings of the International Joint Conference on Neural Networks (Washington, D.C.), p. 403. IEEE, New York, 1989.
[372] SuperSAB: Fast adaptive back propagation with good scaling properties. Neural Networks 561—573, 1990.
[373] Training binary node feedforward neural networks by backpropagation of error. IEE Electronic Letters 1745—1746, 1990.
[374] Interpretation of artificial neural networks: Mapping knowledge-based neural networks into rules. In J.E. Moody, S.J. Hanson, and R.P. Lippmann, editors, Advances in Neural Information Processing Systems pp. 977—984. Morgan Kaufmann, San Mateo, 1992.
[375] Using knowledge-based neural networks to refine roughly-correct information. In S.J. Hanson, T. Petsche, M. Kearns, and R.L. Rivest, editors, Computational Learning Theory and Natural Learning Systems, Vol. 2, Intersections between Theory and Experiment, pp. 65—80. MIT Press, Cambridge, 1994.
[378] Principles of risk minimization for learning theory. In J.E. Moody, S.J. Hanson, and R.P. Lippmann, editors, Advances in Neural Information Processing Systems pp. 831—838. Morgan Kaufmann, San Mateo, 1992.
[379] On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications 264—280, 1971.
[380] Accelerating the convergence of the back-propagation method. Biological Cybernetics 257—263, 1988.
[381] Reduction of interconnection weights in higher order associative memory networks. Proceedings of the International Joint Conference on Neural Networks (Seattle), pp. 177—182. IEEE, New York, 1991.
[382] Experiments in translating chinese characters using backpropagation. In Proceedings of IEEE Computer Society Conference, pp. 399—402, February 1988.
[383] Learning algorithms for connectionist networks: Applied gradient methods of nonlinear optimization. In Proceedings of the IEEE First International Conference on Neural Networks (San Diego), pp. 619—627. IEEE, New York, 1987.
[384] Functional approximation by feed-forward networks: A least-squares approach to generalization. R.S.R.E. Memorandum 4453, R.S.R.E., Malvern, U. K., 1991.
[385] The optimized internal representation of multilayer classifier networks performs nonlinear discriminant analysis. Neural Networks 367—375, 1990.
[386] Back-propagation, weight-elimination and time series prediction. In D. Touretzky, J. Elman, T. Sejnowski, and G. Hinton, editors, Proceedings of the 1990 Connectionist Models Summer School, pp. 105—116. Morgan Kaufmann, San Mateo, 1991.
[387] Generalization by weight-elimination applied to currency exchange rate prediction. In Proceedings of the International Joint Conference on Neural Networks (Seattle), pp. 837—841. IEEE, New York, 1991.
[388] Generalization by weight-elimination with application to forecasting. In R. Lippmann, J. Moody, and D. Touretzky, editors, Advances in Neural Information Processing Systems pp. 875—882. Morgan Kaufmann, San Mateo, 1991.
[389] Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufmann, San Mateo, 1991.
[390] Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Ph.D. dissertation, Harvard University, Cambridge, 1974(published as The Roots of Backpropagation, Wiley, 1994).
[391] Backpropagation: Past and future. In Proceedings of the IEEE International Conference on Neural Networks (San Diego), pp. 343—353. IEEE, New York, 1988.
[392] Backpropagation and neurocontrol: A review and prospectus. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), pp. 209—216. IEEE, New York, 1989.
[393] Avoiding false local minima by proper initialization of connections. IEEE Transactions on Neural Networks 899—905, 1992.
[394] Some asymptotic results for learning in single hidden layer feedforward network models. Journal of the American Statistical Association 1008—1013, 1989.
[395] Artificial Neural Networks, Approximation and Learning Theory. Blackwell, Cambridge, MA, 1992.
[396] The GENITOR algorithm and selection pressure: Why rank-based allocation of reproductive trials is best. In Proceedings Third International Conference on Genetic Algorithms. (Washington D.C.), June 1990.
[397] The evolution of connectivity: Pruning neural networks using genetic algorithms. In Proceedings of the International Joint Conference on Neural Networks (San Diego), p. 134. IEEE, New York, 1990.
[398] Optimizing small neural networks using a distributed genetic algorithm. In Proceedings of the International Joint Conference on Neural Networks (San Diego), p. 206. IEEE, New York, 1990.
[399] Generalization and information storage in networks of adaline "Neurons". In M. C. Yovitz, G. T. Jacobi, and G. D. Goldstein, editors, Self Organizing Systems 1962, pp. 435—461. Spartan Books, Washington, ,D.C., 1962.
[400] Pattern recognition and adaptive control. In 1964 IRE WESCON Convention Record, pp. 269—277. IRE, 1964.
[401] Practical applications for adaptive data-processing systems. In 1963 IRE WESCON Convention Record, pp. 1—14, 1963.
[402] Adaptive switching circuits. In 1960 IRE WESCON Convention Record, pp. 96—104. IRE, New York, 1960(reprinted in J.A. Anderson, and E. Rosenfeld, editors Neurocomputing. MIT Press, Cambridge, 1988).
[403] 30 years of adaptive neural networks: Perceptron, madaline, and backpropagation. Proceedings of the IEEE 1415—1442, Sept. 1990.
[404] Pattern-recognizing control systems. In Computer and Information Sciences (COINS) Symposium Proceedings, pp. 288—316. Spartan Books, Washington D C, 1963.
[406] Neural nets for adaptive filtering and adaptive pattern recognition. Computer, pp. 25—39, March 1988.
[407] Geometric analysis of neural network capabilities. In Proceedings of the IEEE First International Conference on Neural Networks (San Diego), pp. 385—392. IEEE, New York, 1987.
[408] Strategies for teaching layered networks classification tasks. In D. Z. Aanderson, editor, Proceedings of the Conference on Neural Information Processing Systems (Denver 1987), pp. 850—859. American Institute of Physics, New York, 1988.
[410] Node splitting: A constructive algorithm for feed-forward neural networks. In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems pp. 1072—1079. Morgan Kaufmann, San Mateo, 1992.
[411] Handwritten numeral recognition by multi-layered neural network with improved learning algorithm. In Proceedings of the International Joint Conference on Neural Networks (Washington D.C.), pp. 259—266. IEEE, New York, 1989.
[412] Training hard-limiting neurons using backpropagation algorithm by updating steepness factors. In IEEE International Conference on Neural Networks (Orlando), pp. 526—530. IEEE, New York, 1994.
[413] Can backpropagation error surface not have local minima?IEEE Transactions on Neural Networks 1019—1021, 1992.
[414] Corrections to "On the local minima free condition of backpropagation learning." IEEE Transactions on Neural Networks 1300—1303, 1995.