AD ALTA
JOURNAL OF INTERDISCIPLINARY RESEARCH
be processed via ´Automated neural networks´ tool. The result, if
its validity is improved (it will prove a higher level of accuracy),
it will be subsequently varied on the level of vector weights
among neurons.
We are looking for an artificial neural structure, which will be
able to classify each enterprise, based on the input data, into one
of four groups:
The enterprise is not going bankrupt (a creditworthy enterprise),
Bankruptcy in the given year,
Bankruptcy in two years,
Bankruptcy in the future (in a period longer than two years).
First, we will establish the properties of individual
characteristics of the enterprise. It is necessary to define the
output categorical quantity. In this case, it is obvious that this
will be a value within a column in the MS Excel notebook
marked as ´resulting situation´. At the same time, we need to
know the results for at least the periods of 2008 to 2014. Further,
we will establish the categorical input quantities. In case of
neural structures, categorical quantities are transferred into a
binary code, i.e. into the form of ´YES´ (1) or ´NO´ (0). In case
of, for instance, placing the enterprise within a given region, we
are counting on 14 regions. The code will state that the
enterprise does not reside in thirteen regions, and it does reside
in the fourteenth region – the numeric code thus contains 14
numerals (0 or 1). These are non-financial indicators (e.g. the
place of the enterprise residence, the region respectively). All the
stated items of financial statements and numbers of employees
will belong among continuous quantities.
Subsequently, the file will be randomly divided (sampled) into
three groups of enterprises – i.e. a training file (neural networks
are trained on this one to reach the best results possible), a
testing file (this file tests the success of trained artificial neural
structures classification), and a validational file (used for the
second validation of the result obtained). The data will be
divided in the following ratio among the training, testing and
validational file: 70:15:15. The choice will be random. Thus, the
ratio of individual enterprise groups is not preserved (the
enterprise is not going bankrupt, bankruptcy in two years,
bankruptcy in the future) in individual data files. If we keep the
ratio, we might distort the result. Equally, the sub-sampling
4
will
be done randomly. A maximum of two sub-samples will be
created. The seed (for a random number choice) for sub-
sampling will be stated at a value of 10.
Subsequently, 10,000 random artificial neural structures
5
will be
generated, out of which, ten of the best results will be preserved.
To create the model, we will use multiple perceptron networks
(MLP) and linear neural networks, probabilistic neural networks
(PNN), generalized regression neural networks (GRNN), radial
basic-function neural networks (RBF), three-layer perceptron
networks (TLP), and four-layer perceptron networks (FLP).
In case of radial basic-function neural networks, we will use 1 up
to 40 hidden neurons. The second layer of the three-layer
perceptron network will contain 1 to 10 hidden neurons. The
second and third layer of the four-layer perceptron network will
contain always 1 to 10 hidden neurons. Perceptron networks will
classify individual enterprises based on cross entropy. That
works with multinomial division of frequency (unlike e.g.
smallest squares sum, which presumes a normal division of
frequency). The analysis thus can be stopped, if the value of
cross entropy draws near the value of 0 and if it does not
improve any longer. The threshold of classification is assigned
based on the highest trust. Hidden layers as well as output
neurons of identical functions will be utilized as activation
functions for neurons, and they are presented in Table No. 1.
4
By sub-sampling, in this case, clustering of data lines is meant to be based on
reported similar characteristics.
5
If the improvement of individual trained networks is not significant, training of neural
networks can be shortened.
Table 1: Activation Functions in Neurons´ Hidden and Output
Layers
Function
Definition
Extension
Identical
x
(-
∞, +∞)
Logistical
(0, +1)
Hyperbolic
(-1, +1)
Exponential
(0, +∞)
Sinus
[0, +1]
Source: Author
Weight decomposition will be carried out with a one-hundredth
accuracy for both hidden and output layers
6
. Initialization will
not be used.
The result of the calculation will be:
An overview of the best 10 generated and preserved networks
(including a complete result description in an xml file) from the
previously generated 10,000. Confusion matrices via which we
will determine classification (prediction) success of a possible
enterprise bankruptcy, respectively the correctness and
incorrectness of estimates in individual cases.
Sensitivity analysis, which will confirm in every generated
neural network which input quantities are necessary for the given
neural structure, and the weight of the specific input quantity
included. The scheme of preserved neural structures.
3 Results
The overview of individual generated and preserved networks is
the object of Table No. 2
7
(Inserted in Attachment number 1).
BP value in the table indicates using the Back Propagation
algorithm. It is one of the so-far mostly used algorithms, which
has been published independently by several authors: Rumelhart,
Hinton and Williams (1986), Werbos (1974) and Parker (1985).
Its advantage is that it requires less memory than most of other
algorithms, and it usually reaches an acceptable amount of error
quite fast. Moreover, it is useful for most neural networks. The
abbreviation ´CG´ represents the Conjugate gradient descent
algorithm (Bishop, 1995; Shepherd, 1997). It is an advanced
method of training of a multilayer perceptron network. Usually,
it proves significantly better results than Back propagation.
Equally, it can be used to solve the same tasks as Back
propagation. Its use is recommended for any networks with a
greater amount of weights, and a multiple outcome. PI, i.e.
Pseudo-Inverse Algorithm represents the optimization technique
via the method of smallest squares (Kahan, 1965). SS represents
a (sub) sample, i.e. sub-sampling. KN represents nearest
neighbor deviation assignment. It is an algorithm assigning
radial unit deviations via RMS (an efficient value) distance from
K units closest towards each unit in the form of standard
deviation. Each unit thus has its own, independently calculated
deviation based on the density of points clustered near each
other.
The most valuable network is the one, which proves the highest
reliability values for the training, testing and evaluating data file.
At the same time, ideally an identical or at least similar value is
required in all three sets. In case of obtained results, it may be
observed that this condition has been met in nine out of ten
preserved networks. The only exception is Network No. 2, MLP
2:7-88-63-4:1, proving minimal values. At the same time, we are
looking for a network, which proves minimal error, again
relatively identical for all, training, testing and verifying data
6
Weight decomposition is determined based on iteration in the software. Iteration
accuracy of each weight was determined to be equal to 0.01 for the analysis´ purpose.
7
It is suitable to add that results may slightly differ for repeatedly carried-out analyses.
This is given by the fact that neural network algorithm uses slightly different
generators meant for variable initiation weights. This helps reach a slightly different
local minimum in a function. The result is not significantly influenced by this fact.
- 234 -