Application of data mining to grading indian industries on the basis of financial ratios
- Data mining, Financial Ratios, Factor Analysis, k-means Clustering and Multivariate Discriminant Analysis.
An attempt is made to introduce a new method of rating the top ranking industriess on the basis of certain financial ratios. It is well known that the financial ratios are being used as a yardstick by researchers for many purposes. About 500 industries from public and private sectors were considered for each year from 2007 to 2012, which were ranked according to their net sales. Twenty financial ratios were carefully chosen out of numerous ratios that could give different notion of the objectives and have significant meaning in the literature. The unique feature of this study is the application of factor, k-mean clustering and discriminant analyses as data mining tools to exploit the hidden structure present in the data for each of the study periods. Initially, factor analysis is used to uncover the patterns underlying financial ratios. The scores from extracted factors were used to find initial groups by k-mean clustering algorithm. A few outlier
industries, which could not be classified to any of the larger groups, were discarded as some of the ratios possessed higher values. The clusters thus obtained formed the basis for the further analyses as they inherited the structural patterns found by the factor analysis. The cluster analysis was followed by iterative discriminant procedure with original ratios until cent percent classification was achieved. Finally, the groups were identified as industries belonging to Grade A, Grade B and Grade C in that order, which exhibit the behavior of High performance, Moderate performance and Low performance. From the present study it was observed that a little over 90% of the total variations of the data were explained by the first five factors for each year. These five factors revealed the underlying structural patterns among the twenty ratios that were initially considered in the analysis. Also only three clusters could be meaningfully formed for each of the
periods. It is also interesting to note that the clusters could be arranged by magnitude of their mean vectors on selected ratios, thus permitting the groups to be identified on the basis of their performance.