概述

在Part1 C中，需要一个评价方法来对四个州的能源结构进行综合评价，通过能源相关文献，我们暂时选择定量化的熵权法对四个州进行评价，其中评价指标暂时为选择的8个MSN值（指标有可能改变），并用python实现。

理论基础

关于熵权法原理的学习参考张敬信对于该方法的介绍,简单介绍如下：

在信息论中，熵是对不确定性的一种度量。信息量越大，不确定性就越小，熵也就越小；信息量越小，不确定性越大，熵也越大。

根据熵的特性，可以通过计算熵值来判断一个事件的随机性及无序程度，也可以用熵值来判断某个指标的离散程度，指标的离散程度越大，该指标对综合评价的影响（权重）越大。比如样本数据在某指标下取值都相等，则该指标对总体评价的影响为0，权值为0.

熵权法是一种客观赋权法，因为它仅依赖于数据本身的离散性。

具体的计算过程如下：
1.对于$n$个对象，$m$个指标，输入$n*m$的矩阵，$x_{ij}$表示第$i$个对象的第$j$个指标的值。
2.对数据进行归一化处理：
对于正向指标：

$x_{ij}=\frac{x_{ij}-min\{x_{1j},...,x_{nj}\}}{max\{x_{1j},...,x_{nj}\}-min\{x_{1j},...,x_{nj}\}}$

对于负向指标：

$x_{ij}=\frac{max\{x_{1j},...,x_{nj}\}-x_{ij}}{max\{x_{1j},...,x_{nj}\}-min\{x_{1j},...,x_{nj}\}}$

3.计算第$j$项指标下第$i$个对象的值占该指标的比重：

$p_{ij} =\frac{x_{ij}}{ \sum_{i=1}^n x_{ij}}$

4.计算第$j$项指标的熵值：

$e_j = - \frac{1}{ln(n)}p_{ij}ln(p_{ij})$

5.计算信息熵的冗余度：

$d_j=1-e_j$

6.计算各项指标的权重：

$w_j = \frac{d_j}{\sum_{j=1}^m d_j}$

7.计算各对象的综合得分

$s_i = \sum_{j=1}^m w_j p_{ij}$

实现

代码实现

原文中所给出的算法是利用Matlab实现的，但是由于我的数据是利用python进行筛选和处理的，因此我利用python实现了一遍，并进行改进。原算法中归一化没有考虑“xmax-xmin == 0”的情况，分母可能为0，因此当这种情况发生时，为分母赋一个较小的值，防止分母为0，代码如下：

# normalize the data
# if type == 1 the diection is positive
# else the dieection is nagetive
# for the data numSample is the num of sample
# numTarget is the num of target
def normalize(oriData,type,ymin,ymax):

    lenOfData = len(oriData)
    xmax = max(oriData)
    xmin = min(oriData)
    oriData = np.mat(oriData)

    betweenMaxMIn = xmax-xmin
    if betweenMaxMIn == 0:
        betweenMaxMIn = 0.001

    if type == 1:
            yaim = (ymax - ymin)*(oriData - xmin)/betweenMaxMIn+ymin
    else:
            yaim = (ymax - ymin)*(xmax - oriData)/betweenMaxMIn+ymin

    return yaim

# to get the score of samples and the weight of the target
# for the data numSample is the num of sample
# numTarget is the num of target
def getEntropy(oriData,types):
    numSample,numTarget = np.shape(oriData)
    proportion = np.zeros((numSample,numTarget))

    aimEntropy = np.zeros(numTarget)
    aimEntropy = aimEntropy.tolist()

    oriData = np.mat(oriData)
    x = np.zeros((numSample,numTarget))

    # ormalize the data
    for i in range(0,numTarget):
       TData = oriData[:,i].T
       TData = mat2list(TData)
       x[:,i] = normalize(TData,types[i],0.002,0.996)

    # get the proportion[i,j],i is the num of the sample
    # j is the num of the target
    for i in range(0,numSample):
        for j in range(0,numTarget):
            proportion[i,j] = x[i,j]/sum(x[:,j])

    # get the entropy of each target
    logSample = np.log(numSample)
    logSample = 1/logSample
    for i in range(0,numTarget):
        tempData1 = np.log(proportion[:,i])
        tempData2 = np.multiply(proportion[:,i],tempData1)
        aimEntropy[i] = -logSample * sum(tempData2)

    # get the redundancy of each target
    b = np.ones((1,numTarget))
    b = mat2list(b)
    redundancy = np.mat(b) - np.mat(aimEntropy)
    redundancy = redundancy.tolist()
    redundancy = sum(redundancy,[])

    # get the weight of the targets and the score of the samples
    aimWeight = np.mat(redundancy)/sum(redundancy)
    proportion = trans(proportion)
    aimScore =  100 * np.dot(aimWeight,proportion)

    # change the output from mat to list
    aimScore = mat2list(aimScore)
    aimWeight = mat2list(aimWeight)

    return aimScore,aimWeight

其中第一个函数用于归一化处理，第二个函数用于计算指标权值和对象得分。
在本题中，先利用两步进行数据处理，得到一年的输入数据，输入矩阵大小为$4*8$，即为4个对象（州），8个指标，处理过程如下：

# the first step to get the dataset in right style
# just get the dataset of certain one  MSN and one year
# the Arr's standard is 1*4
def getOneMSNDataSty(MSNNum,yearNum):
    MSNName = sheet2.cell(MSNNum,0).value
    yearName = getNameOfYear(yearNum)
    stateNameArr = getArrStateName()
    aimData = []

    oriX = 0
    oriY = []
    for i in range(1,105744):
        if sheet1.cell(i,0).value == MSNName and sheet1.cell(i,2).value == yearName and sheet1.cell(i,1).value == stateNameArr[0] :
            oriX = oriX + 1
            oriY.append(sheet1.cell(i,3).value)
        elif sheet1.cell(i,0).value == MSNName and sheet1.cell(i,2).value == yearName and sheet1.cell(i,1).value == stateNameArr[1] :
            oriX = oriX + 1
            oriY.append(sheet1.cell(i,3).value)
        elif sheet1.cell(i,0).value == MSNName and sheet1.cell(i,2).value == yearName and sheet1.cell(i,1).value == stateNameArr[2] :
            oriX = oriX + 1
            oriY.append(sheet1.cell(i,3).value)
        elif sheet1.cell(i,0).value == MSNName and sheet1.cell(i,2).value == yearName and sheet1.cell(i,1).value == stateNameArr[3] :
            oriX = oriX + 1
            oriY.append(sheet1.cell(i,3).value)
    return oriY

# the second step to get the dataset in right style
# get the dataset of certain 8 MSNs in one year
# the Arr's standard is 4*8
def getAllMSNDataSty(yearNum):
    lenOfMSN = len(MSNNumArr)
    oriYOneMSN = []

    for i in range(0,lenOfMSN):
        oneMSNData = getOneMSNDataSty(MSNNumArr[i],yearNum)
        oriYOneMSN.append(oneMSNData)

    oriYOneMSN = trans(oriYOneMSN)
    return oriYOneMSN

测试输出

以1960年的数据输入为例，

if __name__ =="__main__":
    MSNStyle = [1,1,1,1,1,1,-1,-1]
    testData = getAllMSNDataSty(1960)
    tempScore,tempWeight = getWeightByEntropy(testData,MSNStyle)

    print 'time is' + '1960'
    print 'tempScore:'
    print tempScore
    print 'tempWeight:'
    print tempWeight

测试输出如下：

$ python entropy2.py 
time is 1960
tempScore:
[8.872975711653167, 69.88616993620298, 7.1637488887983425, 14.077105463345536]
tempWeight:
[0.12051125623791543, 0.2371519637606621, 0.0, 0.13837650119691683, 0.2371519637606621, 0.11704768604685604, 0.06717244234828233, 0.08258818664870525]

小结

在用python实现的时候，一开始输出的数值不正常，检查后发现一个公式看错了，导致错误。