概述
在Part1 C中,需要一个评价方法来对四个州的能源结构进行综合评价,通过能源相关文献,我们暂时选择定量化的熵权法对四个州进行评价,其中评价指标暂时为选择的8个MSN值(指标有可能改变),并用python实现。
理论基础
关于熵权法原理的学习参考张敬信对于该方法的介绍,简单介绍如下:
- 在信息论中,熵是对不确定性的一种度量。信息量越大,不确定性就越小,熵也就越小;信息量越小,不确定性越大,熵也越大。
- 根据熵的特性,可以通过计算熵值来判断一个事件的随机性及无序程度,也可以用熵值来判断某个指标的离散程度,指标的离散程度越大,该指标对综合评价的影响(权重)越大。比如样本数据在某指标下取值都相等,则该指标对总体评价的影响为0,权值为0.
- 熵权法是一种客观赋权法,因为它仅依赖于数据本身的离散性。
  具体的计算过程如下:
   1.对于$n$个对象,$m$个指标,输入$n*m$的矩阵,$x_{ij}$表示第$i$个对象的第$j$个指标的值。
   2.对数据进行归一化处理:
    对于正向指标:
对于负向指标:
3.计算第$j$项指标下第$i$个对象的值占该指标的比重:
4.计算第$j$项指标的熵值:
5.计算信息熵的冗余度:
6.计算各项指标的权重:
7.计算各对象的综合得分
实现
代码实现
  原文中所给出的算法是利用Matlab实现的,但是由于我的数据是利用python进行筛选和处理的,因此我利用python实现了一遍,并进行改进。原算法中归一化没有考虑“xmax-xmin == 0”的情况,分母可能为0,因此当这种情况发生时,为分母赋一个较小的值,防止分母为0,代码如下:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73# normalize the data
# if type == 1 the diection is positive
# else the dieection is nagetive
# for the data numSample is the num of sample
# numTarget is the num of target
def normalize(oriData,type,ymin,ymax):
    lenOfData = len(oriData)
    xmax = max(oriData)
    xmin = min(oriData)
    oriData = np.mat(oriData)
    betweenMaxMIn = xmax-xmin
    if betweenMaxMIn == 0:
        betweenMaxMIn = 0.001
    if type == 1:
            yaim = (ymax - ymin)*(oriData - xmin)/betweenMaxMIn+ymin
    else:
            yaim = (ymax - ymin)*(xmax - oriData)/betweenMaxMIn+ymin
    return yaim
# to get the score of samples and the weight of the target
# for the data numSample is the num of sample
# numTarget is the num of target
def getEntropy(oriData,types):
    numSample,numTarget = np.shape(oriData)
    proportion = np.zeros((numSample,numTarget))
    aimEntropy = np.zeros(numTarget)
    aimEntropy = aimEntropy.tolist()
    oriData = np.mat(oriData)
    x = np.zeros((numSample,numTarget))
    # ormalize the data
    for i in range(0,numTarget):
       TData = oriData[:,i].T
       TData = mat2list(TData)
       x[:,i] = normalize(TData,types[i],0.002,0.996)
    # get the proportion[i,j],i is the num of the sample
    # j is the num of the target
    for i in range(0,numSample):
        for j in range(0,numTarget):
            proportion[i,j] = x[i,j]/sum(x[:,j])
    # get the entropy of each target
    logSample = np.log(numSample)
    logSample = 1/logSample
    for i in range(0,numTarget):
        tempData1 = np.log(proportion[:,i])
        tempData2 = np.multiply(proportion[:,i],tempData1)
        aimEntropy[i] = -logSample * sum(tempData2)
    # get the redundancy of each target
    b = np.ones((1,numTarget))
    b = mat2list(b)
    redundancy = np.mat(b) - np.mat(aimEntropy)
    redundancy = redundancy.tolist()
    redundancy = sum(redundancy,[])
    # get the weight of the targets and the score of the samples
    aimWeight = np.mat(redundancy)/sum(redundancy)
    proportion = trans(proportion)
    aimScore =  100 * np.dot(aimWeight,proportion)
    # change the output from mat to list
    aimScore = mat2list(aimScore)
    aimWeight = mat2list(aimWeight)
    return aimScore,aimWeight
  其中第一个函数用于归一化处理,第二个函数用于计算指标权值和对象得分。
  在本题中,先利用两步进行数据处理,得到一年的输入数据,输入矩阵大小为$4*8$,即为4个对象(州),8个指标,处理过程如下:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39# the first step to get the dataset in right style
# just get the dataset of certain one  MSN and one year
# the Arr's standard is 1*4
def getOneMSNDataSty(MSNNum,yearNum):
    MSNName = sheet2.cell(MSNNum,0).value
    yearName = getNameOfYear(yearNum)
    stateNameArr = getArrStateName()
    aimData = []
    oriX = 0
    oriY = []
    for i in range(1,105744):
        if sheet1.cell(i,0).value == MSNName and sheet1.cell(i,2).value == yearName and sheet1.cell(i,1).value == stateNameArr[0] :
            oriX = oriX + 1
            oriY.append(sheet1.cell(i,3).value)
        elif sheet1.cell(i,0).value == MSNName and sheet1.cell(i,2).value == yearName and sheet1.cell(i,1).value == stateNameArr[1] :
            oriX = oriX + 1
            oriY.append(sheet1.cell(i,3).value)
        elif sheet1.cell(i,0).value == MSNName and sheet1.cell(i,2).value == yearName and sheet1.cell(i,1).value == stateNameArr[2] :
            oriX = oriX + 1
            oriY.append(sheet1.cell(i,3).value)
        elif sheet1.cell(i,0).value == MSNName and sheet1.cell(i,2).value == yearName and sheet1.cell(i,1).value == stateNameArr[3] :
            oriX = oriX + 1
            oriY.append(sheet1.cell(i,3).value)
    return oriY
# the second step to get the dataset in right style
# get the dataset of certain 8 MSNs in one year
# the Arr's standard is 4*8
def getAllMSNDataSty(yearNum):
    lenOfMSN = len(MSNNumArr)
    oriYOneMSN = []
    for i in range(0,lenOfMSN):
        oneMSNData = getOneMSNDataSty(MSNNumArr[i],yearNum)
        oriYOneMSN.append(oneMSNData)
    oriYOneMSN = trans(oriYOneMSN)
    return oriYOneMSN
测试输出
  以1960年的数据输入为例,1
2
3
4
5
6
7
8
9
10if __name__ =="__main__":
    MSNStyle = [1,1,1,1,1,1,-1,-1]
    testData = getAllMSNDataSty(1960)
    tempScore,tempWeight = getWeightByEntropy(testData,MSNStyle)
    print 'time is' + '1960'
    print 'tempScore:'
    print tempScore
    print 'tempWeight:'
    print tempWeight
  测试输出如下:1
2
3
4
5
6$ python entropy2.py 
time is 1960
tempScore:
[8.872975711653167, 69.88616993620298, 7.1637488887983425, 14.077105463345536]
tempWeight:
[0.12051125623791543, 0.2371519637606621, 0.0, 0.13837650119691683, 0.2371519637606621, 0.11704768604685604, 0.06717244234828233, 0.08258818664870525]
小结
在用python实现的时候,一开始输出的数值不正常,检查后发现一个公式看错了,导致错误。