概述
在Part1 C中,需要一个评价方法来对四个州的能源结构进行综合评价,通过能源相关文献,我们暂时选择定量化的熵权法对四个州进行评价,其中评价指标暂时为选择的8个MSN值(指标有可能改变),并用python实现。
理论基础
关于熵权法原理的学习参考张敬信对于该方法的介绍,简单介绍如下:
- 在信息论中,熵是对不确定性的一种度量。信息量越大,不确定性就越小,熵也就越小;信息量越小,不确定性越大,熵也越大。
- 根据熵的特性,可以通过计算熵值来判断一个事件的随机性及无序程度,也可以用熵值来判断某个指标的离散程度,指标的离散程度越大,该指标对综合评价的影响(权重)越大。比如样本数据在某指标下取值都相等,则该指标对总体评价的影响为0,权值为0.
- 熵权法是一种客观赋权法,因为它仅依赖于数据本身的离散性。
具体的计算过程如下:
1.对于$n$个对象,$m$个指标,输入$n*m$的矩阵,$x_{ij}$表示第$i$个对象的第$j$个指标的值。
2.对数据进行归一化处理:
对于正向指标:
对于负向指标:
3.计算第$j$项指标下第$i$个对象的值占该指标的比重:
4.计算第$j$项指标的熵值:
5.计算信息熵的冗余度:
6.计算各项指标的权重:
7.计算各对象的综合得分
实现
代码实现
原文中所给出的算法是利用Matlab实现的,但是由于我的数据是利用python进行筛选和处理的,因此我利用python实现了一遍,并进行改进。原算法中归一化没有考虑“xmax-xmin == 0”的情况,分母可能为0,因此当这种情况发生时,为分母赋一个较小的值,防止分母为0,代码如下:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73# normalize the data
# if type == 1 the diection is positive
# else the dieection is nagetive
# for the data numSample is the num of sample
# numTarget is the num of target
def normalize(oriData,type,ymin,ymax):
lenOfData = len(oriData)
xmax = max(oriData)
xmin = min(oriData)
oriData = np.mat(oriData)
betweenMaxMIn = xmax-xmin
if betweenMaxMIn == 0:
betweenMaxMIn = 0.001
if type == 1:
yaim = (ymax - ymin)*(oriData - xmin)/betweenMaxMIn+ymin
else:
yaim = (ymax - ymin)*(xmax - oriData)/betweenMaxMIn+ymin
return yaim
# to get the score of samples and the weight of the target
# for the data numSample is the num of sample
# numTarget is the num of target
def getEntropy(oriData,types):
numSample,numTarget = np.shape(oriData)
proportion = np.zeros((numSample,numTarget))
aimEntropy = np.zeros(numTarget)
aimEntropy = aimEntropy.tolist()
oriData = np.mat(oriData)
x = np.zeros((numSample,numTarget))
# ormalize the data
for i in range(0,numTarget):
TData = oriData[:,i].T
TData = mat2list(TData)
x[:,i] = normalize(TData,types[i],0.002,0.996)
# get the proportion[i,j],i is the num of the sample
# j is the num of the target
for i in range(0,numSample):
for j in range(0,numTarget):
proportion[i,j] = x[i,j]/sum(x[:,j])
# get the entropy of each target
logSample = np.log(numSample)
logSample = 1/logSample
for i in range(0,numTarget):
tempData1 = np.log(proportion[:,i])
tempData2 = np.multiply(proportion[:,i],tempData1)
aimEntropy[i] = -logSample * sum(tempData2)
# get the redundancy of each target
b = np.ones((1,numTarget))
b = mat2list(b)
redundancy = np.mat(b) - np.mat(aimEntropy)
redundancy = redundancy.tolist()
redundancy = sum(redundancy,[])
# get the weight of the targets and the score of the samples
aimWeight = np.mat(redundancy)/sum(redundancy)
proportion = trans(proportion)
aimScore = 100 * np.dot(aimWeight,proportion)
# change the output from mat to list
aimScore = mat2list(aimScore)
aimWeight = mat2list(aimWeight)
return aimScore,aimWeight
其中第一个函数用于归一化处理,第二个函数用于计算指标权值和对象得分。
在本题中,先利用两步进行数据处理,得到一年的输入数据,输入矩阵大小为$4*8$,即为4个对象(州),8个指标,处理过程如下:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39# the first step to get the dataset in right style
# just get the dataset of certain one MSN and one year
# the Arr's standard is 1*4
def getOneMSNDataSty(MSNNum,yearNum):
MSNName = sheet2.cell(MSNNum,0).value
yearName = getNameOfYear(yearNum)
stateNameArr = getArrStateName()
aimData = []
oriX = 0
oriY = []
for i in range(1,105744):
if sheet1.cell(i,0).value == MSNName and sheet1.cell(i,2).value == yearName and sheet1.cell(i,1).value == stateNameArr[0] :
oriX = oriX + 1
oriY.append(sheet1.cell(i,3).value)
elif sheet1.cell(i,0).value == MSNName and sheet1.cell(i,2).value == yearName and sheet1.cell(i,1).value == stateNameArr[1] :
oriX = oriX + 1
oriY.append(sheet1.cell(i,3).value)
elif sheet1.cell(i,0).value == MSNName and sheet1.cell(i,2).value == yearName and sheet1.cell(i,1).value == stateNameArr[2] :
oriX = oriX + 1
oriY.append(sheet1.cell(i,3).value)
elif sheet1.cell(i,0).value == MSNName and sheet1.cell(i,2).value == yearName and sheet1.cell(i,1).value == stateNameArr[3] :
oriX = oriX + 1
oriY.append(sheet1.cell(i,3).value)
return oriY
# the second step to get the dataset in right style
# get the dataset of certain 8 MSNs in one year
# the Arr's standard is 4*8
def getAllMSNDataSty(yearNum):
lenOfMSN = len(MSNNumArr)
oriYOneMSN = []
for i in range(0,lenOfMSN):
oneMSNData = getOneMSNDataSty(MSNNumArr[i],yearNum)
oriYOneMSN.append(oneMSNData)
oriYOneMSN = trans(oriYOneMSN)
return oriYOneMSN
测试输出
以1960年的数据输入为例,1
2
3
4
5
6
7
8
9
10if __name__ =="__main__":
MSNStyle = [1,1,1,1,1,1,-1,-1]
testData = getAllMSNDataSty(1960)
tempScore,tempWeight = getWeightByEntropy(testData,MSNStyle)
print 'time is' + '1960'
print 'tempScore:'
print tempScore
print 'tempWeight:'
print tempWeight
测试输出如下:1
2
3
4
5
6$ python entropy2.py
time is 1960
tempScore:
[8.872975711653167, 69.88616993620298, 7.1637488887983425, 14.077105463345536]
tempWeight:
[0.12051125623791543, 0.2371519637606621, 0.0, 0.13837650119691683, 0.2371519637606621, 0.11704768604685604, 0.06717244234828233, 0.08258818664870525]
小结
在用python实现的时候,一开始输出的数值不正常,检查后发现一个公式看错了,导致错误。