Read Excel File in Python
我有一个Excel文件
1 2 3 4 | Arm_id DSPName DSPCode HubCode PinCode PPTL 1 JaVAS 01 AGR 282001 1,2 2 JaVAS 01 AGR 282002 3,4 3 JaVAS 01 AGR 282003 5,6 |
我想以
1 | FORMAT = ['Arm_id', 'DSPName', 'Pincode'] |
如果
这就是我所尝试的。目前我可以读取文件中的所有内容
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | from xlrd import open_workbook wb = open_workbook('sample.xls') for s in wb.sheets(): #print 'Sheet:',s.name values = [] for row in range(s.nrows): col_value = [] for col in range(s.ncols): value = (s.cell(row,col).value) try : value = str(int(value)) except : pass col_value.append(value) values.append(col_value) print values |
我的输出是
1 | [[u'Arm_id', u'DSPName', u'DSPCode', u'HubCode', u'PinCode', u'PPTL'], ['1', u'JaVAS', '1', u'AGR', '282001', u'1,2'], ['2', u'JaVAS', '1', u'AGR', '282002', u'3,4'], ['3', u'JaVAS', '1', u'AGR', '282003', u'5,6']] |
然后我绕着
但这是一个糟糕的解决方案。
如何在Excel文件中获取具有名称的特定列的值?
回答有点晚,但使用pandas可以直接获取Excel文件的列:
1 2 3 4 5 6 7 8 9 10 | import pandas import xlrd df = pandas.read_excel('sample.xls') #print the column names print df.columns #get the values for a given column values = df['Arm_id'].values #get a data frame with selected columns FORMAT = ['Arm_id', 'DSPName', 'Pincode'] df_selected = df[FORMAT] |
这是一种方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | from xlrd import open_workbook class Arm(object): def __init__(self, id, dsp_name, dsp_code, hub_code, pin_code, pptl): self.id = id self.dsp_name = dsp_name self.dsp_code = dsp_code self.hub_code = hub_code self.pin_code = pin_code self.pptl = pptl def __str__(self): return("Arm object: " " Arm_id = {0} " " DSPName = {1} " " DSPCode = {2} " " HubCode = {3} " " PinCode = {4} " " PPTL = {5}" .format(self.id, self.dsp_name, self.dsp_code, self.hub_code, self.pin_code, self.pptl)) wb = open_workbook('sample.xls') for sheet in wb.sheets(): number_of_rows = sheet.nrows number_of_columns = sheet.ncols items = [] rows = [] for row in range(1, number_of_rows): values = [] for col in range(number_of_columns): value = (sheet.cell(row,col).value) try: value = str(int(value)) except ValueError: pass finally: values.append(value) item = Arm(*values) items.append(item) for item in items: print item print("Accessing one single value (eg. DSPName): {0}".format(item.dsp_name)) |
您不必使用自定义类,只需取一个
下面是上面脚本的输出:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | Arm object: Arm_id = 1 DSPName = JaVAS DSPCode = 1 HubCode = AGR PinCode = 282001 PPTL = 1 Accessing one single value (eg. DSPName): JaVAS Arm object: Arm_id = 2 DSPName = JaVAS DSPCode = 1 HubCode = AGR PinCode = 282002 PPTL = 3 Accessing one single value (eg. DSPName): JaVAS Arm object: Arm_id = 3 DSPName = JaVAS DSPCode = 1 HubCode = AGR PinCode = 282003 PPTL = 5 Accessing one single value (eg. DSPName): JaVAS |
因此,关键部分是抓取头(
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | from xlrd import open_workbook wb = open_workbook('Book2.xls') values = [] for s in wb.sheets(): #print 'Sheet:',s.name for row in range(1, s.nrows): col_names = s.row(0) col_value = [] for name, col in zip(col_names, range(s.ncols)): value = (s.cell(row,col).value) try : value = str(int(value)) except : pass col_value.append((name.value, value)) values.append(col_value) print values |
通过使用熊猫,我们可以轻松阅读Excel。
1 2 3 4 5 6 7 8 9 | import pandas as pd import xlrd as xl from pandas import ExcelWriter from pandas import ExcelFile DataF=pd.read_excel("Test.xlsx",sheet_name='Sheet1') print("Column headings:") print(DataF.columns) |
测试地点:https://repl.it参考:https://pythonspot.com/read-excel-with-pandas/
我采用的方法从第一行读取头信息来确定感兴趣的列的索引。
您在问题中提到您还希望将值输出到字符串。我动态地从格式列列表为输出构建一个格式字符串。行追加到值字符串中,由新行字符分隔。
输出列顺序由格式列表中列名称的顺序决定。
在我下面的代码中,列名称在格式列表中的情况很重要。在上面的问题中,您的格式列表中有"pincode",而Excel中有"pincode"。这在下面不起作用,它需要被"夹击"。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | from xlrd import open_workbook wb = open_workbook('sample.xls') FORMAT = ['Arm_id', 'DSPName', 'PinCode'] values ="" for s in wb.sheets(): headerRow = s.row(0) columnIndex = [x for y in FORMAT for x in range(len(headerRow)) if y == firstRow[x].value] formatString = ("%s,"*len(columnIndex))[0:-1] +" " for row in range(1,s.nrows): currentRow = s.row(row) currentRowValues = [currentRow[x].value for x in columnIndex] values += formatString % tuple(currentRowValues) print values |
对于上面给出的示例输入,此代码输出:
1 2 3 | >>> 1.0,JaVAS,282001.0 2.0,JaVAS,282002.0 3.0,JaVAS,282003.0 |
因为我是一个python noob,道具是:这个答案,这个答案,这个问题,这个问题还有这个答案。
尽管我几乎总是使用pandas来完成这项工作,但我目前的小工具正在打包成一个可执行文件,包括pandas,这太过分了。所以我创建了一个poida解决方案的版本,它产生了一个命名元组的列表。他的代码与此更改类似:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | from xlrd import open_workbook from collections import namedtuple from pprint import pprint wb = open_workbook('sample.xls') FORMAT = ['Arm_id', 'DSPName', 'PinCode'] OneRow = namedtuple('OneRow', ' '.join(FORMAT)) all_rows = [] for s in wb.sheets(): headerRow = s.row(0) columnIndex = [x for y in FORMAT for x in range(len(headerRow)) if y == headerRow[x].value] for row in range(1,s.nrows): currentRow = s.row(row) currentRowValues = [currentRow[x].value for x in columnIndex] all_rows.append(OneRow(*currentRowValues)) pprint(all_rows) |