excel学习库-pandas读取文件read

在使用pandas读取excel文件时，需要用到read_excel函数，该函数支持多种工作簿格式。不仅可以读取一个sheet，还可以读取多个sheet。

1、read_excel各参数组成如下：

pd.read_excel(

io,

sheet_name: 'str | int | list[IntStrT] | None' = 0,

header: 'int | Sequence[int] | None' = 0,

names: 'list[str] | None' = None,

index_col: 'int | Sequence[int] | None' = None,

dtype: 'DtypeArg | None' = None,

engine: "Literal['xlrd', 'openpyxl', 'odf', 'pyxlsb'] | None" = None,

converters: 'dict[str, Callable] | dict[int, Callable] | None' = None,

true_values: 'Iterable[Hashable] | None' = None,

false_values: 'Iterable[Hashable] | None' = None,

skiprows: 'Sequence[int] | int | Callable[[int], object] | None' = None,

nrows: 'int | None' = None,

na_values=None,

keep_default_na: 'bool' = True,

na_filter: 'bool' = True,

verbose: 'bool' = False,

parse_dates: 'list | dict | bool' = False,

date_parser: 'Callable | lib.NoDefault' = <no_default>,

date_format: 'dict[Hashable, str] | str | None' = None,

thousands: 'str | None' = None,

decimal: 'str' = '.',

comment: 'str | None' = None,

skipfooter: 'int' = 0,

storage_options: 'StorageOptions' = None)

这里安装的是pandas 2.0.3版本，可以看到read_excel函数有26个参数，虽然有这么多的参数，但是实际工作中只用到很少的部分，因为已经帮我们设置好了默认的参数。

2、read_excel参数详解

（1） io ：用来指定文件路径或文件对象

（2） sheet_name：要读取的表格名称，默认的是工作簿中的第一个表格。

如果同时读取2个表格的数据，可以将表格名称使用列表的方式。

sheet_name=(['成绩1','成绩2'])

如果读取工作簿中所有表格，可以使用sheet_name=None。

（3） header：指定某一行作为表头

header=0 这是默认值，使用表格的第一行作为表头；
header=None 针对没有表头的表，重新设置列索引；
header=1指定第二行作为表头；

（4） names：重新设置列索引的名称，针对没有表头的数据可以使用，一般情况下用不到。

（5） index_col：设置行索引，也就是第一列的数据内容。

（6） usecols：是指读取表格用到的列，如果表格列数很多，分析数据只用到几列，就可以用到这个参数指定列名。注意的是，不论读取几列，都要用方括号列表的形式。

或者用列标直接指定区域范围

（7） dtype：读取数据时，设置每一列的数据类型。

dtype={}传入一个字典，{"列名":"类型"}

（8） converters：用法同dtype，不同的是converters可以在通过dict对某一列或者某几列应用某一个函数，读取的是函数返回后的结果。

通过dict对某一列应用函数

（9） engine：可以接受的参数有“ xlrd”，“ openpyxl”或“ odf”，用于使用第三方的库去解析excel文件。

（10）true_values 和 false_values参数一般用不到，将指定的文本转换为True或False，默认为None。

(11） skiprows：是指跳过指定的行。

skiprows=1 跳过第1行

skiprows=[2,4,5] 跳过第2，4，5行

skiprows=lambda x:x%2==0，跳过偶数行，保留奇数行，同时需要设置header=None,names=['学号','语文','数学','英语']

（12）nrows：指定读取前多少行，通常用于较大的数据文件中，默认None。

（13）na_values：指定某些列的某些值为NaN，一般用不到。

（14）keep_default_na：导入数据时是否导入空值，默认为True，即自动识别空值并导入。

keep_default_na=False，原来的数据列会变为object类型，所以这个参数也要谨慎使用。

（15）na_filter：检测缺失的值标记(空字符串和na values的值)。在没有NAs的数据中，传递na filter=False可以提高读取大文件的性能。

（16）verbose：指放置在非数字列中的NA值的数目。

3、最后小结

以上介绍的参数大部分在工作中能够使用到，而且有些已经给默认设置好了，如果想要提高自己的技能，这些足够大家学习和使用。

一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

excel学习库

excel表格_excel函数公式大全_execl从入门到精通

pandas读取文件read_excel各参数详解2024-03-06 23:45:18