Mrli

别装作很努力,
因为结局不会陪你演戏。

Contacts:

Pandas速成

2020/12/02 机器学习 Python

Word count: 727 | Reading time: 3min

Pandas速成

Series : 一位数组, 只允许存储相同的数据类型
Time-Series : 以时间为索引的Series
DataFrame : 二维的表格型数据结构 , 可以理解为是Series 的容器
Panel : 三维数组,可以理解为DataFrame 的容器

Series

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
arr = np.array([1,2,3])
index = ['a','b','c']
myseries = pd.Series(arr,index)
print(myseries)

print('Series中第一个元素{}'.format(myseries[0]))
print('Series中索引为C的元素'.format(myseries['c']))
>>>
a    1
b    2
c    3
dtype: int32
Series中第一个元素1
Series中索引为C的元素

DataFrame

arr= np.array([
    [1,2,3],
    [2,3,4],
    [3,4,5]
])
rowindex = ['row1','row2','row3']
colindex = ['col1','col2','col3']
dataframe= pd.DataFrame(data=arr,index=  rowindex,columns=colindex)
print(dataframe)
>>>
      col1  col2  col3
row1     1     2     3
row2     2     3     4
row3     3     4     5

arr= np.array([
    [1,2,3],
    [2,3,4],
    [3,4,5]
])
rowindex = ['row1','row2','row3']
colindex = ['col1','col2','col3']
dataframe= pd.DataFrame(data=arr,index =  rowindex,columns=colindex)
print(dataframe._ixs(0))
>>>
col1    1
col2    2
col3    3
Name: row1, dtype: int32

获取行列

1.ix[ ]

先行后列

print(dataframe.ix[[0]])		#获得第一行内容
# print(dataframe.ix['row1'])	#以索引名称获得

print(dataframe.ix[[0]])		#获得第一列内容
print(dataframe.ix[:,'col1'])

2.loc[ ]

loc,是基于索引的名称选取数据集，这里的索引名称可以是数字,先行后列。注意，[0:2]是选取名称为0， 1， 2行的数据，一共三，只能写行和列的名称，不能写序号。

print(dataframe.loc['row1'])
>>> 
col1    1
col2    2
col3    3
Name: row1, dtype: int32

print(dataframe.loc[:,'col1'])  		#获得'col1'列的值
print(dataframe.loc['row1','col1'])		#获得某行某列的值

3.iloc[]

iloc，它是基于索引位来选取数据集，也就是数字序号来选取，0:2就是选取 0，1这两行，需要注意的是这里是前闭后开集合。只能写行和列的序号，不能写名称( i可以看着int,因此iloc就是用数字(int)来取数据的)，否则会报错。

print(dataframe.iloc[2])
>>> 
col1    3
col2    4
col3    5
Name: row3, dtype: int32

# 切片
# 下面两种方法有同样的效果; 表示取出df中1：5行(不包括5)和3：6列
df.iloc[1:5,3:6] 
df.iloc[[1,2,3,4],[3,4,5]] 
# 另外 df.iloc[0]、df.iloc[1]、df.iloc[-1] 分别表示第一行、第二行、最后一行
# 同理df.iloc[:,0]、df.iloc[:,1]、df.iloc[:,-1] 分别表示第一列、第二列、最后一列

总结：

loc使用范围比iloc更广更实用，loc可以使用切片、名称(index,columns)、也可以切片和名称混合使用；但是loc不能使用不存在的索引来充当切片取值,像-1
iloc只能用整数来取数

▲.推荐使用loc

Author: Mrli

Link: https://nymrli.top/2018/12/21/Pandas速成/

Copyright: All articles in this blog are licensed under CC BY-NC-SA 3.0 unless stating additionally.

< PreviousPost
PythonWeb--flask部署内网电脑 NextPost >
BaiduMap_API

CATALOG

1. Pandas速成
1. 1.1. Series
2. 1.2. DataFrame
  1. 1.2.1. 获取行列