欢迎光临
我们一直在努力

三升四暑假学习计划,四个引路人学习心得

缺失数据处理

对时间序列中缺失值的操作:

In [79]: df = pd.DataFrame(np.random.randn(6,1), index=pd.date_range(‘2013-08-01′, periods=6, freq=’B’), columns=list(‘A’))In [80]: df.loc[df.index[3], ‘A’] = np.nanIn [81]: dfOut[81]: A2013-08-01 -1.0548742013-08-02 -0.1796422013-08-05 0.6395892013-08-06 NaN2013-08-07 1.9066842013-08-08 0.104050# 向下填充In [82]: df.reindex(df.index[::-1]).ffill()Out[82]: A2013-08-08 0.1040502013-08-07 1.9066842013-08-06 1.9066842013-08-05 0.6395892013-08-02 -0.1796422013-08-01 -1.054874 分组

使用apply:

In [83]: df = pd.DataFrame({‘animal’: ‘cat dog cat fish dog cat cat’.split(), ….: ‘size’: list(‘SSMMMLL’), ….: ‘weight’: [8, 10, 11, 1, 20, 12, 12], ….: ‘adult’ : [False] * 5 + [True] * 2}); df ….: Out[83]: animal size weight adult0 cat S 8 False1 dog S 10 False2 cat M 11 False3 fish M 1 False4 dog M 20 False5 cat L 12 True6 cat L 12 True#每种动物中最大的体型In [84]: df.groupby(‘animal’).apply(lambda subf: subf[‘size’][subf[‘weight’].idxmax()])Out[84]: animalcat Ldog Mfish Mdtype: objectUsing get_groupIn [85]: gb = df.groupby([‘animal’])#得到cat这一组的数据In [86]: gb.get_group(‘cat’)Out[86]: animal size weight adult0 cat S 8 False2 cat M 11 False5 cat L 12 True6 cat L 12 True#对一个组中不同项目应用函数In [87]: def GrowUp(x): ….: avg_weight = sum(x[x[‘size’] == ‘S’].weight * 1.5) ….: avg_weight += sum(x[x[‘size’] == ‘M’].weight * 1.25) ….: avg_weight += sum(x[x[‘size’] == ‘L’].weight) ….: avg_weight /= len(x) ….: return pd.Series([‘L’,avg_weight,True], index=[‘size’, ‘weight’, ‘adult’]) ….: In [88]: expected_df = gb.apply(GrowUp)In [89]: expected_dfOut[89]: 便宜美国vps size weight adultanimal cat L 12.4375 Truedog L 20.0000 Truefish L 1.2500 True

apply的扩展应用:

In [90]: S = pd.Series([i / 100.0 for i in range(1,11)])In [91]: def CumRet(x,y): ….: return x * (1 + y) ….: In [92]: def Red(x): ….: return functools.reduce(CumRet,x,1.0) ….: In [93]: S.expanding().apply(Red, raw=True)Out[93]: 0 1.0100001 1.0302002 1.0611063 1.1035504 1.1587285 1.2282516 1.3142297 1.4193678 1.5471109 1.701821dtype: float64 37714949

赞(0)
【声明】:本博客不参与任何交易,也非中介,仅记录个人感兴趣的主机测评结果和优惠活动,内容均不作直接、间接、法定、约定的保证。访问本博客请务必遵守有关互联网的相关法律、规定与规则。一旦您访问本博客,即表示您已经知晓并接受了此声明通告。