본문 바로가기
CS/Python&R

Faster Pandas: Make your code run faster and consume less memory| Miki Tebeke, CEO 353solutions.

by Diligejy 2022. 12. 16.

https://www.youtube.com/watch?v=d9YfwxuaylI&ab_channel=PyData 

 

1. Why Performance is Important?

A. Cloud Cost (If you consume more less CPU, less Memory, then you can save the more cloud cost)

B. We can save the Time taken to complete experiment.

C. Viability of code production 

 

2. Why shouldn't we optimize?

A. Development Time

 

3. magic function

 

- %time

- %timeit

- %paste

- %%prun

 

So, always try to vectorize what you can.

But Sometimes you can't.

 

Pandas is column oriented.

 

4. Technique

A. 비교

 

numpy가 왠만하면 더 빠름

%timeit max(df['total_amount'])

%timeit df['total_amount'].max()

%timeit df['total_amount'].values.max()

B. Numpy가 좋긴 하지만.. 주의해야 함

 

s = pd.Series([1, np.nan, 3])

s.sum()

s.values.sum()

 

위에는 4가 나오고

아래는 nan이 나옴

 

만약 numpy로 합을 보고 싶다면 np.nanmax같은 함수 써야함.

 

5. Memory Usage

 

 

댓글