常用的包
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Load the data into a DataFrame
df = pd.read_csv(filepath_or_buffer='../data/AAPL.csv',
index_col='Date',
parse_dates = True)
df2 = df.iloc[::-1] #-1为step,获取反序数组
loc:通过行索引"Index"中的具体值来取行数据
iloc:通过行号来取行数据
df = pd.read_csv('../Data/General/SPX-Constituents.csv', index_col='Symbol')
df = pd.read_excel(io='CU_V1.xlsx', sheet_name='Factors', parse_dates=True, index_col='Date')
Have a quick look at what we have read in : df的常用函数
df.info
df.describe
df.head
DataFrames -Rows and columns
selecting data from a DataFrame for specific row(s) or column(s)
- use df[‘colA’] to select a single column
- use df[[‘colA’,’colB’]] to select multiple columns
- use df.loc[‘rowA’] to select a single row
- use df.loc[[‘rowA’,’rowB’]] to select a row by the rows label
loc iloc的用法
创建一个Dateframe
data=pd.DataFrame(np.arange(16).reshape(4,4),index=list('abcd'),columns=list('ABCD'))
A B C D
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
d 12 13 14 15
提取第一行数据
data.loc['a'] or data.iloc[0]
提取第一列数据
data.loc[:,['A']] or data.iloc[:,[0]]
提取前两行两列数据
data.loc[['a','b'],['A','B']] or data.iloc[[0,1],[0,1]]
提取所有数据
data.loc[:,:] or data.iloc[:,:]
Calculate the Bollinger Bands for the Adj. Close
df2['30d mavg'] = df2['AdjClose'].rolling(window=21).mean()
df2['30d std'] = df2['AdjClose'].rolling(window=21).std()
df2['Upper Band'] = df2['30d mavg'] + (df2['30d std'] * 2)
df2['Lower Band'] = df2['30d mavg'] - (df2['30d std'] * 2)
Plot the results
cols = ['30d mavg','Upper Band','Lower Band','AdjClose']
df2[cols].plot()
df2['2016':][cols].plot()
improve the presentation of the chart
# set style, empty figure and axes
plt.style.use('fivethirtyeight')
fig = plt.figure(figsize=(12,6))
ax = fig.add_subplot(111)
# Get index values for the X axis for the DataFrame
x_axis = df2['2016':].index.get_level_values(0)
# Plot shaded 21 Day Bollinger Band for Facebook
ax.fill_between(x_axis, df2['2016':]['Upper Band'], df2['2016':]['Lower Band'], color='grey')
ax.plot( , df2['2016':]['AdjClose'], color='blue', lw=2)
ax.plot(x_axis, df2['2016':]['30d mavg'], color='black', lw=2)
# Set Title & Show the Image
ax.set_title('30 Day Bollinger Band For Apple')
ax.set_xlabel('Date (Year/Month)')
ax.set_ylabel('Price(USD)')
ax.legend()
plt.show();
pandas data structures
series
The first Pandas data structure we’re going to look at is the Series. Series are simple one dimensional vectors of data that we can manipulate.
index = ['first','second','third','fourth']
s = pd.Series(np.arange(4),index=index)
s
s.index = ['fee','fi','fo','fum','foo']
print (s['foo'])
print (s['fi':])
pandas and datatime
The datetime stamp is one of the common fields in time series data. Let’s look at how this works. First, we’ll show how to create a range of dates using the date_range method from Pandas.
import pandas as pd
dates = pd.date_range('2018-07-07','2018-11-26',dayfirst = False)
dates = dates[0:6]
dates
Note that we can combine these two series into a dataframe.
stock_df = pd.DataFrame({'GOOG': goog_stock_series, 'AAPL':aapl_stock_series})
stock_df
Time Series Frequencies
rng = pd.date_range()
Bollinger Bands
# Load in the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('../Data/Commodities/GOLD.csv', index_col='Date', parse_dates=True)
df_BOLL = pd.DataFrame()
# Copy the 'Price' column from the original DataFrame into this DataFrame
df_BOLL['Price(m avg)'] = df['USD (PM)'].rolling(21).mean()
df_BOLL['Upper'] = df_BOLL['Price(m avg)'] + 2 * df['USD (PM)'].rolling(21).std()
df_BOLL['Lower'] = df_BOLL['Price(m avg)'] - 2 * df['USD (PM)'].rolling(21).std()
#Plot the results
fig = plt.figure(figsize=(18,6))
fig.suptitle('GOLD -- 2 stds above/below closing price')
plt.xlabel('Date')
plt.ylabel('Price (USD PM)')
plt.plot(df_BOLL)
#Plot the result for 2017
plt.plot(df_BOLL['2017'])
Time resampling
# Load in the data
df_APPL = pd.read_csv('../Data/Equities/AAPL.csv',index_col='Date',parse_dates=True)
# resampling Price, Open, High, Low - Annually
cols = ['Adj. Open', 'Adj. High', 'Adj. Low', 'Adj. Close']
df_APPL[cols].resample(rule='Y').max()
# Resampling Price by Business Quarter
df_APPL[cols].resample(rule='BQ').mean()
Sklearn Tutorial
- machine learning library written in Python -Simple and effcient, for both experts and non-experts
- classical, well-established machine learning algorithms
- shipped with documentation and examples
- BSD 3 license
Algorithms
- Supervised learning:
- Linear models (Ridge, Lasso, Elastic Net, …)
- Support Vector Machines
- Tree-based methods (Random Forests, Bagging, GBRT, …)
- Nearest neighbors
- Neural networks
- Gaussian Processes
- Feature selection
Datasets
Sklearn has a number of datasets that come with the library. We can import them with the python import command.
from sklearn import datasets dir(datasets)
our first ML program
- This is our first example of a machine learning algorithm. Here we import the digits dataset and attemp to classify each image from 0-9 ``` from sklearn import svm from sklearn import datasets import matplotlib.pyplot as plt
Load the dataset.
digits = datasets.load_digits()
We’re going to use a support vector machine. The gamma and C paramters are known as hyperparameters. Gamma is the learning rate.
clf = svm.SVC(gamma = 0.0001, C = 100)
x,y = digits.data[:-10],digits.target[:-10]
clf.fit(x,y)
print(‘Prediction: ‘, clf.predict(digits.data[[-5]]))
Plot the image to see if what the classifier predicts matches what we expect.
plt.imshow(digits.images[-5], cmap=plt.cm.gray_r,interpolation=’nearest’) plt.show()
## Training set and testing set
- machine learning is about learning some properties of a data set and applying them to new data.
from sklearn import datasets iris = datasets.load_iris() digits = datasets.load_digits()
```