Introduction
Support vector machines (SVMs) are a set of supervised learning algorithms. In this tutorial we are going to use real world data to predict classification of data. We are going to do a full tutorial from downloading information to some theorical topics.
Step 1: Gathering the data
First we are going to download Apple Stock historical prices.
import requests
response = requests.get('http://www.google.com/finance/historical?output=csv&q=aapl')
# we will use this variable later
csv_appl = response.content
Since we are going to use a crazy feature, we also are going to download earthquake information.
import json
import requests
from datetime.datetime import fromtimestamp
url = 'http://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2014-01-01&endtime=2014-01-02'
earthquake_raw_data = json.loads(requests.get(url).content)
# we will use this variable later
earthquakes = []
for earthquake_data in earthquake_raw_data['features']:
earthquakespp
Step 2: Thinking about your classifier
Step 3: Selecting the features
In this step we are going to think about which features we will pick and some ideas on how to discard some features. Classifiers are only as good as the features you provide. Selecting good features is one of the most importants steps in machine learning. Machine learning uses many features and not only one, since using one feature could not provide enough information in some edge cases and you will get a low quality predictor. Every feature should be independent, so a correlation analysis is required.
Now we will try to use the following features and we some analysis we will discard some of them:
- Day of the month.
- High - Low.
- Mean price between 5 days of a US holiday.
- Volume.
- Earth quakes bigger than 6 in magnitude.
- Average price of 10 days *
We are going to use pandas to analyze the correlations of the features. At this step we may (I hope) believe that at least earths quakes will be thrown.
import pandas as pd
# let's load the csv info into a dataframe