AI For Trading: Regression (15)

Intro

Checking and transforming data

file

file

Note: Regarding the chart used to describe signal to noise. The horizontal x-axis of the chart is time, such as days. The vertical y-axis is price, such as dollars. The blue line represents the combined signal plus noise, which is the actually observed stock price movement. The red dashed line represents the signal without noise, which is not directly observable.

关于用于描述信噪比的图表。图表的水平x轴是时间,例如天。垂直y轴是价格,例如美元。蓝线表示组合信号加噪声,这是实际观察到的股票价格变动。红色虚线表示没有噪声的信号,这是不可直接观察到的。

Distributions

Many statistical models assume that the data follows a normal distributions, also referred to as a Gaussian or a bell curve.
许多统计模型假设数据遵循正态分布,也称为高斯曲线或钟形曲线。

This is important when checking whether our models are valid. There are various tests that we use to check that our models describe a meaningful relationship.

Exercise: Visualize Distributions

Many variables tend to follow a Normal distribution (hence the name “Normal”), both in nature as well as artificial contexts. But there are other distributions as well, some that are variants of the Normal distribution, and some that are completely different! Each distribution is suitable for modeling certain kinds of variables.
许多变量倾向于遵循正态分布(因此称为“正常”),无论是在自然界还是在人为背景下。但是还有其他发行版,有些是Normal分布的变体,有些是完全不同的!每个分布都适用于对某些变量进行建模。

In this exercise, you are given some samples of data. Plot the histogram of each sample, and then try to match it with the corresponding distribution.
在本练习中,您将获得一些数据样本。绘制每个样本的直方图,然后尝试将其与相应的分布进行匹配。

"""Visualize the distribution of different samples."""

import pandas as pd
import matplotlib.pyplot as plt

def plot_histogram(sample, title, bins=16, **kwargs):
    """Plot the histogram of a given sample of random values.

    Parameters
    ----------
    sample : pandas.Series
        raw values to build histogram
    title : str
        plot title/header
    bins : int
        number of bins in the histogram
    kwargs : dict 
        any other keyword arguments for plotting (optional)
    """
    # TODO: Plot histogram (no need to return anything)
    # width = 0.7 * (bins[1] - bins[0])
    # center = 8
    # plt.bar(center, sample, align='center')
    print(sample)

    # plt.title(title);
    plt.show()

def test_run():
    """Test run plot_histogram() with different samples."""
    # Load and plot histograms of each sample
    # Note: Try plotting them one by one if it's taking too long
    A = pd.read_csv("A.csv", header=None, squeeze=True)
    plot_histogram(A, title="Sample A")

    B = pd.read_csv("B.csv", header=None, squeeze=True)
    plot_histogram(B, title="Sample B")

    C = pd.read_csv("C.csv", header=None, squeeze=True)
    plot_histogram(C, title="Sample C")

    D = pd.read_csv("D.csv", header=None, squeeze=True)
    plot_histogram(D, title="Sample D")

if __name__ == '__main__':
    test_run()

为者常成,行者常至