c = 5 / 9 * (f - 32)
f (the Fahrenheit temperature) is the independent variablec (the Celsius temperature) is the dependent variablec depends on the value of f used in the calculation# enable high-res images in notebook
%config InlineBackend.figure_format = 'retina'
%matplotlib inline
c = lambda f: 5 / 9 * (f - 32)
temps = [(f, c(f)) for f in range(0, 101, 10)]
DataFrame, then use its plot method to display the linear relationship between the temperaturesstyle keyword argument controls the data’s appearance'.-' indicates that each point should appear as a dot, and that lines should connect the dotsimport pandas as pd
temps_df = pd.DataFrame(temps, columns=['Fahrenheit', 'Celsius'])
axes = temps_df.plot(x='Fahrenheit', y='Celsius', style='.-')
y_label = axes.set_ylabel('Celsius')
The points along any straight line can be calculated with:
\begin{equation} y = m x + b \end{equation}slope,linregress from the SciPy’s stats Module¶
linregress from the SciPy’s stats Module (cont.)¶linregress function (from the scipy.stats module) performs simple linear regression for youave_hi_nyc_jan_1895-2018.csv in the ch10 examples folderDate—A value of the form 'YYYYMM’ (such as '201801'). MM is always 01 because we downloaded data for only January of each year. Value—A floating-point Fahrenheit temperature.Anomaly—The difference between the value for the given date and average values for all dates (not used in this example)DataFrame¶nyc = pd.read_csv('ave_hi_nyc_jan_1895-2018.csv')
nyc.head()
nyc.tail()
'Value' column as 'Temperature'nyc.columns = ['Date', 'Temperature', 'Anomaly']
nyc.head(3)
Date values01 (for January), so we’ll remove it from each Datenyc.Date.dtype
Series method floordiv performs integer division on every element of the Seriesnyc.Date = nyc.Date.floordiv(100)
nyc.head(3)
describe on the Temperature columnpd.set_option('precision', 2)
nyc.Temperature.describe()
stats module provides function linregress, which calculates a regression line’s slope and intercept from scipy import stats
linear_regression = stats.linregress(x=nyc.Date,
y=nyc.Temperature)
linregress receives two one-dimensional arrays of the same length representing the data points’ x- and y-coordinatesxand y represent the independent and dependent variables, respectivelyslope and interceptlinear_regression.slope
linear_regression.intercept
linear_regression.slope is m, 2019 is x (the date value for which you’d like to predict the temperature), and linear_regression.intercept is b:linear_regression.slope * 2019 + linear_regression.intercept
linear_regression.slope * 1890 + linear_regression.intercept
regplot function plots each data point with the dates on the x**-axis and the temperatures on the y-axisTemperatures for the given Dates and adds the regression lineregplot’s x and y keyword arguments are one-dimensional arrays of the same length representing the x-y coordinate pairs to plotimport seaborn as sns
sns.set_style('whitegrid')
axes = sns.regplot(x=nyc.Date, y=nyc.Temperature)
axes.set_ylim(10, 70)
| Sources time-series dataset |
|---|
| https://data.gov/ |
| This is the U.S. government’s open data portal. Searching for “time series” yields over 7200 time-series datasets. |
| https://www.ncdc.noaa.gov/cag/` |
| The National Oceanic and Atmospheric Administration (NOAA) Climate at a Glance portal provides both global and U.S. weather-related time series. |
| https://www.esrl.noaa.gov/psd/data/timeseries/ |
| NOAA’s Earth System Research Laboratory (ESRL) portal provides monthly and seasonal climate-related time series. |
| https://www.quandl.com/search |
| Quandl provides hundreds of free financial-related time series, as well as fee-based time series. |
| https://datamarket.com/data/list/?q=provider:tsdl |
| The Time Series Data Library (TSDL) provides links to hundreds of time series datasets across many industries. |
| http://archive.ics.uci.edu/ml/datasets.html |
| The University of California Irvine (UCI) Machine Learning Repository contains dozens of time-series datasets for a variety of topics. |
| http://inforumweb.umd.edu/econdata/econdata.html |
| The University of Maryland’s EconData service provides links to thousands of economic time series from various U.S. government agencies. |
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.