Defining Big Data
“Extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.“ – oxforddictionaries
“Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus—as long as the right policies and enablers are in place.” – McKinsey & Co
“One of the key challenges of the information society is to turn data into information, information into knowledge, and knowledge into value. To turn data into value in this way involves collecting large volumes of data, possibly from many and diverse data sources, processing the data fast, and applying complex operations to the data.”- ETH Zürich
Defining Small Data
On the other hand, more and more the term Small Data is used in industry applications. “Most business don’t have big data and that is a good thing. Small data sets are still the most com-mon and can present really difficult statistical challanges” – foxconn4tech
According to Stanford, “Small data are datasets that allow interaction, visualization, exploration and analysis on a local machine to drive business intelligence.”
Martin Lindstrom, New York Times Bestseller, published in 2016 the book: Small Data: The Tiny Clues That Uncover Huge Trends with the following definition of Small Data:
“Seemingly insignificant behavioral observations containing very specific attributes pointing towards an unmet customer need. Small data is the foundation for break through ideas or completely new ways to turnaround brands.”
“Small data connects people with timely, meaningful insights (derived from big data and/or “local” sources), organized and packaged – often visually – to be accessible, understandable, and actionable for everyday tasks.” – Allen Bonde
“Small data is a dataset that contains very specific attributes. Small data is used to determine current states and conditions or may be generated by analyzing larger data sets (Big Data)” – answerminer
Small Data: The Next Big Thing?
Roger Dooley ask the following question on forbes, “Small Data: The Next Big Thing”? A quite similar approach is publish by the theguardian “Forget big data, small data is the real revolution” and by the Harvard Business Review “Sometimes Small Data Is Enough to Create Smart Products”. The most data in industry companies are rather small then big. Even with small Data, the possibility to improve an existing or new process are tremendous.
Linear Regression – simple tool for small data
Linear regression is an algorithm that attempts to drew a line of best fit for a given dataset. With that line, new predictions can be easily made. The classic regression problem involves a single independent variable and a dependent variable. Whereas multiple linear regression involves two or more independent variables that contribute to a single dependent variable.
Linear regression is a good choice when you want a very simple model for a basic predictive task. This method is one of the oldest, simplest, and widely used machine learning models. Sometimes simply linear or multiple linear regression give the possibility to improve a new or existing systems. Regression is a statistical tool used to understand and quantify the relation between two or more variables. Furthermore linear regression is a basic and commonly used type of predictive analysis. Three main uses for regression analysis are, forecasting effects and trends and determining the power of predictors. Programs like R or Payton offer tools for linear regression models.
To sum up, Regression algorithms are placed in the field of Supervised Machine Learning algorithms which is a part of machine learning.