Statistics in SQL: The Kruskal–Wallis Test

Before you report your conclusions about your data, have you checked whether your 'actionable' figures occurred by chance? The Kruskal-Wallis test is a safe way of determining whether samples come from the same population, because it is simple and doesn't rely on a normal distribution in the population. This allows you a measure of confidence that your results are 'significant'. Phil Factor explains how to do it.… Read more

Data Science Laboratory System – Distributed File Databases

Distributed File Databases manage large amounts of unstructured or semi-structured data. They are designed on the principle of splitting up the data into multiple locations, and then placing the code that processes each fragment close, or directly on, that location. Buck Woody shows how to install Hadoop in your Data Science lab to experiment with an example of the breed.… Read more

Data Science Laboratory System – Object-Oriented Databases

Object-Oriented Databases (OOD) avoid the object-relational impedence mismatch altogether by tightly integrating into the user-level OOP code to the extent that they are simply an engine that ships with the code itself. The developer is able to instantiate OOD objects directly into the code. Buck Woody explores the Object-Oriented breed of database in his Data Science lab.… Read more

Data Science Laboratory System – Key/Value Pair Systems

Though the Key/Value pair paradigm is common to almost every computer language, there is no clear agreement yet for the definition of a Key/Value Pair database. However, Key/Value pair databases are valuable for special applications where speed of writing data is more important than searching and general versatility. It is certainly worth experimenting with in a data science lab.… Read more