Click here to monitor SSC
Buck Woody

Data Science Laboratory System - Distributed File Databases

Distributed File Databases manage large amounts of unstructured or semi-structured data. They are designed on the principle of splitting up the data into multiple locations, and then placing the code that processes each fragment close, or directly on, that location. Buck Woody shows how to install Hadoop in your Data Science lab to experiment with an example of the breed. Read more...

Buck Woody

Data Science Laboratory System – Object-Oriented Databases

Object-Oriented Databases (OOD) avoid the object-relational impedence mismatch altogether by tightly integrating into the user-level OOP code to the extent that they are simply an engine that ships with the code itself. The developer is able to instantiate OOD objects directly into the code. Buck Woody explores the Object-Oriented breed of database in his Data Science lab. Read more...

Buck Woody

Data Science Laboratory System – Document Store Databases

A Document Store Database (DSD) is similar to a Relational Database Management system with the exceptions that a DSD allows for unstructured data and sharding a single database across multiple machines. So when or why would you choose a document database over a relational one? Buck Woody has the answer and an example using the DSD MongoDB on his lab system. Read more...

Buck Woody

Data Science Laboratory System - Key/Value Pair Systems

Though the Key/Value pair paradigm is common to almost every computer language, there is no clear agreement yet for the definition of a Key/Value Pair database. However, Key/Value pair databases are valuable for special applications where speed of writing data is more important than searching and general versatility. It is certainly worth experimenting with in a data science lab. Read more...

Buck Woody

Data Science Laboratory System - Relational Database Management Systems

There is no better way of understanding new data processing, retrieval, analysis or visualising techniques than actually trying things out in a lab system. Buck Woody continues his series by explaining why an RDBMS is essential for a lab, what that is, and how to install SQL Server into the lab. Read more...

Buck Woody

Data Science Laboratory System - Programming and Scripting Languages

Although every computer language is suitable for data, some languages lend themselves especially well for working with certain types or sources of data, or processing the data in certain ways, and so are of particular use to the data scientist. Read more...

Buck Woody

Data Science Laboratory System - Interactive Data Tools

Data tools interact directly with data and are great for automating data data-aquisition, but they aren't always the best way to prototype or pilot a process. Interactive data tools also allow you to test and refine the process, until it is ripe for automation. Read more...

Buck Woody

Data Science Laboratory System - Instrumentation

It is sensible to check the performance of different solutions to data analysis in 'lab' conditions. Measurement by instrumentation makes it easier to develop systems that are efficient. Read more...

Buck Woody

Data Science Laboratory System - Testing the Text Tools and Sample Data

Anyone who is frequently faced with preparing data for processing needs to be familiar with some industry-standard text-manipulation tools. Awk, join, sed, find, grep and cat are the classics, and Buck Woody takes them for a spin in his Data Science Laboratory Read more...

Roger Jennings

Analyze Big Data with Apache Hadoop on Windows Azure Preview Service Update 3

Hadoop and MapReduce have good prospects for adoption as a standard for big data analysis, especially since its adoption by Microsoft. It is ideal for Cloud usage since one can spin up nodes when required, pay only for storage and compute services whilst they are running. Roger Jennings descibes how to get it running on Azure Read more...

Roger Jennings

Analyze Years of Air Carrier Flight Arrival Delays in Minutes with the Windows Azure HPC Scheduler

If you are seeking to analyse very large sets of data, and need a highly parallel rapid way of doing it that scales to your requirements, then 'Cloud Numerics' from Microsoft may be the answer to your prayers Read more...

Most Viewed

Windows Azure Virtual Machine: A look at Windows Azure IaaS Offerings (Part 2)
 We continue our introduction of the Azure IaaS by discussing how images and disks are used in the Azure... Read more...

PHPFog and Pagoda Box: A Look at PHP Platforms
 Cloud platforms such as Heroku, AppEngine, PHPFog and Pagoda Box are ideal for companies who just want... Read more...

An Introduction to Windows Azure BLOB Storage
 Azure BLOB storage is persistent Cloud data storage that serves a variety of purposes. Mike Wood shows... Read more...

Managing session state in Windows Azure: What are the options?
 Because you can't maintain session state for ASP.NET applications in Azure using the default in-process... Read more...

Creating a custom Login page for federated authentication with Windows Azure ACS
 Windows Azure Acess Control Service (ACS) provides a way of authenticating users who need to access web... Read more...