Docker Part 3 – Taking Docker for a spin on Windows.

To install Docker , head over to Download Docker for Windows if you are running windows 10 Pro or enterprise 64 bit and download the installer from this link.If you are on a previous version of Windows you would need to download  Docker Toolbox. Also you would need to enable Hyper-V virtualization to enable Docker to work. The setup installs the Daemon and the CLI tools. The Docker engine is installed as a service and you also have a helper on the task bar which can be used to further configure Docker. You can also download Kitematic from download.Docker.com to graphically manage Docker containers.

Once installed head over to a command prompt and type Docker –help to see the list of commands to interact with Docker.

Docker Options

For a quick sanity check run the command Docker version to check the version of Docker installed. As of writing this post my local instance of Docker was on version 1.12.0 for both the client and server components.

Lets start by searching for the official dotnet core image from Microsoft to load into a container. To search for the image we need to use the command

> Docker search microsoft/dotnet

image

 

This command returns the above list of dotnet images available from Microsoft. Next lets pull an image using the command Docker Pull and the name of the image to pull.

image

 

We can see that the Microsoft/dotnet image with the latest tag has been downloaded. Another interesting fact to note is the various layers being pulled and extracted asynchronously to create this image locally.

Now let us check the images available locally using the command Docker images

image

 

We have 3 images available locally including a redis and an Ubuntu image. Now for the first run fire the below command

image

This command loads the redis image into a container. The –d option runs the container in the background. Now let us run the below command to check on the running containers

image

The output indicates that there is a redis container running since the past four minutes with an entry point and a name. To get more details about this container we can use the docker inspect command which will return a json output with detailed information about the container such as IP address, volumes mounted etc.

We can use the docker logs command to view messages that the container has written to standard output or standard error.

image

Docker– Part 2 Docker Architecture

The Foundation

Before diving into how Docker is designed and built, it is better to understand the underlying technologies from which it is fleshed out of.

The concept of containers has been on the making for some time. Docker uses Linux namespaces and cgroups which have been part of linux since 2007.

CGroups cgroups or control groups provides a mechanism to place a resource limit on a group of processes in linux. These processes can be organized hierarchically as a tree of process groups, subgroups etc.Limits can be placed on shared resources such as CPU, Memory, network or Disk IO.CGroups also allow for accounting, checkpointing and restarting groups of processes.

NameSpaces Linux namespaces are used to create process that are isolated from the rest of the system without the need to use low level virtualization technology. This provides a way to have varying views of the system for different processes.

Union File System ( UFS) – The Union file system also called a Union Mount allows multiple file systems to be overlaid, appearing as a single files System to the user. Docker supports several UFS implementations AUFS, BTRFS, ZFS and others. The installed UFS system can be identified by running the Docker info command and checking the storage information. On my system the storage driver is aufs. Docker images are made up of multiple layers.. Each instruction adds a layer on top the existing layers. when a container is built , Docker adds a read write file system on top of these layers along with other settings.

The combination of the above technologies allowed the development of Linux Containers(LXC) which is a precursor to Docker.image

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Docker has two man components based on client server design communicating using HTTP. It also has an image store/repository.

Docker CLI – The Docker client or CLI is used to operate or control the Docker Daemon. The client may run on the container host or on remote client connected to the container host through http.

Docker Daemon – The Docker daemon is the middleware that runs, monitors and orchestrates containers.It works with the Docker CLI to build, ship and run containers. The user interacts with the Docker daemon through the Docker clinet

Docker Registry – This is a registry of images. It contains images, layers and metadata about the images.   Docker hub is a public registry which hosts thousands of public images.

 

image

Docker–Part 1

What is Docker ?

Docker is a container technology for packaging an application along with all its dependencies . Docker did not invent containers , rather it made container technology simpler and easier to use.

Difference between Docker and Virtual machine.

Traditionally applications have been deployed in Virtual machines in production. This provides isolation and scalability since we can spin off multiple VM’s on a single machine when needed.While providing advantages VM’s also bring their own share of problems. .A virtual machine provides you with virtual hardware on which the operating system and the necessary dependencies must be installed. This takes a long time and running a complete copy of the OS and dependencies requires a significant resource overhead. Each virtualized application includes not only the application and its dependencies but the entire guest operating system.

Unlike virtual machines docker containers do not virtualize hardware but run in the user space. Additionally while a Virtual machine runs on a separate guest OS , docker containers run within the same OS Kernel.Docker helps avoid this by using the container technology built into the operating system. Thus docker provides resource isolation and allocation benefits of VM’s while being lean , efficient and portable. Docker also ensures a consistent environment for an application across any system, be it development, staging or a production system. This enables developers to develop and test their programs against a single platform and a single set of dependencies. This puts an end to the age old developer refrain of ”Hey!!, It works on my machine”.

In a nutshell the aim of a VM is to fully emulate an OS and a hardware stack while the fundamental goal of a container is to make an application portable and self contained.

 

image

 

Container

A container is a logical unit where you can store and run an application and all its dependencies. A container holds an image. An image is a bundled snapshot of everything needed to run a program inside a container. Multiple containers can run copies of the same image. Images are the shippable units in the docker ecosystem. Docker also provides an infrastructure for managing images using registries and indexes. Docker hub (https://hub.docker.com/)is an example of such a registry. it is a public registry of Images. Individuals and organizations can create private repositories holding private images as well.

The Docker platform primarily has two system, a Container engine which is responsible for container lifecycle and a hub for distributing images. Additionally we also have Swarm which is a clustering manager, Kitematic which is a GUI for working with containers and many other tools which complement the Docker ecosystem.Many more of these are getting built by the community largely making this  a much more robust and fast evolving platform.

 Why is Docker important ?

The fast start up time of Docker containers and the ability to script them has resulted in Docker taking off especially within the developer community more so as it enabled fast iterative development cycles along with portability and isolation. With DevOps becoming a standard all across Docker fits in nicely in this methodology allowing teams to manage thier build and deployment cycles at a much more faster and predictable pace.

Multiversion Concurrency Control

Every developer I know of has used or uses databases on a daily basis, be it SQL Server or  Oracle but quite a few of them are stumped when asked how this SQL will execute and what would its output be.

insert into Employee select * from Employee; Select * from Employee;

Additionally what happens if multiple people execute this statement concurrently. Will the select lock the insert ? Will the insert lock the select ? How does the system maintain data integrity in this case ? The answers have ranged from, this is not a valid SQL statement to this will be an indefinite loop.

Some have also got it accurately but without an understanding of what happens in the background to be able to provide a consistent and accurate response to the above query. The situation is further complicated by the fact that the implementation which solves the above conundrum is referred to differently in the various database and transactional systems.

A simple way of solving the above problem is by using locks or latching. Using locks ensures data integrity but it also serializes all reads and writes. This approach is definitely not scalable since the lock will only allow a single transaction either read or write to happen. Locking has further evolved with ready only locks and other variations but is still inherently a concurrency nightmare. A better approach to effectively ensure data integrity and also ensure scalability is by using   Multiversion concurrency control pattern.

Multiversion Concurrency involves tagging an object with read and write timestamps. This way an access history is maintained for a data object. This timestamp information is maintained via change set numbers which are generally stored together with modified data in rollback segments. This enables multiple “point in time” consistent views based on the change set numbers stored.MVCC enables read-consistent queries and non-blocking reads by providing this “point in time” consistent views without the need to lock the whole table. In Oracle this change set number are called System change numbers commonly referred to as SCN’s.This is one of the most important scalability features for a database since this provides for maximum concurrency while maintaining read consistency.

Internet of things–IOT

Microsoft recently announced that Visual Studio will start supporting device development and started by initially supporting Intel’s Galileo board.This is not entirely new to Visual Studio. One of the earliest add on’s to support board based development was visual micro which supported the Arduino board.

Arduino is an open-source physical computing platform based on a simple i/o board and a IDE integrated development environment that implements the Processing/Wiring language. Arduino can be used to develop stand-alone interactive objects and installation or can  be connected to software on your computer. The first Arduino board was produced on January 2005. The project is the result of the collaboration of many people primarily David Cuartielles and Massimo Banzi who taught physical computing. The others would be David Mellis, who built software engine , Tom Igoe and Gianluca Martino.

 

image

Board Layout

image

 

Starting clockwise from the top center:

• Analog Reference pin (orange)

• Digital Ground (light green)

• Digital Pins 2-13 (green)

• Digital Pins 0-1/Serial In/Out – TX/RX (dark green)

– These pins cannot be used for digital i/o (digitalRead and digitalWrite) if you are also using serial communication (e.g. Serial.begin).

• Reset Button – S1 (dark blue)

• In-circuit Serial Programmer (blue-green)

• Analog In Pins 0-5 (light blue)

• Power and Ground Pins (power: orange, grounds:light orange)

• External Power Supply In (9-12VDC) – X1 (pink)

• Toggles External Power and USB Power (place jumper on two pins closest to desired supply) – SV1 (purple)

• USB (used for uploading sketches to the board and for serial communication between the board and the computer; can be used to  power the board) (yellow)

Installing Arduino software on Your Computer :

To program the Arduino board, you must first download the development environment (the IDE) from www.arduino.cc/en/Main/Software. Choose the right version for your operating system. Post this you need to install the drivers that allow your computer to talk to your board through the USB port.

Installing Drivers: Windows

Plug the Arduino board into the computer; when the Found New Hardware Wizard window comes up, Windows will first try to find the driver on the Windows Update site.Windows Vista will first attempt to find the driver on Windows Update; if that fails, you can instruct it to look in the Drivers\FTDI USB Drivers folder.You’ll go through this procedure twice, because the computer first installs the low-level driver, then installs a piece of code that makes the board look like a serial port to the computer. Once the drivers are installed, you can launch the Arduino IDE and start using Arduino.

clip_image001

Arduino Programming language

The basic structure of the Arduino programming language is fairly simple and runs in at least two parts. These two required parts, or functions, enclose blocks of statements.

void setup()
{
    //statements
}
void loop()
{
     //statements
}

Where setup() is the preparation, loop() is the execution. Both functions are required for the program to work.

The setup function should follow the declaration of any variables at the very beginning of the program. It is the first function to run in the program, is run only once, and is used to set pinMode or initialize serial communication.

The loop function follows next and includes the code to be executed continuously – reading inputs, triggering outputs, etc. This function is the core of all Arduino programs and does the bulk of the work.

Functions

A function is a block of code that has a name and a block of statements that are executed when the function is called.

Custom functions can be written to perform repetitive tasks and reduce clutter in a program. Functions are declared by first declaring the function type. This is the type of value to be returned by the function such as ‘int’ for an integer type function. If no value is to be returned the function type would be void. After type, declare the name given to the function and in parenthesis any parameters being passed to the function.

type functionName(parameters)

{

statements;

}

Variables

A variable is a way of naming and storing a value for later use by the program. As their namesake suggests, variables can be continually changed as opposed to constants whose value never changes.A variable can be declared in a number of locations throughout the program and where this definition takes place determines what parts of the program can use the variable.

Variable scope

A variable can be declared at the beginning of the program before void setup(), locally inside of functions, and sometimes within a statement block such as for loops. Where the variable is declared determines the variable scope, or the ability of certain parts of a program to make use of the variable.

A global variable is one that can be seen and used by every function and statement in a program. This variable is declared at the beginning of the program, before the setup() function.

A local variable is one that is defined inside a function or as part of a for loop. It is only visible and can only be used inside the function in which it was declared. It is therefore possible to have two or more variables of the same name in different parts of the same program that contain different values. Ensuring that only one function has access to its variables simplifies the program and reduces the potential for programming errors.

Support Vector Machines (SVM)

SVM 

Support vector machines (SVMs) are powerful tools for data classification. Support vector machines (SVMs) are used for classification of both linear and nonlinear data. Classification is achieved by a linear or nonlinear mapping to transform the original training data into a higher dimension. Within this new dimension, it searches for the linear optimal separating hyperplane (i.e. a “decision boundary” separating the tuples of one class from another). With an appropriate nonlinear mapping to a sufficiently high dimension, data from two classes can always be separated by a hyperplane. The SVM finds this hyperplane using support vectors (“essential” training tuples) and margins (defined by the support vectors).SVM’s  became famous when, using images as input, it gave accuracy comparable to neural-network with hand-designed features in a handwriting recognition task.

Support vector machines select a small number of critical boundary instances called
support vectors from each class and build a linear discriminant function that separates
them as widely as possible. This instance-based approach transcends the limitations
of linear boundaries by making it practical to include extra nonlinear terms in
the function, making it possible to form quadratic, cubic, and higher-order decision
boundaries.

SVM’s are based on an algorithm that finds a special kind of linear model: the maximum-margin hyperplane. The maximum-margin hyperplane is the one that gives the greatest separation between the classes—it comes no closer to either than it has to.

Using a mathematical transformation, it moves the original data set into a new mathematical space in which the decision boundary is easy to describe. Because this transformation depends only on a simple computation involving “kernels,” this technique is called the kernel trick. kernel can be set to one of four values: linear, polynomial, radial and sigmoid.

SVMs were developed by Cortes & Vapnik (1995) for binary classification.

Class separation: We are looking for the optimal separating hyperplane between the two classes by maximizing the margin between the classes’ closest points |the points lying on the boundaries are called support vectors, and the middle of the margin is our optimal separating hyperplane.

Overlapping classes: Data points on the “wrong” side of the discriminant margin are weighted down to reduce their influence ( “soft margin” ).

Nonlinearity: when we cannot find a linear separator, data points are projected into an (usually) higher-dimensional space where the data points effectively become linearly separable (this projection is realised via kernel techniques ).
Problem solution: the whole task can be formulated as a quadratic optimization problem which can be solved by known techniques.

A program able to perform all these tasks is called a Support Vector Machine.

Graphing my Facebook network – Social Network Analysis

Network

 

Netvizz is a cool tool which can export your Facebook data in a format which makes it easier to visualize or do a network analysis offline. There are other apps available on the Facebook platform which perform this using predefined bounds within the app itself but what I like about Netvizz is that it allows you to extract this information and play with it using any tool you are comfortable with like R, Gephi etc. This sort of visualization is a core of Social network analysis systems.
I spent some time to play around with my Facebook network information over the weekend. I extracted out my network using Netvizz. It gives a GDF file with all the necessary information and can be imported into Gephi directly. The idea was to see how much the Facebook visualization of my friend network compares with my real life network. I do understand that the online world and the offline world social networks differ but on a massive system like Facebook my guess was that it should mirror my real life network much more realistically. I built the network and used the community detection algorithm and did a ForceAtlas2 layout. This is the network visualization I ended up with after tweaking a few things. The network diagram is both surprising and a bit scary.

Accurate reflection of my Social network groups: The network analysis shows the various groups of friends by my life events and their relationships. The green bunch of circles on the right are my friends from my ex company Logica. The small red bunch below is my network of friends from IIM-Lucknow and the big blob of circles on the top left are my friends from Dell. There were other individual circles in-between too which were removed because I filtered by degree of connectedness and represented friends who were not part of any of these big networks and were people with whom I was connected outside of all these bigger networks
The network information on Facebook is more or less an accurate reflection of my networks in real life in terms of how I could group them by connectedness or with specific timelines of my life.

Not so Accurate reflection of my Social network groups: My assumptions of who were the best networkers was shattered when I could see that the some of the people in the top 10 in degree of connectedness were actually people who are more of introverts. Maybe the online world provides them a degree of comfort in connecting with people. Another surprise was that some of my friends in these disparate networks were also connected with each other. This can be seen by the 2 dots at the bottom connecting the bigger blob to them. Another aspect was that this is still not an accurate reflection of my actual social life and does not in any way reflect my actual day to day social interactions.

Inference on Connectedness: The connected inference was also surprising since it was an accurate reflection on the groups of people and their interactions. I was pretty surprised on how even within my current organization the connectedness numbers could pinpoint teams. This is reflected in the big blob on the top right which has 3 distinct colors representing the 3 large divisions I have worked across in my organization.

I will be playing around a lot more with this data and I am planning a further analysis on my LinkedIn network and overlay it onto my Facebook network to see how it compares.