Apache Mesos Fundamentals

Apache Mesos

Apache Mesos is a tool which abstracts system resources of individual systems and makes them available as a pool of resources. Mesos can, then, run and control the life of various applications that are deployed against these pool of resources.

So, in practice, instead of running an application on a particular machine, one would tell mesos about the resource requirement to run the application and Mesos will figure out the most appropriate machine to run the application or task on.

Mesos Components and Architecture

Mesos world consists of three significant entities.

Zookeeper: Apache zookeeper is an independent project which helps in coordinating among various pieces of software. Most notable uses of zookeeper is service discovery. Zookeeper is distributed and highly available.
Mesos Master: Mesos Master (or Master cluster) is the used to manage the slaves. It also collects information about resources and tasks to be executed from one entity and passes on to the other entity.
Mesos Framework: Mesos Framework have two important components, scheduler and executor. Framework Scheduler receives a resource offer from master and it can accept or reject it based on its requirement and algorithm. Framework Executor receives task information from Mesos master and executes the task on the slave. Most popular frameworks are Marathon and Chronos. We will use Marathon for this session.
Mesos Slave: These are the servers that actually runs the tasks.

enter image description here

Introduction to Marathon

A cluster-wide init and control system for services in cgroups or Docker containers. It is a framework for Apache Mesos. Marathon exposes a rest api for managing tasks. We can also use Marathon to run and manage other frameworks. Marathon will receive a resource offer from Mesos and, if it accepts the offer, it will provide Mesos information about the task which will be passed on to the Marathon Executor running on the slave.

Installing and Setting up Mesos and other components

Mesosphere, the creators of DataCenter Operating System, have official package repositories for Mesos and related tools and frameworks. So, let us first install the repository and then the Mesos package.

$ sudo rpm -Uvh http://repos.mesosphere.io/el/7/noarch/RPMS/mesosphere-el-repo-7-1.noarch.rpm

Installing and Configuring Zookeeper

$ sudo yum -y install mesosphere-zookeeper

Each Zookeeper node needs to have a unique id, which is an integer between 1 to 255. The id is defined in /var/lib/zookeeper/myid. We will use only one zookeeper in our setup.

$ sudo bash -c "echo 1 > /var/lib/zookeeper/myid"

We also need to change the zookeeper configuration (/etc/zookeeper/conf/zoo.cfg) to reflect the binding address.

server.1=<ip_address>:2888:3888

Now start the zookeeper server.

$ systemctl start zookeeper

Installing and Configuring Mesos Master

We can get the Mesos from the mesosphere repository which we setup in the previous step.

$ sudo yum -y install mesos

Now we need to point the Mesos to Zookeeper.

$ sudo bash -c "echo zk://<zookeeper_ip>:2181/mesos > /etc/mesos/zk"

We also need to tell the Mesos about quorum size. It should always be greater than half of the size of Mesos master cluster. Since we are running with one master, we will go with 1.

$ sudo bash -c "echo 1 > /etc/mesos-master/quorum"

Mesos is uses the hostname to communicate with everyone. So either we have to make it routable via DNS or we should tell mesos to use IP address.

$ sudo bash -c "echo <ip-address> > /etc/mesos-master/hostname"

Now, let us start the Mesos master

$ sudo systemctl restart mesos-master

Installing and Configuring Marathon

Install Marathon package from the mesosphere repository

$ sudo yum -y install marathon

Like Mesos, Marathon is also hostname sensitive. So we should set the hostname to the IP address.

$ sudo mkdir -p /etc/marathon/conf/  $ sudo bash -c "echo <ip-address> > /etc/marathon/conf/hostname"

Now start the marathon

$ sudo systemctl restart marathon

Installing and Configuring Mesos Slave

On the slave, we will first install the mesos package and we will stop the master service.

$ sudo yum -y install mesos  $ sudo systemctl stop mesos-master

We need this slave to be able to run Docker. So let us configure that.

$ sudo bash -c "echo 'docker,mesos' > /etc/mesos-slave/containerizers"

Since there can be delays to pull Docker images at times, we should increase that timeout.

$ sudo bash -c "echo '15mins' > /etc/mesos-slave/executor_registration_timeout"

Just like the master, we need to tell the slave about the zookeeper and then change the hostname to point to IP address.

$ sudo bash -c "echo zk://<zookeeper_ip>:2181/mesos > /etc/mesos/zk"  $ sudo bash -c "echo <ip-address> > /etc/mesos-slave/hostname"

Now let us restart the Mesos slave.

$ sudo systemctl restart mesos-slave

Great! Both Mesos master and slave are up and we have Marathon running too. We can verify the setup by opening the web interface of both Mesos (port 5050) and Marathon (port 8080). Now let us deploy an application via marathon now.

Application or Docker containers can be deployed via Marathon framework by using web interface or REST API. For this lab, we will see a mix of both where we will create the application using web interface but scale it using the REST API.

So let us go to Marathon web interface and click on Create button on upper left corner. A UI will open where we can put the relevant details and deploy the application. We will expand Docker container settings to supply details like image, network and port mappings.

enter image description here

Let us scale the cluster up to 2 instances by using the REST API.

$ curl -H 'Content-Type: application/json' -X PUT -d '{"instances":2}' http://<marathon_ip>:8080/v2/apps/<app-id>

API is very useful in environments where we want to take automate the scaling depending up on parameters like load, traffic and resource consumption.

Understanding /proc

Exploring procfs

In summary, all linux's processes can be found in /proc folder. This folder is of procfs type which, like I said before, is a virtual filesystem and most of its file descriptors point to in-memory data. This is why if you run a ls /proc -l you'll notice that most files and folders are of size 0.

ls -l /proc
dr-xr-xr-x.  9 root           root                         0 Sep 25 22:10 1
dr-xr-xr-x.  9 root           root                         0 Oct  1 10:38 10
dr-xr-xr-x.  9 root           root                         0 Oct  1 12:46 101
dr-xr-xr-x.  9 root           root                         0 Oct  1 12:46 102
...

Inside '/proc' there is one folder for each process running with its pid as name. So I opened one of the folders to see what I could learn about a running process just by reading these filed.

ls -l /proc/<pid>
total 0
dr-xr-xr-x. 2 binpipe binpipe 0 Sep 28 23:15 attr
-rw-r--r--. 1 root   root   0 Oct  1 10:46 autogroup
-r--------. 1 root   root   0 Oct  1 10:46 auxv
-r--r--r--. 1 root   root   0 Sep 28 23:15 cgroup
--w-------. 1 root   root   0 Oct  1 10:46 clear_refs
-r--r--r--. 1 root   root   0 Sep 28 22:41 cmdline
-rw-r--r--. 1 root   root   0 Oct  1 10:46 comm
-rw-r--r--. 1 root   root   0 Oct  1 10:46 coredump_filter
...

Ok, now I have a bunch of files like autogroup, gid_map and maps that I have no idea what they're for. A good starting point would be checking for their documentation. But why on earth shouldn't I just open them?

So I started looping through the files one by one and most of them were completely unreadable for me, until I ran into the golden pot:

cat /proc/<pid>/status
Name:	chrome
State:	S (sleeping)
Tgid:	3054
Ngid:	0
Pid:	3054
PPid:	2934
TracerPid:	0
Uid:	1000	1000	1000	1000
Gid:	1000	1000	1000	1000
FDSize:	64
Groups:	10 1000 1001
VmPeak:	 1305996 kB
VmSize:	 1232520 kB
...

This is great! Finally something human readable. It contains general data about the process, like its state, memory usage and owner. But is this all I need?

Not satisfied with '/proc' file exploration, I decided to run ps against strace to see if it's accessing any of the files I found.

1	strace -o ./strace_log ps aux

Strace returns all system calls executed by a program. So I filter strace result by 'open' system call and as I suspected (maybe I didn't) the files being open by operating system were the same I first checked:

cat ./strace_log | grep open
[...]
open("/proc/1/stat", O_RDONLY) = 6
open("/proc/1/status", O_RDONLY) = 6
[...]
open("/proc/2/stat", O_RDONLY) = 6
open("/proc/2/status", O_RDONLY) = 6
open("/proc/2/cmdline", O_RDONLY) = 6
open("/proc/3/stat", O_RDONLY) = 6
open("/proc/3/status", O_RDONLY) = 6
open("/proc/3/cmdline", O_RDONLY) = 6
[...]

Ok, so we have stat, status and cmdline files to check, now all we need to do is to parse this and extract what we need.

The code

The implementation turned out to be fairly simple and it comes down to reading files and display its content in an organized matter.

Process data structure

We want to display our data in a tabular way; where each process is a record on this table. Let's take the following class as one of our table records:

class ProcessData
  attr_reader :pid
  attr_reader :name
  attr_reader :user
  attr_reader :state
  attr_reader :rss

  def initialize pid, name, user, state, rss
    @pid = pid
    @name = parse_name name
    @user = user
    @state = state
    @rss = rss
  end
end

Finding Pid's for running processes

Take into account what we know so far:

/proc folder contains sub-folders with all processes
All process folders have their pid as name

So gathering a list of all current pids should be easy:

def get_current_pids
  pids = []
  Dir.foreach("/proc") { |d|
    if is_process_folder?(d)
      pids.push(d)
    end
  }
  return pids
end

In order to be a valid process folder it must fulfill two requirements:

It's a folder (duh?)
It's name contains only number (this is why we have to cast folder name to int)

1
2
3

def is_process_folder? folder
  File.directory?("/proc/#{folder}") and (folder.to_i != 0)
end

Extracting process data

Now that we know every pid in the system we should create a method that exposes data from /proc/<pid>/status for any of them.

But first, lets analyze the file.

cat /proc/<pid>/status
Name:	chrome
State:	S (sleeping)
...
Uid:	1000	1000	1000	1000

This file is organized in the following way: Key:\t[values]. This means that for every piece of data in this file we can follow this same pattern to extract it. However, some lines will have an individual value and others will have a list of values (like Uid)

def get_process_data pid
  proc_data = {}
  File.open("/proc/#{pid}/status") { |file|
    begin
      while line = file.readline
        data = line.strip.split("\t")
        key = data.delete_at(0).downcase
        proc_data[key] = data
      end
      file.close
    rescue EOFError
      file.close
    end
  }
  return proc_data
end

The method above results in the following structure:

get_process_data 2917
=> {"name:"=>["chrome"],
 "state:"=>["S (sleeping)"],
 "tgid:"=>["2917"],
 "ngid:"=>["0"],
 "pid:"=>["2917"],
 "ppid:"=>["1"],
 "tracerpid:"=>["0"],
 "uid:"=>["1000", "1000", "1000", "1000"],
 ...

Reading user data

User uid and name association is kept in /etc/passwd file, so in order to show the correct username we must also read this file and parse it.

For the sake of simplicity, let's just read the whole file and save it in a Hash with key as Uid and value as name.

def get_users
  users = {}
  File.open("/etc/passwd", "r") { |file|
    begin
      while line = file.readline
        data = line.strip.split(":")
        users[data[2]] = data[0]
      end
      file.close
    rescue EOFError
      file.close
    end
  }
  return users
end

Creating process records

So far we have found the pids in the system, read the status file and extracted the data. What we have to do now is to filter and organize this data into a single record that will be presented to the user.

current_processes = filesystem.get_current_pids

current_processes.each { |p|
  process = create_process p
  puts "#{process.name}\t#{process.user}\t#{process.state}\t#{process.command}\t#{process.rss}"
}

def create_process pid
  data = get_process_data pid

  name = data["name:"][0]
  user_id = data["uid:"][0]
  state = data["state:"][0]

  if data["vmrss:"] != nil
    rss = data["vmrss:"][0]
  end

  user = get_users[user_id]

  return ProcessData.new(pid, name, user, state, rss)
end

The reason why we get VMRss value is because we want to check resident memory values, this means, only what's stored in the physical memory and not what's sitting in our disk.

Extra (formatting)

You can format ProcessData text in a tabular way to get a prettier output.

format="%6s\t%-15s\t%-10s\t%-10s\t%-10s\n"
printf(format, "PID", "NAME", "USER", "STATE", "MEMORY")
printf(format, "------", "---------------", "----------", "----------", "----------")
current_processes.each { |p|
  process = create_process p
  printf(format, process.pid, process.name, process.user, process.state, process.rss)
}

Result:

PID	NAME           	USER      	STATE     	MEMORY    
------	---------------	----------	----------	----------
   1	systemd        	root      	S (sleeping)	    8444 kB
   2	kthreadd       	root      	S (sleeping)	          
   3	ksoftirqd/0    	root      	S (sleeping)	          
   ...

Conclusion

There is a lot of information that you can find under /proc folder. This post only covers basic data like name, state and resident memory. But if you dig deep into those files you will find a lot more, like memory mapping and CPU usage.

Django Installation on Centos 7 with Apache mod-wsgi & MariaDB

## Install the Components from the CentOS and EPEL Repositories

yum install epel-release
yum install python-pip httpd mod_wsgi
pip install --upgrade pip
yum install python-pip python-devel gcc mariadb-server mariadb-devel
sudo systemctl start mariadb
sudo systemctl enable mariadb
mysql_secure_installation

## Create a Database and Database User

mysql -u root -p

CREATE DATABASE noviceproj CHARACTER SET UTF8;
CREATE USER novice@localhost IDENTIFIED BY 'redhat';
GRANT ALL PRIVILEGES ON noviceproj.* TO novice@localhost;
FLUSH PRIVILEGES;

## Install Django within a Virtual Environment & Connect DB

pip install virtualenv
mkdir noviceproj
cd noviceproj/
virtualenv noviceenv
source noviceenv/bin/activate
pip install django mysqlclient

Open the main Django project settings file located within the child project directory:

vi ~/noviceproj/noviceproj/settings.py
Towards the bottom of the file, you will see a DATABASES section that looks like this:

. . .

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
}
}

. . .
This is currently configured to use SQLite as a database. We need to change this so that our MariaDB database is used instead. To do that comment the above and append the below lines:

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'noviceproj',
'USER': 'novice',
'PASSWORD': 'redhat',
'HOST': 'localhost',
'PORT': '3306',
}
}

## Migrate the Database and Test your Project

cd ~/noviceproj/
python manage.py makemigrations
python manage.py migrate
python manage.py createsuperuser

vi noviceproj/settings.py

STATIC_ROOT = os.path.join(BASE_DIR, "static/")

./manage.py collectstatic

(set user/passwd for admin)

## Start Development Server

python manage.py runserver 0.0.0.0:8000

Quit the server with CONTROL-C.

To leave virtualenv type:

deactivate

## Setting up Apache Server with Mod_wsgi

vi /etc/httpd/conf.d/django.conf

#
Alias /static /root/noviceproj/static
<Directory /root/noviceproj/static >

Require all granted
</Directory>
<Directory /root/noviceproj/noviceproj>

<Files wsgi.py>
Require all granted
</Files>
</Directory>
WSGIDaemonProcess noviceproj python-path=/root/noviceproj:/root/noviceproj/noviceenv/lib/python2.7/site-packages

WSGIProcessGroup noviceproj
WSGIScriptAlias / /root/noviceproj/noviceproj/wsgi.py
#

usermod -a -G root apache
chmod 710 /home/user (not required)
chown :apache ~/noviceproj

systemctl restart httpd

BINPIPE | blog

Search Posts on Binpipe Blog