Thursday, November 20, 2014

Cassandra - A Linear Scalable Database for Output Messenger

After getting positive feedback from Output Messenger v1.0 customers, we are planning to press the accelerator further to make sure all features in instant messaging world is available in our Output Messenger too.

One of the request from customers is a hosted version. Our existing architecture is designed as a self-hosted version & scalable to support 10,000 users inside a company. But for a cloud based hosted version, obviously needs more scalability & availability.

While designing the needed changes in existing design, need to choose a scalable database to store the chat message history. As there will be more write operations, we preferred Log Structured Merge tree (LSM Tree) & started reviewing few databases.
The top in our review list was "Cassandra"

Apache Cassandra - A linear scalable database designed for high performance best in class Write & Read, with scalability & availability.
Cassandra's database is designed as a master less architecture with fully distributed & therefore no single-point-of-failure (SPOF).

The Write record gets replicated across multiple nodes & so there is no master / slave node.  Even it gets replicated across multiple nodes, the client will wait for only one node confirmation.
blog_cassandra1


Also by having LSM Tree structure, the insert data will be kept in memory, later it will be appended in disk & merged. So, Write operations will be fast without waiting for any index ordering.
Another important advantage with Cassandra is their column indexes.
Let's see how a Composite-keyed index (primary key + column index(s)) may help for our instant messenger chat history logs.
Assume we are having the chat history table structure as below:
 create table chatmessages (
  chatroomid int,
  senton timestamp,
  sender varchar,
  message varchar,
  PRIMARY KEY (chatroomid, senton)
   );
 
While querying
select * from chatmessages


chatroomidsentonfrommessage
124112569537329ramHi
124112569537411laxmanHi Ram,How ar..
124112569537523ramFine..I need a..
124112569537758laxmanYes. Sure. I can …
 
It seems to be normal database result/operations. We may assume data is also stored in this format.
But, the interesting part is Cassandra stores in its own way.
Of the 4 columns declared in create table, will result as 2 columns while insert operation.
row_key : chatroomid (first key declared in primary key)
column_name_1: senton + from  (from value will be stored in this column)
column_name_2: senton + message (message value will be stored in this column)
ie, The first of Primary key will be the row key. Subsequent primary key column values will be the prefix of the non-primary columns.


As below our data look in Cassandra's storage model:
chatroomid12569537329 from12569537329 message12569537411 from12569537411 message
1241ramHilaxmanHi Ram,H..


The columns keep on growing with messages.
While fetching chat room messages for a particular period, our select query will be:
select * from chatmessages where chatroomid=1241 and senton >= '2014-11-13 00:00:00+0200' AND senton <= '2014-11-20 23:59:00+0200'

We have to always use chatroomid in filter as it is the main row key.
Also  Cassandra has CQL (Cassandra Query Language) much similar to SQL, which gives familiar way to develop back end.
With all these advantages in consideration, let's wait & see, can Cassandra join with our Output Messenger family tools.

Happy Messaging!






Friday, September 26, 2014

Tirunelveli to Mars

Sep 24th, 2014 a milestone of our India's space research organization, our spacecraft Mars orbiter (MOM) entered the Mars Orbit.

Some may think United States NASA has did this already in 1965 & is it really a achievement to do this now ? For them, Yes, this is our great & unique achievement.

The biggest achievement of ISRO is making the spacecraft with self-thinking brain. Mars orbiter can think & act on its own.

Our Satellite has travelled 680 million kms for about 300 days. Driving in proper route in space is interesting, as we know in space every thing around will look same. To stay on correct path, MOM used the star-gazer to look at positions of six to 10 stars for every microsecond and compare them with its preloaded patterns. Distant stars will be preferred as they are relatively stationary. By continuously matching the patterns, MOM determined its position & direction on its own. Even by travelling at a speed of more than 82,000 kmph, it never lost the direction.

While in travel, MOM automatically controls its temperature, position its antenna constantly towards earth for communication and its solar panels towards the sun to generate power. All these are done without any major input from earth.  Scientists call it autonomy.

To enter Mars orbit, our scientists from ISRO in Bangalore stored commands in MOM's brain 10 days in advance and Mars orbiter executed it perfectly.

This is just a start for future flying satellites. “Like migratory birds, satellites with autonomy can orbit Earth, look at the same things from different angles or look at different things and correlate data,“ says T.K.Alex  Indian Space Commission member. ISRO chairman K Radhakrishnan says this project is 80% technology demonstration.  Our ISRO has demonstrated the technology of autonomy in style to the universe.

Another information to feel us proud is, the Project Director of this mission is Mr.Subbiah Arunan, from Tirunelveli. It again proves our soil, water & air are filled with talent molecules. We are born in this land to achieve some thing.  If we have the confident, we will & can travel to any height!!




Mr. Subbiah Arunan, Project Director of MOM. He likes MGR & James Bond movies !!



Mars Orbiter Spacecraft captures its first image of Mars. Taken from a height of 7300 km; with 376 m spatial resolution



First image of the Earth by Mars Color Camera(MCC) of Mars orbiter spacecraft taken on Nov 19,2013 at 13:50 hrs (IST) from 67975 km altitude.




Friday, August 1, 2014

Table Index – Part 2, Multiple Column Index

In Part 1, we got some picture on choosing the column for index.
Here we are going to see the factors to be considered for Multicolumn index.

When to use Multiple Column Index ?
Consider a sales table with following columns:
id, company_id, sales_date, client_id , amount, remarks

If we filter the sales table based on any one field like
…from sales where company_id = 1;
…from sales where sales_date = ’2014-07-31′;
…from sales where client_id = 12 ;
it is better to have separate index for each field than multiple index.

But when multiple columns are used in filter like
…from sales where company_id = 1 and sales_date=’2014-07-31′;
…from sales where company_id = 1 and sales_date=’2014-07-31′ and client_id=12;

we should consider multiple column index than single column index for each field.
The index can be created as (company_id, sales_date, client_id)

One smart thing in multiple column index is, all the columns defined in index need not to be used in Filter column.
The index(company_id, sales_date, client_id) can support the following queries too
…from sales where company_id = 1
…from sales where company_id = 1 and sales_date=’2014-07-31′;



How Multiple Index are stored ?
Here Index will be maintained in a dictionary with the specified columns + Primary key.
The values will be sorted based on the specified column order.

ix_multiple

Choosing columns is an Art
The columns should be chosen cleverly & defined in the particular order aiming the result.
The queries should also be build considering the index. If ignored, then index will also ignore us.

For example a Index  defined  for (col1,col2,col3)

@ Where  Condition

  • SELECT * FROM tbl_name WHERE col1 = val1  AND col2 = val2  AND col3 = val3; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 = val1 AND col2 = val2; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 = val1 (INDEX USED)
  • SELECT * FROM tbl_name WHERE col2 = val2; (INDEX NOT USED)
  • SELECT * FROM tbl_name WHERE col2 = val2 AND col3 = val3; (INDEX NOT USED)
  • SELECT * FROM tbl_name WHERE col1 = val1 AND col2 > val2; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 > val1 AND col2 = val2; (INDEX NOT USED)
  • SELECT * FROM tbl_other JOIN tbl_name on tbl_other.col2 = tbl_name.col2; (INDEX NOT USED)
  • SELECT * FROM tbl_other JOIN tbl_name on tbl_other.col2 = tbl_name.col2 where tbl_name.col1 = val1(INDEX USED)

@ Group By

  • SELECT * FROM tbl_name group by col1, col2, col3; (INDEX USED)
  • SELECT * FROM tbl_name group by col1, col2; (INDEX USED)
  • SELECT * FROM tbl_name group by col1; (INDEX USED)
  • SELECT * FROM tbl_name group by col1, col2,col3,col4(INDEX NOT USED)
  • SELECT * FROM tbl_name group by col2, col3; (INDEX NOT USED)
  • SELECT DISTINCT col1, col2 FROM tbl_name; (INDEX USED)
  • SELECT col1, MIN(col2) FROM tbl_name GROUP BY col1; (INDEX USED)
  • SELECT col1, col2 FROM tbl_name WHERE col1 < const GROUP BY col1, col2; (INDEX USED)
  • SELECT MAX(col3), MIN(col3), col1, col2 FROM tbl_name WHERE col2 > const GROUP BY col1, col2; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 < const GROUP BY col1, col2; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col2 < const GROUP BY col1, col3; (INDEX NOT USED)
  • SELECT * FROM tbl_name WHERE col3 = const GROUP BY col1, col2; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 = const GROUP BY col2, col3; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col2 = const GROUP BY col1, col3; (INDEX USED)

@ Order By

  • SELECT * FROM tbl_name ORDER BY col1, col2, col3, … ; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 = constant ORDER BY col2, col3; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 = constant ORDER BY col3, col2; (INDEX NOT USED)
  • SELECT * FROM tbl_name ORDER BY col1, col2 ; (INDEX USED)
  • SELECT * FROM tbl_name ORDER BY col2, col3, … ; (INDEX NOT USED)
  • SELECT * FROM tbl_name ORDER BY col1, col3 (INDEX NOT USED)
  • SELECT * FROM tbl_name WHERE col2 = constant ORDER BY col1, col3; (INDEX USED)
  • SELECT * FROM tbl_name ORDER BY col1 DESC, col2 DESC; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 = constant ORDER BY col2 DESC; (INDEX NOT USED)
  • SELECT * FROM tbl_name WHERE col1 = constant ORDER BY col1 DESC, col2 DESC; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 > val1 ORDER BY col1 ASC; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 < val1 ORDER BY col1 DESC; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 = val1 AND col2 > val2 ORDER BY col2; (INDEX USED)



Factors to be considered in Multiple Index Column Order
Make sure the first column should not be used for range operations. In that case, further columns will have no effect.
From the above example: SELECT * FROM tbl_name WHERE col1 > val1 AND col2 = val2; (INDEX NOT USED)

Also use the columns first, whose filter results has less rows than filtering other rows.
For example, You want to filter a School students table by Subject and Class.
Select count(*) from students where subject = ‘ENGLISH’ and class=’IV’ ;
Let’s check each filter possible result count:
Select count(*) from students where subject = ‘ENGLISH’ ;
Result: 469
Select count(*) from students where class = ‘IV’ ;
Result: 29
So when we filter using class first, we will filter most of the records. Obviously, less scanning of records for further column(s) filter in index.
Thus the order of column index should be (class, subject)

Hope this helps to under the use of Multiple Column Index and also the factors to be considered in creating them.