Thursday, November 20, 2014

Cassandra - A Linear Scalable Database for Output Messenger

After getting positive feedback from Output Messenger v1.0 customers, we are planning to press the accelerator further to make sure all features in instant messaging world is available in our Output Messenger too.

One of the request from customers is a hosted version. Our existing architecture is designed as a self-hosted version & scalable to support 10,000 users inside a company. But for a cloud based hosted version, obviously needs more scalability & availability.

While designing the needed changes in existing design, need to choose a scalable database to store the chat message history. As there will be more write operations, we preferred Log Structured Merge tree (LSM Tree) & started reviewing few databases.
The top in our review list was "Cassandra"

Apache Cassandra - A linear scalable database designed for high performance best in class Write & Read, with scalability & availability.
Cassandra's database is designed as a master less architecture with fully distributed & therefore no single-point-of-failure (SPOF).

The Write record gets replicated across multiple nodes & so there is no master / slave node.  Even it gets replicated across multiple nodes, the client will wait for only one node confirmation.
blog_cassandra1


Also by having LSM Tree structure, the insert data will be kept in memory, later it will be appended in disk & merged. So, Write operations will be fast without waiting for any index ordering.
Another important advantage with Cassandra is their column indexes.
Let's see how a Composite-keyed index (primary key + column index(s)) may help for our instant messenger chat history logs.
Assume we are having the chat history table structure as below:
 create table chatmessages (
  chatroomid int,
  senton timestamp,
  sender varchar,
  message varchar,
  PRIMARY KEY (chatroomid, senton)
   );
 
While querying
select * from chatmessages


chatroomidsentonfrommessage
124112569537329ramHi
124112569537411laxmanHi Ram,How ar..
124112569537523ramFine..I need a..
124112569537758laxmanYes. Sure. I can …
 
It seems to be normal database result/operations. We may assume data is also stored in this format.
But, the interesting part is Cassandra stores in its own way.
Of the 4 columns declared in create table, will result as 2 columns while insert operation.
row_key : chatroomid (first key declared in primary key)
column_name_1: senton + from  (from value will be stored in this column)
column_name_2: senton + message (message value will be stored in this column)
ie, The first of Primary key will be the row key. Subsequent primary key column values will be the prefix of the non-primary columns.


As below our data look in Cassandra's storage model:
chatroomid12569537329 from12569537329 message12569537411 from12569537411 message
1241ramHilaxmanHi Ram,H..


The columns keep on growing with messages.
While fetching chat room messages for a particular period, our select query will be:
select * from chatmessages where chatroomid=1241 and senton >= '2014-11-13 00:00:00+0200' AND senton <= '2014-11-20 23:59:00+0200'

We have to always use chatroomid in filter as it is the main row key.
Also  Cassandra has CQL (Cassandra Query Language) much similar to SQL, which gives familiar way to develop back end.
With all these advantages in consideration, let's wait & see, can Cassandra join with our Output Messenger family tools.

Happy Messaging!






Friday, September 26, 2014

Tirunelveli to Mars

Sep 24th, 2014 a milestone of our India's space research organization, our spacecraft Mars orbiter (MOM) entered the Mars Orbit.

Some may think United States NASA has did this already in 1965 & is it really a achievement to do this now ? For them, Yes, this is our great & unique achievement.

The biggest achievement of ISRO is making the spacecraft with self-thinking brain. Mars orbiter can think & act on its own.

Our Satellite has travelled 680 million kms for about 300 days. Driving in proper route in space is interesting, as we know in space every thing around will look same. To stay on correct path, MOM used the star-gazer to look at positions of six to 10 stars for every microsecond and compare them with its preloaded patterns. Distant stars will be preferred as they are relatively stationary. By continuously matching the patterns, MOM determined its position & direction on its own. Even by travelling at a speed of more than 82,000 kmph, it never lost the direction.

While in travel, MOM automatically controls its temperature, position its antenna constantly towards earth for communication and its solar panels towards the sun to generate power. All these are done without any major input from earth.  Scientists call it autonomy.

To enter Mars orbit, our scientists from ISRO in Bangalore stored commands in MOM's brain 10 days in advance and Mars orbiter executed it perfectly.

This is just a start for future flying satellites. “Like migratory birds, satellites with autonomy can orbit Earth, look at the same things from different angles or look at different things and correlate data,“ says T.K.Alex  Indian Space Commission member. ISRO chairman K Radhakrishnan says this project is 80% technology demonstration.  Our ISRO has demonstrated the technology of autonomy in style to the universe.

Another information to feel us proud is, the Project Director of this mission is Mr.Subbiah Arunan, from Tirunelveli. It again proves our soil, water & air are filled with talent molecules. We are born in this land to achieve some thing.  If we have the confident, we will & can travel to any height!!




Mr. Subbiah Arunan, Project Director of MOM. He likes MGR & James Bond movies !!



Mars Orbiter Spacecraft captures its first image of Mars. Taken from a height of 7300 km; with 376 m spatial resolution



First image of the Earth by Mars Color Camera(MCC) of Mars orbiter spacecraft taken on Nov 19,2013 at 13:50 hrs (IST) from 67975 km altitude.




Friday, August 1, 2014

Table Index – Part 2, Multiple Column Index

In Part 1, we got some picture on choosing the column for index.
Here we are going to see the factors to be considered for Multicolumn index.

When to use Multiple Column Index ?
Consider a sales table with following columns:
id, company_id, sales_date, client_id , amount, remarks

If we filter the sales table based on any one field like
…from sales where company_id = 1;
…from sales where sales_date = ’2014-07-31′;
…from sales where client_id = 12 ;
it is better to have separate index for each field than multiple index.

But when multiple columns are used in filter like
…from sales where company_id = 1 and sales_date=’2014-07-31′;
…from sales where company_id = 1 and sales_date=’2014-07-31′ and client_id=12;

we should consider multiple column index than single column index for each field.
The index can be created as (company_id, sales_date, client_id)

One smart thing in multiple column index is, all the columns defined in index need not to be used in Filter column.
The index(company_id, sales_date, client_id) can support the following queries too
…from sales where company_id = 1
…from sales where company_id = 1 and sales_date=’2014-07-31′;



How Multiple Index are stored ?
Here Index will be maintained in a dictionary with the specified columns + Primary key.
The values will be sorted based on the specified column order.

ix_multiple

Choosing columns is an Art
The columns should be chosen cleverly & defined in the particular order aiming the result.
The queries should also be build considering the index. If ignored, then index will also ignore us.

For example a Index  defined  for (col1,col2,col3)

@ Where  Condition

  • SELECT * FROM tbl_name WHERE col1 = val1  AND col2 = val2  AND col3 = val3; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 = val1 AND col2 = val2; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 = val1 (INDEX USED)
  • SELECT * FROM tbl_name WHERE col2 = val2; (INDEX NOT USED)
  • SELECT * FROM tbl_name WHERE col2 = val2 AND col3 = val3; (INDEX NOT USED)
  • SELECT * FROM tbl_name WHERE col1 = val1 AND col2 > val2; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 > val1 AND col2 = val2; (INDEX NOT USED)
  • SELECT * FROM tbl_other JOIN tbl_name on tbl_other.col2 = tbl_name.col2; (INDEX NOT USED)
  • SELECT * FROM tbl_other JOIN tbl_name on tbl_other.col2 = tbl_name.col2 where tbl_name.col1 = val1(INDEX USED)

@ Group By

  • SELECT * FROM tbl_name group by col1, col2, col3; (INDEX USED)
  • SELECT * FROM tbl_name group by col1, col2; (INDEX USED)
  • SELECT * FROM tbl_name group by col1; (INDEX USED)
  • SELECT * FROM tbl_name group by col1, col2,col3,col4(INDEX NOT USED)
  • SELECT * FROM tbl_name group by col2, col3; (INDEX NOT USED)
  • SELECT DISTINCT col1, col2 FROM tbl_name; (INDEX USED)
  • SELECT col1, MIN(col2) FROM tbl_name GROUP BY col1; (INDEX USED)
  • SELECT col1, col2 FROM tbl_name WHERE col1 < const GROUP BY col1, col2; (INDEX USED)
  • SELECT MAX(col3), MIN(col3), col1, col2 FROM tbl_name WHERE col2 > const GROUP BY col1, col2; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 < const GROUP BY col1, col2; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col2 < const GROUP BY col1, col3; (INDEX NOT USED)
  • SELECT * FROM tbl_name WHERE col3 = const GROUP BY col1, col2; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 = const GROUP BY col2, col3; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col2 = const GROUP BY col1, col3; (INDEX USED)

@ Order By

  • SELECT * FROM tbl_name ORDER BY col1, col2, col3, … ; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 = constant ORDER BY col2, col3; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 = constant ORDER BY col3, col2; (INDEX NOT USED)
  • SELECT * FROM tbl_name ORDER BY col1, col2 ; (INDEX USED)
  • SELECT * FROM tbl_name ORDER BY col2, col3, … ; (INDEX NOT USED)
  • SELECT * FROM tbl_name ORDER BY col1, col3 (INDEX NOT USED)
  • SELECT * FROM tbl_name WHERE col2 = constant ORDER BY col1, col3; (INDEX USED)
  • SELECT * FROM tbl_name ORDER BY col1 DESC, col2 DESC; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 = constant ORDER BY col2 DESC; (INDEX NOT USED)
  • SELECT * FROM tbl_name WHERE col1 = constant ORDER BY col1 DESC, col2 DESC; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 > val1 ORDER BY col1 ASC; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 < val1 ORDER BY col1 DESC; (INDEX USED)
  • SELECT * FROM tbl_name WHERE col1 = val1 AND col2 > val2 ORDER BY col2; (INDEX USED)



Factors to be considered in Multiple Index Column Order
Make sure the first column should not be used for range operations. In that case, further columns will have no effect.
From the above example: SELECT * FROM tbl_name WHERE col1 > val1 AND col2 = val2; (INDEX NOT USED)

Also use the columns first, whose filter results has less rows than filtering other rows.
For example, You want to filter a School students table by Subject and Class.
Select count(*) from students where subject = ‘ENGLISH’ and class=’IV’ ;
Let’s check each filter possible result count:
Select count(*) from students where subject = ‘ENGLISH’ ;
Result: 469
Select count(*) from students where class = ‘IV’ ;
Result: 29
So when we filter using class first, we will filter most of the records. Obviously, less scanning of records for further column(s) filter in index.
Thus the order of column index should be (class, subject)

Hope this helps to under the use of Multiple Column Index and also the factors to be considered in creating them.

Friday, July 25, 2014

Table Index – Part 1, Choosing the right column

Simple to say that Indexing is essential for filtering/sorting/grouping a table having more number of rows. But the real challenge is choosing the right column for index.
Before that, Let’s have a quick understand of Index  & how it is helpful in filtering the rows:
  • Index can be classified as Clustered Index & Non-Clustered/Secondary Index.
  • A Table will have only one Clustered Index, but can have multiple Secondary Index.
  • In MySQL, Primary Key is considered as a Clustered Index.
  • Rows are sorted & stored in the table based on this Clustered Index. The row values are stored closed to the Clustered Index column.
  • A Secondary Index will be maintained in a separate dictionary with that Indexed column (sorted) & the primary key column (to link the clustered index). If Primary Key index is long, secondary index use more space.
  • A example of Secondary Index :
    ix_img1
  • When we filter rows based on indexed column, instead of scanning all rows, only the needed rows will be scanned. (as the indexed values will be sorted)ix_img2
As it is clear there is a cost for creating index, we cannot simply create index for all the columns we use in WHERE clause.  Index affects the performance of Write Queries (Insert / Update / Delete) and it also consumes space in database.

 So selecting the right column is important.
In General, A column having more distinct values is good for Index.

 Rule of thumb :
Index Selectivity Formulae = (Count of distinct values/Number of Rows) * 100%
If Selectivity is > 8 % it can be considered for Index
In other simple words, when you filter a query by using an indexed column, you should get less than 15% of rows.

For example, if you need to filter an 10,000 records table based on Gender (the 2 distinct values will be Male / Female)
Selectivity = 2 / 10000 * 100% = 0.002% and so it should not be used for index.
ix_img3

As we can see cursor needs to scan more rows in the Index table & then map with the prime table. This has no advantage than scanning entire rows of the Primary table.
Also this selectivity formulae cannot be followed simply for all. We should consider the volume of data for each distinct value.

 For example, an Indian Ecommerce store having 500 customers from more than 50 countries. Based on our rule, it is eligible. But when you analyze the data, if 400 customers are from India alone, then there is no need for Index.

 So with selectivity formulae, you have to judge based on the possible volume of data.
In Part 2, We can see how to pick the columns for Multiple Column Index.

Monday, March 24, 2014

Webbrowser control drag drop Multiple Files

In WebBrowser control of .NET Winform, there is no drag & drop event. But we had a requirement to catch the dropped files. By using beforenavigate event, we can catch only one dropped file and not able to pick up all the dropped files.

The simple trick is to disable the Web browser control while dropping, so that the form dragdrop event will be fired, from where we can get all the files. But there is no disable property for Webbrowser control. To overcome that, place the webbrowser control inside a Panel & disable the panel, so that the webbrowser control will also be disabled.

Next question will be, When to disable the Panel control ?
Form Activate / Deactivate, since while we start dragging a file obviously the form will be deactivated & we can disable the Panel. On(after) drop, the form will be activated & can enable the Panel control.

The code is :
1. In Form1, placed the webbrowser control in the “Panel1″ panel control.
2. Applied the following code in your form

 

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
     Me.AllowDrop = True
End Sub



Private Sub Form1_DragDrop(ByVal sender As System.Object, ByVal e As System.Windows.Forms.DragEventArgs) Handles Me.DragDrop
     Dim files() As String = e.Data.GetData(DataFormats.FileDrop)
     For Each filePath In files
          MsgBox(filePath)
     Next
End Sub



Private Sub Form1_DragEnter(ByVal sender As System.Object, ByVal e As System.Windows.Forms.DragEventArgs) Handles Me.DragEnter
     If e.Data.GetDataPresent(DataFormats.FileDrop) Then
          e.Effect = DragDropEffects.Copy
     End If
End Sub



Private Sub Form1_Activated(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Activated
     ' To Enable Web Browser control
     Panel1.Enabled = True
End Sub



Private Sub Form1_Deactivate(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Deactivate
     ' To Disable Web Browser control
     Panel1.Enabled = False
End Sub

Remove Windows 8 focus border

To remove the annoying focus border (not sure how it turned ON)

1. Go the Narrator Settings
2. Turn off "Highlight the Cursor" setting.

Wednesday, March 12, 2014

Come Fast my "CSS & JS"

A slow loading web page wastes the precious time of so many visitors. The loading speed of a website depends on several factors, of which loading CSS & JS files in a web page has a huge impact. Unfortunately developers used to miss, since in local development server we may not smell this but while in production remote server, it drags the loading time.  If few minutes we spent on optimizing the loading, it will save several minutes of several people.

Let’s try how to optimize the loading time. From one of a website, taken the CSS & JS files & loaded in an empty html page.   We can see the loading time in Firebug
01_FullDownload1

Nearly 20 CSS/JS files are used & it leads to 20 connection requests. Most browsers are supporting 6 concurrent connections per host & so there will be a six parallel downloading of files.  Rest files will be in waiting mode. That may be the reason for long loading time. To minimize that, let’s try by combining all the CSS & JS files in a single file. (To merge several css/js files into a single file, we can use server side scripting languages)
The result is
02_Merged2
We were able to save some seconds. But still it is slow, may be due to the size of the content. To reduce the file content size, let’s minify both css & js.


03_MergedwithMin2
We have saved some file size & some seconds. But still slow, let’s try to compress further using gzip. We can do that in our web server. (Since we are using IIS, enabled “compression for static files” in IIS Manager).
Let’s refresh the browser & check

04_Gzip_MergedwithMin1
That’s better. We have saved nearly 4.5 seconds & also the bandwidth.

But while using as a merged single file, we may have other issue too. Of several files, there may be a frequent updating of a single file, which leads to a new version of entire merged file & it leads to download often, which ignores the browser cache advantage.
In such cases, hosting the static files in CDN (Content Delivery Network) servers is a better option. And also, linking files from multiple domains provide an advantage of parallel downloading. If we have 100 static files, we can host them in 4 different domains, like splitting the resources 25 per domain. We should not forget, linking multiple domains, will take some time for DNS lookup.  As a rule of thumb, make sure a domain serves at least 10 files.

Back to our case, let’s use a CDN server to download few files & check the download speed
05_MultiDomainCDN1
Yes, as expected it is saving the loading time.

Based on our project requirements, we should plan merging, compressing & hosting in other domains.

Now we are nearly done with initial page load. What happen, when the browser navigate to the next page?  Are the files will download again for each page or browser cache will rescue us. Let’s check.

06_FullwithBrowserCache
That’s great. Files were not downloaded again. For browser’s request, server returns 304 & so browser has used its cache.  Thus it takes nearly 1 second only for further pages. We are fast now.

But wait, why the browser connects & requests the server for each file on each page load? It is an overhead for both the browser & server. How to force the browser not to check with server for any update & just use its cache?  The solution is, we can set an expiry time of a file from server, so that browser will use that file directly from cache & not disturb the server.

To set expiry details in header, in IIS manager:
We can set cache control by seconds or provide an exact expiry date.

07_Cache_IIS

Now going to our browser & check by refresh.

07_FullwithBrowser_ServerCache1
On first load, we can see the page downloaded with headers for cache added.

Let’s check how cache works by navigating the same page again:

07_FullwithBrowser_ServerCache3

0 Seconds!!! No download or request from browser to server, but the entire content from cache.
Let’s appreciate our self & also make sure to apply needed optimization in future development.

Further few points to optimize for CSS/JS:
- Link all CSS files first, before JS files.
- Request static content files from a cookie-less domain
- If a piece of CSS/JS code for a particular page & no need of caching, better use inline code of the page.
- Partition JS into two files. Need to render at startup should be included in header & other JS after page loaded or load asynchronous.

Saturday, February 22, 2014

“Key” for Secure Data Transmission

“Secure” the most used word in communication world.  Most of our Private Office Instant Messenger enterprise customer’s prime question   “Is Our chat messages are encrypted  ?”. Let’s see the methods available in cryptography and the solutions used in many applications.

Symmetric Key Encryption:
In Symmetric Encryption, a single key is used for both encrypting & decrypting the message. To transform a message from one computer to other, both computers will have a same shared key. The sender will encrypt the data using that key and it is sent over the network to the other computer, where the encrypted packets will be decrypted using the same key.

Asymmetric Key (or Public Key) Encryption:
In Asymmetric Encryption, Pair of keys is used – Public & Private. The Sender will have a Public & Private key pair and the receiver will have a different Public & Private Key pair.  Public key is used to encrypt the data & private key is used to decrypt that encrypted data. Both Sender & Receiver share their Public Key, whereas the private key will remain as a secret key & is never exposed. The sender will encrypt the data using receiver’s Public Key and sent over the network to the receiver. The receiver by using private key, can decrypt that data.

When & What to Use :
Symmetric Key is mostly used to transmit the data in network, since it will be faster & simple for communication. BUT to manage the symmetric key between transmitting & receiving nodes Asymmetric key is used. For example, in our Output Messenger Server, all clients are connected under SSL. The same Web Server & Browser logic is followed here.
Secure Private Messenger
As per the above diagram, Server has Public-Private key pair, sends the public key to the client. Client generates a symmetric key & return to server by encrypting with Public Key.  Server decrypts the data using its Private key & stores the symmetric key for that client session. All further communication with the client are encrypted/decrypted using the symmetric key.

Algorithms:
For Asymmetric  RSA Algorithm (2048 bit) is the most common used Algorithm
For Symmetric  AES Algorithm (256/128 bit) is used in most applications.