poll, ping, pubsub, pubsubhub, pubsubhubbub

September 6, 2009 by · 5 Comments
Filed under: technology 

In case you are skeptical, there is a method in the madness above, in the title of the post. In tech circles it is unlikely that you haven’t heard of pubsubhubbub. In fast few months, it has been one of the top three talked about things. Other two being homomorphic encryption and Google wave[ 1.]. Coming back to current post – terms in the title indicate how you get the contents from other people. Following details become more clear, if you imagine things happening with respect to blogs, even though they are conceptually not restricted to blogs.

Polling refers to the scenario where clients keep asking the server if something new has come up. How often to ask for updates will always be a problem with polling too frequent or too infrequent, but as one reader pointed out here that one great , thing with polling is that server doesn’t have to maintain state.

Ping refers to the case where when post an article, you(or your software) also updates some popular (central) update services. Some background here.

Next is pubsub which stands for publish/subscribe , one of the earliest pitch for it was made here by Evan Henshaw-Plath and Kellan Elliott-McCrea (72 slides but worth going through). Compelling example they gave against was this – on a particular date, Friendfeed crawled Flickr 2.9 million times to get the latest photos of 45,754 users, of which 6,721 had visited Flickr in those 24 hrs and could have ‘potentially’ uploaded a photo . Note that what they proposed was not a new technology, as they point out ‘revolutionary new 20 year old technology’ . If you do it for blogs then one of the major problems with xmpp – presence data overhead, which may be as high as 60-70 % can be reduced a lot.

pubsubhub stands for publish subscribe hub and pubsubhubbub is a protocol , core of which is idea of pubsubhub. Wherein publishers(say bloggers) update the hub which(may be more than one hubs, which talk to each other) resides ‘somewhere in cloud’, as per protocol this can be push or pull as per the protocol but the next link in the chain, hub to client(say readers) it is always push model. This page is good starting point for pubsubhubbub, overview slides are good. Ever eloquent Anil Dash describes it here as pushbutton web.
Two other related reads are webhooks which is basically http callbacks. Github uses it, so does paypal for asynchronous notifications of payment in ipn.So do many others. Related concept is rsscloud which is again pubsub hub. Follow this link for details.

You might be wondering what is the point of writing all these here, there are two 1. These things are worth knowing, minimally at least and 2. This blog is pubsubhubbub enabled now via appspot hub using this wordpress plugin and feed too is pubsubhubbub enabled via feedburner, link here

[1]. – Good introductory read for fully homomorphic encryption is this article by Bruce Schneier and this talk is more or less only source for Google wave.

-UPDATE – This post explains how the requirement of public server for callback can be worked around for desktop clients by using xmpp gateway(for pubsubhubbub).

A bit of assembly

August 29, 2009 by · 1 Comment
Filed under: technology 

After reading this article about lock free buffers and seeing the use of CAS(compare and swap), I felt like posting the assembly code to do the same. Use case over there was to write a native method and call it from Java(back in 1.5 , when concurrent data structures in Java were more or less non existent). Without further ado , I’ll unleash the code onto you :) . First is for CAS and second is for computing GCD using Euclid’s algorithm(this one can be found in many places and tutorials as well).

Compile and run instructions gcc file_name.c ; ./a.out

Compare and Swap

  #include 
 #include 
//exchange - newvalue, comperand is old/expected value
/*
 * Function actually does the following thing - if the value at *dest is equal to oldvalue then replace it by newvalue else leave it unchanged : do all these atomically
 *
 * There are two options for return value
 *  1.is initial value of *dest and leave the burden of calling fxn to compare it with oldval
 *  2. do it over here and return 0 or 1, this should be more efficient
 * */

/* later change it into macro  */
int cas(int* dest, int oldvalue, int newvalue){
	printf("(%d,%d,%d)",*dest,oldvalue,newvalue);
	/* int cas(int dest, int oldvalue, int newvalue){ */
	/* int cas(int dest, int newvalue, int oldvalue){ */
	int result= 1;/* 1 shows that cas succeeded and 0 shows that it failed  */
	/* btw need to set cc for flag clobbering !  */
	__asm__ __volatile__(
			"movl %2, %%eax\n\t"
			"movl %3, %%ebx\n\t"
			"movl %0, %%ecx\n\t"
			"LOCK\n\t"
			"CMPXCHG %%ebx, (%%ecx)\n\t"  /* should LOCK be on the same line  */
			"jz DONE\n\t"
			"movl $0, %1\n\t "
			"DONE: \n\t"
			:"=m"(dest),"=g"(result)
			:"g" (oldvalue),"g" (newvalue),"m"(dest)
			:"%eax","%ebx","ecx","cc"
			);
	printf("(%d,%d,%d)",*dest,oldvalue,newvalue);
	return result; 
}


/* TODO
 * write another asm fxn which puts above fxn in a while loop and keep trying unless it succeeds*/

int main(){
	int a= 5,b= 6;
	int* c = (int*) malloc(sizeof(int));
	*c= 6;
	/* int c= 6; */
	printf("%d\n",cas(c,b,b));
	printf("%d\n",cas(c,b,a));
	printf("%d\n",cas(c,a,a));
	printf("%d\n",cas(c,b,b));
	*c= 6;
	/* c= 5; */
	printf("changing value of *c to %d\n",*c);
	printf("%d\n",cas(c,b,b));
	printf("%d\n",cas(c,b,a));
	printf("%d\n",cas(c,a,a));
	printf("%d\n",cas(c,a,b));
	printf("%d\n",cas(c,b,a));
	return 0;
}

Formatting notes – seems like wp syntax highlighter is adding in the end , ignore that.

GCD

#include 
int gcd( int a, int b ) {
    int result ;
    /* Compute Greatest Common Divisor using Euclid's Algorithm */
    __asm__ __volatile__ ( "movl %1, %%eax;"
                          "movl %2, %%ebx;"
                          "CONTD: cmpl $0, %%ebx;"
                          "je DONE;"
                          "xorl %%edx, %%edx;"
                          "idivl %%ebx;"
                          "movl %%ebx, %%eax;"
                          "movl %%edx, %%ebx;"
                          "jmp CONTD;"
                          "DONE: movl %%eax, %0;" : "=g" (result) : "g" (a), "g" (b)
    );

    return result ;
}

int main() {
    int first, second ;
    printf( "Enter two integers : " ) ;
    scanf( "%d%d", &first, &second );

    printf( "GCD of %d & %d is %d\n", first, second, gcd(first, second) ) ;

    return 0 ;
}

ruby one liners to create hash

August 20, 2009 by · 1 Comment
Filed under: technology 

Yesterday while going through one of my old programs, I found this written by me sometime back :

#begin magic
hash=Hash[*CGI.unescape(raw_text).split('&').map{|x| b=x.split("=");b.push(nil) if b.size==1;b}.flatten]
#end magic

To kill some of suspense let me disclose that raw_text looks like

"SUCCESS&mc_gross=10.00&protection_eligibility=Ineligible&payer_id=U7PPJJ4TSJ47E&tax=0.00&payment_date=09%3A45%3A30+Jul+10%2C+2009+PDT&payment_status=Pending"

, right it has been cut from paypal payment acknowledgment.

Above line if broken in parts reads better :

  unescaped_array= CGI.unescape(raw_text).split('&')
  unescaped_array= unescaped_array.collect{|x| b=x.split("=");b.push(nil) if b.size==1;b}
  flattened_array= unescaped_array.flatten
  hash= Hash[*flattened_array]

Let’s do individual steps in irb:

irb(main):009:0>unescaped_array= CGI.unescape(raw_text).split('&')    

=> ["SUCCESS", "mc_gross=10.00", "protection_eligibility=Ineligible", "payer_id=U7PPJJ4TSJ47E", "tax=0.00", "payment_date=09:45:30 Jul 10, 2009 PDT", "payment_status=Pending"]                                                                         

irb(main):013:0> unescaped_array= unescaped_array.map{|x| b=x.split("=");b.push(nil) if b.size==1;b}  

=> [["SUCCESS", nil], ["mc_gross", "10.00"],["protection_eligibility", "Ineligible"], ["payer_id", "U7PPJJ4TSJ47E"], ["tax", "0.00"], ["payment_date", "09:45:30 Jul 10, 2009 PDT"], ["payment_status", "Pending"]]                               

irb(main):014:0> flattened_array= unescaped_array.flatten     

=> ["SUCCESS", nil, "mc_gross", "10.00", "protection_eligibility", "Ineligible", "payer_id", "U7PPJJ4TSJ47E", "tax", "0.00", "payment_date", "09:45:30 Jul 10, 2009 PDT", "payment_status", "Pending"]
irb(main):015:0>
hash= Hash[*flattened_array]
=> {"tax"=>"0.00", "payment_status"=>"Pending", "payer_id"=>"U7PPJJ4TSJ47E", "mc_gross"=>"10.00", "SUCCESS"=>nil, "payment_date"=>"09:45:30 Jul 10, 2009 PDT", "protection_eligibility"=>"Ineligible"}

BTW, * is called splat operator in ruby

Another way to create hash from ‘array of pairs ‘ is to use inject :

hash= [[1,2],[3,4]].inject({}){|result,element| result[element.first]= result[element.last]; result}

There is one more way :) Write a loop, that I’ll leave as an exercise to the readers !!

Here is a bit unrelated use case of creating hash from arrays:

irb(main):005:0> [1,2,3,4,7,9].group_by{|x| x<5 ? :lesser : :greater}

=> {:lesser=>[1, 2, 3, 4], :greater=>[7, 9]}

You can do more things, basically result of the block is used as the key for that element in the resulting hash.

gmail, mutt and msmtp fix

August 17, 2009 by · 3 Comments
Filed under: technology 

If you use mutt and smtp to access gmail . Here is a (bad) news. Cool guys at Google again changed certificate. Oh, did you ask – how do it know it ? Simple mutt started complaining about bad certificate when trying to use msmpt, infamous ‘msmtp: TLS certificate verification failed: the certificate hasn’t got a known issuer.’ greeted me on the screen.

To cross confirm –
Just run following

$ msmtp --serverinfo --host=smtp.gmail.com --tls=on --port=587 --tls-certcheck=off

In place of old Thwate Server now you get following in issuer segment
Issuer:
Common Name: Google Internet Authority
Organization: Google Inc
Country: US

Fortunately fix is simple, here is what you need to do on debian

# apt-get install ca-certificates
# dpkg -s ca-certificates|grep Version
Version: 20090814

After this just change following line in you ~/.msmtprc

tls_trust_file /certs/Thawte SSLWeb Server Roots/thawte Premium Server CA/Thawte Premium Server CA.pem

to

tls_trust_file /usr/share/ca-certificates/mozilla/Equifax_Secure_CA.crt

Merging hashes in yaml conf files

July 31, 2009 by · 12 Comments
Filed under: technology 

YAML is quite handy for writing configuration files. Primary advantage is that, it reads like text file. This works really well if your config file is flat(no hierarchy) and has no repetitions.
If your configurations file has repetitions then it makes sense to separate out those elements and reuse them. What I mean is this – let’s say you your config file looks like this :

development:
  input_location: common_input
  output_location: dev_location
  mail:
    smtp_server: your_server
    login: your_login
    password: top_secret
production:
  input_location: common_input
  output_location: dev_location
  mail:
    smtp_server: your_server
    login: your_login
    password: top_secret

Assuming above code in /tmp/test.yml here is how you can read in python and ruby
$cat readyml.py

#!/usr/bin/env python
from pprint import pprint as pp
#in debian need to install python-yaml
from yaml import load,load_all,dump
hash= load(open('/tmp/test.yml'))
pp(hash['development'])


$ cat readyml.rb

#!/usr/bin/env ruby
require 'pp'
hash= YAML::load(File.open('/tmp/test.yml').read)
pp hash['development']

here is a handy one liner for ruby version
$ ruby -rpp -e 'pp YAML::load(File.open("/tmp/a.yml"))["development"]' or you can try the same in irb or python console.

Note that in the above code snippet ,everything is other than output location is same in development and production part. This is where yml node identifier comes to rescue. Idea is simple have a set of default values and override them at different place.
You could pull it apart as follows:

defaults: &defaults
  input_location: common_input
  output_location: dev_location
  mail:
    sender_name: sender
    smtp_server: your_server
    login: your_login
    password: top_secret
development:
  <<: *defaults
production:
  <<: *defaults
  output_location: prod_location


$ ruby -rpp -e 'pp YAML::load(File.open("/tmp/a.yml"))["development"]["mail"]["login"]'
"your_login"
$

Great , it works(tm) !.
Arguably we traded some clarity for a bit of magic. Here is a small explanation : & , * and <<:  & which is  anchor tag can be understood as node identifier, * is node reference and <<: stands for hash merge.

For more details see either yaml specs or wikipedia
So far so good but there is a catch here, these hash merges are not recursive. What it means is this : let’s say you want to have different sender name for mail in two environments, you may be tempted to do the following:

defaults: &defaults
  input_location: common_input
  output_location: dev_location
  mail:
    sender_name: sender
    smtp_server: your_server
    login: your_login
    password: top_secret
development:
  <<: *defaults
  mail:
    sender_name: sender_dev
production:
  <<: *defaults
  output_location: prod_location
  mail:
    sender_name: sender_prod

Lets check

$ ruby -rpp -e 'pp YAML::load(File.open("/tmp/a.yml"))["development"]["mail"]["login"]'
nil
$

Oops, something went wrong, problem as mentioned above is that the hash merge is not recursive and while merging it replaced mail of default by mail of production which has only one key. Solution/work around is to unroll one more level:

common_settings: &common_settings
input_location: common_input
output_location: dev_location
mail_defaults: &mail_defaults
  sender_name: sender
  smtp_server: your_server
  login: your_login
  password: top_secret

defaults: &defaults
  <<: *common_settings
  mail:
    <<: *mail_defaults
development:
  <<: *defaults
production:
  <<: *defaults
  mail:
    <<: *mail_defaults
    sender_name: sender_prod

Lets check again

$ ruby -rpp -e 'pp YAML::load(File.open("/tmp/a.yml"))["development"]["mail"]["login"]'
"your_login"
$

Did you say you have one more level of nesting, well you can definitely unroll one more level, but then it becomes a mess. So, if you are not trying to write solution to towers of hanoi in a conf file, it is better to restucture conf file than digging into yaml or something else. But that is your call anyway.

A bit of shell redirection

May 10, 2009 by · 1 Comment
Filed under: technology 

Here is how we normally do shell redirection
$ ./pgm.sh args >out.txt 2>err.txt
I wanted to modify it a bit and run as follows
$ ./pgm.sh args
with the requirement that  output and error should go to some filename computed inside pgm.sh based on args. One illustrative case could be when date is part of args. So you would like stdout to go to say /your/directory/pgm_out_YYYYMMDD.txt 1

The problem with standard way of redirecting N>file.txt i.e, associating file descriptor N to file.txt , is that it works only for the newly forked process and not for the current process.
so
$ echo hi 1>out.txt ; echo hii will send hi to out.txt but will print hii to stdout.2

This is where exec comes to our rescue. If we add  exec 1>somefile.txt then output from rest of the script will go to somefile.txt

$ cat test.sh
#!/usr/bin/env bash
exec 1>out.txt
echo hi
echo hii
$./test.sh will redirect hi as well as hii to out.txt

Similarly to redirect stdout as well as stderr we’ll do something like this
cat test2.sh
exec 1>out.txt
exec 2>err.txt
echo out text
echo 1>&2 err text
somenoneexitent command
ls -ld /tmp

Now coming back to original point of redirecting to some file from inside the shell, let’s say program computed the filename in some variable  OUTFILE, we could have just done exec 1>$OUTFILE

That solves the current problem. But you may like to go through following example which achieves ‘random access’ of file in shell script. Example is from here
echo 1234567890 > File # Write string to "File".
exec 3<> File # Open "File" and assign fd 3 to it.
read -n 4 <&3 # Read only 4 characters.
echo -n . >&3 # Write a decimal point there.
exec 3>&- # Close fd 3.
cat File # ==> 1234.67890

With comments, this code is self explanatory.

1 It can also be done by $ ./pgm.sh args >pgm_out`date +%Y%m%d` but idea is to generate this file name based on some logic in program itself.
2 1 in 1>out.txt is redundant but it clarifies here that we are redirecting fd 1

To scale or not to scale

January 11, 2009 by · 1 Comment
Filed under: technology 

While talking about horizontal partitioning of databases, DHH of ROR fame suggests that scaling stuff can wait, definitely wait until the point your business  needs require it. His article definitely makes sense for small setups say startups. Not to say that in startups you should write demo programs but given that there are only 24 hours in a day you should focus on serving say 1000 users in more fulfilling way than losing your sleep over worrying about how will my application handle load of 13.142 million users.  If you get those many users you will know how to scale. For startups scalability is a good problem to solve but a far better problem to have, I mean any startup would love to run into this problem !

In another related article Jeremy Zawodny writes you should not depend upon Mr. Moore if you have scalability problems.

Since these articles refer to  Moore’s law I can’t help but write that Moore’s law must be one of the most generalized law in Computer Science. From his original prediction about transistor density , this law is now cited anywhere you come across exponential growth.

  • Recent Posts