Sunday, November 15, 2015

I have MOVED my blog to

I have moved my blog site to

Please follow my new posts there. Thank you! 

Blog moved to

Saturday, October 3, 2015

Understanding Vagrant boxes & VMs -- clearing the confusion

Vagrant is an awesome tool for developers to get their own sandboxed environments to play with. To understand more about "Why Vagrant", you could read my earlier blog post: Vagrant: An interesting approach to setup development environments FAST. But, because Vagrant does a lot of things auto-magically under the hoods, most of the times people are left confused when they want to delete / add boxes or VMs. I will try and explain the relationships between Base Boxes, VMs, Virtual Box Instances, etc in this post.

Note: The term box is loosely used by many people, and hence the confusion has risen even further.

There are 3 different entities involved in running a Vagrant box
  1. Base Boxes: The .box file is a packaged vagrant box for distribution. You can get boxes from or people may provide boxes via FTP, Dropbox, or via any other file distribution system on the internet. The base box file path is mentioned in the Vagrantfile using the variable config.vm.box_url. This may point to a local file on your machine, or a URL from where vagrant should download the box. The base box is your starting point (FIRST TIME). The base box may be a fresh OS, or may be an OS with certain softwares pre-installed; depending on what the person, distributing the box, wanted to package into it. Read more here: Base Boxes
  2. Installed Boxes: This is the list of Vagrant boxes installed on your machine. You can see this list by executing the command vagrant box list. You can think of them as vagrant templates available to you for creating machines to work with. These boxes can only be created from base boxes. Usually, you would download a base box, then install it using the vagrant box add command, and from then on you don't need the base box, and you can freely delete the base box (to free space on your hard disk). Any new machine you create, only needs one of the installed boxes to create itself from it. The installed boxes are usually placed by Vagrant in the /.vagrant.d/boxes folder. Any project on your machine, can use these boxes as base templates to spin off machines. Note: When you fire a vagrant box remove command, you are in effect removing the box template from your machine. After that you can only add it back using a base box (.box file). 
  3. Machines: When you fire the vagrant up command, you are asking for a VM to be started (based on the Vagrantfile specification in the current directory). This is the Virtual Box Machine Instance that you are spinning off to work with. When you fire the vagrant halt  command, you are shutting down the current VM (based on the Vagrantfile specification in the current directory). You can also confirm this if you launch your Virtual Box GUI -- it will show you a running machine when you say vagrant up, and a shut down machine when you say vagrant halt. These machines are usually stored in a hidden .vagrant folder in the same directory from where you fired the vagrant up command. Note: When you perform a vagrant destroy, you are destroying the VM (and not the template box from which you created this machine). Usually this VM is called "default".

The confusion actually occurs because of the way vagrant up command behaves. When you do a vagrant up, Vagrant will bring up the VM which was halted earlier. But, if the VM was destroyed (using vagrant destroy), or was never created in the first place, then Vagrant will create the VM from the installed box template. Hence, when you do vagrant destroy, followed by vagrant up, you are in fact creating a new VM from the installed template boxes. Therefore, its common for people to only do a vagrant halt (or suspend) so that they can start off from where they left. You will choose to destroy only if you don't care about any changes in the VM, and want to free the space occupied by the VM on your disk. 

In development mode, since most code changes are done on your local file-system, and developers use Vagrant just to "run" the code, it is OK if they destroy the VM, at the end of your work day. The next time you perform a vagrant up, it will recreate the VM,  provision it, and run with the latest code (think of them like Phoenix boxes).

Note that, the vagrant up command will also download the base box and install it as a box template if it detects that this is the first time this box is being brought up. After this, you can choose to delete the downloaded .box file (or store it on some long-term disk store), since its not being used in your vagrant workflow. Vagrant required the .box file only to install and create a box template in your machine. All further VM creations will use the installed boxes.

Hopefully this explanation clears up any doubts you have about the relationship between .box files, vagrant boxes,  Virtual Box Machines, etc. 

Sunday, August 16, 2015

Installing ThoughtWorks Go Server & Agent on a Digital Ocean droplet

Today I experimented with installing ThoughtWorks Go CD Server and agent on a Digital Ocean droplet. Here are the steps (pretty simple!):

Step 1: Create a CentOS 6.7x64 droplet. Give it some name, e.g: "centos-droplet". Ensure that you choose a minimum of 1GB RAM, and enable Private Networking option while creating the droplet.

Step 2: Login as root to your new droplet.
ssh root@
Step 3: Add host entry for your droplet name. For instance, if your droplet is called "centos-droplet":
vi /etc/hosts
centos-droplet     localhost   
Step 4: Install Open JDK7 (you can also install just the JRE)
sudo yum install java-1.7.0-openjdk-devel
Step 5: Ensure that java is installed
java -version
Step 6: Now register ThoughtWorks Go YUM repo by executing the following command:
echo "
name            = GoCD YUM Repository
baseurl         =
enabled         = 1
gpgcheck        = 0
" > /etc/yum.repos.d/thoughtworks-go.repo
Step 7: Install the Go Server
yum install -y go-server

Step 8: Check if Go Server service is running. If not, start the service. 
service go-server status 
service go-server start 
Step 9:  The server should now be running on port 8153. You should be able to open a browser on your laptop, and connect to the remote Digital Ocean droplet Go server UI on http://ip-of-your-droplet:8153/go

Step 10: Install the Go Agent (on the same droplet)
yum install -y go-agent
Step 11: Check if Go Agent service is running. If not, start the service. 
service go-agent status 
service go-agent start 
On the droplet you can run 'ps -ef | grep java' to see the server, and agent processes.

At this point, both your Go server and agent should be up and running. If you go to the "Agents" tab in your Go Server UI, you should be able to see this agent registered (with the same name as your droplet machine name).

That's it. You should be good to Go! If there are any issues, you can check server logs at /var/log/go-server and agent logs at /var/log/go-agent folder.

Related Links


Saturday, February 14, 2015

i18n: Building Applications for Multiple Countries and Languages

Nowadays, many softwares and websites are i18n enabled. If you don't yet understand what is meant by Internationalisation, Globalisation and Localisation in software, then this Wikipedia link is your friend.

I have worked on a couple of projects where we had to make the application multi-country savvy, and I think it would be good to pen down various areas we considered, and the tools we used. Note that, these practices are from the perspective of a Java/Scala project, but can easily be adapted to other platforms too -- since the basic requirements are similar.

1) External Files for String: Maintain all user visible strings in properties files (not log messages though!). We maintain the English version (, and the translation teams maintain translations for other languages (, etc). Read about locales here

2) No Automated Translations: Having automated translations does not really work in practice, because the "context" is important, which can't be figured out by just looking at the string. For instance, on one of our projects we had provided a feature in our website (internal version), which allowed translators to hit a url with a key name so that they can see the page where the key will be used -- and get a context of what we are looking at, and how their translation will look on screen.  Automated translations usually target tourists wanting to communicate or researchers wanting to understand a particular sentence, and hence are devoid of the subtleties you want in a website / application. Poor translation is also a put-off for the users of the application. Therefore, I'd recommend staying away from Automated Translation tools / websites. 

3)  Simple Scripted Tests for Errors: On one project we wrote a tool for performing basic checks for received translation files against our reference properties file. These error are usually not easy to catch by "humans", and can slip through. For instance: 

  1. Check if a key has not been translated
  2. Check if a key is missing in a file
  3. Check if a key is "extra" in a file
  4. Check for UTF-8 encoding issues
  5. Check if a key is duplicate 
  6. Check if a key doesn't have a value

These checks are very easy to code via a script, and can be run by the translation teams themselves. This concept is called "Poka Yoke".

4) Grammatical considerations: Also, many UI messages cannot be concatenated, because grammatically nouns/adjectives/verbs come in different positions in different languages. So a message: "Your owe Rs 100" has to be externalized in YAML/properties file as "You owe {0}" (where {0} will be replaced by price+currency), and this context needs to be informed to the translators.

5) Singular/Plural: Also singular / plural word construction needs to be different in different languages. So messages change based on whether you are displaying "1" or "more". 

6) Numbers/Dates/UoM: Besides UI strings, we also need to look out for currencies, dates, time-zones, units-of-measures, etc since they vary by country, and hence need to use formatters appropriately. Same goes for Numbers, where decimal separators are different by country (dot in some, and comma in others). For instance, use "Number" fields to store numbers in DB / Models, and use UTC to store Date/Time values, and then format them appropriately on screen depending on the locale.

7) Case-insensitive checks: Case-insensitive code also needs to be written properly, because for instance in Turkish there are 4 'i', and your code may not do the correct comparison for case-insensitive strings, if it doesn't have a "locale". (Why languages fail with the Turkish I). For this reason, in Java/Scala, the toUpperCase API takes a "locale" as a parameter (and mentions the Turkish I case in documentation).

8) UTF-8: Of course, you use Unicode character set and UTF-8 encoding everywhere. For more details read: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). For Java/Scala programs on Windows, don't forget to set JAVA_OPTS="-Dfile.encoding=UTF-8". 

These are all the points I could think of right now. If more areas strike me, I will post updates to this blog. Meanwhile, if you have practical tips to share in this regard, I'd love to hear. 

Sunday, January 25, 2015

(Comic) Git Push Keys

Link to this comic on Pixton

A very interesting blog related to a real life incident where a developer accidentally pushed his Amazon EC2 keys onto github: My $2375 Amazon EC2 Mistake

[Update: 1/Feb/2015]
GITRob: A ruby tool to scan Github Projects for Sensitive Information: