OpenID Delegation: Why and How

The great promise of the OpenID specification is that it can simplify identity management on the ‘net. At its best, OpenID provides three great features:

Unified Identity

A single account (identity) with which you can log in to many sites, removing the need to create and remember a separate username/login for every web site you interact with.

Openness

A decentralized authentication system with multiple providers. This means that you can choose a provider (or even a few) from the many options available to vouch for your identity, and switch providers if you find a better one. Or you can even be your own provider.

Delegation

I think Delegation is the most attractive feature of OpenID because it means your own web site can act as your identity, while delegating the authentication process to one (or more) OpenID providers.

In short, with delegation you can log in to sites using a URL you own like jamesmurty.com, while taking advantage of the strong authentication options offered by providers such as Verisign’s PIP. Although my Verisign PIP identity happens to be jmurty.pip.verisignlabs.com, I can use my own web site as an alias for this provider-specific identity.

By decoupling your identity from your OpenID provider you can take advantage of the fact there are many providers and easily switch providers later on without losing your identity, and without having to update your associated OpenID identity on every web site. After all, if you had to do that you might as well have created your own username/password on every site in the first place.

But…

Unfortunately, the complexity of OpenID and the challenge ordinary people can have getting it to work properly is preventing widespread adoption of the system in general, and of the Openness and Delegation features in particular. Although big players like Google and Yahoo are supporting (parts of) the specification, they are understandably encouraging people to adopt their branded OpenID identities rather than extolling the advantages of controlling your own identity.

After all, every web company would love to take on the “burden” of managing your unified web identity. It’s the ultimate in vendor lock-in.

Setup OpenID delegation for your web site

If you have your own web site or blog and are able to edit the HTML pages directly, you can set up delegation by adding special link tags to the head section of one of your site’s pages. You will most likely want to do this on the site’s home page so you can use a short URL like jamesmurty.com instead of jamesmurty.com/my-openid-page.html.

Below are the link tags I use on my site to delegate to my jmurty Verisign PIP identity. You will need to use your own provider-specific identity URL in your links, and the format could vary quite a lot depending on the OpenID provider you choose so check your provider’s documentation. Also, I’m not sure that all OpenID providers actually support delegation, so you should research this before you sign up with a provider.

<link rel="openid.server"
      href="http://pip.verisignlabs.com/server" />
<link rel="openid.delegate"
      href="http://jmurty.pip.verisignlabs.com/" />
<link rel="openid2.provider"
      href="http://pip.verisignlabs.com/server" />
<link rel="openid2.local_id"
      href="http://jmurty.pip.verisignlabs.com/" />

It is important that these link tags be included inside a valid HTML head section in your web page, or many web sites will be unable to find your delegate settings.

More Complexity, aka Taming Blogger.com

You may have noticed that the OpenID information is provided twice, once for the original OpenID specification (openid.* tags) and again for version 2 of the spec (openid2.* tags).

I don’t know why the second lot of settings is necessary, since presumably the spec is supposed to be backwards-compatible, but I have found that some sites won’t work properly unless the version 2 settings are provided.

One example of version incompatibility quirks is Google’s Blogger.com, which allows you to comment on blog posts after logging in with an OpenID. Prior to adding the openid2.* tags I found that although Blogger would allow me to authenticate and post comments, it would replace my delegating identity jamesmurty.com with the delegated version jmurty.pip.verisignlabs.com. This meant that the delegation was essentially useless, since anyone clicking on the nickname for my comment would end up at an empty Verisign PIP landing page instead of my own site.

I’m not sure if this is Google’s fault, or a fault in the OpenID spec. Either way it was annoying having to track down and fixing this issue. It just serves as yet another example where OpenID is not quite living up to the promise of simplifying identity management.

Posted in OpenID, Tips | Leave a comment

XMLBuilder Version 0.3: XPath, Parsing and Maven Goodies

I have updated my small java-xmlbuilder project with some nice new features.

First, here’s a reminder of what this project does:

XML Builder is a utility that creates simple XML documents using relatively sparse Java code.

It is intended to allow for quick and painless creation of XML documents where you might otherwise be tempted to use concatenated strings, and where you would rather not face the tedium and verbosity of coding with JAXP.

The new features include:

  • Parse existing documents into an XMLBuilder object, so you can now easily add nodes to pre-existing documents.
  • Use XPath queries to locate a specific element in your document. This is especially useful if you have parsed a document and you need to add new nodes at different locations in the DOM. Type in your XPath query and you can now jump directly to the right place.
  • The project now has a Maven-friendly structure, complete with a repository from which you can obtain the Jar file — see the Downloads section on the project page for instructions. This great leap forwards is thanks to Dan Brown’s instructions for using Wagon to deploy Maven artifacts to Google’s SVN.
  • JUnit tests are now public in the repository, to help keep me honest.

People familiar with the project may notice that I have changed the version numbering scheme. The latest version is 0.3, not 3. I think the “0.” prefix better indicates the maturity of this tool.

Posted in Coding, Java | Leave a comment

Real-world cloud computing

An interesting post with some drawbacks of cloud computing and EC2, from those in the trenches: Real-world cloud computing.

There are some real gems here, such as:

  • [They all] used Amazon services, and most if not all of them seemed to use RightScale to manage them.
  • Cost: cloud is more expensive than real machines. Cloud is good for elastic computing, not for high constant demand.
  • You need monitoring services external to your cloud!
Posted in AWS, Cloud Computing | 1 Comment

IPython with Python version 2.6 on OS X Leopard

I recently installed the excellent IPython program, a beefed-up Python console that provides a raft of extra features over the default interpreter and makes it even more of a pleasure to work with this language.

When you install IPython on Mac OS X Leopard using the standard method, it only installs against the system’s default version of Python: 2.5.1. However, since I had previously installed Python version 2.6.1 on my system I wanted IPython to work with this newer release.

It was surprisingly difficult to find out how to achieve this, so in case anyone else wishes to do the same here’s the process that worked for me. Download the IPython tarball from the distributions directory (e.g. ipython-0.9.1.tar.gz), extract the archive, change into the extracted directory and run:

sudo python2.6 setup.py install

Notice that the command explicitly invokes the 2.6 version of python with the python2.6 alias: this simple step is enough to properly link your IPython installation with the newer Python. It is obvious in hindsight that this would work, but I wasted enough time pointlessly messing with environment variables and paths that I thought it was worth a blog post.

Don’t try this with the bleeding-edge Python 3K because IPython is not yet compatible with this version, but it seems to work fine with 2.6.1.

Posted in Python, Tips | 7 Comments

Big data + Little pipe? Try S3 Ingestion

A major barrier to moving your data to an online storage location like Amazon’s S3 can be the time it takes to push large numbers of bytes through your upstream Internet connection. While your connection may be fast enough to keep your data fresh and in-sync from day to day, it can be painful to do the initial data load if you have huge files, very many smaller files, a slow connection, or some combination of these factors.

I feel your pain. For the longest time I risked losing all my precious music files because I didn’t want to flood my home Internet bandwidth for the four whole days it would take to upload them all.

Amazon is aiming to address this issue with the new AWS Import/Export service, currently in limited beta in the United States.
Continue reading

Posted in AWS, Cloud Computing | Leave a comment

New EC2 Services: Monitor, Scale and Load Balance Your Instances

Amazon recently released three major new features for their Elastic Compute Cloud (EC2) service — New Features for Amazon EC2: Elastic Load Balancing, Auto Scaling, and Amazon CloudWatch. These beta services are immediately available to anyone with an EC2 account and server instances located in the US (sorry EU folks, they are US-only for now).

Amazon CloudWatch is a monitoring service that records resource and performance metrics for any EC2 instances you associate with the service, at a cost of 1.5¢ per monitored instance per hour. In addition to providing up to 2 weeks of monitoring data to EC2 users, this service also underpins the other two new EC2 services.

Auto Scaling is a service that will automatically start or stop EC2 instances on your behalf based on conditions you specify. In other words, this service allows you to automatically scale the computing power available to your application in response to changing demand.

You control your instance pool by defining triggers that react to defined conditions such as CPU load, response latency, and the number of healthy/unhealthy instances. Auto Scaling relies on CloudWatch to supply the metrics it needs to make scaling decisions, so every instance managed or started by the scaling service must be registered with CloudWatch. Happily, there is no additional cost for using Auto Scaling beyond the CloudWatch fees.

Elastic Load Balancing (ELB) rounds out the new services by providing the ability to distribute network traffic between multiple EC2 instances. ELB routes traffic at the HTTP or TCP level to instances within or across Availability Zones, and avoids routing traffic to instances that have become unresponsive. The fee for ELB is 2.5¢ per hour for each load balancer, plus 0.8¢ per Gigabyte of data transferred through the service. You will also need to pay the CloudWatch fee for each load-balanced instance.

These features constitute a major step forward in EC2 functionality that will make it easier for many users to run applications reliably in the cloud without the need to implement their own management services. However, it is important to recognize that the services are only a first step and there are many situations where they will not provide the control, precision or cost-effectiveness you will need.

Some gotchas for the services in their current incarnation include:

  • CloudWatch metrics are limited to the instance/machine level and do not provide information about individual applications. Also, some metrics such as response latency and instance health are only available when CloudWatch is combined with the Elastic Load Balancing service.
  • Auto Scaling does not seem to be able to terminate instances that are identified as unhealthy. It will compensate for unresponsive instances by starting others, but will not put the original instance out of its misery.
  • The Elastic Load Balancing service can only balance CNAME domains like www.acme.com, not top-level ones like acme.com. It also seems to limit the range of sub-1024 ports that can be balanced to 80 and 443, and does not perform some advanced load balancing functions like HTTP session affinity management or HTTPS termination (HTTPS connections are supported, but only at the TCP level).
  • You will need to work with command-line tools or use the APIs directly, there are not yet any graphical tools available.

As RightScale’s Thorsten von Eicken points out in his discussion of the new services, there is still room in Amazon’s ecosystem for third-party companies to offer value-adding services that improve on the underlying provider’s offering in terms of functionality, flexibility, price and ease of use. As Amazon extends the capabilities of EC2 these companies will need to work harder to add value. This situation may be tough for them, but the fierce competition will ultimately benefit customers and accelerate the adoption of cloud services in general.

To help you get started with the new services there is a post in the EC2 forums that succinctly lists the documentation and resources you will need.

Posted in AWS, Cloud Computing | Leave a comment