Even a chimp can write code

Friday, May 28, 2004

Open Sores of Java

That right, I said "sores", for that's what they amount to now. In the recent past, there has been a disturbing trend of people in the open source community and elsewhere clamoring for making Java open source. So I figured it was time to stand up and speak my mind on this, by calling the stupid idea for what it is: a 'stupid idea'.

So we have been told, "Sun must open-source Java!" but exactly what does that mean? Must Sun open-source the Java language or the virtual machine or the blend of coffee served in their cafeterias? The source code for the Java language and runtime is freely available. And you don’t have to pay Sun to get the runtime or the developer kit. Nor do you have to pay to write programs for or run them on the Java platform. The way I see it, Java is both free as in beer and free as in speech. But of course, you and I can’t just redistribute our own implementation of the platform or a modified version of it, without first passing a compatibility test. And surely you open source proponents do not want to propagate the idea of distributing software that hasn’t been duly tested. It says a lot about the technology (and needless to say Sun) when alternative implementations are allowed to exist. The GNU Classpath, although an incomplete open source implementation of the Java foundation classes, does exist. Open source virtual machine implementations like GCJ and Kaffe also exist. The rationale for their existence has not been called into question. The rest of the Java community [that’s us] does not interfere with their function. To me, that’s free.

And what exactly is 'open source' in verb form? Is it an offer to get involved in refining the language and the platform? Is it an offer to help take it to new frontiers, as yet unaddressed? In that case, the Java Community Process is open for everyone. A lot of substantial contributions to the language and the platform have come from this Process. A lot of retarded ideas have also been introduced through this forum. But that just highlights the positive side of the JCP. All your goofball ideas will be entertained here.

Should the Java license terms then be changed? If so, to what? GPL? Unlike with GPL, the Java license does not force you to apply the same license restriction to your own source code. This is what corporations like about Java. Like it or not [I personally like it], we live in a capitalist system where profit is a big motivator. Some of us have livelihoods to worry about and families to feed. The Java license works perfectly for us. Besides, I have yet to see one developer or interview candidate refuse to work on my project on the grounds that the language and platform’s license terms violate his/her personal convictions. All licenses have an agenda. Be it BSD, LGPL, Mozilla, CPL, Apache or the aforementioned GPL. To paraphrase James Gosling, unlike some of these others, the Java license’s agenda is not a hidden one.

It has been said that if Java were not made open source, its popularity among developers and corporations would diminish. Eric Raymond has written about Java: "Sun's insistence on continuing tight control of the Java code has damaged Sun's long-term interests by throttling acceptance of the language in the open-source community, ceding the field (and probably the future) to scripting-language competitors like Python and Perl." Before you re-read that statement with incredulity, let me assure you that you got it right the first time. I don’t know any other person who has a life outside of Slashdot who can say that without laughing out loud. For the record, Java is not a scripting language and Java developers are not losing any sleep over being tied to a 'dying' technology. So to determine its popularity for myself, I went to that most democratic of all institutions on the Internet: Google. The Google directory for Computers > Programming > Languages lists a number of familiar and unfamiliar languages. There are 290 entries for Basic. About the same for C. 898 for C++, 152 for C#, 1014 for Perl, 1189 for PHP and a modest 3307 for Java. But then, what do they know huh?

It is infuriating to post counter-arguments to statements made by people who have had absolutely no hand in furthering the cause of Java. Some of these people have never even programmed in the language. And then there are people from within the Java community who have similar points to make. At least these I can understand, although cannot rationalize.

Disclosure: I do not work for Sun Microsystems. Never have. Don’t anticipate I ever will. I have felt the need to make this argument for Java solely out of the love I have for the language and the platform. Some of the smartest people I have ever known are Java programmers. Most of the nicest and funniest individuals too. This is for all of them.

Update: Here is one link that will provide a clearer background on the issue. This posting on Slashdot cites an article by Javalobby.org's Rick Ross on the Sun-Microsoft settlement and another titled "Free but shackled: The Java trap" by the much venerated Richard Stallman. While one delicately cloaks the "open source Java" argument, another says it outright. Notice also the rubbish that passes off as reader feedback and comment. This is but one of the many websites that have addressed this topic over time.

Email this | Bookmark this

Tuesday, May 25, 2004

A new kind of music

Stephen Wolfram’s "A New Kind of Science" beings cellular automata to the forefront of science. Anyone who’s read the book knows it carries some weight [both literally and figuratively]. In the 1197-page thesis, Wolfram asserts that cellular automata operations govern the world as we know it. Step aside McNealy, it isn't the network! Wolfram states that the Universe is one large cellular automaton computer! Through a seemingly unending set of diagrams and evolution states, he comes to a simple conclusion: that there exists a simple computational rule that generates all existence. I must admit here, I knew some general concepts of cellular automata before I read the book [multiple evolutions away from being an authority], but I have never quite seen things this way. The book can uproot existing beliefs and introduce new ones in their place.

Simply put, here’s what he does: starting with a simple initial state (a black or a white cell in a grid of cells), and repetitively applying a simple rule, one would expect a repetitive and deterministic pattern. Amazing though, the results are apparently random. But being random doesn’t make it interesting. What does is that there are discernable features that evolve in the design over successive iterations. Almost like the appearance of a semblance of order and intelligence amid chaos. His discovery is that simple programs can produce great complexity. This goes against the grain of conventional thought, as he puts it, "Whenever a phenomenon is encountered that seems complex it is taken almost for granted that the phenomenon must be the result of some underlying mechanism that is itself complex".

Using cellular automata algorithms, the open-source jMusic framework and a whole lot of ingenuity, Paul Reiners has created Automatous Monk, a Java program that generates melodies from cellular automata evolutions. The Automatous Monk project is hosted on Sourceforge. The project website lists several works that I have readily come to appreciate. My personal favorite is the oldie but goodie Tall Girl from the Mountains using Rule 208, performed by the Mad String Quartet. It may be an acquired taste, but like a tropical fruit, it stays with you. A new kind of music, no?

Email this | Bookmark this

Saturday, May 22, 2004

More on secure code

Using industry standard security tools is not enough to provide a reliable and hardened security cover to an application. There must be security features in the length and breadth of code if you want to ensure that both you and your end users enjoy the benefits of a more reliable product user experience. An important factor is the architectural and design mindset: no system is completely secure. Applications are always going to have vulnerabilities. As someone once said, this is possibly the only golden rule in information technology. The key is therefore to incorporate security guidelines in the architecture, design, the code and even in testing procedure. There is little to gain from providing a security solution that does not address the threats specific to an application. But don't let the threat modelling exercise be a bureaucratic hurdle. There are numerous things we can do in code to mitigate most common risks.

I will try and visit some points on securing web applications in this weblog entry. To the security experts, my notes here will seem amateurish and so I'd recommend you amuse youself by clicking on one of the links on the sidebar or Google your name. To the rest of the uninitiated or partially initiated, this log will hopefully provide some value.

A common mistake that web application developers and vendors alike make is that they trust their users to provide valid, non-malicious data. This is more often than not at the root of most problems compromising web applications. Howard and LeBlanc in "Writing Secure Code" suggest remembering the two golden rules of user input:

  1. All input is bad until proven otherwise

  2. Data must be validated as it crosses the boundary between untrusted and trusted environments

It is best to structure validation logic such that it determines only what is valid data and discards all other input. This is the key in structuring you if-else clause. One mistake would be to leave room for unbounded and unchecked data from form submissions. I have seen a lot of this on my projects, where for instance, developers forget to provide size and maxlength attribute values to Struts HTML input tags. Remembering the rules above can steer you away from the mistake of allowing direct user input data in your SQL statements. An attacker can provide form input likexyz' or '1' = '1. If your code allows untrusted input without validity checks, and moreoever propogates it all the way into your data access objects, then you may be in for trouble. Here's a sample authentication query in a fictitious DAO that would typically get you in:


Say, your logic here is structured to permit access only if the returned value is non-zero. Now, if the malicious input values were substituted, the query looks like this:

WHERE USER_NAME = 'xyz' or '1' = '1' AND PASSWORD = 'xyz' or '1' = '1'

The attacker is authenticated without knowing either the username or their password. There are variants to this as one can no doubt deduce. The vulnerability shown above is partly due to the fact that we have SQL statements in our code. Use prepared statements or parameterized queries. Java [that wonderful language!] provides you with all the means to achieve this. Does your code have safeguards against this sort of thing? Maybe so. Do you trust that the code on your favorite online store has similar safeguards? You'd sure hope so!

Yet another vulnerability stems from allowing HTML constructs as valid form inputs. Attackers can enter a wide expanse of inputs to detect systems secrets.

<body onload="javascript:alert(document.cookie)"></body>

If you really do want to allow a small subset of HTML tags as part of form input data, then use regular expression -based input checks.

There are some other tips to remember:

  • Don't use security logic that in some way relies on hidden form fields, data in cookies, weak session ids and the HTTP referer header

  • Don't use hidden fields to store sensitive or near-sensitive information

  • Don't use BASIC authentication outside of a PetStore kind of demo application and sometimes not even there

  • Don't store any secrets in your system! If you simply must store something, make sure it is encrypted or hashed. Encryption can drag down performance but that is okay. At least your system will still be more reliable than it would be if it were hacked.

  • Don't mistake Base64 encoding with encryption. Believe me, a lot of people do that.

  • If you implement password hashing, for example using SHA-1, further harden it by using a salt. The salt is a random sequence of characters and digits used to season a password before hashing it. The salt itself need not be a secret although it is important that it be unique for every user.

  • Run applications with the least privileges they need to function, and no more. There is no reason to run a database or application server process as local administrator or worse still, with system privileges. If your vendor says their product can only run as system or elevated privileges, consider a different product/vendor.

  • Revisit you page caching policy. Don't cache if don't have to.

  • Always have a security policy if you don't already have one. As SANS describes it "Policies describe purpose, scope, policy statements, actions and responsibilities. Security policies must be written to reduce the effort required in maintaining them, yet be clear in the objectives, boundaries and procedures required in enforcing them." Make sure your product has a password policy, a logon banner, notification policy and session timeout policy

  • Always analyze your system for threats. Use a threat model and develop a threat mitigation strategy. I have recently used the STRIDE model and have found it useful.

  • Be wary of the 10 software myths highlighted by Cigital

  • Be afraid. Be very afraid.

My description above in no way summarizes all there is to know or even does justice to the topic. "Writing Secure Code" by Howard and LeBlanc is a book I strongly recommend to all developers. I have found many articles on SecurityFocus.com to be very useful, especially this one by K.N. Mookhey is a must-read. You may also download OWASP's WebGoat found here. Learning about software vulnerabilities by actually exploiting them, can be a great way to grasp the subject.

There's more to say on this in future entries.

Email this | Bookmark this

Friday, May 21, 2004

Quickly access javadocs!

The http://javadocs.org/ site is now online.

It provides you the ability to search through Javadocs. You can additionally type http://javadocs.org/String to lookup the java.lang.String class or rather type http://javadocs.org/java.lang.String. You can also type in package names like http://javadocs.org/java.util. A neat little utility for all those who have a life outside of an IDE!

Email this | Bookmark this

Thursday, May 20, 2004

Documentation, where art thou?

I was in a Software Requirements Spec review meeting earlier today. The latest draft of the document was mailed to attendees about 15 minutes prior to the meeting. That peeved some people [me!]. Rightly so. And then there was the content of the document itself. I have heard just about every excuse there is to justify bad documentation or even the complete lack of documentation. Oh believe me, I have!

And then there are those in the eXtreme Programming crowd that claim that the "code is the design". Oh, is it? When one of you guys comes to work on my project, I'll be happy to give you machine code. Try and figure the system out then! The reasoning given to us is that while the code always depicts the current design, design documents always have to play catch-up and often fail. But that just imposes a flawed premise: that a design document has to rationalize every line of code and every shred of logic used. It does not! Some of us have indeed perfected the art of concise documentation. It is meant to be on-point and to demonstrate to readers accurate and adequate information. Nowhere do I make a claim of complete information.

I feel most developers shy away from writing documentation because they do not have the requisite communication skills. As Mr Ed puts it: "Documentation requires expression in natural language, and a disturbing number of developers have approximately the same facility with the written word as a high school junior". He is being kind. Most developers cannot communicate their way out of a wet brown paper bag. [I know I heard or read that one somewhere. Just can't remember where. An attribution is due]

Email this | Bookmark this

Wednesday, May 19, 2004

Writing Secure Code

I was in WA last week and met up with my buddy Akhil who works for Microsoft. It was great to drop down you guard and be nerds once again. Due to the nature of the product I am working to create here in RI, security in code interests me a great deal. Akhil told me about how folks in Microsoft are religiously [my words, not his] looking at their codebase, fixing a lot of potential security loopholes. Then as I was leaving, he gave me a copy of the book "Writing Secure Code" by Michael Howard and David LeBlanc. "It's a useful book", said Akhil, in a classic understatement that doesn't come naturally to my friend. I am half way through it now and can't put it down.

I have always believed this myself, but never put in print before: the common mistake we as developers make is to architect, design, develop, test and deliver code that that has no palpable security features. We add security technologies into our applications more to be "buzzword compliant" than to truly make it secure. We have disdain for the good folks from the Information Security group. They are seen as academic; what do they know about the real world of coding and deadlines and not to mention changing requirements? [Nobody loves me!] Security is usually an afterthought, added into the code several iterations down the line [when there's time for these pesky things]. Does this stem from naivete? I think not. Security is seen to bear little financial benefits with customers. It gets in the way! Too much time, money and resources are needed for features that are for all practical purposes "invisible". Besides, how many developers have security skills in their resumes? These are the reasons I have frequently heard given.

Expensive security tools and gizmos can provide one with a false sense of comfort but sadly they do not suffice. Incorrect handling of arrays, buffers and memory copying logic have been liberally exploited by virus writers and hackers alike. Howard and LeBlanc demonstrate with examples, how black-hat hackers attack vulnerabilities in code. They provide tips on how code can be secured as well how to convince decision-makers about the importance of security. Some of their examples made me go back and look at code I'd written to verify I wasn't making the same mistakes. There's a lot more to be written on this. There's even more to be learnt.

Email this | Bookmark this

Tuesday, May 04, 2004

Modular and Loosely-Coupled Systems

Bertrand Meyer, the father of the Eiffel method and language, introduced the five main characteristics of modular systems. Known as Meyer's criteria for evaluating modularity, these are things every developer must know. They are as follows:

  • Decomposability: decompose a problem into smaller sub-problems that can be solved separately. An example is Top-Down design

  • Composability: Freely combine modules to produce new systems. For e.g., Math libraries, Unix command & pipes

  • Understandability: Individual modules must be understandable by a human reader. A counter-example of this is Sequential Dependencies

  • Continuity: A small change in specification must result in changes in only a few modules and must not affect the architecture. In relation to this I recommended reading Bill DeHora's article titled "Foundations for component and service models".

  • Protection: Effects of an abnormal run-time condition must be confined to a few modules. For e.g. Validating input arguments at source.

In his Manageability web log, Carlos Perez extolls the virtues of loosely-coupled systems using an excellent compare-and-contrast method. Do take a good look at the tabulated differences.

Email this | Bookmark this