[Mulgara-dev] build test

Andrae Muys andrae at netymon.com
Wed Feb 14 20:48:14 UTC 2007


On 15/02/2007, at 5:29 AM, David Moll wrote:

>>>  What is the server-side exception/error causing the 500 to be
> returned?
>
> I'm attaching the entire stack trace from
> /mulgara-1.0.0/dist/mulgara-output
> Although it appears that the problem is:
>
> Caused by: (QueryException)
> com.hp.hpl.jena.rdf.arp.MalformedURIException: Host is not a well  
> formed
> address!
>
> Because the name of the machine does not make a valid URI?
>
... and that is because there was a time when it wasn't valid for the  
first character in a domainlabel to be a numeric - and jena has  
decided to enforce that archaic rule long after it was rescinded.

The specific problem is in isWellFormedAddress() in URI.java (yes  
jena does do its own URI parsing for some bizarre reason),  
specifically line 1241 of Jena2.1 (the old version we are using), and  
which you will find unchanged at line 1234 in the current cvs HEAD.

http://jena.cvs.sourceforge.net/jena/jena/src/com/hp/hpl/jena/rdf/arp/ 
URI.java?view=markup

  1234     // rightmost domain label starting with digit indicates IP  
address
  1235     // since top level domain label can only start with an alpha
  1236     // see RFC 2396 Section 3.2.2
  1237     int index = address.lastIndexOf('.');
  1238     if (address.endsWith(".")) {
  1239       index = address.substring(0, index).lastIndexOf('.');
  1240     }
  1241
  1242     if (index+1 < addrLength && isDigit(p_address.charAt(index 
+1))) {
  1243       char testChar;
  1244       int numDots = 0;
  1245
  1246       // make sure that 1) we see only digits and dot  
separators, 2) that
  1247       // any dot separator is preceded and followed by a digit  
and
  1248       // 3) that we find 3 dots
  1249       for (int i = 0; i < addrLength; i++) {
  1250         testChar = address.charAt(i);
  1251         if (testChar == '.') {
  1252           if (!isDigit(address.charAt(i-1)) ||
  1253               (i+1 < addrLength && !isDigit(address.charAt(i 
+1)))) {
  1254             return false;
  1255           }
  1256           numDots++;
  1257         }
  1258         else if (!isDigit(testChar)) {
  1259           return false;
  1260         }
  1261       }
  1262       if (numDots != 3) {
  1263         return false;
  1264       }
  1265     }

Specifically the test that is going to be failing is the isDigit at  
line 1258, but the real problem is that the if-statement is there at  
all.

Tthis restriction was imposed in RFC-952, was relaxed in RFC-1123  
back in 1989, and is explicit in the grammar in section 3.2.2 of  
RFC-2396 being quoted in the comment.  Specifically:

    The host is a domain name of a network host, or its IPv4 address  
as a
    set of four decimal digit groups separated by ".".  Literal IPv6
    addresses are not supported.

       hostport      = host [ ":" port ]
       host          = hostname | IPv4address
       hostname      = *( domainlabel "." ) toplabel [ "." ]
       domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
       toplabel      = alpha | alpha *( alphanum | "-" ) alphanum
       IPv4address   = 1*digit "." 1*digit "." 1*digit "." 1*digit
       port          = *digit

    Hostnames take the form described in Section 3 of [RFC1034] and
    Section 2.1 of [RFC1123]: a sequence of domain labels separated by
    ".", each domain label starting and ending with an alphanumeric
    character and possibly also containing "-" characters.

I think the confusion might come from the next sentence in the rfc:

   The rightmost
    domain label of a fully qualified domain name will never start  
with a
    digit, thus syntactically distinguishing domain names from IPv4
    addresses, and may be followed by a single "." if it is necessary to
    distinguish between the complete domain name and any local domain.

Where the code is testing the *leftmost* domain label, which is  
permitted to start with a digit since RFC-1123.

So Jena is incorrectly rejecting your hostname, if you can change  
your hostname to fit Jena's preconceptions then that would work  
around the problem - but this is going to bite us eventually  
somewhere else so we need to find a more permanent solution.  This  
might just be the impetus we need to finally remove our dependency on  
Jena.

Andrae

-- 
Andrae Muys
andrae at netymon.com
Principal Mulgara Consultant
Netymon Pty Ltd





More information about the Mulgara-dev mailing list