[Mulgara-dev] build test
Andrae Muys
andrae at netymon.com
Wed Feb 14 20:48:14 UTC 2007
On 15/02/2007, at 5:29 AM, David Moll wrote:
>>> What is the server-side exception/error causing the 500 to be
> returned?
>
> I'm attaching the entire stack trace from
> /mulgara-1.0.0/dist/mulgara-output
> Although it appears that the problem is:
>
> Caused by: (QueryException)
> com.hp.hpl.jena.rdf.arp.MalformedURIException: Host is not a well
> formed
> address!
>
> Because the name of the machine does not make a valid URI?
>
... and that is because there was a time when it wasn't valid for the
first character in a domainlabel to be a numeric - and jena has
decided to enforce that archaic rule long after it was rescinded.
The specific problem is in isWellFormedAddress() in URI.java (yes
jena does do its own URI parsing for some bizarre reason),
specifically line 1241 of Jena2.1 (the old version we are using), and
which you will find unchanged at line 1234 in the current cvs HEAD.
http://jena.cvs.sourceforge.net/jena/jena/src/com/hp/hpl/jena/rdf/arp/
URI.java?view=markup
1234 // rightmost domain label starting with digit indicates IP
address
1235 // since top level domain label can only start with an alpha
1236 // see RFC 2396 Section 3.2.2
1237 int index = address.lastIndexOf('.');
1238 if (address.endsWith(".")) {
1239 index = address.substring(0, index).lastIndexOf('.');
1240 }
1241
1242 if (index+1 < addrLength && isDigit(p_address.charAt(index
+1))) {
1243 char testChar;
1244 int numDots = 0;
1245
1246 // make sure that 1) we see only digits and dot
separators, 2) that
1247 // any dot separator is preceded and followed by a digit
and
1248 // 3) that we find 3 dots
1249 for (int i = 0; i < addrLength; i++) {
1250 testChar = address.charAt(i);
1251 if (testChar == '.') {
1252 if (!isDigit(address.charAt(i-1)) ||
1253 (i+1 < addrLength && !isDigit(address.charAt(i
+1)))) {
1254 return false;
1255 }
1256 numDots++;
1257 }
1258 else if (!isDigit(testChar)) {
1259 return false;
1260 }
1261 }
1262 if (numDots != 3) {
1263 return false;
1264 }
1265 }
Specifically the test that is going to be failing is the isDigit at
line 1258, but the real problem is that the if-statement is there at
all.
Tthis restriction was imposed in RFC-952, was relaxed in RFC-1123
back in 1989, and is explicit in the grammar in section 3.2.2 of
RFC-2396 being quoted in the comment. Specifically:
The host is a domain name of a network host, or its IPv4 address
as a
set of four decimal digit groups separated by ".". Literal IPv6
addresses are not supported.
hostport = host [ ":" port ]
host = hostname | IPv4address
hostname = *( domainlabel "." ) toplabel [ "." ]
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit
port = *digit
Hostnames take the form described in Section 3 of [RFC1034] and
Section 2.1 of [RFC1123]: a sequence of domain labels separated by
".", each domain label starting and ending with an alphanumeric
character and possibly also containing "-" characters.
I think the confusion might come from the next sentence in the rfc:
The rightmost
domain label of a fully qualified domain name will never start
with a
digit, thus syntactically distinguishing domain names from IPv4
addresses, and may be followed by a single "." if it is necessary to
distinguish between the complete domain name and any local domain.
Where the code is testing the *leftmost* domain label, which is
permitted to start with a digit since RFC-1123.
So Jena is incorrectly rejecting your hostname, if you can change
your hostname to fit Jena's preconceptions then that would work
around the problem - but this is going to bite us eventually
somewhere else so we need to find a more permanent solution. This
might just be the impetus we need to finally remove our dependency on
Jena.
Andrae
--
Andrae Muys
andrae at netymon.com
Principal Mulgara Consultant
Netymon Pty Ltd
More information about the Mulgara-dev
mailing list