Date: Wed, 15 Jul 92 00:33:09 MET DST From: timbl (Tim Berners-Lee) Message-Id: <9207142233.AA02089@ nxoc01.cern.ch > To: fkappe@fiicmds04.tu-graz.ac.at, www-talk@nxoc01.cern.ch Subject: Re: HTML DTD and related problems (rather long) I am replying very late to a message of Frank Kappe, dated 25 Jun 1992. First of all, to clear up some things about W3 HTML. You can't nest anchors. You can't nest anything EXCEPT you can put any elements inside an anchor (excpt for other anchors) and you can put an anchor inside any elements. Then there is the slight structure of the <DL>[<DT><DD>]*</DL> etc which is required. I think I emntioned in earlier messages that the lack of structure is to make it easier to process HTML on systems which have styled text (like most systems, MSWord etc). Now it is interesting that you in Gyper-G allow anchors to be any section of the text. This of course is counter to the SGML philosophy of strict nesting (SGML people can get quite religeous about this, but I can't.) I think it is useful to be able to refer to two separate overlapping anchors. The problem is it is taken as given by any SGML DTD designer that this sort of structure in a document is "BAD". This means that SGML tools won't be able to process the <AS> and <AE> (anchor start and anchor end) tags, you'll have to write something special on top which keep track of anchors. AGML parsers won't be able to verify the anchor structure. So though the DTD is valid, but it doesn't in fact representthe restrictions. You say that you feel it is better to store the links separately rather than in the document. First of all let me say that W3 does NOT require you to do that -- it just requires that the links, anchors and text are transmitted at the same time on the net. That is very different. Many systems generate the HTML on the fly from other sources of link information. Nowlooking at this question of where to store the links, the 'link database" model you propose is the Intermedia model of Norman Meyrowitz et al. This was developped in a non-distributed environment, where a "web" (database of links) was available to the readers and centrally coordinated. If you expand that system to more than one web server, and scale it to global, then you find that the same problems of ensuring consistency occur between the multiple link databases when before they occured between dcouments. You can't set up a system of bidirectional links for example, which you could in the non-distributed case. Link databases have their advantages, though. A model I am rather keen on atthe moment is of servers which are both link databases and source code control (sccs, rcs etc) systems. In this world, you don't need to store anchors at all -- you just quote the character number AND VERSION number of the document. To find an anchor, if you have a more recent version than the one referred to in the anchor, you have to ask the server to translate the anchor's position in the old document into a position in the new one. It may reply with the same or a didfferent position, or that as far as its "diff" algorithms go, it can't find where the original anchor text would be in the new document, or it might really have gone. This would allow links to be made around source code (for which you want source code control anyway). There would be common problems when you do the sorts of changes which diff can't handle intelligently, like you swap the two halves of a file, or you change all the "a_"s to "A_". In these cases, a smart editor would be able to explian each editing change to the server as it goes, to the server would understand the relationship between the pre-edit and post-edit versions. If you are going to do that, then you can go a small extra step and make the server coordinate simultaneous edits by many people, which gives you a group editor. Tim