[This is a transcript, courtesy of Karen Mosman, from the lecture on "What comes first on the Web -- style or structure?" given on February 16 2006 as part of my PhD Defense. HWL is me, Håkon Wium Lie] Eric Monteiro: Okay, welcome everyone. Please have seats. As part of the candidate, Hakon Wium Lie's defence, we will today have two trial lectures: the first with the a title, of his own choice, and in a hour he will present his response to the title the committee provided him. So, please Hakon. HWL: Thank you so much. Thank you all for coming. We are going to spend a few hours, if you bear to hang in here, on a topic that is of interest to many people in the room, I'd say the world expertise on style sheet is here today so if something goes wrong it's ah, it's ah bad for the whole field. And I am going to spend the first hour on a topic I selected myself and then as Erik said on the challenge they have, have given me. So, given that I was able to pick the topic here myself, one should think that I would have picked a clear title for it -- that's not actually the case. There is actually a trick question in this title. It's an ambiguous title. The ambiguity is in the part here, "what comes first". "Comes first" can either mean, in English, first as in importance -- what is important -- or it can mean what comes first in time. And I will actually discuss both of these questions. I think they are somewhat interlinked as well. Before we go on to that part though, there are a few things we need to establish in terms of terminology, especially the three words of the title that I have high lighted here. What do we mean by the word "Web", what do we mean by "style" and what do we mean by "structure". I expect many of you to have an intuitive understanding of what we mean by the web for example. That has entered the realm of everyday use, but I still think it would be interesting to take a look at how dictionaries currently define the web. I have two offerings here on the screen, from two online dictionaries. One says, that it is the complete set of documents residing on all Internet servers that use the HTTP proctocol, accessible to users via a simple point and click system. The other says that The World Wide Web is a global information space which people can read and write via computers connected to the Internet. I think both of these are actually pretty good. If we were to expand on the acronyms that are used here, I think certainly http is one of the foundations. This is one of three specification that TBL ah came up with in, in the, in around 1990 which is still the foundation that the web is built on. HTTP stands for hypertext transfer protocol, um, then there is the URL specification which is the source of the point and click system that was referred to. There isn't actually anything that says you have to point and click. You can have machines -- like Google -- and you can have various other user interfaces but still the URL is a fundamental specification that the web relies upon. The third one of these specifications from the uh 1990 era is HTML, the hypertext markup language, and that's actually the one that is of most interest here. I assume most of you are familiar with HTML. It is the foundation for documents on the web -- it is what web pages are written in. It's the source language, uh if you will. And that's also where the term structure comes in. That has to do with how HTML documents are authored. If you look behind a web page, look at the source code underlying the HTML page, you will see a lot of these tags, a lot of these angle brackets, uh and you will see content mixed in. These angle brackets are what are called tags and what's between them is the content. The, the angle brackets, the tags make up what we call a structure and by combining all these tags of various kind and put them inside each other we actually build up a structure of documents which can be represented as a tree structure. I know this is probably basic to many of you but I still think it's worth starting from the beginning. The definition of a structured document,which is more generic than HTML, is proposed by one of the sources of my work "a digitial document consisting of hierarchical elemenets containing text and other content. The elements primarily represent roles of the content rather than the presentation of the content." And now it starts to get interesting here cause what we saw here in the previous slides was actually the name of a tag "H1" which stands for heading one, headline level one if you want. It says something about the role of the content between the tags, it says that this is a headline, this string could be anything, I just chosen it to be headline randomly but the tag "H1" -- and also the other tags like "body" and "P" -- have a logical meaning rather than a presenation meaning. They say something about what the role of this headline is in the document but it doesn't say anything for example that the font size of the headline should be bigger than the font size of the paragraph. And this is what, what HTML coming out of the scientific environment put forward as being most important when HTML was released it was a structured document format where it said something about the roles, the sematics; it was media independent in the sense that all these tags could be presented on a computer screen like we're using here or it could be presented on a speech synthesiser for example for those who cannot see, or those of us who are in the shower. Then we go on to the other part, the presentational part, this is where the term style comes in and we shall look at how to define that as well. I put a little style sheet up on the screen here. It is written in CSS. It's very simple. It's two statements that describes, that says something about these tags that we saw before are to be presented. The H1 tag whenever that is found, the content of that tag is to have a certain font size and the color of the P elements is to be black. This is very simple of course. By making many such rules and combining them in to a style sheet though you can have some considerable visual effects. This is a sample web page using HTML (Dave Raggett steps into the room). I have to welcome Dr. Raggett here who is actually the author of the HTML specifications, many of them. When we apply a style sheet to that HTML document we get a very different presentation. This is exactly the same structure we just added a style sheet to it. (shows page from CSSZenGarden with Valentine's day style sheet) This style sheet is not written by me, it is written by a professional designer -- so are all of the ones I'm going to show here. As you can see from this one, you can tell which day I prepared my presentation. It is quite stunning the effects you can achieve just by adding a style sheet to an HTML document. And designers are starting to realise this and CSS is increasingly used to drive design on the web. If we are to define style sheets in an academic form, this is a definition that I propose myself "A style sheet is a set of rules that associates stylistic properties and values with structural elements in a document thereby expressing to present the document". Style sheets generally do not contain content. They are linkable and are reuseable. So the whole purpose of the style sheet is to says something about presentation, it's not gonna put content in there, you would see that from some of these, some of these added what you arguably could say is content - these pictures for example, it's arguable whether that is content or not, but in general this is used in a stylistic manner to convey a certain style not to really add any textual content. So the purpose is to describe the presentation to encode typography typically on a visual medium, to describe aesthetics, about colours, about white space usage, and in an aural presentation to say something about the volume or about the type of voice you want to be heard, for example. So style sheets are general to any kind of presentation you want to apply to your document. Having this distinction between structure and style is a fundamental belief in the area of document study. There are many reasons why style sheets and structure should be separate. I am not really going to go into that. That's quite well understood in the scientific community but I am going to discuss something which I don't think has been often discussed: what's more important of these two. And to do so I'm going to go back quite a few years - this is a dark picture and really the place is quite dark. It's taken in the dungeons of CERN, the laboratory in Geneva where the Web was invented by Tim. And this is actually the hallway where he worked, second door on the left was Tim's office. I happen to be there as well in 1994 for one year. It's an incredible place with all these physics people, of course, trying to discover new matter, but there's also quite a few computer people there. And I think it is quite interesting to see how CERN is built as a laboratory. It was set up after World War II. As you can see here -- or as you cannot see -- it is quite dark but you know you see all these corridors, all these doors along the corridor they're lined up you know systematically on the left hand, on the right side, long corridors, you have pipelines in the ceiling. This is quite orderly laid out, but it doesn't look so beautiful. his being CERN, you really don't know what these pipes have in them and you probably don't want to know. (laughter) But this is a set of pipes that Tim passed by every time he went to his office so I have this idea perhaps that he was somehow inspired by these pipes, you know the H1 elements,and H2 elements, you know, in HTML, H1, H2, H3 and then there's the ones that aren't used and few of the other ones. I don't know. (laughter) Are we allowed to make jokes here Eric? (giggle) Now if you go up and look around you in that area you find some of the most beautiful scenery you've ever seen. That's Mont Blanc in the distance, the city of Geneva in the middle and the countryside, the French countryside there in the front. It's very beautiful, if you turn the camera around you have the Jura mountains and some cows grazing peacefully in the field not knowing that underneath them they have this incredible machine, the world's biggest machine by the way, so I think, you know, comparing CERN being sort of the orderly scientific world, the laboratory, versus the natural beauty above I think is quite interesting. I am not going to say, you know make too many conclusions based on this but I do think we need, really need both. We need the beautiful part of the web, there is an aesthetic part of the web that is going to amaze us, and we want there to be, at the same time we want there to be structure. We want Google to find our documents. We want there to be some kind of processability ny computers. They're not very good at processing beauty and aesthetics but they are very good at processing tags. So the quite obvious answer to the first question "what's more important" is both style and structure are very important. The second question, "what comes first in time?" doesn't have really an obvious answer but I think it is interesting to see, to discuss it still. The question can be reformulated as "what should come first in time?" What's better? What's better for the users? What's better for the designer? What makes more sense from a rational point of view? And there are two ways to attack this: one is to look at the history of style and structure, to see how mark up languages, style sheet languages how they evolved, how they were developed from around 1980 and onwards or we can do a technical analysis and do some thinking what makes more sense. And I'm gonna try to do both here. Why does all this matter? I think this matters, because we see on the web that mark up languages are developed all the time and I think by knowing a little bit, by being conscious of what it takes to develop a successful mark up language or a successful style sheet language, we can actually make the web a better place. And some of us actually plan to spend the rest of our lives there... it better be nice! So the hypothesis that I will put forward, as I said I don't have an obvious answer to this, but I do think generally that mark up languages should be developed in the context of style sheet languages not the other way around. That's what I will try to argue in this presentation. So if we look at the historical analysis I picked a few of the systems that many people in this room will know and that I also analysed in my thesis we will see that traditionally, in general, mark up languages or systems for mark up languages were developed before the style languages. For example,if we start with SGML which is sort of the mother of all modern mark up languages we will see that it became an ISO standard in 1986 but there wasn't a style sheet language to present SGML documents. So when the Department of Defense chose to use SGML as the foundations for all their documentation, they had really no standardised language to present all that content they put in there. There were some proprietary systems and products that could do it but they wanted to have a standardised version so they started a standardisation of what became known as FOSSI which was developed quickly by committee a few years after 1986 in order to be able to present SGML documents. For the SGML community though, they didn't quite, they did like that , they saw that as a quick hack, they wanted a solution that could do more advanced things. It's quite an academic flavor. And they wanted for example non-Western layouts, they wanted to be able to do Japanese vertical writing and so forth. So FOSI wasn't really accepted. They started instead, the development of DSSSL which took 10 years to develop and I would say to you there not being an acceptable style sheet language SGML has seen quite little use. It has been very influential but when you look at the actual use it isn't that much. Then we have a few languages where we had simultaneous development where the style sheet system and the structured system were developed simultaneously. One was Scribe done by Brian Ried. The other is S&P languages by Quint, who is sitting right here, if I may say so (giggle). I think these languages in some ways suffered by not being submitted for standardisation. I think if they had been successfully standardised they could have seen successful use on the web for example . However, if one were to be able to standardise S&P just to take an example, you still would only have the foundation for the markup languages, you wouldn't really have the markup languages, they would have to come afterwards, which I believe support my hypothesis, but I would be interested to hear what Vincent has to say about that afterwards. Now you also have an example of the opposite where, actually, the style system was in place before the mark up system. TeX is well known in scientific communities, developed by Donald Knuth, who had the need himself. He wrote a book and needed a good system for doing the typography so he wrote the language and the formatter for doing so. It is not per se a style sheet language but it is what one could call a formatting language. Then on top of that, a couple of years later, came LaTeX which was Leslie Lamport's attempt to encode Scribe in an open way and LaTeX has been very successful and today even theses are submitted to the University of Oslo in LaTeX. It is really LaTeX and Word that are the choices. I hope to change that by the way by now submitting mine in HTML. So, this being successful also support my hypothesis I would say due to the style sheet system being in place first. We now looked at the historical analysis, we're now going to look at the technical analysis and try to look at the required components of these and see what makes sense from a technical point of view and if we look at the difference trying to see what's different between a mark up and style sheet system, I think there are two axis, three maybe, that are very important: one is the processing requirements, in order to process a mark up language you need a parser and writing a parser these days is very, very easy. Writing a formatter on the other hand which is required for a style sheet language is very hard. You need to think about all sorts of efficiencies, beauty, fonts, screen resolutions, drivers, etc. which you don't really need to write a parser. So I would argue that it's significantly harder to write the style system. Also, how many people can be part of the development. I think today we can see that many people can contribute to the development of mark up languages but to do style sheet languages there are going to be fewer of them and that means there are going to be fewer developers as well. So, since, in order to present content you really, really need a style system that becomes the required component. The style language become the platform onto which you can develop mark up languages. You can, it's much easier to do prototyping, for example, of mark up languages once you have the style system, then you can see output on the screen whereas developing a new style ... (Recording stops)