Home / Features / People Watch / Interview
Search Site


  PeopleWatch: Betty Harvey

What SGML Can Teach Us About XML & the Web
Interviewed by Tony Byrne,  Founder, Principal, CMSWatch
2002-01-15

Betty Harvey

Betty Harvey is founder and president of Electronic Commerce Connection, Inc. (ECC), a consulting and training firm specializing in SGML/XML technologies.

CMSWatch: When did you start working with markup languages?  And how did you fall in love with them? 

Harvey:  In 1992, I was working with the US Navy in scientific and engineering computer support.  I loved that job, but after changes to the organization and the supercomputing center being moved, I decided switch to something different.  At the time, we had this other project -- which was kind of a black hole -- called “CALS,” (computer-aided logistics support), which included EDI and SGML. 

In our organization, it was always joked that if you worked with CALS it was the end of your career -- it was reputed to be a dead-end job.  I thought I would just move there until I could find something else, but it was actually kind of interesting because we were also responsible for the CALS standards for the Navy.  These were technical publishing standards based in part on SGML. 

But I didn’t fall in love with SGML. There were problems with SGML because it was way too expensive, it was way too complicated due to the all the various options, and there was limited vendor support.  You couldn’t do it for under a million dollars.  But at the same time, while I was still working in the CALS arena, the Web hit, in 1994, and HTML proved that SGML could work, somehow.

At the first World Wide Web conference in 1994 it clicked: SGML is going to revolutionize the way we think of information and because HTML could prove you didn’t need a million dollar to get involved in it.  That was like a lightbulb to me. I got involved in an “SGML on the Web” initiative that helped pave the way for XML. 

So what are the key differences between SGML and XML?

Harvey:  Fundamentally, XML has taken SGML and made it less complicated, but more standardized.  Especially if you take just XML 1.0, anybody can learn it.  Yet, everything has to be well formed, where in SGML that wasn’t the case. In SGML you could eliminate the end tags, you could eliminate the beginning tags, or you could eliminate both tags.  It was crazy.

So it’s stricter – which is good -- but it’s more flexible in other ways.

CMSWatch:  Many of us came to XML via Web development.  What lessons does SGML impart to anyone thinking of working with XML?

Harvey:  Perhaps the biggest lesson – if you look at the history of SGML – is to be wary of unnecessary complexity around specifications. 

The real danger for a developer is that you don’t know which standard you should work under.  Let’s take XML transport: should you use SOAP or ebXML for a transport/routing protocol, or should you use XML-RPC instead?  And some of these specs aren’t quite finished yet, yet people are developing to them.  But you don’t know 5 years down the road whether vendors will support these specs. 

If you look at SGML, we had a similar problem.  When SGML just started, formatting was supposed to remain separate from the content.  But in reality, you need something to display this data – no one is going to pick up an SGML document and look at it in the raw SGML, unless you’re a geek.  Same goes for XML.  So you need a way of displaying it. There was an ISO standard called “DSSSL”  -- Document Style Semantics and Specification Language, but it took DSSSL ten years to get through the ISO world and there were modifications along the way – similar to what has happened with the W3C specs.  In the meantime, the Department of Defense (where I worked) said, “we can’t wait for DSSSL -- we’re going to do our own standard,” called FOSI, Formatting Output Specifications Instance. 

They developed FOSI, and vendors were helping the Defense Department define it and develop product around it, but every vendor dropped out, except two -- ArborText and DataLogics -- who both still support it.  But the spec went through with ambiguities, and FOSI implementations between the two remaining vendors weren’t interoperable. 

I think history is repeating itself, especially where schemas are concerned.  There is a real danger of the same mistakes being made again: The specs revolving around XML are conflicting, complicated, and in some cases ambiguous, and you really don’t realize that until you start to use multiple products.  So you’re still in danger of being locked into a particular vendor, which was just what XML was supposed to solve. It was supposed to be vendor neutral.

CMSWatch:  Then how closely does the manager of a major corporate e-business effort need to track major XML-based schemas and languages emerging in her industry?

Harvey:  She could spend her whole career doing nothing but that.  I think it’s important to be aware of what’s going on, but that’s mostly all. 

I have always stressed that the important thing is your information.  If you have your information structured in a way that is meaningful to you, then going to an industry standard is just another transformation that you can make -- like the transformation to HTML.  It’s nothing you should find daunting.

The thing with these standards is that you have 100 to 300 to -- in the case of ebXML -- a thousand people working on them.  Standards become inflexible and incorporate all kinds of different features that don’t make sense for an individual organization.  So what you really want to do is  keep your organization’s information in mind, but still recognize how your data can be taken to that industry standard.  Remember that in lots of cases, industry standards have not been finalized; they’re being worked on – so they’re going to change. 

CMSWatch:  If I have an SGML-based document repository, when should I seriously consider migrating it to XML?

Harvey:  First, you have to think about your current infrastructure. If you’ve put $1-2 million into your repository, and your authors are working fine in that arena, then there is less incentive to go with XML.  Now if you have to migrate – a good example would be if your SGML repository resides within a vendor platform that is out of business and you must change software regardless – that would b a good time to convert to XML.  This is not a big deal, actually.  But the time to do it is when you want to look at new software.

CMSWatch:  If you had 30 seconds to explain the rationale for XML to a non-technical business manager, what do you say?

Harvey:  Repurposing of information is typically the biggest advantage of going to XML.  You can take it to traditional paper, you can take it to Web, you can take it to wireless applications, and you can take it eBooks. 

And you have one source that goes to all these different formats or areas.  Of course, in the future -- five or ten years down the road -- we don’t know where the biggest bang for your buck is going to be, but we know that if your data is in XML, you know you can get there.

CMSWatch: Can’t you go from Microsoft Word to all those formats?

Harvey:  No.  There’s no inherent structure in Word.  If you export a Word file, you’re going to get RTF or HTML.  And if you’re trying to get some sort of complex information from the data, Word just isn’t going to do it for you. 

CMSWatch:  But can’t you repurpose content with databases?

Harvey:  Yes.  And in some cases, that make sense.  It depends on the infrastructure. 

Let’s say you’re looking at XML seriously and you already have a relational database.  I’ve seen companies take the relational database, throw it away, and go with an XML, object-oriented database.  But this doesn’t make sense, because you already have fielded data, and you can still use XML on top of the relational database. 

Remember that with all of these CM systems at the moment, we don’t know which ones are really going to be survivors.  But you know Oracle and SQL Server are not going away.  So if you have your data in a relational database – and of course it may be that you can’t get all your data into that format – you can still use XML where it makes sense. 

CMSWatch:  You do a lot of XML education.  What do you see as the biggest hurdle for its adoption in Web Content Management?

Harvey:  One of the biggest problems is that most people have put their websites together with spit and glue.  You have to sit down and do in-depth content and data analysis, and most people don’t want to do that.  It’s not something that you can go in overnight and handle. 

You have to look at the entire infrastructure, and not strictly from the standpoint of web delivery, but workflow as well.  Where does the data come from, and how is it used?  In most cases, the XML project gets pushed back further within the infrastructure of the business, not just in Web delivery.   If you’re just going to do it for Web delivery, I don’t think it makes sense.  It’s something that you need to make a part of your information infrastructure.  And this can be daunting, because it can’t be done overnight. 

One of the major reasons why some XML projects fail is that people think, “OK, from here on out everything’s going to be XML.”  But they don’t take an in-depth look at how the work flows, how the information flows, where the information comes from and when it comes in, what they do with it, and so forth.  You need to look at the entire scope of the organization, not just web delivery. 

CMSWatch:  What about the “fear of pointy brackets”?  Should we tell people to just get over it and accept them?

Harvey:  Yes, I think so.  And I think it’s already happened with HTML.  People aren’t afraid of it anymore the way they used to be.  That was probably the biggest hurdle in SGML, because pointy brackets were something new and alien, but now people are used to seeing them every day. 

CMSWatch:  But one of the big pushes in XML editors is to make all those tags go away so that the casual business user doesn’t have to deal with them.  Do you hold much promise for that? 

Harvey:  Actually, it reminds me of working with [Corel] WordPerfect.  People who worked with WordPerfect loved it, because one of the things you could do in WordPerfect that you can’t do in [Microsoft] Word is get to the code.  In WordPerfect you could “reveal codes” and fix things that weren’t quite right. 

That’s the way I view XML.  When you don’t have the tags “on,” it’s really nice to work with it, but if something’s not working right, you can reveal the underlying structure and see what’s going on.  In most cases, you’re going to be able to work without the tags revealed, but in some instances you still need to see what’s going on.

CMSWatch: Do you ever share the recipe for your famous chocolate-chip cookies?

Harvey: Actually, it’s the Tollhouse recipe on the back of the chocolate chip bag.  I consider it a kind of industry standard.  But like all good organizations, I modify its implementation to my liking: I add twice the number of pecans. 

E-mail this story  Email this story Print this story  Printer Friendly version

Send Feedback

see more people


  Glossary

Document Management

Metadata

Object-oriented database

RDBMS

SGML

Workflow

XML



E-mail this story Print this story



  CMS Report Excerpt

The CMS Report looks at...
   ...Image editing

"Many CMS vendors now provide capabilities for editing images on the fly in the browser. This capability is a double-edged sword..."

(p. 38)

More about The CMS Report


CM Pros Summit in Boston

 


THE CMS REPORT Newly updated, June, 2004:
  • Objective reviews of major CMS products
  • How to avoid common pitfalls and negotiate a good price
  • Optional Enterprise Edition for large, multisite CMS projects Buy It! Try It!


  • RSS

    CMS Watch Enterprise Search Report October, 2004
  • Objective reviews of major Enterprise Search products
  • Detailed analysis of costs, benefits, and technical choices
  • Read More



    Sign up now to receive our monthly newsletter


    Don't take our word for it

    "Very knowledgeable on Web technologies, and very effective on the topic at hand. I appreciated the impartiality to any vendor."

    -- Jorge Rivera, Manager, Web Program,
    South Florida WMD



    Tech writer looking to boost skills
    Are there any CMS Certifications? More


    CMS Watch™ publishes vendor-neutral reports that provide independent analysis and practical advice regarding Web and Enterprise Content Management solutions. As a "buy-side" analyst firm, CMS Watch receives no income from the vendors we cover. For a small fraction of your overall project investment, our reports can help you minimize the time and effort to identify and evaluate technologies suited to your requirements. Our products include:
    The CMS Report
    The Enterprise Search Report
    Custom consulting.

    Contact Info
    info@cmswatch.com
    9110 Warren St.
    Silver Spring, MD
    USA 20910

    V: +1 301 585 7004 (editorial)

    V: +1 301 570 6788 (customer service)

    F: +1 214 242 3048




    "Integration is where the time and money goes. That's when the makeup comes off and you get to see all the blemishes. Investigate and anticipate the integration pain points before you commit to a particular product. It'll make for a happier marriage."

    --Phil Suh, CMS Consultant