by Larisa Thomason,
Senior Web Analyst,
Why do many webmasters - who are so meticulous about every other aspect of design - ignore one of the basic components of a good Web site? Many just don't how important valid HTML code is to both page display and site promotion.
A Widespread Problem
The World Wide Web Consortium (W3C) Quality Assurance Interest group estimates that 99% of Web pages contain some invalid HTML code. Although they're careful to note that:
"…there are no statistics to support this. It would be interesting to run a survey to prove that this case is indeed true."
Dagfinn Parnas, a graduate student at the University of Bergen, did just that for his Master's thesis and the results are startling. The entire thesis is posted online in PDF format. Be patient though: its 125 pages take a while to download even over a fast connection!
In summary, Parnas randomly selected 2.4 million different URLs from the Open Directory Project's (ODP) index and evaluated how well the HTML code on each page complied with W3C coding standards. And, as the W3C Quality Assurance group predicted, the vast majority of the pages did indeed have invalid code.
Of the 2.4 million pages that could be evaluated, only a small fraction met W3C standards:
When discussing his results, Parnas notes dryly:
"There is little correlation between the official HTML standard and the de-facto standard of the WWW."
The Five Most Common Problems
At first glance, the most common problems don't really seem to be all that important:
- No Document Type Definition (DTD) declaration
- Missing required attribute
- Non-standard attribute
- Omitted end tag
- End tag not open
So what if you're missing a DTD declaration and used a non-standard tag? So what if you forgot to close a paragraph tag or delete an extra tag? Modern browsers know what you meant to do and display the page anyway, right?
Internet Explorer, the most commonly used browser, is also the most forgiving when it comes to errors. Lazy and/or inexperienced coders think that's great! If the page displays properly without a lot of effort, why worry about validation?
Mainly because invalid code causes problems more problems than most people realize.
Problems Caused By Bad Code
Invalid HTML code can cause display problems that turn away visitors and hurt search engine promotion efforts.
- Browser display problems: Even the simplest errors can cause big problems - particularly in browsers like Netscape and Opera. They adhere more stringently to W3C standards. Although Explorer may ignore a problem like a missing closing TABLE tag, Netscape ignores the entire table and all its contents.
Other errors can be equally damaging. Learn more in our online Browser Display Tutorial.
- Accessibility problems: Screen readers (an assistive technology used by many people with vision problems) are basically simple text browsers. They often have problems with HTML code errors - particularly missing attributes.
Learn more about Web accessibility at our online Accessibility Resource Center.
- Incorrect search engine indexing: Search engine spiders are also basic text browsers. While an advanced browser like Explorer 6 or Netscape 7 may not care if you forget to close some quotation marks inside a tag, a search engine spider does!
Coding errors may hide large amounts of your page content from search engines - even though human visitors see the content with no problem. That is, if they can find it. Learn more about this problem in our December 2002 Webmaster Tip Bad Code Hurts Your Search Engine Rank.
It seems strange to consider that a few coding errors could cause all those problems, but it's true. You could be losing customers and not even realize it!
Why People Have Bad Code
So why isn't every webmaster scrambling to validate the correct their HTML code? Mainly because they either don't realize their code has problems or they don't know enough about HTML code to correct the problems themselves.
The W3C article notes several reasons for the huge number of invalid pages:
- HTML editors don't conform to W3C standards. Front Page, one of the most popular HTML editors allows users to include proprietary HTML tags and use special server functions that only work on servers that are properly configured. Be careful when using an HTML editor. Check the instructions and learn how to configure it to write only valid HTML code.
- Incorrect information in books. Even some newly published how-to books contain incorrect information about coding. But often the problem comes from using outdated Web resources. Check the copyright date on your reference books and consider upgrading to the latest edition if your book is more than 2-3 years old.
- Microsoft Word "save as Web page" function. Microsoft Word, a popular word processing program, contains an option to save the word-processed document in HTML format. This is almost always a mistake! Refer to our August 2001 Webmaster Tip ""Don't Save That File!" to learn more about the pitfalls.
Automatically Correct Errors
It's easy to find out if your HTML code has validation problems. There are several good online validation programs.
- The W3C offers a good online validation tool that spots coding errors. It's free and relatively easy to use.
- NetMechanic's HTML Toolbox offers even more features. The free version alerts you to errors inside your HTML code, spots common spelling errors, identifies broken links, lists potential browser display problems, and tests page load time to detect slow-loading pages. The subscription version also corrects HTML errors for you and generates a corrected page you can upload directly to your server.
No matter how you decide to find and fix coding errors, it's critically important that you do it! Just because a page looks good in your browser doesn't mean that other visitors - and search engine spiders - are having the same enjoyable experience.