Convert XHTML Web Pages to HTML5

There were a couple of questions that cropped up in my HTML5 presentation that I thought I would mention in my blog. Both were somewhat related:

  1. How hard is it to get an XHTML page to validate to HTML5?
  2. Does the HTML5 page allow break tag, or any other empty elements?

The answer to the second questions is “yes”, based on the experimental HTML5 validator from W3C. The validator you know and trust has been set up to test HTML5 albeit in an experimental fashion. Which makes sense since HTML5 is still a specficiation still being written and implemented in bleeding edge browsers. 

The answer to the first question is “pretty easy”. 

I took a sample XHTML 1.0 Transitional page and converted it to HTML5 by swapping out the DOCTYPE and updating the HTML element’s attributes as noted in the presentation.

I did run into a slight problem of a div attribute not being allowed in a blockquote. However, it was the align attribute set to center. But that’s not really a problem. I should have had the div element set to align center via CSS anyway.

What this means, in theory, is that if you have XHTML pages it should be easy to convert them into HTML5. It should be even easier to convert them if you have pages that validate towards XHTML Strict.

So, the benefits of a stricter coding in XHTML help out the Web developer when heading into HTML5.

Validated Ampersands in HTML Links

When dealing with the W3C's HTML Validator, you know that the error messages try to be helpful. But when you are new to HTML and Web design in general, the messages can be hard to decode.

One such error a student of mine recently encountered. When they ran a Web site through the HTML validator, she found an error message that confused her:

general entity "FID" not defined and no default entity .

This is usually a cascading error caused by a an undefined entity reference or use of an unencoded ampersand (&) in an URL or body text. See the previous message for further details.

Error message from HTML validator

HTML Character Entities

When dealing with a markup language that has markers like < and > to indicate the start and end of tags is fairly simple. But when you want to actually use characters like < and > when you want to talk about tags or you want to demonstrated how much greater this part of this sentence is > compared to this side.

It turns out that HTML has a method for handling these cases. You use a series of characters typed out in the right order in the HTML code to represent one character. For example, typing out & lt; and & gt; the HTML file will produce < and >. Other examples include copyright (& copy;) and em-dashes (& em;).

The ampersand character is a trigger that tells the browser that a character entity is about to be formed and should be transformed when it is rendered on the screen.

But what happens when you want to present an actual ampersand and not tell the browser it's some other character? Well, in this case, the only way out of this situation is through! There's a character entity for ampersands and it looks like this & amp; if you were to write in HTML code.

For more information on character entities, check the listing of XML and HTML character entities.

But What About the HTML Error?

The "FID" error the student found is a little confusing, especially since it's not really an error in the sense we have a malformed tag called "FID" that we need to fix.

Rather what's happening is that the long link has an ampersand in its query string that is not properly formed. Since ampersands are supposed to be escaped and they aren't in the long link in the source code, the browser is expecting the characters after the ampersand to be one of those special HTML characters it knows. Since it's not, it's throwing up it's hands and giving up.

So, when placing links with query strings in your Web document, be sure to write out the character for ampersand in the href value:

add-cart.html?isbn=9780470177082& amp;id=023

That removes the error and allows you to get on with life™!

Getting More Out of the W3C Validator

When checking out sites with the validator, you can click on an option called "Verbose Output" which will give you more insight into error or probable cause of the error messages you are receiving.

However, I've found that most people don't know about this option even though Zeldman made W3C put it in way back in 2003, a magical time when we didn't have to be told to leave Britney alone.

To help with that, I've made a bookmarklet which sets Verbose Output on when testing a page for validation.

To use it, first drag this link to your browser's bookmark menu:

HTML Validation (Verbose)

Then surf over to a site you want to check out and press the "HTML Validation (Verbose)" bookmark in your browser's menu.

You are automatically taken to the validation report with Verbose Output and, as an added bonus, Show Source already turned on.

Setting the Start Number for an Ordered List

Last week when Chris Mills and crew announced the release of their Web Standards Curriculum, the table of contents was buried away from their front page. Making it hard for visitors to get to the content.

So, to help people get to the content better in a small way, I placed the table of contents on my site with direct links to each of the articles that make up the first version of the curriculum. While I was marking up the content, I noticed a little problem on Opera’s table of contents.

If you look at Figure 1, the table of contents has two markers: a bullet and a number appended to the side of each list item. What’s actually happing behind the code is that the list is marked up with an unordered list, which auto-generates the bullet, and there is a hard-coded number for the article:

Figure 1.


<ul>
 <li>12. <a href="http://dev.opera.com/articles/
view/12-the-basics-of-html/" title="HTML basics 
article">The basics of HTML</a>, by Mark Norman 
Francis.</li>
  ...
</ul>

All that coding seemed a bit odd. I had a quick email exchange with Chris Mill after I posted my table of contents that they needed to work on increasing visibility (or “scent” as Jarod would say) to the content. 

I checked back on the site a few days later in and noticed that the double markers in the table of contents were gone as shown in Figure 2.

Figure 2.

Curious as to how they did the markup for the list, I pulled up trusty View Source and noticed something a bit shocking:


<ul style="list-style:none;">
 <li>12. <a href="http://dev.opera.com/articles/
view/12-the-basics-of-html/">The basics of HTML</a>,
by Mark Norman Francis.</li>
  ...
</ul>

The markup for the lists was the same as before, however, inline CSS was made to remove the default markers. 

As Ted would say, “woah!”

I shot over an email to Chris informing him of the start attribute that can be used with ordered lists.

This rather simple attribute allows one to set the start of an ordered list with any integer and has the added benefit of fitting in rather nicely with the breaks of the curriculum’s table of contents:


<ol start="12">
 <li><a href="http://dev.opera.com/articles/
view/12-the-basics-of-html/">The basics of HTML</a>, 
by Mark Norman Francis.</li>
 ...
</ol>

He wrote back saying, “I honestly didn’t know you could do this!”, and has promptly fixed the markup as shown in Figure 3.

Figure 3.

It doesn’t surprise me that Chris didn’t know about this attribute because a lot of my HTML books I own don’t even cover the attribute. (Some online HTML tutorials don’t as well.) The first book in my collection I found with this attribute listed was Molly Holzschlag’s Special Edition Using HTML and XHTML published in 2002! 

Note there is a minor down-side to the start attribute: It doesn’t validate with a XHTML Strict Doctype, but it does for XHTML transitional.

Other than that, I believe the start attribute is a handy bit of HTML knowledge to know even if you are just starting out with Web design or, you know, working on a Web standards curriculum for others. (Just kidding, Chris!)

Twitter-Sized An Event Apart Presentation Summaries

I'm not one that you might call a copious note taker. I burn out quickly listening to presentations and tend to focus on note taking rather than digesting what is being said.

Rather than long notes, I go another direction. In order to help remind me of what I witnessed during the two-day event known as An Event Apart Boston 2008, I decided to run through the presentations and write-up a Twitter-sized summary of each one.

Title Slide

Understandng Web Design by Jeffrey Zeldman
Web designers are very talented people who should get more respect. Calls user centered design something else: "Empathy Web Design".
The Lessons of CSS Frameworks by Eric Meyer
Eric examined nine CSS frameworks, but says they all aren't right for you. You should make your own or adapt them to your liking.
Good Design Ain't Easy by Jason Santa Maria, 30
Designers should be story telling. Talks about the history of print design and how that can bleed over to Web design.
Web Application Hierarchy by Luke Wroblewski
Give your users the "confidence to take actions". Telling people visually what to do on your site is good. Learn graphic design principles.
Design to Scale by Doug Bowman
We respect proportions. McDonald's scales, Starbucks sells experience, not Java. Quotes Paul Rand: "Simplicity is not the goal."
When Style Is The Idea by Christopher Fahey
Quoted Paul Rand, Stewart Brand, etc. Style encourages innovation. Style sells, style happens. Fashion has a vocabulary, does Web design?
Scent of a Web Page: Getting Users to What They Want by Jared Spool
Five types of pages users encounter: Target Contnet, Gallery, Department, Store, Home Page. Users have a purpose when coming to your site.
Debug/Reboot by Eric Meyer
CSS debugging is a good way to tease out things that might be trouble. Not many people use Link Checkers. Reviews his CSS Reset rules.
Comps and Code: Couples' Therapy by Ethan Marcotte
It's okay to admit mistakes. Covers three projects and problems he encountered. Treat everyone on your team like a client and prototype!
Principles of Unobtrusive JavaScript by ppk
Unobtrusive JavaScript is more like a philosophy. Use JS wisely for improved accessibility and Web standards-based sites.
Standards in the Enterprise by Kimberly Blessing
To get Web standards into large companies, you need to follow the Circle of Standards: train, review, document, repeat. Buy our book!
Designing the User Experience Curve by Andy Budd
People pay for the experience of Starbucks, not for the coffee. Pay attention to detail, pay attention to your customer.
Designing the Next Generation of Web Apps by Jeff Veen
"We are awash in data." Make data meaningful to your users. Another spotting of Napoleon March to Moscow infographic in a presentation.

Markup Map for hCard Microformat (Update)

Knowing what elements are available in the hCard microformat is important when trying to apply CSS rules. The main problem is that there are so many elements it's easy to get tripped up on how to best style an hCard.

To help understand how many elements are in the hCard and keep the block-level and inline elements separate, I followed Andy Clarke's 3D Zen Garden map model and made my own 3D map for the hCard microformat.

3D hCard Map

3D hCard

Update: After asking Tantek to review the hCard map, he made some suggestions for improving it. For more information, see the notes section.

Notes

Keep in mind that this example is simply a base. The 3D map uses the live example as its base, however an hCard can add more elements, like instant messenger handlers, or remove some that are currently shown in the demo. It depends on how much information you have to work with. For a clearer demonstration of this, I suggest tinkering with the useful hCard creator.

Also the live example is an hCard geared for an organization. Therefore the fn and org class attribute values are used on the same a element.

In the previous version, I used "web.site" as a fictional Web address. Tantek reminded me that example.com Web address was registered by the IANA for this very reason. I keep Ohio in the map since that's my new home state.

I fixed a typo in the last version where I carried over the class attribute tel for the email address's div wrapper.

In my previous example I was representing block level elements in two ways: as being on top of visual boxes as well as being placed inside them as well. While that made perfect sense at 2 a.m., I reworked the presentation of block level elements to be bold and try to focus the boxes on grouping and nesting of elements.

Dynamic Markup Maps

Lastly, I created this example using CSS rules (instead of going with Illustrator). The rationale behind using XHTML+CSS over a graphics editor is that I am more familiar with those technologies. Also I thought it would be a great benefit for Web developers to see a markup map (which could be dynamically generated) while using a tool like hCard generator.

The first step was creating the CSS rules, but I don't have the spare time to implement the rest of the project myself. Though, maybe someone else can?