Saving and Opening HTML Files

When an HTML file is opened in Microsoft Excel, the data in the body is stored in a worksheet starting with cell A1. When HTML data is pasted into a worksheet, the data is stored starting with the active cell.

Table and cell formatting

When a table is opened, each data element in the table is one column wide. Excel automatically adjusts the column widths and row heights in the worksheet, and applies a column data format. For columns that become wider than four column widths, wordwrap is enabled. When an HTML table is pasted in a worksheet, the column widths are not adjusted.

For tables that use spaces instead of tabs to pad entries, any sequence of two or more spaces is treated as a single tab.

Excel does not support percent widths in the width attribute of a Table element. If there is only one table in a file, the width attribute specifies the width of the table in the worksheet. If there's more than one table in the file, Excel determines the width of all tables in the worksheet. If the tables are nested, the width specified for the outer table is used and, if necessary, the width is adjusted to accomodate the data in the nested tables. The columns and rows around the table are merged to preserve the appearance of the nested table. When the workbook is saved, only a single table is saved per worksheet.

Table cell formatting is saved using cascading style sheet (CSS) styles. Style definitions are stored either in the head of the HTML file or in a separate file. For workbooks that contain a single worksheet, the style definitions are stored in the head of the workbook file within a Style element. For workbooks that contain more than one worksheet, the style definitions (including the default cell formatting and all class definitions) are stored in a separate file. The file has the same name as the workbook but with a .css extension instead of .htm.

A style class is used to specify a set of formatting properties using a single name. For example, a style class can be defined as both a number formatting style and a border style. The class name can be specified in the class attribute of the HTML TD, TR, Col, and Table elements. The style class provides a shorthand way of specifying a complex set of styles.

Excel creates a class name for each unique combination of formatting. The class name can be specified in table elements that have this particular formatting. The following example shows four style classes applied to a table, a row, cells, a fragment of red text, and a span of text. The class attribute of the Table element specifies the formatting that would be applied to a column if it contained data.


 <TABLE CLASS=XL9>
  <TR CLASS=XL2>
   <TD CLASS=XL4>Plain<FONT COLOR=red CLASS=XL5>Red</FONT></TD>
   <TD CLASS=XL4>Plain<SPAN CLASS=XL5>Red</SPAN></TD>
  </TR>
 </TABLE>

Class names begin with XL, built-in style names begin with style, and rich text formatting that only specifies font formatting begins with font. All names are followed by an integer starting with 1 for the first formatting combination in the table.

Certain CSS Level 1 and Level 2 styles can also be specified. For more information about styles and the syntax of style definitions, see the Style Attributes topic and the CSS Level 1 and Level 2 Recommendations. For information about cell and worksheet formatting, see the Cell Formatting and Worksheets topics.

Copying and pasting tables

Parent styles and comments are not stored in the Clipboard. For formulas that reference cells outside the selected range, only the result of the formula is stored.

When HTML is pasted, column widths, picture properties, and shape properties are not included. Any references are pasted as relative references.

Cell and row breaks

The BR tag has a mso-data-placement style attribute specifying where the data is stored. The attribute can have one of the following string constants: new-cell means to start a new cell in the next row after the break and same-cell means that the break is in a cell.

Hyperlinks

There cannot be more than one hyperlink in a worksheet cell. If the cell contains a hyperlink, the entire content of the cell is a hyperlink. If a cell contains plain text and a hyperlink, the plain text becomes part of the hyperlink. If more than one hyperlink is placed in a cell, no text in the cell is a hyperlink.

Web queries

If a file contains a Web query and formulas dependent on the query, the formulas might not work if the layout of the query is changed. The XML 97Query element is specified if a Microsoft Excel 97 Web query is used.

Text surrounding tables

The text around a table is stored in column A of a worksheet in a paragraph style format. If the text is longer than the width of the table or if the file does not contain a table, the worksheet cells are merged across the width of the screen and wordwrap is enabled. If the table is smaller than 400 pixels, the cells are merged to the 400 pixel mark. When a paragraph break is encountered, the text is broken. When a file containing Pre elements is opened, the Text Import Wizard determines whether the text is delimited and how to align the text.

Microsoft Excel 97 elements

In Excel 97, the vnd.ms-excel.numberformat style attribute is used to specify the number format using a custom format string. This style attribute is equivalent to the mso-number-format style attribute.

Invalid attribute and element values

When a file containing an invalid or unsupported value is opened, the default value is used.

Invalid worksheet names and index numbers

If the worksheet name is invalid, the prefix recovered and an underscore are specified in front of the default name of the sheet or chart (for example, recovered_sheet1 and recovered_chart1). If the worksheet index is invalid, the default index is used based on the position of the worksheet in the workbook.

Missing images and files

If image files are missing and alt text exists, the text is displayed in place of the image. If the worksheet HTML file is missing, a blank worksheet is inserted in its place. If the workbook HTML file is missing, each worksheet HTML file is opened as a separate workbook. If the CSS file is missing, all text is formatted in the Normal style. If any HTML or other formatting is found in the table, it is applied to the text.