How to Extract Attribute Values from XML/HTML Data

This post will guide you how to extract attribute values from XML or HTML data in a range or cells in excel.  How to strip values from XML or other markup in excel Cells.

Extract Attribute Values from XML

Assuming that you have a table that contains the content of the markup language, such as: XML or HTML markup data. If you want to extract only attribute values from xml data, you can use a combination of the MID function and the LEN function to create a new excel formula.

The format of markup value is like this:

<item>excel</item>

<item>word</item>

<item>ppt</item>

To strip attribute value in Cell B1, you can use the following formula:

=MID(B1,7,LEN(B1)-13)

Let’s see how this formula works:

=LEN(B1)

extract values from xml markup1

The LEN function returns the length of the text string in Cell B1. It returns 18.

=LEN(B1)-13

extract values from xml markup2

This formula returns the length of the attribute value in Cell B1, it use the length value of the cell B1 to subtract 13.  The number 13 is the length of the markup containing the starting tag (<item>) and the closing tag (</item>).

=MID(B1,7,LEN(B1)-13)

extract values from xml markup3

The length of the starting tag is 6, so the position of the first character that you want to extract in attribute value in Cell B1 is 7.  And the num_chars is returned by the above LEN function as 5.  So the MID function extracts the attribute value between two markup tags as “excel”.

Note: you can drag the Fill Handle down to other cells to extract attribute values from other Cells (B2:B3).


Related Formulas

  • Remove Numeric Characters from a Cell
    If you want to remove numeric characters from alphanumeric string, you can use the following complex array formula using a combination of the TEXTJOIN function, the MID function, the Row function, and the INDIRECT function..…
  • Split Text String to an Array
    If you want to convert a text string into an array that split each character in text as an element, you can use an excel formula to achieve this result. the below will guide you how to use a combination of the MID function, the ROW function, the INDIRECT function and the LEN function to split a string…
  • remove non numeric characters from a cell
    If you want to remove non numeric characters from a text cell in excel, you can use the array formula:{=TEXTJOIN(“”,TRUE,IFERROR(MID(B1,ROW(INDIRECT(“1:”&LEN(B1))),1)+0,””))}
  • Get the position of Last Occurrence
    If you want to get the position of the last occurrence of a character in a cell, then you can use a combination of the LOOKUP function, the MID function, the ROW function, the INDIRECT function and the LEN function to create an excel formula…

Related Functions

  • Excel MID function
    The Excel MID function returns a substring from a text string at the position that you specify.The syntax of the MID function is as below:= MID (text, start_num, num_chars)…
  • Excel LEN function
    The Excel LEN function returns the length of a text string (the number of characters in a text string).The LEN function is a build-in function in Microsoft Excel and it is categorized as a Text Function.The syntax of the LEN function is as below:= LEN(text)…

Leave a Reply