XML Parser in Java

In this article we will see how to read and manipulate XML data in Java . There are many inbuilt as well as external API’s for reading and manipulating XML data in Java but we will be using the inbuilt DOM Parser and XPath API to do this.

Below is the XML which we will be updating

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Jobs>
	<Job id="0">
		<position>Data Analyst</position>
		<skill>Python</skill>
		<vacancies>3</vacancies>
	</Job>
	<Job id="2">
		<position>Developer</position>
		<skill>CSS</skill>
		<vacancies>8</vacancies>
	</Job>
    <Job id="3">
		<position>Developer</position>
		<skill>SpringBoot</skill>
		<vacancies>1</vacancies>
	</Job>
</Jobs>

Below will be the output XML after manipulation

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Jobs>
	<Job id="1">
		<position>Data Analyst</position>
		<skill>Python</skill>
		<vacancies>4</vacancies>
	</Job>
	<Job id="2">
		<position>Developer</position>
		<skill>CSS</skill>
		<vacancies>9</vacancies>
		<salary>100K</salary>
    </Job>
    <Job id="3">
		<position>Developer</position>
		<skill>SpringBoot</skill>
		<vacancies>2</vacancies>
		<salary>100K</salary>
    </Job>
</Jobs>
public class AppMain {

	public static void main(String[] args) {
		
		
		String sourcefilePath="D:\\Developers.xml";
		String destinationfilePath="D:\\Developers_updated.xml";
		File xmlFile=new File(sourcefilePath);
		try {
			DocumentBuilderFactory documentFactory=DocumentBuilderFactory.newInstance();
			DocumentBuilder documentBuilder=documentFactory.newDocumentBuilder();
			Document document = documentBuilder.parse(xmlFile);
			
			//remove unwanted white spaces and reduce redundancies
			document.getDocumentElement().normalize();
			
			//using xPath API to query XML
			XPath xPath = XPathFactory.newInstance().newXPath();
			
			//XPath expression to get the list of Job nodes inside the root Job node
			NodeList jobList = (NodeList) xPath.compile("/Jobs/Job").evaluate(document, XPathConstants.NODESET);
			
			//update attribute value for a node with id=0 to id=1
			for(int i=0;i<jobList.getLength();i++) {
				Node idAttribute = jobList.item(i).getAttributes().getNamedItem("id");
				if(idAttribute.getTextContent().equalsIgnoreCase("0")) {
					idAttribute.setTextContent("1");
				}
			}
			
			
			NodeList vacancies = (NodeList) xPath.compile("/Jobs/Job/vacancies").evaluate(document, XPathConstants.NODESET);
			for (int i = 0; i < vacancies.getLength(); i++) {
				Node vacancy = vacancies.item(i).getFirstChild();
				int newVacancy=Integer.parseInt(vacancy.getNodeValue())+1;
				vacancy.setTextContent(String.valueOf(newVacancy));
			}
			
			//create new element salary and insert to all
			for(int i=0;i<jobList.getLength();i++) {
				Element salary = document.createElement("salary");
				salary.appendChild(document.createTextNode("100K"));
				jobList.item(i).appendChild(salary);
			}
			
			
			//to remove salary tag from the first job  
			NodeList childNodes = jobList.item(0).getChildNodes();
			for(int j=0;j<childNodes.getLength();j++) {
				if(childNodes.item(j).getNodeName().equalsIgnoreCase("salary")) {
		//jobList.item(0) is the <Job> node and pass the child node to be removed
					jobList.item(0).removeChild(childNodes.item(j));
				}
			}
			
			
			// write the content back into xml file
			TransformerFactory transformerFactory = TransformerFactory.newInstance();
			Transformer transformer = transformerFactory.newTransformer();
			transformer.setOutputProperty(OutputKeys.INDENT, "yes");
			transformer.setOutputProperty(OutputKeys.METHOD, "xml");
			transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "8");
			DOMSource source = new DOMSource(document);
			StreamResult result = new StreamResult(new File(destinationfilePath));
			transformer.transform(source, result);

			System.out.println("XML File update completed");
		} catch (SAXException | ParserConfigurationException  | IOException  | TransformerException | XPathExpressionException e) {
			e.printStackTrace();
		} 
	}

}

DOM Parser loads the file into memory and XPath allows various querying options to fetch required nodes through regular expressions . XPath is a powerful API that allows you to query data not only by the nodes or attribute tag names but also by the content/value of those nodes or attribute names.

DOM Parser may not be suitable for parsing huge files because it loads the entire document into memory . In such cases the alternatives are to use SAX parse , JAXB etc which have their own advantages and disadvantages.

Feel free to leave questions in the comments.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s