Convert html to xml java

Can any one suggest me a best approach for converting html to xml using java Is there any API available for that? The html also might contain javascript code

I have tried below code:

import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.DataInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.io.IOException;

class HTML2XML {
    public static void main(String args[]) throws JDOMException {
    InputStream isInHtml = null;
    URL url = null;
    URLConnection connection = null;
    DataInputStream disInHtml = null;
    FileOutputStream fosOutHtml = null;
    FileWriter fwOutXml = null;
    FileReader frInHtml = null;
    BufferedWriter bwOutXml = null;
    BufferedReader brInHtml = null;
    try {
        // url = new URL("www.climb.co.jp");
        // connection = url.openConnection();
        // isInHtml = connection.getInputStream();

        frInHtml = new FileReader("D:\\Second.html");
        brInHtml = new BufferedReader(frInHtml);
        SAXBuilder saxBuilder = new SAXBuilder(
                "org.ccil.cowan.tagsoup.Parser", false);
        org.jdom.Document jdomDocument = saxBuilder.build(brInHtml);

        XMLOutputter outputter = new XMLOutputter();
        org.jdom.output.Format newFormat = outputter.getFormat();
        String encoding = "iso-8859-2";
        newFormat.setEncoding(encoding);
        outputter.setFormat(newFormat);

        try {
            outputter.output(jdomDocument, System.out);
            fwOutXml = new FileWriter("D:\\Second.xml");
            bwOutXml = new BufferedWriter(fwOutXml);
            outputter.output(jdomDocument, bwOutXml);
            System.out.flush();
        } catch (IOException e) {
        }

    } catch (IOException e) {
    } finally {
        System.out.flush();
        try {
            isInHtml.close();
            disInHtml.close();
            fosOutHtml.flush();
            fosOutHtml.getFD().sync();
            fosOutHtml.close();
            fwOutXml.flush();
            fwOutXml.close();
            bwOutXml.close();
        } catch (Exception w) {

        }
    }
}
}

But its not working as expected

In order to convert HTML to XML, we’ll use Aspose.PDF for Java API which is a feature-rich, powerful, and easy to use conversion API for Java platform. You can download its latest version directly from Maven and install it within your Maven-based project by adding the following configurations to the pom.xml.

Repository

<repository>
<id>AsposeJavaAPI</id>
<name>Aspose Java API</name>
<url>https://repository.aspose.com/repo/</url>
</repository>

Dependency

<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-pdf</artifactId>
<version>version of aspose-pdf API</version>
<classifier>jdk17</classifier>
</dependency>

Steps to Convert HTML to XML via Java

Java developers can easily load & convert HTML files to XML in just a few lines of code.

  1. Initialize a new Document
  2. Call Document.Process method with page index & output file path as parameters
  3. Save the output XML file

System Requirements

Aspose.PDF for Javais supported on all major operating systems. Just make sure that you have the following prerequisites.

  • Microsoft Windows or a compatible OS with Java Runtime Environment for JSP/JSF Application and Desktop Applications.
  • Development environment like Eclipse or IntelliJ IDEA.
  • Aspose.PDF for Java library referenced in your project.

This sample code shows HTML to XML Java Conversion

// load HTML with an instance of Document
Document document = new Document("template.html");
// save document in XML format
document.save("output.xml", SaveFormat.Xml);

  • Convert HTML to XML via Online App

    How do I convert HTML to XML?

    How to convert HTML files to XML online?.
    Upload HTML-file. Click "Choose File" button to select a html file on your computer. HTML file size can be up to 100 Mb..
    Convert HTML to XML. Click "Convert" button to start conversion..
    Download your XML. When the conversion process is complete, you can download the XML file..

    Can you convert Java to HTML?

    Find and select the JAVA files on your computer and click Open to bring them into Doxillion to convert them to the HTML file format. You can also drag and drop your JAVA files directly into the program to convert them as well.

    How do XML and HTML work together?

    XML Separates Data from HTML When displaying data in HTML, you should not have to edit the HTML file when the data changes. With XML, the data can be stored in separate XML files. With a few lines of JavaScript code, you can read an XML file and update the data content of any HTML page.

    Can you put XML into HTML?

    With Internet Explorer 5.0, XML can be embedded within HTML pages in Data Islands. The unofficial <xml> tag is used to embed XML data within HTML. Note that the <xml> tag is an HTML element, not an XML element. Data Islands can be bound to HTML elements (like HTML tables).