# Formatting/displaying JMA XML data in Julia

(JMA is Japan Meteorological Agency.)

# Motivation

To solve an assignment to retrieve XML data using REST in a university class about database.

# Overview

  • Get a JMA Atom feed in JMA XML format using HTTP.jl
  • Show the XML data using EzXML.jl

# Main

# The data contents

This following is a part of data.

The XML Data
<feed xmlns="http://www.w3.org/2005/Atom" lang="ja">
<title>長期(定時)</title>
<subtitle>JMAXML publishing feed</subtitle>
<updated>2021-05-15T23:12:06+09:00</updated>
<id>http://www.data.jma.go.jp/developer/xml/feed/regular.xml#long_1621087926</id>
<link rel="related" href="http://www.jma.go.jp/"/>
<link rel="self" href="http://www.data.jma.go.jp/developer/xml/feed/regular.xml"/>
<link rel="hub" href="http://alert-hub.appspot.com/"/>
<rights type="html">
<![CDATA[ <a href="http://www.jma.go.jp/jma/kishou/info/coment.html">利用規約</a>, <a href="http://www.jma.go.jp/jma/en/copyright.html">Terms of Use</a> ]]>
</rights>
<entry>
    <title>大雨危険度通知</title>
    <id>http://www.data.jma.go.jp/developer/xml/data/20210515141010_0_VPRN50_010000.xml</id>
    <updated>2021-05-15T14:09:42Z</updated>
    <author>
        <name>気象庁</name>
    </author>
    <link type="application/xml" href="http://www.data.jma.go.jp/developer/xml/data/20210515141010_0_VPRN50_010000.xml"/>
    <content type="text">【大雨危険度通知】</content>
    </entry>
<entry>
    <title>地上実況図</title>
    <id>http://www.data.jma.go.jp/developer/xml/data/20210515140100_0_VZSA50_010000.xml</id>
    <updated>2021-05-15T14:00:49Z</updated>
    <author>
        <name>気象庁</name>
    </author>
    <link type="application/xml" href="http://www.data.jma.go.jp/developer/xml/data/20210515140100_0_VZSA50_010000.xml"/>
    <content type="text">【地上実況図】</content>
</entry>
<entry>
...


# import Package

import Pkg
using Pkg
Pkg.add("HTTP");using HTTP
Pkg.add("EzXML");using EzXML

If you're using Jupyter, you don't need to do using XXX after loading it once. If you write on .jlfile and execute it in CLI, Package loading run everytime because the session switchs in every time and you will feel frustrated every time you run it. It is said that one of the solution seems to be to use PackageCompiler.jl (opens new window).

Reference: PackageCompiler.jl で Plots の呼び出しを高速化する 2020 年 7 月版 - Qiita (opens new window)
PackageCompiler.jl で Makie.jl の呼び出しを速くする - Qiita (opens new window)


# Get an Atom feed with HTTP Request

Using HTTP.jl.

req = HTTP.request("GET", "http://www.data.jma.go.jp/developer/xml/feed/regular_l.xml")
# status code
println(req.status)
# The contents
println(String(req.body))

# Analysis XML Data with EzXML.jl

  • Loading XML Data

    The data format syntax
    XML file readxml(your_xml_file)
    XML String parsexml(""" <feed xmlns=" .... """)
     


    doc = parsexml(String(req.body))
    primates = root(doc)
    
  • EzXML.jl Type The types in EzXML.jl is mainly EzXML.Document and EzXML.Node(At least, in this post.).

  • Take root in XML Data


     

    doc = parsexml(String(req.body))
    primates = root(doc)
    

    The type of doc is EzXML.Document. root(doc) makes you analysis the XML data contents because its type is EzXML.Node.

  • The Analysis of Node

    Now, the above primates is of type EzXML.Node, is an iterator, and is root Node of doc of type EzXML.Document. What you wants to analysis is the child nodes of primates, so the child nodes is taken by using eachelement() as below.

for genus in eachelement(primates)
  println(genus)
  # Get an attribute value by name.
end

Thereby,

<title>高頻度(定時)</title>
<subtitle>JMAXML publishing feed</subtitle>
<updated>2021-06-03T02:12:07+09:00</updated>
<id>http://www.data.jma.go.jp/developer/xml/feed/regular.xml#short_1622653927</id>
<link rel="related" href="http://www.jma.go.jp/"/>
<link rel="self" href="http://www.data.jma.go.jp/developer/xml/feed/regular.xml"/>
<link rel="hub" href="http://alert-hub.appspot.com/"/>
<rights type="html"><![CDATA[
<a href="http://www.jma.go.jp/jma/kishou/info/coment.html">利用規約</a>,
<a href="http://www.jma.go.jp/jma/en/copyright.html">Terms of Use</a>
]]></rights>
<entry>
  <title>大雨危険度通知</title>
  <id>http://www.data.jma.go.jp/developer/xml/data/20210602171020_0_VPRN50_010000.xml</id>
  <updated>2021-06-02T17:09:44Z</updated>
  <author>
    <name>気象庁</name>
  </author>
  <link type="application/xml" href="http://www.data.jma.go.jp/developer/xml/data/20210602171020_0_VPRN50_010000.xml"/>
  <content type="text">【大雨危険度通知】</content>
</entry>
...

output the contents of all hierarchy below genus.

  • Arrange and print the Node contents

    To check the hierarchy of <title>, <id>, <updated>, etc. in <entry>, you can nest them with for. And If you want to output only Node name, use .name as follows.

    If the data is

    <title>大雨危険度通知</title>
    

    like below.

    primates = root(parsexml("""<title>大雨危険度通知</title>"""))
    println(primates.name ,":", primates.content)
    # > title:大雨危険度通知
    
    e.g.:
    for genus in eachelement(primates)
        # Get an attribute value by name.
        if genus.name != "entry"
            println("--- ", genus.name, ": ", genus.content)
        end
        for species in eachelement(genus)
        println(" └-- ", species.name, ": ", species.content, "")
        end
        println("---------------------------------------")
    end
    
        --- title: 高頻度(定時)
        ---------------------------------------
        --- subtitle: JMAXML publishing feed
        ---------------------------------------
        --- updated: 2021-06-03T02:22:06+09:00
        ---------------------------------------
        --- id: http://www.data.jma.go.jp/developer/xml/feed/regular.xml#short_1622654526
        ---------------------------------------
        --- link:
        ---------------------------------------
        --- link:
        ---------------------------------------
        --- link:
        ---------------------------------------
        --- rights:
        <a href="http://www.jma.go.jp/jma/kishou/info/coment.html">利用規約</a>,
        <a href="http://www.jma.go.jp/jma/en/copyright.html">Terms of Use</a>
    
        ---------------------------------------
        └-- title: 大雨危険度通知
        └-- id: http://www.data.jma.go.jp/developer/xml/data/20210602172011_0_VPRN50_010000.xml
        └-- updated: 2021-06-02T17:19:42Z
        └-- author:
            気象庁
    
        └-- link:
        └-- content: 【大雨危険度通知】
        ...
    
  • print href attribute of each Node

    The fetched data above is lost of some attributes like href of <link> etc. If you want to get an attribute (e.g. href), like do genus["href"] you get attributes as a string.

    ...
    <link type="application/xml" href="http://www.data.jma.go.jp/developer/xml/data/20210602171020_0_VPRN50_010000.xml"/>
    ...
    

    do so, you get as below

    primates = root(parsexml("""<link type="application/xml" href="http://www.data.jma.go.jp/developer/xml/data/20210602171020_0_VPRN50_010000.xml"/>"""))
    println(primates["href"])
    
    # > http://www.data.jma.go.jp/developer/xml/data/20210602171020_0_VPRN50_010000.xml
    
    e.g.:
        for genus in eachelement(primates)
            # Get an attribute value by name.
            if genus.name != "entry" && genus.name != "link"
                println("--- ", genus.name, ": ", genus.content)
            elseif genus.name == "link"
                println("--- ", genus.name, ": ", genus["href"])
            end
            for species in eachelement(genus)
                if species.name == "link"
                    println("--- ", species.name, ": ", species["href"])
                else
                    println(" └-- ", species.name, ": ", species.content, "")
                end
            end
            println("---------------------------------------")
        end
    
        ...
        --- id: http://www.data.jma.go.jp/developer/xml/feed/regular.xml#short_1622655067
        ---------------------------------------
        --- link: http://www.jma.go.jp/
        ---------------------------------------
        --- link: http://www.data.jma.go.jp/developer/xml/feed/regular.xml
        ---------------------------------------
        --- link: http://alert-hub.appspot.com/
        ---------------------------------------
        ...
    
        ---------------------------------------
        └-- title: 大雨危険度通知
        └-- id: http://www.data.jma.go.jp/developer/xml/data/20210602173000_0_VPRN50_010000.xml
        └-- updated: 2021-06-02T17:29:38Z
        └-- author:
            気象庁
    
        --- link: http://www.data.jma.go.jp/developer/xml/data/20210602173000_0_VPRN50_010000.xml
        └-- content: 【大雨危険度通知】
        ---------------------------------------
        ...
    

The other function in EzXML, for instance, hasnode(node) that judgment whether the node has a child node and istext(node) that judgment whether the node is type of string. The details is in Reference · EzXML.jl (opens new window).


To wrap it up,

import Pkg;using Pkg
Pkg.add("HTTP");using HTTP
Pkg.add("EzXML");using EzXML

req = HTTP.request("GET", "http://www.data.jma.go.jp/developer/xml/feed/regular_l.xml")
doc = parsexml(String(req.body))
primates = root(doc)

id_name = ""
id_content = ""
println("\n\n\n\n===== XML =====")

for genus in eachelement(primates)
    # Get an attribute value by name.
    if genus.name != "entry" && genus.name != "link"
        println("--- ", genus.name, ": ", genus.content)
    elseif genus.name == "link"
        println("--- ", genus.name, ": ", genus["href"])
    end
    for species in eachelement(genus)
        if species.content != "" && species.name != "id"
            println(" └-- ", species.name, ": ", species.content, "")
        elseif species.name == "id"
            global id_name = deepcopy(species.name)
            global id_content = deepcopy(species.content)
        end
    end
    if id_name != ""
        println(" └-- ", id_name, ": ", id_content, "")
        global id_name = ""
        global id_content = ""
    end
    println("---------------------------------------")
end

output:


===== XML =====
--- title: 高頻度(定時)
---------------------------------------
--- subtitle: JMAXML publishing feed
---------------------------------------
--- updated: 2021-06-03T03:02:07+09:00
---------------------------------------
--- id: http://www.data.jma.go.jp/developer/xml/feed/regular.xml#short_1622656927
---------------------------------------
--- link: http://www.jma.go.jp/
---------------------------------------
--- link: http://www.data.jma.go.jp/developer/xml/feed/regular.xml
---------------------------------------
--- link: http://alert-hub.appspot.com/
---------------------------------------
--- rights:
<a href="http://www.jma.go.jp/jma/kishou/info/coment.html">利用規約</a>,
<a href="http://www.jma.go.jp/jma/en/copyright.html">Terms of Use</a>

---------------------------------------
 └-- title: 大雨危険度通知
 └-- updated: 2021-06-02T17:59:40Z
 └-- author:
      気象庁

 └-- content: 【大雨危険度通知】
 └-- id: http://www.data.jma.go.jp/developer/xml/data/20210602180010_0_VPRN50_010000.xml
  --- title: 高頻度(定時)
  --- subtitle: JMAXML publishing feed
  --- updated: 2021-06-03T03:02:07+09:00
  --- id: http://www.data.jma.go.jp/developer/xml/feed/regular.xml#short_1622656927
---------------------------------------
 └-- title: 大雨危険度通知
 └-- updated: 2021-06-02T17:49:37Z
 └-- author:
      気象庁

 └-- id: http://www.data.jma.go.jp/developer/xml/data/20210602175001_0_VPRN50_010000.xml
  --- title: 高頻度(定時)
  --- subtitle: JMAXML publishing feed
  --- updated: 2021-06-03T03:02:07+09:00
  --- id: http://www.data.jma.go.jp/developer/xml/feed/regular.xml#short_1622656927
---------------------------------------
...

# Reference

JuliaIO/EzXML.jl - github (opens new window) EzXML.jl を作った話 - りんごがでている - hatenablog (opens new window)
PackageCompiler.jl で Plots の呼び出しを高速化する 2020 年 7 月版 - Qiita (opens new window)
PackageCompiler.jl で Makie.jl の呼び出しを速くする - Qiita (opens new window)

Last Updated: 12/6/2021, 7:41:52 PM