CODE WITH MARTIN

Data Formats

This chapter introduces working with data formats in Python.


Contents


Introduction

As a backend developer, you need to be aware of the most common ways data is formatted as its passed between different systems. A lot of backend systems for example communicate with web-sites and the data format known as JSON is the most popular due to how easy it is to work with in web browsers.

Another common data format is CSV which suits software like Excel or Google Spreadsheets. It's also a very useful format for pulling data out of one database and then sending it to be imported in another database.

And finally, XML, which is still very popular between backend systems talking to other backend systems and more notably, all web sites are written using XML but you just might know it as HTML which is a specific type of XML.

We're going to see how to work with all three of these formats in Python. I suggest creating a folder somewhere where we can place some new Python code files and some other files we'll be creating to support the examples in this chapter. I've created a folder named "BackendData" for example in a folder where I store some other projects and code files.

CSV

CSV stands for Comma-Separated-Values. It is a human readable data format, meaning, if you opened a CSV file (files with the extension '.csv'), you can easily read the values that are stored. You can also create or modify them using a text editor like notepad.

Let's take a look at a CSV file. Imagine someone in a business sent you a spreadsheet containing the names of some employees. They attached a file to an email with the filename 'Employees.csv'. So you save the file on to your computer. If you were to open that file using a text editor such as VS Code, TextEdit, or Notepad for example, you would see the following:

1,Martin,Software Developer,2020-05-01
2,Emily,Senior Software Developer,2017-09-22
3,Chris,Project Manager,2019-11-15

What we're looking at are 3 rows of data. Each row of data is on its own line. Each row also has 4 columns. We know there are 4 columns because for every comma we see in a line, it means there is a new column.

You won't know what the columns are exactly, but we might be able to take a guess that the first column is an ID, the second is a name, the third is a job title, and the fourth is a date, which we might assume is an employment start date given that our file is named 'Employees.csv'.

Sometimes (its optional), CSV files are formatted so that the first row contains the names of the columns. If our file had a column name row, it might look like this:

ID,Name,Job Title,Start Date
1,Martin,Software Developer,2020-05-01
2,Emily,Senior Software Developer,2017-09-22
3,Chris,Project Manager,2019-11-15

And finally, depending on who created the CSV or if it was exported from some program like Microsoft Excel or Google Spreadsheets, values in the file can be enclosed with quotation marks, like this:

"ID","Name","Job Title","Start Date"
1,"Martin","Software Developer","2020-05-01"
2,"Emily","Senior Software Developer","2017-09-22"
3,"Chris","Project Manager","2019-11-15"

Notice how the ID numbers don't have quotation marks around them? By using quotation marks in a CSV, you can indicate which values are purely text (with quotation marks) and which values are numbers (no quotation marks). This helps spreadsheet programs know how to format cells when they import CSV files.

Loading CSV Files

Python has a native way to read CSV files, taking care of all the different quirky ways of how the file might be formatted.

Let's create our own CSV file to work with. In the folder where you plan on creating your code files for this chapter (let's call this the work folder), create a new file in a text editor program like VSCode, TextEdit or Notepad. Place the following text in to the file:

"ID","Name","Job Title","Start Date"
1,"Martin","Software Developer","2020-05-01"
2,"Emily","Senior Software Developer","2017-09-22"
3,"Chris","Project Manager","2019-11-15"

Now save the file with the name "Employees.csv", ensuring that the extension of the file when you save is ".csv" and not something like ".txt" or anything else.

Tip (Windows): If you're on Windows, you can use Notepad to quickly create a file. When you save, make sure you change the "Save as type:" option to "All Files (*.*)" and enter the filename "Employees.csv". VSCode can save custom files too if you prefer.

Tip (MacOS/Linux): If you're using VS Code, it's probably easier to just create the file from the File menu and then "New File...". Saving the file will then allow you to select the "Save as type:" option, set this to "All Files (*.*)" to make sure the file gets the correct extension. Perhaps an easier method, if you like the terminal, is typing "touch Employees.csv" in a terminal window that is currently pointed to your work folder. This will instantly create a file with that name, which you can then simply open in VS Code.

Now let's see how we can load and read this information in Python. Create a new Python code file named 'csvload.py' in your work folder and place the following code in it:

import csv, os

path = os.path.dirname(os.path.abspath(__file__))
print("Loading Employees.csv from folder", path)

file = open(path + "/Employees.csv")
csv_reader = csv.reader(file, quotechar='"', delimiter=',')

for row in csv_reader:
    print(row[0], row[1], row[2], row[3])

file.close()

Save and run the code file. You should get the following terminal output:

Loading Employees.csv from folder c:\Code
ID Name Job Title Start Date
1 Martin Software Developer 2020-05-01
2 Emily Senior Software Developer 2017-09-22
3 Chris Project Manager 2019-11-15

On line 3 in the Python code, we use a module built in to Python that gets the folder location where our Python script is currently located. We do this so we can build the full file location of the 'Employees.csv' file. On line 4, we print the path so you can see in the terminal where the program is trying to load your file from.

Line 6, we open the file and store the object returned from the 'open' function in the variable 'file'.

Line 7, we now use the CSV module built in to Python to process the file. We pass our 'file' variable to the 'reader' function which returns an object containing our file data which we store in the variable named 'csv_reader'.

Line 9 shows how we can use a for loop on the 'csv_reader' variable to iterate over each row in the file. Then finally, line 10 we're accessing the 'row' variable by an index, which corresponds to a column index of the row, to then print the value of each column to the terminal.

When working with files, you must always close them when you're finished working with them. Line 12, we call the 'close' function on our 'file' variable to close the file.

So you can see how we have accessed each column and row that is contained in the CSV file. Note that the first row for this file we created are the column names, so if you're processing this data for some kind of business process, be aware that your first row in the for loop is just the column names.

We're going to look at an alternate method of reading the CSV file using dictionaries, which makes it easier referencing the columns in each row. Create a new Python code file named 'csvloaddict.py' in your work folder and place the following code in it:

import csv, os

path = os.path.dirname(os.path.abspath(__file__))
print("Loading Employees.csv from folder", path)

file = open(path + "/Employees.csv")
csv_reader = csv.DictReader(file, quotechar='"', delimiter=',')

for row in csv_reader:
    print(row["ID"], row["Name"], row["Job Title"], row["Start Date"])

file.close()

If you save and run the code, you will see this output in the terminal:

Loading Employees.csv from folder c:\Code
1 Martin Software Developer 2020-05-01
2 Emily Senior Software Developer 2017-09-22
3 Chris Project Manager 2019-11-15

The key differences in the Python code start on line 7. Instead of using 'reader' to read the data, we now use the function 'DictReader'. Line 10 now shows how we're accessing the 'row' variable like a dictionary, using the column names as the key to the value in the dictionary item.

The difference in this output compared with the previous way, is we don't see the column names. The dictionary reader method nicely consumes the column row for us so that we now only get the actual row data.

Writing CSV Files

We've seen how easy it is to write a CSV file manually in a text editor, but what about doing it with Python. There's a couple of ways to do this. We'll start with the easiest method that doesn't write the column names.

Create a new Python code file named 'csvwrite.py' in your work folder and place the following code in it:

import csv, os

path = os.path.dirname(os.path.abspath(__file__))
print("Writing MyEmployees.csv in folder", path)

file = open(path + "/MyEmployees.csv", mode='w', newline='')
csv_writer = csv.writer(file)

csv_writer.writerow([10, "Martin", "Software Developer"])
csv_writer.writerow([20, "Emily", "Project Manager"])
csv_writer.writerow([30, "Chris", "Business Analyst"])

file.close()

If you save and run this code file, you will see a new file appear in your work folder named 'MyEmployees.csv'. Open this file with a text editor and you'll see the rows that we wrote in code.

You'll notice how on line 6 we now have some extra parameters in the 'open' function. These options ensure the file is created if it doesn't exist and that new lines are formatted correctly.

On line 7, we're now using the 'writer' function with our 'file' variable. Finally on lines 9 to 11 is how we write the specific rows of data to the file, and with all files, we make sure we close the file on line 13.

Now let's see how to write the same CSV file using a dictionary which will generate column names for us. Create a new Python code file named 'csvwritedict.py' in your work folder and place the following code in it:

import csv, os

path = os.path.dirname(os.path.abspath(__file__))
print("Writing MyEmployees.csv in folder", path)

column_names = ["ID", "Name", "Job Title"]

file = open(path + "/MyEmployees.csv", mode='w', newline='')
csv_writer = csv.DictWriter(file, fieldnames=column_names)

csv_writer.writeheader()
csv_writer.writerow({"ID": 10, "Name": "Martin", "Job Title": "Software Developer"})
csv_writer.writerow({"ID": 20, "Name": "Emily", "Job Title": "Project Manager"})
csv_writer.writerow({"ID": 30, "Name": "Chris", "Job Title": "Business Analyst"})

file.close()

Save and run this and the 'MyEmployees.csv' file will now update with the column names as the first row. You can open the file and check this in a text editor again.

You'll notice this method is a little more verbose to code, but makes things a little safer as everything must be specified using column names. On line 6, we have defined a list containing the column names that we pass to the 'DictWriter' function on line 9.

On line 11, we have a new 'writeheader' function call that ensures the column names we defined are written to the file as the first row.

Lines 12 to 14 are using a Python dictionary object to define the rows. Each dictionary item key is a name of one of the columns, and the item value for that key is the value we want to write for that column.

You've now see the 2 methods for loading CSV files and 2 methods for writing CSV files. You should have no problem now working with this kind of data if you ever work on projects that deal with CSV data.

JSON

JSON stands for JavaScript-Object-Notation and is a very popular data format for passing data from web-pages up to servers and back down. This is because it's super easy for JavaScript running in web pages to consume this kind of data and produce it. Most all languages have ways of dealing JSON too.

The biggest difference between JSON and CSV that we just saw, is that JSON allows us to specify data in a hierarchy. CSV for example mimics a flat table design that simply has rows that specify all the columns down each row. JSON, is a tree like structure and can take on all sorts of shapes and vary at any point, much like branches on a tree.

Like CSV, JSON is also human readable and can be created in text editors.

If this is your experience with a tree like structure, it might seem pretty confusing at first. Once you come to understand it, it will form the base of your thinking for many different software problems and other data structures that we might encounter in the future.

What I think is the best way to begin to learn JSON is to tell a story about how I want to describe something and represent that in JSON data. So first, let's look at the most basic JSON to understand its formatting. Let's begin with this piece of JSON data:

{ }

An opening curly bracket and a closing curly bracket. JSON specializes in describing objects (this is why it has Object Notation in its name). The beginning of an object is the opening curly bracket '{' and when we finish describing an object, we close it off with a closing curly bracket '}'. Right now, there is nothing inside this object, so nothing is being described at all - it's empty.

To start describing an object, we add properties to it in the form of a key, which has a name, and a value that is assigned to that key. Let's pretend that we are going to describe a house. Let's add a property to the object that tells us the house number:

{
  "number": 20
}

The first property we've added to our object has a key named "number", followed by the colon character ':'. This colon character just sits in-between a properties key and value. Straight after the colon, we have the value for the key, which is the number 20. Let's add another property to our object for the street name of the house:

{
  "number": 20,
  "street": "Milky Way"
}

Notice how we introduced a comma ',' at end of the first property value on line 2. This comma denotes that we want to add another property to the object. Just like our first property, our 2nd property has a key and a value separated by a colon character. For this value however, our value is enclosed in double quotation marks. This means our value is a string, or text. Numbers don't need the double quotation marks such as the number on line 2.

Now let's introduce another property that will be a list of tags that describe the features of the property:

{
  "number": 20,
  "street": "Milky Way",
  "features": [ "New Build", "Off Road Parking", "Open Plan" ]
}

As before, we've added a comma at the end of line 3 to define a new property on line 4. Our new property has a key name of "features" and the value is an array. We know its an array because we have the opening square bracket '[' which describes the start of the list. The items inside the array are a set of strings (text in double quotation marks) all separated by a comma, and the end of the array is marked with a closing square bracket ']'.

Great, so the value of a key can not only be a number, or a piece of text, but it can be an array of strings or even numbers, or a mix of numbers and strings.

Now comes the final feature of JSON. In our house, I want to describe the rooms, such as the type, the size, and items in the room. We know that there are many rooms in the house. We could say that a room is an object too. I'm now going to add some rooms to our house JSON object:

{
  "number": 20,
  "street": "Milky Way",
  "features": [ "New Build", "Off Road Parking", "Open Plan" ],
  "rooms": [
    {
      "type": "Kitchen",
      "width": 10,
      "length": 15,
      "items": [ "Sink", "Oven", "Fridge" ]
    },
    {
      "type": "Hallway",
      "width": 2,
      "length": 5,
      "items": [ "Coat Hooks", "Mirror", "Clock" ]
    },
    {
      "type": "Bedroom",
      "width": 10,
      "length": 10,
      "items": [ "Bed", "Side Table", "Wardrobe" ]
    }
  ]
}

As with any other property we added, we've inserted our new rooms property with the key 'rooms' on line 5, and it's value has a square bracket which we know to be an array which is just a list of things. However, the most key thing to note here is that, the array items are more objects. We know this because on line 6, the curly bracket means we are describing an object which in turn has properties. We see the properties of this new object on lines 7 to 10. On line 11, we see the closed curly bracket, the end of the new object, and a comma which then allows us to describe yet another object starting on line 12. These room objects are items in the array that we started on line 5.

After defining the room objects for the rooms array, we can see the closing square bracket on line 24 which completes the array that was started on line 5. And the ending curly bracket on line 25 is the end of describing the house object, or what we sometimes call the root object.

The power shown in this example is that objects can contain objects and the nesting of things containing more objects and arrays of items can go on to be as complex and deep as you like.

So the main bracket pairs you need to keep an eye out for when reading JSON data are curly brackets '{ }' which denote objects and square brackets '[ ]' denote arrays.

One more thing to mention, JSON data can be formatted without any whitespace because all that is important is the order of special characters. The lines and how the indenting appears in the JSON text has no effect at all. We add new lines and indenting to make it easier to read.

Sometimes JSON is stripped of this whitespace to save memory when transferring the JSON data. For example, this structure is the exact same structure as above, but without any whitespace at all:

{"number":20,"street":"MilkyWay","features":["NewBuild","OffRoadParking","OpenPlan"],"rooms":[{"type":"Kitchen","width":10,"length":15,"items":["Sink","Oven","Fridge"]},{"type":"Hallway","width":2,"length":5,"items":["CoatHooks","Mirror","Clock"]},{"type":"Bedroom","width":10,"length":10,"items":["Bed","SideTable","Wardrobe"]}]}

But as you see, as a human creating JSON data, you really wouldn't want to write it all on a single line that. You can always strip whitespace from strings in code before sending it around and after you've wrote or generated it.

Let's now jump in to some Python code to see how we work with JSON in code.

Loading JSON

Let's create a file in your work folder called 'House.json' and paste the final house JSON text we created above in to the file, and then save the file. The whitespace or non-whitespace versions will both work the same in case you wondered which to use.

Python doesn't need any 3rd party packages to work with JSON. Let's see how we can load that JSON file and display the data from it. Create a new Python code file named 'jsonload.py' in your work folder and place the following code in it:

import json, os

path = os.path.dirname(os.path.abspath(__file__))
print("Reading House.json in folder", path)

file = open(path + "/House.json")

data = json.load(file)

print(data)

file.close()

Save and run. Provided you created the JSON file ok, you should see the terminal output displaying the JSON that looks just as it does in the file, but with not as many new lines. Great, we've loaded the JSON data in 1 line of code on line 8.

But how do we get to things like the list of rooms, or query the items in the rooms? Let's try an example that navigates the dictionary that the line 8 'load' function returned.

Create a new Python code file named 'jsonloaddetail.py' in your work folder and place the following code in it:

import json, os

path = os.path.dirname(os.path.abspath(__file__))
print("Reading House.json in folder", path)

file = open(path + "/House.json")

data = json.load(file)

print("The house number is:", data["number"])
print("The house street is:", data["street"])

print("The features of the house are:")

for feature in data["features"]:
    print(feature)

print("The house has", len(data["rooms"]), "rooms.")

for room in data["rooms"]:
    print("Room type", room["type"])
    print("Room width is", room["width"])
    print("Room length is", room["length"])
    print("The items in the room are:")
    for item in room["items"]:
        print(item)

    print("")

file.close()

Save and run the code file. You will now see a detail description of the JSON data in the terminal output.

If you've worked with dictionaries before now (covered in the foundation chapters), nothing should really surprise you here. From line 10 right down to line 28, we're simply accessing the values in the dictionary object stored in the variable 'data' using the keys that have been parsed from the JSON data and used in construction of the dictionary object.

In our JSON, we used arrays to represent the house features and the room items, so on lines 20 and 25, we're using a for loop to iterate over the lists that are contained in the dictionary values for those array keys. Because we have items within each room, the second for loop on line 25 is nested within the first loop on line 20.

Note: We've seen the function 'json.load' to load JSON from a file but you can also use a function named 'json.loads' that accepts a string parameter instead of a file. This is handy for when you have some JSON that didn't come from a file source and you have it stored in a string variable.

Writing JSON

There's something amazing about writing JSON from Python. The Python language uses the exact same characters '{..}' for describing objects in a key-value type form with dictionaries. It also uses the same list type formatting with square brackets '[..]' like you find in JSON. In fact, it's so close, that you can paste your JSON directly in to source code without any errors.

Let's define a Python dictionary using a JSON looking layout and write this object to a file. Create a new Python code file named 'jsonwrite.py' in your work folder and place the following code in it:

import json, os

path = os.path.dirname(os.path.abspath(__file__))
print("Writing HouseOutput.json in folder", path)

data = {
  "number": 20,
  "street": "Milky Way",
  "features": [ "New Build", "Off Road Parking", "Open Plan" ],
  "rooms": [
    {
      "type": "Kitchen",
      "width": 10,
      "length": 15,
      "items": [ "Sink", "Oven", "Fridge" ]
    },
    {
      "type": "Hallway",
      "width": 2,
      "length": 5,
      "items": [ "Coat Hooks", "Mirror", "Clock" ]
    },
    {
      "type": "Bedroom",
      "width": 10,
      "length": 10,
      "items": [ "Bed", "Side Table", "Wardrobe" ]
    }
  ]
}

file = open(path + "/HouseOutput.json", mode='w')

json.dump(data, file, indent=2)

file.close()

If you run the code and now inspect the new file named 'HouseOutput.json' in your work folder, you'll see how the JSON in the file looks identical to the actual building of the variable named 'data' that we started on line 6 - amazing.

Any data that lives inside of a Python dictionary variable can be written to a file using the 'json.dump' function as seen on line 34. The 'indent=2' optional parameter nicely formats the JSON in the output file for us. If you omit this parameter, the JSON will save with no new lines making it more compact and smaller in size, though still perfectly valid JSON.

Just like the loading of JSON could be done with a file or a string variable, there is an alternate function named 'json.dumps' that allows you to generate the JSON from a dictionary object and store the result in a string. The function 'json.dumps' returns the resulting string containing the JSON.

Create a new Python code file named 'jsonwritestring.py' in your work folder and place the following code in it:

import json

data = {
  "number": 20,
  "street": "Milky Way",
  "features": [ "New Build", "Off Road Parking", "Open Plan" ]
}

str = json.dumps(data, indent=2)

print("The result JSON is:")
print(str)

Save, and run. The terminal output for this showing the print statement on line 12 outputting the resulting JSON string to the terminal is:

The result JSON is:
{
  "number": 20,
  "street": "Milky Way",
  "features": [
    "New Build",
    "Off Road Parking",
    "Open Plan"
  ]
}

The advantages of having the result stored in a string instead of a file, is you are now able to send this perhaps to a different output other than file, such as to a connection to another backend system.

XML

Last but not least by all means, is the XML data format. XML stands for Extensible Markup Language. This is the most popular of all formats as this format is used for building web pages (HTML). Like JSON, it is a tree like structure, meaning we can represent things in this data format in many different shapes and have relations between different pieces of data.

Let's begin with the most simple example, working towards building up a house data model like we did in the JSON section:

<house> </house>

What we have here is a single node. The name of this node is 'house'. Nodes are defined using the '< >' symbols. Every node starts with its name defined in the '< >' symbols and must be closed by another set of '</ >' symbols, noting the forward slash before the name denoting closing of a node.

There is another way of describing a node that is also closed, by placing the forward slash after the name. Like this:

<house/>

Note how in this format, there is no need to repeat the node name in another of '< >' symbols.

We refer to XML data as an XML Document. Every XML document must begin with a node. We call this the root node of the document.

Nodes can contain a value. The value of a node is the text that appears between the open and closing of the node, such as:

<house>This is the value</house>

So in this example, the node named 'house' has the value 'This is the value'.

Nodes can also contain attributes. Attributes are a key-value type piece of data. Attributes have a name and a value. Let's add some attributes to our house node:

<house number="10" street="Milky Way"></house>

Notice how our two attributes named 'number' and 'street' are inside the opening block of the house node and separated with a space. The value of the attributes are also contained within double quotation marks. Attributes always are placed in the opening of a node, never the closing part.

Now comes the true power of XML. Nodes can contain other nodes. You can nest nodes as deep as you need, meaning, a node can contain a node, which also contains a node, and that contains another node etc.

Let's add our house features to our current XML document:

<house number="10" street="Milky Way">
  <features>
    <feature>New Build</feature>
    <feature>Off Road Parking</feature>
    <feature>Open Plan</feature>
  </features>
</house>

We are now nesting nodes within nodes. Our new 'features' node on line 2 is inside the 'house' node. Another way to describe this more commonly in XML is to say that the 'features' node is a child node of the 'house' node. We can also say that the 'house' node is the parent of the 'features' node. So nodes can be thought of as a parent child like structure.

Notice on line 6 how the features node is closed before we can close its parent node house on line 7. We also use some indenting to help see the hierarchy of the nodes in the document. Like JSON, the indenting has no effect and an XML document with no new lines or indents is still a valid XML document.

We've also added the specific features of the house as a number of nodes inside the 'features' node on lines 3 to 5. These nodes have values specifying the text for each feature.

Now lets complete our XML document adding the rooms and the room items information like we saw in the JSON example:

<house number="10" street="Milky Way">
  <features>
    <feature>New Build</feature>
    <feature>Off Road Parking</feature>
    <feature>Open Plan</feature>
  </features>
  <rooms>
    <room type="Kitchen" width="10" length="15">
      <items>
        <item>Sink</item>
        <item>Oven</item>
        <item>Fridge</item>
      </items>
    </room>
    <room type="Hallway" width="2" length="5">
      <items>
        <item>Coat Hooks</item>
        <item>Mirror</item>
        <item>Clock</item>
      </items>
    </room>
    <room type="Bedroom" width="10" length="10">
      <items>
        <item>Bed</item>
        <item>Side Table</item>
        <item>Wardrobe</item>
      </items>
    </room>
  </rooms>
</house>

There's nothing new here. We're just defining more nodes and our 'room' nodes have attributes. We've put the room items as child nodes inside the room nodes.

Understanding the relationship of XML nodes will greatly help with you learning front-end web development later on, if you have plans on going in that direction in the future.

Loading XML

Loading XML is perhaps the most trickiest of the data formats we've seen so far. This is due to how richer XML can be compared to JSON and CSV.

First, let's create an XML file that we can load. Create a new file in your work folder and name it 'House.xml'. Copy the following XML text in to the file and save it:

<house number="10" street="Milky Way">
  <features>
    <feature>New Build</feature>
    <feature>Off Road Parking</feature>
    <feature>Open Plan</feature>
  </features>
  <rooms>
    <room type="Kitchen" width="10" length="15">
      <items>
        <item>Sink</item>
        <item>Oven</item>
        <item>Fridge</item>
      </items>
    </room>
    <room type="Hallway" width="2" length="5">
      <items>
        <item>Coat Hooks</item>
        <item>Mirror</item>
        <item>Clock</item>
      </items>
    </room>
    <room type="Bedroom" width="10" length="10">
      <items>
        <item>Bed</item>
        <item>Side Table</item>
        <item>Wardrobe</item>
      </items>
    </room>
  </rooms>
</house>

Now create a new code file in your work folder named 'xmlload.py'. In this file, place the following code:

import xml.etree.ElementTree as ET, os

path = os.path.dirname(os.path.abspath(__file__))
print("Reading House.xml in folder", path)

file = open(path + "/House.xml")

tree = ET.parse(file)

root = tree.getroot()

print("Child nodes under the root node of the XML document are:")

for child in root:
    print(child.tag, child.attrib)

features = root[0]

print("Child nodes under the features node are:")
for feature in features:
    print(feature.tag, feature.text)

rooms = root[1]

print("Child nodes under the rooms node are:")
for room in rooms:
    print(room.tag, room.attrib)
    print(room.attrib["type"])
    print(room.attrib["width"])
    print(room.attrib["length"])

    print("Room items:")
    for item in room[0]:
        print(item.text)

file.close()

If you save and run this code file, you should see a lot of information printed in the terminal. Let's explain what's going on in the Python code step by step.

Starting with line 1, you will notice we import a module from a native Python package named 'xml.etreet.ElementTree' and we shorten this module name to 'ET'.

Line 6, we're simply opening the XML file we created and line 8, we're passing the open file object variable 'file' to the 'ET.parse' function. This line performs the loading of the XML in the House.xml file.

Line 10, we begin to start navigating the objects of the XML document. The 'tree' variable was returned from the 'ET.parse' function, this represents the XML document. The 'tree.getroot' function returns an object that represents the root node in the XML document, which in our XML is '<house>...</house>'.

The 'root' variable acts like a list, meaning we can use it in a for loop as we do on line 14. Each item in the for loop represents a child node found under the root node. Looking at our XML document, the child nodes under "<house>" are "<features>" and "<rooms>". So you will see these printed out to the terminal.

Line 15 shows how inside the for loop, we print out both the tag name of the node and any attributes, which there are none, but this serves as an example of how you would see attributes if there were some on this node.

Line 17, we retrieve the array item at index 0 from the 'root' variable. This represents the first child node under the root, which is the '<features>' node in the XML document.

Just like we did with the root node, we can iterate this features node where we will find the specific 'feature' child XML nodes. So lines 20 and 21 show the for loop over the features node variable and we simply print the found child nodes tag (the node name) and the text which is the value found in between the open node and close node of the 'feature' nodes.

Line 23 grabs the rooms node from the root which is at position 1 in the root list and begins to iterate on line 26 over the specific room XML nodes. Lines 28 to 30 show how we can access specific attributes on each of the room nodes by an attribute name.

And finally because each of the room nodes in the XML document have another set of nodes in the 'items' node, we also iterate over those on lines 33 and 34 printing the item node text values to the terminal.

As with all data loading code we've seen, we don't just have to load XML from a file, which is what we did on line 8 using 'ET.parse'. There is also the function 'ET.fromstring'. This takes a single string parameter that represents an XML document stored in a string. Especially helpful for when you're dealing with XML that has come from a different source other than a file.

That concludes loading and navigating the XML document. Now let's see how to build an XML document using Python.

Writing XML

Writing XML documents starts with building the tree of nodes. We'll show a simple example of how to build a small XML document.

Create a new Python code file named 'xmlwrite.py' in your work folder and place the following code in it:

import xml.etree.ElementTree as ET, os

path = os.path.dirname(os.path.abspath(__file__))
print("Writing MyHouse.xml in folder", path)

file = open(path + "/MyHouse.xml", mode='w')

root = ET.Element("house", { "number": "10", "street": "Milky Way"})
features = ET.SubElement(root, "features")

feature = ET.SubElement(features, "feature")
feature.text = "New Build"
feature = ET.SubElement(features, "feature")
feature.text = "Off Road Parking"
feature = ET.SubElement(features, "feature")
feature.text = "Open Plan"

tree = ET.ElementTree(root)
ET.indent(tree)

tree.write(file, encoding="unicode")

file.close()

Save the file and run it. You should see a new file in your work folder named 'MyHouse.xml'. Let's explain the code of how that file got generated.

Line 1 to 6 is all about opening the file we want to output to, as we've seen in previous data format write methods.

Line 8 is where our document building begins. We're using the function 'ET.Element' to create our first single node name "house". We're also passing a dictionary to this function in the second parameter which represents our attributes for the node. The node is stored in the variable 'root'.

To create a child node under the root node, we use the 'ET.SubElement' function on line 9. The first parameter to this function is the node that we want as the parent and the second parameter is simply the name. If you wanted, you can also specify a third parameter to think function passing a dictionary of attributes if you want to add attributes to this new child node.

Lines 11 to 16 demonstrate how we use the same 'ET.SubElement' function to add further child nodes under the features node. We also set the 'text' property of these nodes as seen on lines 12, 14, and 16.

Line 18 creates a tree object which is simply an objet that lets us work with the tree such as formatting and writing to files. We use this tree objet on line 19 to apply indenting rules so that our XML output is nicely formatted.

Finally, the function 'tree.write' on line 21 performs the output of the XML document from the objects we created in to the file that we opened (explaining the encoding parameter is out of scope of this article for now). We also close the file on line 23 as we should with all files that we open.

As you can see, it's quite straight forward building an XML structure piece by piece, adding attributes and setting the text of nodes.

Final Notes

CSV, JSON, and XML are absolutely the most common data formats you will find yourself working with in a backend environment and even in front-end development. As there is a lot of detail covered in this chapter, this page might serve as a reference for you for now. Of course, feel free to come back for a refresher when you begin to use these formats in a real project.