vo.table is a Python package to read and write VOTable files into Numpy record arrays.
vo.table supports the VOTable Format Definition Version 1.1 and Version 1.2. Some flexibility is provided to support the 1.0 draft version and other non-standard usage in the wild. To support these cases, set the keyword argument pedantic to False when parsing.
Note
Each warning and VOTABLE-specific exception emitted has a number and is documented in more detail in Warnings and Exceptions.
Output always conforms to the 1.1 or 1.2 spec, depending on the input.
To read in a VOTable file, pass a file path to vo.table.parse():
from vo.table import parse
votable = parse("votable.xml")
votable is a vo.tree.VOTableFile object, which can be used to retrieve and manipulate the data and save it back out to disk.
VOTable files are made up of nested RESOURCE elements, each of which may contain one or more TABLE elements. The TABLE elements contain the arrays of data.
To get at the TABLE elements, one can write a loop over the resources in the VOTABLE file:
for resource in votable.resources:
for table in resource.tables:
# ... do something with the table ...
pass
However, if the nested structure of the resources is not important, one can use iter_tables() to return a flat list of all tables:
for table in votable.iter_tables():
# ... do something with the table ...
pass
Finally, if there is expected to be only one table in the file, it might be simplest to just use get_first_table():
table = votable.get_first_table()
Even easier, there is a convenience method to parse a VOTable file and return the first table all in one step:
from vo.table import parse_single_table
table = parse_single_table("votable.xml")
From a Table object, one can get the data itself in the array member variable:
data = table.array
This data is a Numpy record array. The columns get their names from both the ID and name attributes of the FIELD elements in the VOTABLE file. For example, suppose we had a FIELD specified as follows:
<FIELD ID="Dec" name="dec_targ" datatype="char" ucd="POS_EQ_DEC_MAIN"
unit="deg">
<DESCRIPTION>
representing the ICRS declination of the center of the image.
</DESCRIPTION>
</FIELD>
This column of data can be extracted from the record array using:
>>> table.array['dec_targ']
array([17.15153360566, 17.15153360566, 17.15153360566, 17.1516686826,
17.1516686826, 17.1516686826, 17.1536197136, 17.1536197136,
17.1536197136, 17.15375479055, 17.15375479055, 17.15375479055,
17.1553884541, 17.15539736932, 17.15539752176,
17.25736014763,
# ...
17.2765703], dtype=object)
or equivalently:
>>> table.array['Dec']
array([17.15153360566, 17.15153360566, 17.15153360566, 17.1516686826,
17.1516686826, 17.1516686826, 17.1536197136, 17.1536197136,
17.1536197136, 17.15375479055, 17.15375479055, 17.15375479055,
17.1553884541, 17.15539736932, 17.15539752176,
17.25736014763,
# ...
17.2765703], dtype=object)
Many VOTABLE files in the wild do not conform to the VOTABLE specification. If reading one of these files causes exceptions, you may turn off pedantic mode in vo.table by passing pedantic=False to the parse() or parse_single_table() functions:
from vo.table import parse
votable = parse("votable.xml", pedantic=False)
Note, however, that it is good practice to report these errors to the author of the application that generated the VOTABLE file to bring the file into compliance with the specification.
Even with pedantic turned off, many warnings may still be omitted. These warnings are all of the type VOTableSpecWarning and can be turned off using the standard Python warnings module.
It is also possible to build a new table, define some field datatypes and populate it with data:
from vo.tree import VOTableFile, Resource, Table, Field
# Create a new VOTable file...
votable = VOTableFile()
# ...with one resource...
resource = Resource()
votable.resources.append(resource)
# ... with one table
table = Table(votable)
resource.tables.append(table)
# Define some fields
table.fields.extend([
Field(votable, ID="filename", datatype="char"),
Field(votable, ID="matrix", datatype="double", arraysize="2x2")])
# Now, use those field definitions to create the numpy record arrays, with
# the given number of rows
table.create_arrays(2)
# Now table.array can be filled with data
table.array[0] = ('test1.xml', [[1, 0], [0, 1]])
table.array[1] = ('test2.xml', [[0.5, 0.3], [0.2, 0.1]])
# Now write the whole thing to a file.
# Note, we have to use the top-level votable file object
votable.to_xml("new_votable.xml")
Any value in the table may be “missing”. vo.table stores a parallel array in each Table instance called mask to keep track of missing values. This array is False anywhere the value is missing.
Note
In the future, the array and mask members will likely be combined into a single masked record array. There are implementation bugs in current versions of Numpy that prevent this at the moment.
The datatype specified by a FIELD element is mapped to a Numpy type according to the following table:
VOTABLE type Numpy type boolean b1 bit b1 unsignedByte u1 char (variable length) O - In Python 2.x, a str object; in 3.x, a bytes object. char (fixed length) S unicodeChar (variable length) O - In Python 2.x, a unicode object, in utf-16; in 3.x a str object unicodeChar (fixed length) U short i2 int i4 long i8 float f4 double f8 floatComplex c8 doubleComplex c16
If the field is a fixed size array, the data is stored as a Numpy fixed-size array.
If the field is a variable size array (that is arraysize contains a ‘*’), the cell will contain a Python list of Numpy values. Each value may be either an array or scalar depending on the arraysize specifier.
To look up more information about a field in a table, one can use the vo.tree.Table.get_field_or_param_by_id() method, which returns the Field object with the given ID. For example:
>>> field = table.get_field_or_param_by_id('Dec')
>>> field.datatype
'char'
>>> field.unit
'deg'
Note
Field descriptors should not be mutated – they will have no effect on the record arrays storing the data. This shortcoming will be addressed in a future version of vo.table.
To save a VOTable file, simply call the vo.tree.VOTableFile.to_xml() method. It accepts either a string or unicode path, or a Python file-like object:
votable.to_xml('output.xml')
There are currently two data storage formats supported by vo.table. The TABLEDATA format is XML-based and stores values as strings representing numbers. The BINARY format is more compact, and stores numbers in base64-encoded binary. The storage format can be set on a per-table basis using the vo.tree.Table.format attribute, or globally using the vo.tree.VOTableFile.set_all_tables_format() method:
votable.get_first_table().format = 'binary'
votable.set_all_tables_format('binary')
votable.to_xml('binary.xml')