This blog has been deprecated.
GIS Geek
There are no big problems, there are just a lot of little problems. - Henry Ford
Sunday, December 23, 2012
Using Apache Avro with Python
Abstract:
Instructions are given on how to use the Python implementation of Avro to load multiple schemas, stored in separate files, to build an overall composite schema.
Details:
The following example uses two Avro schema files, where the second uses the first. There is also an example of a Python script, which combines and tests the composite schema by outputting to the console.
First schema file: parameter_types.avsc
Avro's documentation is sparse. This article is intended to help those who are curious to know if the Python implementation of Avro can reuse separately defined schemas. The answer is yes, and a working example was presented above.
Keep in mind that the data used in this example, param_1 and param_2, have key names which match some of Avro's. Obviously these key names were chosen to mimic Avro, not because they were specifically required.
It is important to note that the base schemas that must be "inherited", in this case the schema contained in parameter_types.avsc, must be loaded first.
It would be nice to have a bulk schema loader that could take a set schemas in any order and manage the correct loading process for us.
Additional Notes:
Installing Avro for Python is easy:
References:
Instructions are given on how to use the Python implementation of Avro to load multiple schemas, stored in separate files, to build an overall composite schema.
Details:
The following example uses two Avro schema files, where the second uses the first. There is also an example of a Python script, which combines and tests the composite schema by outputting to the console.
First schema file: parameter_types.avsc
{"namespace": "scriptaxe.parameter",Second schema file: parameter.avsc
"type": "enum",
"name": "types",
"symbols": [
"null",
"boolean",
"int",
"long",
"float",
"double",
"bytes",
"string"
]
}
{"namespace": "scriptaxe",Python file: avro_test.py
"type": "record",
"name": "parameter",
"fields": [
{
"name": "name",
"type": "string"
},{
"name": "description",
"type": ["null", "string"]
},{
"name": "type",
"type": "scriptaxe.parameter.types"
}
]
}
import avro.schema
import json
def main():
"""Start of execution"""
#combine the schemas
known_schemas = avro.schema.Names()
types_schema = LoadAvsc("parameter_types.avsc", known_schemas)
param_schema = LoadAvsc("parameter.avsc", known_schemas)
print json.dumps(param_schema.to_json(avro.schema.Names()), indent=2)
#test the schema works
param_file = open("parameters.avro", "w")
writer = DataFileWriter(param_file, DatumWriter(), param_schema)
param_1 = {"name": "test", "description":"An Avro test.", "type":"int"}
param_2 = {"name": "test", "description":"An Avro test.", "type":"boolean"}
writer.append(param_1)
writer.append(param_2)
writer.close()
reader = DataFileReader(open("parameters.avro", "r"), DatumReader())
for parameter in reader:
print parameter
reader.close()
def LoadAvsc(file_path, names=None):
"""Load avsc file
file_path: path to schema file
names(optional): avro.schema.Names object
"""
file_text = open(file_path).read()
json_data = json.loads(file_text)
schema = avro.schema.make_avsc_object(json_data, names)
return schema
if __name__ == "__main__":Output to console:
main()
{
"type": "record",
"namespace": "scriptaxe",
"name": "parameter",
"fields": [
{
"type": "string",
"name": "name"
}, {
"type": ["null", "string"],
"name": "description"
}, {
"type": {
"symbols": ["null", "boolean", "int", "long", "float", "double", "bytes", "string"],
"namespace": "scriptaxe.parameter",
"type": "enum",
"name": "types"
},
"name": "type"
}
]
}
{u'type': u'int', u'name': u'test', u'description': u'An Avro test.'}Discussion:
{u'type': u'boolean', u'name': u'test', u'description': u'An Avro test.'}
Avro's documentation is sparse. This article is intended to help those who are curious to know if the Python implementation of Avro can reuse separately defined schemas. The answer is yes, and a working example was presented above.
Keep in mind that the data used in this example, param_1 and param_2, have key names which match some of Avro's. Obviously these key names were chosen to mimic Avro, not because they were specifically required.
It is important to note that the base schemas that must be "inherited", in this case the schema contained in parameter_types.avsc, must be loaded first.
It would be nice to have a bulk schema loader that could take a set schemas in any order and manage the correct loading process for us.
Additional Notes:
Installing Avro for Python is easy:
$sudo pip install avro
References:
- Using Apache Avro , Boris Lublinsky on Jan 25, 2011.
- Apache Avro™ 1.7.3 Getting Started (Python)
- Apache Avro™ 1.7.3 Specification
Sunday, November 4, 2012
Ubuntu 12.04 Restore Default Scrollbars
Situation:
Ubuntu 12.04 LTS(Precise)
Problem:
Please restore the default scrollbars.
Solution:
Run the following at command line:
sudo su
Ubuntu 12.04 LTS(Precise)
Problem:
Please restore the default scrollbars.
Solution:
Run the following at command line:
sudo su
echo "export LIBOVERLAY_SCROLLBAR=0" > /etc/X11/Xsession.d/80overlayscrollbars
gnome-session-quit #log out and back in
References:
gnome-session-quit #log out and back in
References:
Error "string indices must be integers" When Deserializing Queryset
Situation:
Attempting to deserialize a django 1.5 queryset
from django.core import serializers
queryset = MyModel.objects.all()
data = serializers.serialize('json', queryset)
#... on another server, data loaded using urllib2:
obj = serializers.deserialize('json', data) # error
Problem:
Django returns the error "string indices must be integers"
Solution:
You are not deserializing what you think you are deserializing. Look at the JSON string and it should be apparent. I had this problem in two cases:
- I had accidentally serialized twice (the JSON string contained escaped quotation marks: \")
- There was an error message instead of the actual object
References:
Thursday, August 9, 2012
Python: Import a Submodule Programmatically
Situation:
In Python 2.7, there is a need to import a sub-package programmatically by name.
Problem:
I've looked at __import__ and importlib but the solution is not readily apparent.
Solution:
References:
In Python 2.7, there is a need to import a sub-package programmatically by name.
Problem:
I've looked at __import__ and importlib but the solution is not readily apparent.
Solution:
import importlib
module_name = '.'.join(('os','path'))
path = importlib.import_module(module_name)
#or
import sys
module_name = '.'.join(('os','path'))
__import__(module_name)
path = sys.modules[module_name]
References:
Subscribe to:
Posts (Atom)