package hwp5¶
module hwp5.filestructure¶
-
class
hwp5.filestructure.CompressedStorage(wrapped)¶ decompress streams in the underlying storage
-
class
hwp5.filestructure.Hwp5Compression(wrapped)¶ handle compressed streams in HWPv5 files
-
resolve_conversion_for(name)¶ return a conversion function for the specified storage item
-
-
class
hwp5.filestructure.Hwp5File(stg)¶ represents HWPv5 File
Hwp5File(stg)
stg: an instance of Storage
-
resolve_conversion_for(name)¶ return a conversion function for the specified storage item
-
-
class
hwp5.filestructure.Hwp5FileBase(stg)¶ Base of an Hwp5File.
Hwp5FileBase checks basic validity of an HWP format v5 and provides fileheader property.
- Parameters
stg (an instance of storage, OleFileIO or filename) – an OLE2 structured storage.
- Raises
InvalidHwp5FileError – stg is not a valid HWP format v5 document.
-
resolve_conversion_for(name)¶ return a conversion function for the specified storage item
-
hwp5.filestructure.is_hwp5file(filename)¶ Test whether it is an HWP format v5 file.
module hwp5.recordstream¶
-
class
hwp5.recordstream.Hwp5File(stg)¶ Hwp5File for ‘rec’ layer
-
hwp5.recordstream.group_records_by_toplevel(records, group_as_list=True)¶ group records by top-level trees and return iterable of the groups
-
hwp5.recordstream.record_to_json(record, *args, **kwargs)¶ convert a record to json
module hwp5.binmodel¶
-
class
hwp5.binmodel.Hwp5File(stg)¶
-
class
hwp5.binmodel.ModelJsonEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)¶ -
default(obj)¶ Implement this method in a subclass such that it returns a serializable object for
o, or calls the base implementation (to raise aTypeError).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
-
-
hwp5.binmodel.init_record_parsing_context(base, record)¶ Initialize a context to parse the given record
the initializations includes followings: - context = dict(base) - context[‘record’] = record - context[‘stream’] = record payload stream
- Parameters
base – the base context to be shallow-copied into the new one
record – to be parsed
- Returns
new context
-
hwp5.binmodel.model_to_json(model, *args, **kwargs)¶ convert a model to json
-
hwp5.binmodel.parse_model(context, model)¶ HWPTAG로 모델 결정 후 기본 파싱
module hwp5.xmlmodel¶
-
class
hwp5.xmlmodel.Hwp5File(stg)¶
-
hwp5.xmlmodel.make_extended_controls_inline(event_prefixed_mac, stack=None)¶ inline extended-controls into paragraph texts
-
hwp5.xmlmodel.make_paragraphs_children_of_listheader(event_prefixed_mac, parentmodel=<class 'hwp5.binmodel.tagid56_list_header.ListHeader'>, childmodel=<class 'hwp5.binmodel.tagid50_para_header.Paragraph'>)¶ make paragraphs children of the listheader
-
hwp5.xmlmodel.make_texts_linesegmented_and_charshaped(event_prefixed_mac)¶ lineseg/charshaped text chunks
-
hwp5.xmlmodel.restructure_tablebody(event_prefixed_mac)¶ Group table columns in each rows and wrap them with TableRow.
-
hwp5.xmlmodel.tokenize_text_by_lang(event_prefixed_mac)¶ Group table columns in each rows and wrap them with TableRow.
-
hwp5.xmlmodel.wrap_section(event_prefixed_mac, sect_id=None)¶ wrap a section with SectionDef
module hwp5.xmlformat¶
module hwp5.storage¶
-
hwp5.storage.iter_storage_leafs(stg, basepath='')¶ iterate every leaf nodes in the storage
stg: an instance of Storage
-
hwp5.storage.unpack(stg, outbase)¶ unpack a storage into outbase directory
stg: an instance of Storage outbase: path to a directory in filesystem (should not end with ‘/’)
module hwp5.dataio¶
-
exception
hwp5.dataio.Eof(*args)¶
-
exception
hwp5.dataio.OutOfData¶
-
exception
hwp5.dataio.ParseError(*args, **kwargs)¶
-
hwp5.dataio.decode_utf16le_with_hypua(bytes)¶ decode utf-16le encoded bytes with Hanyang-PUA codes into a unicode string with Hangul Jamo codes
- Parameters
bytes – utf-16le encoded bytes with Hanyang-PUA codes
- Returns
a unicode string with Hangul Jamo codes
module hwp5.tagids¶
module hwp5.importhelper¶
-
hwp5.importhelper.pkg_resources_filename(pkg_name, path)¶ the equivalent of pkg_resources.resource_filename()
-
hwp5.importhelper.pkg_resources_filename_fallback(pkg_name, path)¶ a fallback implementation of pkg_resources_filename()
module hwp5.treeop¶
-
hwp5.treeop.build_subtree(event_prefixed_items)¶ build a tree from (event, item) stream
Example Scenario:
... (STARTEVENT, rootitem) # should be consumed by the caller --- call build_subtree() --- (STARTEVENT, child1) # consumed by build_subtree() (STARTEVENT, grandchild) # (same) (ENDEVENT, grandchild) # (same) (ENDEVENT, child1) # (same) (STARTEVENT, child2) # (same) (ENDEVENT, child2) # (same) (ENDEVENT, rootitem) # same, buildsubtree() returns --- build_subtree() returns --- (STARTEVENT, another_root) ...
- result will be (rootitem, [(child1, [(grandchild, [])]),
(child2, [])])
-
hwp5.treeop.prefix_ancestors(event_prefixed_items, root_item=None)¶ convert iterable of (event, item) into iterable of (ancestors, item)
-
hwp5.treeop.prefix_ancestors_from_level(level_prefixed_items, root_item=None)¶ convert iterable of (level, item) into iterable of (ancestors, item)
@param level_prefixed items: iterable of tuple(level, item) @return iterable of tuple(ancestors, item)
-
hwp5.treeop.prefix_event(level_prefixed_items, root_item=None)¶ convert iterable of (level, item) into iterable of (event, item)
-
hwp5.treeop.tree_events(rootitem, childs)¶ generate tuples of (event, item) from a tree
-
hwp5.treeop.tree_events_multi(trees)¶ generate tuples of (event, item) from trees
module hwp5.utils¶
-
class
hwp5.utils.GeneratorReader(gen)¶ convert a string generator into file-like reader
- def gen():
yield b’hello’ yield b’world’
f = GeneratorReader(gen()) assert ‘hell’ == f.read(4) assert ‘oworld’ == f.read()
-
class
hwp5.utils.GeneratorTextReader(gen)¶ convert a string generator into file-like reader
- def gen():
yield ‘hello’ yield ‘world’
f = GeneratorTextReader(gen()) assert ‘hell’ == f.read(4) assert ‘oworld’ == f.read()
-
hwp5.utils.generate_json_array(tokens)¶ generate json array with given tokens
-
hwp5.utils.unicode_escape(s)¶ Escape a string.
- Parameters
s (unicode) – a string to escape
- Returns
escaped string
- Return type
unicode
-
hwp5.utils.unicode_unescape(s)¶ Unescape a string.
- Parameters
s (unicode) – a string to unescape
- Returns
unescaped string
- Return type
unicode