package hwp5

module hwp5.filestructure

class hwp5.filestructure.CompressedStorage(wrapped)

decompress streams in the underlying storage

class hwp5.filestructure.Hwp5Compression(wrapped)

handle compressed streams in HWPv5 files

class hwp5.filestructure.Hwp5File(stg)

represents HWPv5 File

Hwp5File(stg)

stg: an instance of Storage

class hwp5.filestructure.Hwp5FileBase(stg)

Base of an Hwp5File.

Hwp5FileBase checks basic validity of an HWP format v5 and provides fileheader property.

Parameters:stg (an instance of storage, OleFileIO or filename) – an OLE2 structured storage.
Raises:InvalidHwp5FileErrorstg is not a valid HWP format v5 document.
hwp5.filestructure.is_hwp5file(filename)

Test whether it is an HWP format v5 file.

module hwp5.recordstream

class hwp5.recordstream.Hwp5File(stg)

Hwp5File for ‘rec’ layer

hwp5.recordstream.group_records_by_toplevel(records, group_as_list=True)

group records by top-level trees and return iterable of the groups

hwp5.recordstream.record_to_json(record, *args, **kwargs)

convert a record to json

module hwp5.binmodel

hwp5.binmodel.init_record_parsing_context(base, record)

Initialize a context to parse the given record

the initializations includes followings: - context = dict(base) - context[‘record’] = record - context[‘stream’] = record payload stream

Parameters:
  • base – the base context to be shallow-copied into the new one
  • record – to be parsed
Returns:

new context

hwp5.binmodel.model_to_json(model, *args, **kwargs)

convert a model to json

hwp5.binmodel.parse_model(context, model)

HWPTAG로 모델 결정 후 기본 파싱

module hwp5.xmlmodel

hwp5.xmlmodel.make_extended_controls_inline(event_prefixed_mac, stack=None)

inline extended-controls into paragraph texts

hwp5.xmlmodel.make_paragraphs_children_of_listheader(event_prefixed_mac, parentmodel=<class 'hwp5.binmodel.tagid56_list_header.ListHeader'>, childmodel=<class 'hwp5.binmodel.tagid50_para_header.Paragraph'>)

make paragraphs children of the listheader

hwp5.xmlmodel.make_texts_linesegmented_and_charshaped(event_prefixed_mac)

lineseg/charshaped text chunks

hwp5.xmlmodel.restructure_tablebody(event_prefixed_mac)

Group table columns in each rows and wrap them with TableRow.

hwp5.xmlmodel.tokenize_text_by_lang(event_prefixed_mac)

Group table columns in each rows and wrap them with TableRow.

hwp5.xmlmodel.wrap_section(event_prefixed_mac, sect_id=None)

wrap a section with SectionDef

module hwp5.storage

hwp5.storage.iter_storage_leafs(stg, basepath=u'')

iterate every leaf nodes in the storage

stg: an instance of Storage

hwp5.storage.unpack(stg, outbase)

unpack a storage into outbase directory

stg: an instance of Storage outbase: path to a directory in filesystem (should not end with ‘/’)

module hwp5.dataio

class hwp5.dataio.BSTR
basetype

alias of __builtin__.unicode

class hwp5.dataio.BYTE
basetype

alias of __builtin__.int

class hwp5.dataio.DOUBLE
basetype

alias of __builtin__.float

exception hwp5.dataio.Eof(*args)
class hwp5.dataio.HWPUNIT
basetype

alias of __builtin__.long

class hwp5.dataio.HWPUNIT16
basetype

alias of __builtin__.int

class hwp5.dataio.INT16
basetype

alias of __builtin__.int

class hwp5.dataio.INT32
basetype

alias of __builtin__.int

class hwp5.dataio.INT8
basetype

alias of __builtin__.int

exception hwp5.dataio.OutOfData
exception hwp5.dataio.ParseError(*args, **kwargs)
class hwp5.dataio.SHWPUNIT
basetype

alias of __builtin__.int

class hwp5.dataio.UINT16
basetype

alias of __builtin__.int

class hwp5.dataio.UINT32
basetype

alias of __builtin__.long

class hwp5.dataio.UINT8
basetype

alias of __builtin__.int

class hwp5.dataio.WCHAR
basetype

alias of __builtin__.int

class hwp5.dataio.WORD
basetype

alias of __builtin__.int

hwp5.dataio.decode_utf16le_with_hypua(bytes)

decode utf-16le encoded bytes with Hanyang-PUA codes into a unicode string with Hangul Jamo codes

Parameters:bytes – utf-16le encoded bytes with Hanyang-PUA codes
Returns:a unicode string with Hangul Jamo codes

module hwp5.tagids

module hwp5.plat

module hwp5.importhelper

hwp5.importhelper.pkg_resources_filename(pkg_name, path)

the equivalent of pkg_resources.resource_filename()

hwp5.importhelper.pkg_resources_filename_fallback(pkg_name, path)

a fallback implementation of pkg_resources_filename()

module hwp5.treeop

hwp5.treeop.build_subtree(event_prefixed_items)

build a tree from (event, item) stream

Example Scenario:

...
(STARTEVENT, rootitem)          # should be consumed by the caller
--- call build_subtree() ---
(STARTEVENT, child1)            # consumed by build_subtree()
(STARTEVENT, grandchild)        # (same)
(ENDEVENT, grandchild)          # (same)
(ENDEVENT, child1)              # (same)
(STARTEVENT, child2)            # (same)
(ENDEVENT, child2)              # (same)
(ENDEVENT, rootitem)            # same, buildsubtree() returns
--- build_subtree() returns ---
(STARTEVENT, another_root)
...
result will be (rootitem, [(child1, [(grandchild, [])]),
(child2, [])])
hwp5.treeop.prefix_ancestors(event_prefixed_items, root_item=None)

convert iterable of (event, item) into iterable of (ancestors, item)

hwp5.treeop.prefix_ancestors_from_level(level_prefixed_items, root_item=None)

convert iterable of (level, item) into iterable of (ancestors, item)

@param level_prefixed items: iterable of tuple(level, item) @return iterable of tuple(ancestors, item)

hwp5.treeop.prefix_event(level_prefixed_items, root_item=None)

convert iterable of (level, item) into iterable of (event, item)

hwp5.treeop.tree_events(rootitem, childs)

generate tuples of (event, item) from a tree

hwp5.treeop.tree_events_multi(trees)

generate tuples of (event, item) from trees

module hwp5.utils

class hwp5.utils.GeneratorReader(gen)

convert a string generator into file-like reader

def gen():
yield b’hello’ yield b’world’

f = GeneratorReader(gen()) assert ‘hell’ == f.read(4) assert ‘oworld’ == f.read()

hwp5.utils.generate_json_array(tokens)

generate json array with given tokens