package hwp5

module hwp5.filestructure

class hwp5.filestructure.CompressedStorage(wrapped)

decompress streams in the underlying storage

class hwp5.filestructure.Hwp5Compression(wrapped)

handle compressed streams in HWPv5 files

resolve_conversion_for(name)

return a conversion function for the specified storage item

class hwp5.filestructure.Hwp5File(stg)

represents HWPv5 File

Hwp5File(stg)

stg: an instance of Storage

resolve_conversion_for(name)

return a conversion function for the specified storage item

class hwp5.filestructure.Hwp5FileBase(stg)

Base of an Hwp5File.

Hwp5FileBase checks basic validity of an HWP format v5 and provides fileheader property.

Parameters

stg (an instance of storage, OleFileIO or filename) – an OLE2 structured storage.

Raises

InvalidHwp5FileErrorstg is not a valid HWP format v5 document.

resolve_conversion_for(name)

return a conversion function for the specified storage item

hwp5.filestructure.is_hwp5file(filename)

Test whether it is an HWP format v5 file.

module hwp5.recordstream

class hwp5.recordstream.Hwp5File(stg)

Hwp5File for ‘rec’ layer

hwp5.recordstream.group_records_by_toplevel(records, group_as_list=True)

group records by top-level trees and return iterable of the groups

hwp5.recordstream.record_to_json(record, *args, **kwargs)

convert a record to json

module hwp5.binmodel

class hwp5.binmodel.Hwp5File(stg)
class hwp5.binmodel.ModelJsonEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)
default(obj)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
hwp5.binmodel.init_record_parsing_context(base, record)

Initialize a context to parse the given record

the initializations includes followings: - context = dict(base) - context[‘record’] = record - context[‘stream’] = record payload stream

Parameters
  • base – the base context to be shallow-copied into the new one

  • record – to be parsed

Returns

new context

hwp5.binmodel.model_to_json(model, *args, **kwargs)

convert a model to json

hwp5.binmodel.parse_model(context, model)

HWPTAG로 모델 결정 후 기본 파싱

module hwp5.xmlmodel

class hwp5.xmlmodel.Hwp5File(stg)
hwp5.xmlmodel.make_extended_controls_inline(event_prefixed_mac, stack=None)

inline extended-controls into paragraph texts

hwp5.xmlmodel.make_paragraphs_children_of_listheader(event_prefixed_mac, parentmodel=<class 'hwp5.binmodel.tagid56_list_header.ListHeader'>, childmodel=<class 'hwp5.binmodel.tagid50_para_header.Paragraph'>)

make paragraphs children of the listheader

hwp5.xmlmodel.make_texts_linesegmented_and_charshaped(event_prefixed_mac)

lineseg/charshaped text chunks

hwp5.xmlmodel.restructure_tablebody(event_prefixed_mac)

Group table columns in each rows and wrap them with TableRow.

hwp5.xmlmodel.tokenize_text_by_lang(event_prefixed_mac)

Group table columns in each rows and wrap them with TableRow.

hwp5.xmlmodel.wrap_section(event_prefixed_mac, sect_id=None)

wrap a section with SectionDef

module hwp5.storage

hwp5.storage.iter_storage_leafs(stg, basepath='')

iterate every leaf nodes in the storage

stg: an instance of Storage

hwp5.storage.unpack(stg, outbase)

unpack a storage into outbase directory

stg: an instance of Storage outbase: path to a directory in filesystem (should not end with ‘/’)

module hwp5.dataio

class hwp5.dataio.BSTR
basetype

alias of builtins.str

class hwp5.dataio.BYTE
basetype

alias of builtins.int

class hwp5.dataio.DOUBLE
basetype

alias of builtins.float

exception hwp5.dataio.Eof(*args)
class hwp5.dataio.HWPUNIT
basetype

alias of builtins.int

class hwp5.dataio.HWPUNIT16
basetype

alias of builtins.int

class hwp5.dataio.INT16
basetype

alias of builtins.int

class hwp5.dataio.INT32
basetype

alias of builtins.int

class hwp5.dataio.INT8
basetype

alias of builtins.int

exception hwp5.dataio.OutOfData
exception hwp5.dataio.ParseError(*args, **kwargs)
class hwp5.dataio.SHWPUNIT
basetype

alias of builtins.int

class hwp5.dataio.UINT16
basetype

alias of builtins.int

class hwp5.dataio.UINT32
basetype

alias of builtins.int

class hwp5.dataio.UINT8
basetype

alias of builtins.int

class hwp5.dataio.WCHAR
basetype

alias of builtins.int

class hwp5.dataio.WORD
basetype

alias of builtins.int

hwp5.dataio.decode_utf16le_with_hypua(bytes)

decode utf-16le encoded bytes with Hanyang-PUA codes into a unicode string with Hangul Jamo codes

Parameters

bytes – utf-16le encoded bytes with Hanyang-PUA codes

Returns

a unicode string with Hangul Jamo codes

module hwp5.tagids

module hwp5.plat

module hwp5.importhelper

hwp5.importhelper.pkg_resources_filename(pkg_name, path)

the equivalent of pkg_resources.resource_filename()

hwp5.importhelper.pkg_resources_filename_fallback(pkg_name, path)

a fallback implementation of pkg_resources_filename()

module hwp5.treeop

hwp5.treeop.build_subtree(event_prefixed_items)

build a tree from (event, item) stream

Example Scenario:

...
(STARTEVENT, rootitem)          # should be consumed by the caller
--- call build_subtree() ---
(STARTEVENT, child1)            # consumed by build_subtree()
(STARTEVENT, grandchild)        # (same)
(ENDEVENT, grandchild)          # (same)
(ENDEVENT, child1)              # (same)
(STARTEVENT, child2)            # (same)
(ENDEVENT, child2)              # (same)
(ENDEVENT, rootitem)            # same, buildsubtree() returns
--- build_subtree() returns ---
(STARTEVENT, another_root)
...
result will be (rootitem, [(child1, [(grandchild, [])]),

(child2, [])])

hwp5.treeop.prefix_ancestors(event_prefixed_items, root_item=None)

convert iterable of (event, item) into iterable of (ancestors, item)

hwp5.treeop.prefix_ancestors_from_level(level_prefixed_items, root_item=None)

convert iterable of (level, item) into iterable of (ancestors, item)

@param level_prefixed items: iterable of tuple(level, item) @return iterable of tuple(ancestors, item)

hwp5.treeop.prefix_event(level_prefixed_items, root_item=None)

convert iterable of (level, item) into iterable of (event, item)

hwp5.treeop.tree_events(rootitem, childs)

generate tuples of (event, item) from a tree

hwp5.treeop.tree_events_multi(trees)

generate tuples of (event, item) from trees

module hwp5.utils

class hwp5.utils.GeneratorReader(gen)

convert a string generator into file-like reader

def gen():

yield b’hello’ yield b’world’

f = GeneratorReader(gen()) assert ‘hell’ == f.read(4) assert ‘oworld’ == f.read()

class hwp5.utils.GeneratorTextReader(gen)

convert a string generator into file-like reader

def gen():

yield ‘hello’ yield ‘world’

f = GeneratorTextReader(gen()) assert ‘hell’ == f.read(4) assert ‘oworld’ == f.read()

hwp5.utils.generate_json_array(tokens)

generate json array with given tokens

hwp5.utils.unicode_escape(s)

Escape a string.

Parameters

s (unicode) – a string to escape

Returns

escaped string

Return type

unicode

hwp5.utils.unicode_unescape(s)

Unescape a string.

Parameters

s (unicode) – a string to unescape

Returns

unescaped string

Return type

unicode