hwp5proc: HWPv5 processor

Do various operations on HWPv5 files.

usage: hwp5proc [-h] [--loglevel LOGLEVEL] [--logfile LOGFILE]
                {version,header,summaryinfo,ls,cat,unpack,records,models,find,xml,rawunz,diststream}
                ...

Named Arguments

--loglevel

Set log level.

--logfile

Set log file.

Subcommands

version

Print the file format version of .hwp files.

Print the file format version of <hwp5file>.

usage: hwp5proc version [-h] <hwp5file>

Positional Arguments

<hwp5file>

.hwp file to analyze

summaryinfo

Print summary informations of .hwp files.

Print the summary information of <hwp5file>.

usage: hwp5proc summaryinfo [-h] <hwp5file>

Positional Arguments

<hwp5file>

.hwp file to analyze

ls

List streams in .hwp files.

List streams in the <hwp5file>.

usage: hwp5proc ls [-h] [--vstreams | --ole] <hwp5file>

Positional Arguments

<hwp5file>

.hwp file to analyze

Named Arguments

--vstreams

Process with virtual streams (i.e. parsed/converted form of real streams)

Default: False

--ole

Treat <hwp5file> as an OLE Compound File. As a result, some streams will be presented as-is. (i.e. not decompressed)

Default: False

cat

Extract out internal streams of .hwp files

Extract out the specified stream in the <hwp5file> to the standard output.

usage: hwp5proc cat [-h] [--vstreams | --ole] <hwp5file> <stream>

Positional Arguments

<hwp5file>

.hwp file to analyze

<stream>

Internal path of a stream to extract

Named Arguments

--vstreams

Process with virtual streams (i.e. parsed/converted form of real streams)

Default: False

--ole

Treat <hwp5file> as an OLE Compound File. As a result, some streams will be presented as-is. (i.e. not decompressed)

Default: False

Example:

$ hwp5proc cat samples/sample-5017.hwp BinData/BIN0002.jpg | file -

$ hwp5proc cat samples/sample-5017.hwp BinData/BIN0002.jpg > BIN0002.jpg

$ hwp5proc cat samples/sample-5017.hwp PrvText | iconv -f utf-16le -t utf-8

$ hwp5proc cat --vstreams samples/sample-5017.hwp PrvText.utf8

$ hwp5proc cat --vstreams samples/sample-5017.hwp FileHeader.txt

ccl: 0
cert_drm: 0
cert_encrypted: 0
cert_signature_extra: 0
cert_signed: 0
compressed: 1
distributable: 0
drm: 0
history: 0
password: 0
script: 0
signature: HWP Document File
version: 5.0.1.7
xmltemplate_storage: 0

unpack

Extract out internal streams of .hwp files into a directory.

Extract out streams in the specified <hwp5file> to a directory.

usage: hwp5proc unpack [-h] [--vstreams | --ole] <hwp5file> [<out-directory>]

Positional Arguments

<hwp5file>

.hwp file to analyze

<out-directory>

Output directory

Named Arguments

--vstreams

Process with virtual streams (i.e. parsed/converted form of real streams)

Default: False

--ole

Treat <hwp5file> as an OLE Compound File. As a result, some streams will be presented as-is. (i.e. not decompressed)

Default: False

Example:

$ hwp5proc unpack samples/sample-5017.hwp
$ ls sample-5017

Example:

$ hwp5proc unpack --vstreams samples/sample-5017.hwp
$ cat sample-5017/PrvText.utf8

records

Print the record structure of .hwp file record streams.

Print the record structure of the specified stream.

usage: hwp5proc records [-h]
                        [--simple | --json | --raw | --raw-header | --raw-payload]
                        [--range <range> | --treegroup <treegroup>]
                        [<hwp5file>] [<record-stream>]

Positional Arguments

<hwp5file>

.hwp file to analyze

<record-stream>

Record-structured internal streams. (e.g. DocInfo, BodyText/*)

Named Arguments

--simple

Print records as simple tree

Default: False

--json

Print records as json

Default: False

--raw

Print records as is

Default: False

--raw-header

Print record headers as is

Default: False

--raw-payload

Print record payloads as is

Default: False

--range

Specifies the range of the records. N-M means “from the record N to M-1 (excluding M)” N means just the record N

--treegroup

Specifies the N-th subtree of the record structure.

Example:

$ hwp5proc records samples/sample-5017.hwp DocInfo

Example:

$ hwp5proc records samples/sample-5017.hwp DocInfo --range=0-2

If neither <hwp5file> nor <record-stream> is specified, the record stream is read from the standard input with an assumption that the input is in the format version specified by -V option.

Example:

$ hwp5proc records --raw samples/sample-5017.hwp DocInfo --range=0-2 > tmp.rec
$ hwp5proc records < tmp.rec

models

Print parsed binary models of .hwp file record streams.

Print parsed binary models in the specified <record-stream>.

usage: hwp5proc models [-h] [--file-format-version <version>]
                       [--simple | --json | --format <format> | --events]
                       [--treegroup <treegroup> | --seqno <treegroup>]
                       [<hwp5file>] [<record-stream>]

Positional Arguments

<hwp5file>

.hwp file to analyze

<record-stream>

Record-structured internal streams. (e.g. DocInfo, BodyText/*)

Named Arguments

--file-format-version, -V

Specifies HWPv5 file format version of the standard input stream

--simple

Print records as simple tree

Default: False

--json

Print records as json

Default: False

--format

Print records formatted

--events

Print records as events

Default: False

--treegroup

Specifies the N-th subtree of the record structure.

--seqno

Print a model of <seqno>-th record

Example:

$ hwp5proc models samples/sample-5017.hwp DocInfo
$ hwp5proc models samples/sample-5017.hwp BodyText/Section0

$ hwp5proc models samples/sample-5017.hwp docinfo
$ hwp5proc models samples/sample-5017.hwp bodytext/0

Example:

$ hwp5proc models --simple samples/sample-5017.hwp bodytext/0
$ hwp5proc models --format='%(level)s %(tagname)s\\n' \\
        samples/sample-5017.hwp bodytext/0

Example:

$ hwp5proc models --simple --treegroup=1 samples/sample-5017.hwp bodytext/0
$ hwp5proc models --simple --seqno=4 samples/sample-5017.hwp bodytext/0

If neither <hwp5file> nor <record-stream> is specified, the record stream is read from the standard input with an assumption that the input is in the format version specified by -V option.

Example:

$ hwp5proc cat samples/sample-5017.hwp BodyText/Section0 > Section0.bin
$ hwp5proc models -V 5.0.1.7 < Section0.bin

find

Find record models with specified predicates.

Find record models with specified predicates.

usage: hwp5proc find [-h] [--from-stdin]
                     [--model <model-name> | --tag <hwptag>] [--incomplete]
                     [--format <format>] [--dump]
                     [<hwp5files> [<hwp5files> ...]]

Positional Arguments

<hwp5files>

.hwp files to analyze

Named Arguments

--from-stdin

get filenames from stdin

Default: False

--model

filter with record model name

--tag

filter with record HWPTAG

--incomplete

filter with incompletely parsed content

Default: False

--format

record output format

--dump

dump record

Default: False

Example: Find paragraphs:

$ hwp5proc find --model=Paragraph samples/*.hwp
$ hwp5proc find --tag=HWPTAG_PARA_TEXT samples/*.hwp
$ hwp5proc find --tag=66 samples/*.hwp

Example: Find and dump records of HWPTAG_LIST_HEADER which is parsed incompletely:

$ hwp5proc find --tag=HWPTAG_LIST_HEADER --incomplete --dump samples/*.hwp

xml

Transform .hwp files into an XML.

Transform <hwp5file> into an XML.

usage: hwp5proc xml [-h] [--embedbin] [--no-xml-decl] [--output <file>]
                    [--format <format>] [--no-validate-wellformed]
                    <hwp5file>

Positional Arguments

<hwp5file>

.hwp file to analyze

Named Arguments

--embedbin

Embed BinData/* streams in the output XML.

Default: False

--no-xml-decl

Do not output <?xml … ?> XML declaration.

Default: False

--output

Output filename.

--format

“flat”, “nested” (default: “nested”)

--no-validate-wellformed

Do not validate well-formedness of output.

Default: False

Example:

$ hwp5proc xml samples/sample-5017.hwp > sample-5017.xml
$ xmllint --format sample-5017.xml

With --embedbin option, you can embed base64-encoded BinData/* files in the output XML.

Example:

$ hwp5proc xml --embedbin samples/sample-5017.hwp > sample-5017.xml
$ xmllint --format sample-5017.xml

rawunz

Deflate an headerless zlib-compressed stream.

Deflate an headerless zlib-compressed stream

usage: hwp5proc rawunz [-h]

diststream

Decode a distribute document stream.

Decode a distribute document stream.

usage: hwp5proc diststream [-h] [--sha1 | --key] [--raw]

Named Arguments

--sha1

Print SHA-1 value for decryption.

Default: False

--key

Print decrypted key.

Default: False

--raw

Print raw binary objects as is.

Default: False