-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
adding of two books and all metadata about them
- Loading branch information
Showing
475 changed files
with
226,182 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# About _L’Amour tirannique_ | ||
|
||
## Original | ||
_L'Amour tirannique_ of Georges de Scudéry (book published in 1639) (https://gallica.bnf.fr/ark:/12148/btv1b8607044w) | ||
|
||
## Images | ||
First page of the sample used for training: | ||
|
||
Image size: 4267 × 5513 pixels | ||
|
||
Color model: RGB | ||
|
||
## Files | ||
``png`` contains all the images in png format on which are based all the files in the others directories. They are from Gallica (cf. link above). | ||
|
||
``pageXmlTranskribus`` contains all the PAGE-XML files which are the output files of Transkribus. Those files come from E-ditiones' repository [OCR17plus](https://github.com/e-ditiones/OCR17plus). They were pre-prepared by Simon Gabay. | ||
|
||
``pageXmlTranskribusCorrected`` contains all the PAGE-XML files which are the output files of Transkribus and which was transformed to fit and be used in [eScriptorium](http://traces6.paris.inria.fr/) (link to the python and xsl scripts used [here](https://github.com/Heresta/BAO_Stage_DH_ENS_2021/tree/main/CorrectionPageXMLeScriptorium)). | ||
|
||
``alto4eScriptorium`` contains all the ALTO (version 4) xml files which are the output files of [eScriptorium](http://traces6.paris.inria.fr/). Those files were cleaned and prepared. On one hand, all zones were reshaped and renamed depending on an ontology under development called [SegmOnto](https://github.com/SegmOnto) ([link to the zones' denomination list](https://github.com/SegmOnto/examples/tree/main/zones)). We try to do things as simple as possible. But it still raises questions : we gathered them [here](https://github.com/Heresta/BAO_Stage_DH_ENS_2021/tree/main/problemesSegmentation) (french documentation). On the other hand, only some lines were reshaped, but they all were renamed also depending on [SegmOnto](https://github.com/SegmOnto) ([link to the lines' denomination list](https://github.com/SegmOnto/examples/tree/main/lines)). | ||
|
||
## About files' segmentation | ||
|
||
### About zones: | ||
|
||
Title: 2 (1.22%) | ||
|
||
Main: 31 (18.9%) | ||
|
||
Damage: 32 (19.51%) | ||
|
||
Decoration: 10 (6.1%) | ||
|
||
DropCapital: 7 (4.27%) | ||
|
||
Margin: 11 (6.71%) | ||
|
||
Numbering: 29 (17.68%) | ||
|
||
RunningTitle: 28 (17.07%) | ||
|
||
Signatures: 13 (7.93%) | ||
|
||
Stamp: 1 (0.61%) | ||
|
||
### About lines: | ||
|
||
Default: 878 (98.21%) | ||
|
||
DropCapitalLine: 15 (1.68%) | ||
|
||
Rubric: 1 (0.11%) |
260 changes: 260 additions & 0 deletions
260
...1b8607044w_corrected/alto4eScriptorium/Scudery1639_Amour_btv1b8607044w_corrected_0009.xml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,260 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xmlns="http://www.loc.gov/standards/alto/ns-v4#" | ||
xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-2.xsd"> | ||
<Description> | ||
<MeasurementUnit>pixel</MeasurementUnit> | ||
<sourceImageInformation> | ||
<fileName>Scudery1639_Amour_btv1b8607044w_corrected_0009.png</fileName> | ||
|
||
</sourceImageInformation> | ||
</Description> | ||
|
||
<Tags> | ||
<OtherTag ID="BT1" LABEL="Title" DESCRIPTION="block type Title"/><OtherTag ID="BT2" LABEL="Main" DESCRIPTION="block type Main"/><OtherTag ID="BT687" LABEL="Damage" DESCRIPTION="block type Damage"/><OtherTag ID="BT688" LABEL="Decoration" DESCRIPTION="block type Decoration"/><OtherTag ID="BT689" LABEL="DropCapital" DESCRIPTION="block type DropCapital"/><OtherTag ID="BT690" LABEL="Figure" DESCRIPTION="block type Figure"/><OtherTag ID="BT691" LABEL="Margin" DESCRIPTION="block type Margin"/><OtherTag ID="BT692" LABEL="MusicNotation" DESCRIPTION="block type MusicNotation"/><OtherTag ID="BT693" LABEL="Numbering" DESCRIPTION="block type Numbering"/><OtherTag ID="BT694" LABEL="RunningTitle" DESCRIPTION="block type RunningTitle"/><OtherTag ID="BT695" LABEL="Seal" DESCRIPTION="block type Seal"/><OtherTag ID="BT696" LABEL="Signatures" DESCRIPTION="block type Signatures"/><OtherTag ID="BT697" LABEL="Stamp" DESCRIPTION="block type Stamp"/><OtherTag ID="BT698" LABEL="Table" DESCRIPTION="block type Table"/> | ||
<OtherTag ID="LT248" LABEL="Default" DESCRIPTION="line type Default"/><OtherTag ID="LT249" LABEL="DropCapitalLine" DESCRIPTION="line type DropCapitalLine"/><OtherTag ID="LT250" LABEL="Interlinear" DESCRIPTION="line type Interlinear"/><OtherTag ID="LT251" LABEL="MusicLine" DESCRIPTION="line type MusicLine"/><OtherTag ID="LT252" LABEL="Rubric" DESCRIPTION="line type Rubric"/> | ||
</Tags> | ||
|
||
<Layout> | ||
<Page WIDTH="4267" | ||
HEIGHT="5513" | ||
PHYSICAL_IMG_NR="0" | ||
ID="eSc_dummypage_"> | ||
<PrintSpace HPOS="0" | ||
VPOS="0" | ||
WIDTH="4267" | ||
HEIGHT="5513"> | ||
|
||
<TextBlock HPOS="348" | ||
VPOS="216" | ||
WIDTH="3200" | ||
HEIGHT="1810" | ||
ID="eSc_textblock_93a4aced" | ||
TAGREFS="BT1"> | ||
<Shape><Polygon POINTS="348 216 348 2026 3548 2026 3548 216"/></Shape> | ||
|
||
|
||
<TextLine ID="tl_1" | ||
TAGREFS="LT248" | ||
BASELINE="438 678 3409 704" | ||
HPOS="438" | ||
VPOS="222" | ||
WIDTH="2971" | ||
HEIGHT="530"> | ||
<Shape><Polygon POINTS="438 222 3409 222 3409 752 438 752"/></Shape> | ||
<String CONTENT="L'AMOVR" | ||
HPOS="438" | ||
VPOS="222" | ||
WIDTH="2971" | ||
HEIGHT="530"></String> | ||
</TextLine> | ||
|
||
|
||
|
||
<TextLine ID="tl_2" | ||
TAGREFS="LT248" | ||
BASELINE="840 1033 2978 1016" | ||
HPOS="840" | ||
VPOS="821" | ||
WIDTH="2141" | ||
HEIGHT="267"> | ||
<Shape><Polygon POINTS="840 821 2981 821 2981 1088 840 1088"/></Shape> | ||
<String CONTENT="TIRANNIQVE," | ||
HPOS="840" | ||
VPOS="821" | ||
WIDTH="2141" | ||
HEIGHT="267"></String> | ||
</TextLine> | ||
|
||
|
||
|
||
<TextLine ID="tl_3" | ||
TAGREFS="LT248" | ||
BASELINE="412 1432 875 1430 1547 1432 3339 1474 3400 1396" | ||
HPOS="412" | ||
VPOS="1151" | ||
WIDTH="2990" | ||
HEIGHT="355"> | ||
<Shape><Polygon POINTS="412 1153 3402 1151 3402 1504 412 1506"/></Shape> | ||
<String CONTENT="TRAGI-COMEDIE." | ||
HPOS="412" | ||
VPOS="1151" | ||
WIDTH="2990" | ||
HEIGHT="355"></String> | ||
</TextLine> | ||
|
||
|
||
|
||
<TextLine ID="tl_4" | ||
TAGREFS="LT248" | ||
BASELINE="1720 1677 2082 1690" | ||
HPOS="1720" | ||
VPOS="1576" | ||
WIDTH="363" | ||
HEIGHT="114"> | ||
<Shape><Polygon POINTS="1720 1576 2083 1576 2083 1690 1720 1690"/></Shape> | ||
<String CONTENT="PAR" | ||
HPOS="1720" | ||
VPOS="1576" | ||
WIDTH="363" | ||
HEIGHT="114"></String> | ||
</TextLine> | ||
|
||
|
||
|
||
<TextLine ID="tl_5" | ||
TAGREFS="LT248" | ||
BASELINE="682 1891 1669 1898 2005 1900 2961 1922 3122 1927" | ||
HPOS="682" | ||
VPOS="1743" | ||
WIDTH="2443" | ||
HEIGHT="212"> | ||
<Shape><Polygon POINTS="682 1743 3125 1743 3125 1955 682 1955"/></Shape> | ||
<String CONTENT="MONSIEVR DE SCVDERY." | ||
HPOS="682" | ||
VPOS="1743" | ||
WIDTH="2443" | ||
HEIGHT="212"></String> | ||
</TextLine> | ||
|
||
|
||
</TextBlock> | ||
|
||
<TextBlock HPOS="324" | ||
VPOS="3931" | ||
WIDTH="3200" | ||
HEIGHT="1259" | ||
ID="eSc_textblock_b8f3a00a" | ||
TAGREFS="BT1"> | ||
<Shape><Polygon POINTS="324 3931 324 5178 3512 5190 3524 3967"/></Shape> | ||
|
||
|
||
<TextLine ID="line_1563976965674_867" | ||
TAGREFS="LT248" | ||
BASELINE="1150 4074 2507 4129" | ||
HPOS="1153" | ||
VPOS="3946" | ||
WIDTH="1359" | ||
HEIGHT="203"> | ||
<Shape><Polygon POINTS="1155 3946 2512 4001 2510 4149 1153 4094"/></Shape> | ||
<String CONTENT="A PARIS," | ||
HPOS="1153" | ||
VPOS="3946" | ||
WIDTH="1359" | ||
HEIGHT="203"></String> | ||
</TextLine> | ||
|
||
|
||
|
||
<TextLine ID="line_1563976988123_876" | ||
TAGREFS="LT248" | ||
BASELINE="354 4281 3399 4391" | ||
HPOS="352" | ||
VPOS="4169" | ||
WIDTH="3051" | ||
HEIGHT="253"> | ||
<Shape><Polygon POINTS="358 4169 3403 4279 3397 4422 352 4312"/></Shape> | ||
<String CONTENT="Chez AVGVSTIN COVRBE', Imprimeur & Libraire de" | ||
HPOS="352" | ||
VPOS="4169" | ||
WIDTH="3051" | ||
HEIGHT="253"></String> | ||
</TextLine> | ||
|
||
|
||
|
||
<TextLine ID="tl_6" | ||
TAGREFS="LT248" | ||
BASELINE="827 4439 1308 4457 1592 4477 1739 4485 2042 4522 2295 4501 2399 4500 2707 4512 2935 4518" | ||
HPOS="825" | ||
VPOS="4344" | ||
WIDTH="2111" | ||
HEIGHT="218"> | ||
<Shape><Polygon POINTS="825 4344 2936 4344 2936 4562 825 4562"/></Shape> | ||
<String CONTENT="Monſieur Frere du Roy, dans la petite Sale" | ||
HPOS="825" | ||
VPOS="4344" | ||
WIDTH="2111" | ||
HEIGHT="218"></String> | ||
</TextLine> | ||
|
||
|
||
|
||
<TextLine ID="tl_7" | ||
TAGREFS="LT248" | ||
BASELINE="1354 4612 1480 4622 1842 4664 1916 4638 2019 4641 2406 4648" | ||
HPOS="1353" | ||
VPOS="4511" | ||
WIDTH="1055" | ||
HEIGHT="169"> | ||
<Shape><Polygon POINTS="1353 4511 2408 4511 2408 4680 1353 4680"/></Shape> | ||
<String CONTENT="du Palais, à la Palme." | ||
HPOS="1353" | ||
VPOS="4511" | ||
WIDTH="1055" | ||
HEIGHT="169"></String> | ||
</TextLine> | ||
|
||
|
||
|
||
<TextLine ID="tl_8" | ||
TAGREFS="LT248" | ||
BASELINE="1126 4907 1274 4916 1524 4927 1676 4935 2493 4954" | ||
HPOS="1125" | ||
VPOS="4808" | ||
WIDTH="1371" | ||
HEIGHT="162"> | ||
<Shape><Polygon POINTS="1125 4808 2496 4808 2496 4970 1125 4970"/></Shape> | ||
<String CONTENT="M. D C. XXXIX." | ||
HPOS="1125" | ||
VPOS="4808" | ||
WIDTH="1371" | ||
HEIGHT="162"></String> | ||
</TextLine> | ||
|
||
|
||
|
||
<TextLine ID="tl_9" | ||
TAGREFS="LT248" | ||
BASELINE="594 5019 1080 5051 1272 5056 1492 5070 1656 5075 1988 5084 2132 5128 2259 5096 2631 5110 3113 5124" | ||
HPOS="593" | ||
VPOS="4926" | ||
WIDTH="2523" | ||
HEIGHT="230"> | ||
<Shape><Polygon POINTS="593 4926 3116 4926 3116 5156 593 5156"/></Shape> | ||
<String CONTENT="AVEC PRIVILEGE DV ROY." | ||
HPOS="593" | ||
VPOS="4926" | ||
WIDTH="2523" | ||
HEIGHT="230"></String> | ||
</TextLine> | ||
|
||
|
||
</TextBlock> | ||
|
||
<TextBlock HPOS="2925" | ||
VPOS="3632" | ||
WIDTH="707" | ||
HEIGHT="683" | ||
ID="eSc_textblock_2ad6a31d" | ||
TAGREFS="BT697"> | ||
<Shape><Polygon POINTS="3572 3644 3632 4279 2925 4315 2937 3632"/></Shape> | ||
|
||
</TextBlock> | ||
|
||
<TextBlock HPOS="611" | ||
VPOS="2050" | ||
WIDTH="2493" | ||
HEIGHT="1798" | ||
ID="eSc_textblock_abea09f5" | ||
TAGREFS="BT688"> | ||
<Shape><Polygon POINTS="635 2050 611 3728 3044 3848 3104 2110"/></Shape> | ||
|
||
</TextBlock> | ||
|
||
|
||
|
||
</PrintSpace> | ||
</Page> | ||
</Layout> | ||
</alto> |
Oops, something went wrong.