Skip to content

Commit

Permalink
adding of two books and all metadata about them
Browse files Browse the repository at this point in the history
  • Loading branch information
Heresta committed Jul 5, 2021
1 parent 8b2a6f9 commit e5a0536
Show file tree
Hide file tree
Showing 475 changed files with 226,182 additions and 24 deletions.
52 changes: 52 additions & 0 deletions Data/Scudery1639_Amour_btv1b8607044w_corrected/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# About _L’Amour tirannique_

## Original
_L'Amour tirannique_ of Georges de Scudéry (book published in 1639) (https://gallica.bnf.fr/ark:/12148/btv1b8607044w)

## Images
First page of the sample used for training:

Image size: 4267 × 5513 pixels

Color model: RGB

## Files
``png`` contains all the images in png format on which are based all the files in the others directories. They are from Gallica (cf. link above).

``pageXmlTranskribus`` contains all the PAGE-XML files which are the output files of Transkribus. Those files come from E-ditiones' repository [OCR17plus](https://github.com/e-ditiones/OCR17plus). They were pre-prepared by Simon Gabay.

``pageXmlTranskribusCorrected`` contains all the PAGE-XML files which are the output files of Transkribus and which was transformed to fit and be used in [eScriptorium](http://traces6.paris.inria.fr/) (link to the python and xsl scripts used [here](https://github.com/Heresta/BAO_Stage_DH_ENS_2021/tree/main/CorrectionPageXMLeScriptorium)).

``alto4eScriptorium`` contains all the ALTO (version 4) xml files which are the output files of [eScriptorium](http://traces6.paris.inria.fr/). Those files were cleaned and prepared. On one hand, all zones were reshaped and renamed depending on an ontology under development called [SegmOnto](https://github.com/SegmOnto) ([link to the zones' denomination list](https://github.com/SegmOnto/examples/tree/main/zones)). We try to do things as simple as possible. But it still raises questions : we gathered them [here](https://github.com/Heresta/BAO_Stage_DH_ENS_2021/tree/main/problemesSegmentation) (french documentation). On the other hand, only some lines were reshaped, but they all were renamed also depending on [SegmOnto](https://github.com/SegmOnto) ([link to the lines' denomination list](https://github.com/SegmOnto/examples/tree/main/lines)).

## About files' segmentation

### About zones:

Title: 2 (1.22%)

Main: 31 (18.9%)

Damage: 32 (19.51%)

Decoration: 10 (6.1%)

DropCapital: 7 (4.27%)

Margin: 11 (6.71%)

Numbering: 29 (17.68%)

RunningTitle: 28 (17.07%)

Signatures: 13 (7.93%)

Stamp: 1 (0.61%)

### About lines:

Default: 878 (98.21%)

DropCapitalLine: 15 (1.68%)

Rubric: 1 (0.11%)
Original file line number Diff line number Diff line change
@@ -0,0 +1,260 @@
<?xml version="1.0" encoding="UTF-8"?>
<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.loc.gov/standards/alto/ns-v4#"
xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-2.xsd">
<Description>
<MeasurementUnit>pixel</MeasurementUnit>
<sourceImageInformation>
<fileName>Scudery1639_Amour_btv1b8607044w_corrected_0009.png</fileName>

</sourceImageInformation>
</Description>

<Tags>
<OtherTag ID="BT1" LABEL="Title" DESCRIPTION="block type Title"/><OtherTag ID="BT2" LABEL="Main" DESCRIPTION="block type Main"/><OtherTag ID="BT687" LABEL="Damage" DESCRIPTION="block type Damage"/><OtherTag ID="BT688" LABEL="Decoration" DESCRIPTION="block type Decoration"/><OtherTag ID="BT689" LABEL="DropCapital" DESCRIPTION="block type DropCapital"/><OtherTag ID="BT690" LABEL="Figure" DESCRIPTION="block type Figure"/><OtherTag ID="BT691" LABEL="Margin" DESCRIPTION="block type Margin"/><OtherTag ID="BT692" LABEL="MusicNotation" DESCRIPTION="block type MusicNotation"/><OtherTag ID="BT693" LABEL="Numbering" DESCRIPTION="block type Numbering"/><OtherTag ID="BT694" LABEL="RunningTitle" DESCRIPTION="block type RunningTitle"/><OtherTag ID="BT695" LABEL="Seal" DESCRIPTION="block type Seal"/><OtherTag ID="BT696" LABEL="Signatures" DESCRIPTION="block type Signatures"/><OtherTag ID="BT697" LABEL="Stamp" DESCRIPTION="block type Stamp"/><OtherTag ID="BT698" LABEL="Table" DESCRIPTION="block type Table"/>
<OtherTag ID="LT248" LABEL="Default" DESCRIPTION="line type Default"/><OtherTag ID="LT249" LABEL="DropCapitalLine" DESCRIPTION="line type DropCapitalLine"/><OtherTag ID="LT250" LABEL="Interlinear" DESCRIPTION="line type Interlinear"/><OtherTag ID="LT251" LABEL="MusicLine" DESCRIPTION="line type MusicLine"/><OtherTag ID="LT252" LABEL="Rubric" DESCRIPTION="line type Rubric"/>
</Tags>

<Layout>
<Page WIDTH="4267"
HEIGHT="5513"
PHYSICAL_IMG_NR="0"
ID="eSc_dummypage_">
<PrintSpace HPOS="0"
VPOS="0"
WIDTH="4267"
HEIGHT="5513">

<TextBlock HPOS="348"
VPOS="216"
WIDTH="3200"
HEIGHT="1810"
ID="eSc_textblock_93a4aced"
TAGREFS="BT1">
<Shape><Polygon POINTS="348 216 348 2026 3548 2026 3548 216"/></Shape>


<TextLine ID="tl_1"
TAGREFS="LT248"
BASELINE="438 678 3409 704"
HPOS="438"
VPOS="222"
WIDTH="2971"
HEIGHT="530">
<Shape><Polygon POINTS="438 222 3409 222 3409 752 438 752"/></Shape>
<String CONTENT="L&#39;AMOVR"
HPOS="438"
VPOS="222"
WIDTH="2971"
HEIGHT="530"></String>
</TextLine>



<TextLine ID="tl_2"
TAGREFS="LT248"
BASELINE="840 1033 2978 1016"
HPOS="840"
VPOS="821"
WIDTH="2141"
HEIGHT="267">
<Shape><Polygon POINTS="840 821 2981 821 2981 1088 840 1088"/></Shape>
<String CONTENT="TIRANNIQVE,"
HPOS="840"
VPOS="821"
WIDTH="2141"
HEIGHT="267"></String>
</TextLine>



<TextLine ID="tl_3"
TAGREFS="LT248"
BASELINE="412 1432 875 1430 1547 1432 3339 1474 3400 1396"
HPOS="412"
VPOS="1151"
WIDTH="2990"
HEIGHT="355">
<Shape><Polygon POINTS="412 1153 3402 1151 3402 1504 412 1506"/></Shape>
<String CONTENT="TRAGI-COMEDIE."
HPOS="412"
VPOS="1151"
WIDTH="2990"
HEIGHT="355"></String>
</TextLine>



<TextLine ID="tl_4"
TAGREFS="LT248"
BASELINE="1720 1677 2082 1690"
HPOS="1720"
VPOS="1576"
WIDTH="363"
HEIGHT="114">
<Shape><Polygon POINTS="1720 1576 2083 1576 2083 1690 1720 1690"/></Shape>
<String CONTENT="PAR"
HPOS="1720"
VPOS="1576"
WIDTH="363"
HEIGHT="114"></String>
</TextLine>



<TextLine ID="tl_5"
TAGREFS="LT248"
BASELINE="682 1891 1669 1898 2005 1900 2961 1922 3122 1927"
HPOS="682"
VPOS="1743"
WIDTH="2443"
HEIGHT="212">
<Shape><Polygon POINTS="682 1743 3125 1743 3125 1955 682 1955"/></Shape>
<String CONTENT="MONSIEVR DE SCVDERY."
HPOS="682"
VPOS="1743"
WIDTH="2443"
HEIGHT="212"></String>
</TextLine>


</TextBlock>

<TextBlock HPOS="324"
VPOS="3931"
WIDTH="3200"
HEIGHT="1259"
ID="eSc_textblock_b8f3a00a"
TAGREFS="BT1">
<Shape><Polygon POINTS="324 3931 324 5178 3512 5190 3524 3967"/></Shape>


<TextLine ID="line_1563976965674_867"
TAGREFS="LT248"
BASELINE="1150 4074 2507 4129"
HPOS="1153"
VPOS="3946"
WIDTH="1359"
HEIGHT="203">
<Shape><Polygon POINTS="1155 3946 2512 4001 2510 4149 1153 4094"/></Shape>
<String CONTENT="A PARIS,"
HPOS="1153"
VPOS="3946"
WIDTH="1359"
HEIGHT="203"></String>
</TextLine>



<TextLine ID="line_1563976988123_876"
TAGREFS="LT248"
BASELINE="354 4281 3399 4391"
HPOS="352"
VPOS="4169"
WIDTH="3051"
HEIGHT="253">
<Shape><Polygon POINTS="358 4169 3403 4279 3397 4422 352 4312"/></Shape>
<String CONTENT="Chez AVGVSTIN COVRBE&#39;, Imprimeur &amp; Libraire de"
HPOS="352"
VPOS="4169"
WIDTH="3051"
HEIGHT="253"></String>
</TextLine>



<TextLine ID="tl_6"
TAGREFS="LT248"
BASELINE="827 4439 1308 4457 1592 4477 1739 4485 2042 4522 2295 4501 2399 4500 2707 4512 2935 4518"
HPOS="825"
VPOS="4344"
WIDTH="2111"
HEIGHT="218">
<Shape><Polygon POINTS="825 4344 2936 4344 2936 4562 825 4562"/></Shape>
<String CONTENT="Monſieur Frere du Roy, dans la petite Sale"
HPOS="825"
VPOS="4344"
WIDTH="2111"
HEIGHT="218"></String>
</TextLine>



<TextLine ID="tl_7"
TAGREFS="LT248"
BASELINE="1354 4612 1480 4622 1842 4664 1916 4638 2019 4641 2406 4648"
HPOS="1353"
VPOS="4511"
WIDTH="1055"
HEIGHT="169">
<Shape><Polygon POINTS="1353 4511 2408 4511 2408 4680 1353 4680"/></Shape>
<String CONTENT="du Palais, à la Palme."
HPOS="1353"
VPOS="4511"
WIDTH="1055"
HEIGHT="169"></String>
</TextLine>



<TextLine ID="tl_8"
TAGREFS="LT248"
BASELINE="1126 4907 1274 4916 1524 4927 1676 4935 2493 4954"
HPOS="1125"
VPOS="4808"
WIDTH="1371"
HEIGHT="162">
<Shape><Polygon POINTS="1125 4808 2496 4808 2496 4970 1125 4970"/></Shape>
<String CONTENT="M. D C. XXXIX."
HPOS="1125"
VPOS="4808"
WIDTH="1371"
HEIGHT="162"></String>
</TextLine>



<TextLine ID="tl_9"
TAGREFS="LT248"
BASELINE="594 5019 1080 5051 1272 5056 1492 5070 1656 5075 1988 5084 2132 5128 2259 5096 2631 5110 3113 5124"
HPOS="593"
VPOS="4926"
WIDTH="2523"
HEIGHT="230">
<Shape><Polygon POINTS="593 4926 3116 4926 3116 5156 593 5156"/></Shape>
<String CONTENT="AVEC PRIVILEGE DV ROY."
HPOS="593"
VPOS="4926"
WIDTH="2523"
HEIGHT="230"></String>
</TextLine>


</TextBlock>

<TextBlock HPOS="2925"
VPOS="3632"
WIDTH="707"
HEIGHT="683"
ID="eSc_textblock_2ad6a31d"
TAGREFS="BT697">
<Shape><Polygon POINTS="3572 3644 3632 4279 2925 4315 2937 3632"/></Shape>

</TextBlock>

<TextBlock HPOS="611"
VPOS="2050"
WIDTH="2493"
HEIGHT="1798"
ID="eSc_textblock_abea09f5"
TAGREFS="BT688">
<Shape><Polygon POINTS="635 2050 611 3728 3044 3848 3104 2110"/></Shape>

</TextBlock>



</PrintSpace>
</Page>
</Layout>
</alto>
Loading

0 comments on commit e5a0536

Please sign in to comment.