-
Notifications
You must be signed in to change notification settings - Fork 0
/
ep3harvester.1.html
145 lines (142 loc) · 4.76 KB
/
ep3harvester.1.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
<!DOCTYPE html>
<html>
<head>
<title>eprinttools - ep3harvester.1.html</title>
<link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="/css/site.css">
</head>
<body>
<header>
<a href="http://library.caltech.edu" title="link to Caltech Library Homepage"><img src="/assets/liblogo.gif" alt="Caltech Library logo"></a>
</header>
<nav>
<ul>
<li><a href="/">Home</a></li>
<li><a href="README.html">README</a></li>
<li><a href="LICENSE">LICENSE</a></li>
<li><a href="install.html">INSTALL</a></li>
<li><a href="user-manual.html">User Manual</a></li>
<li><a href="search.html">Search Docs</a></li>
<li><a href="about.html">About</a></li>
<li><a href="https://github.com/caltechlibrary/eprinttools">GitHub</a></li>
</ul>
</nav>
<section>
<h1 id="name">NAME</h1>
<p>ep3harvester</p>
<h1 id="synopsis">SYNOPSIS</h1>
<p>ep3harvester <a href="#options">OPTIONS</a>
JSON_SETTINGS_FILENAME<br />
[START_TIMESTAMP] [END_TIMESTAMP]</p>
<h1 id="description">DESCRIPTION</h1>
<p>ep3harvester is a command line program for metadata harvesting of
EPrints repositories.</p>
<p>ep3harvester takes a JSON settings file and harvests all the EPrint
repositories defined in the settings file into a JSON store implemented
in MySQL 8. One repository per MySQL 8 table.</p>
<p>Each MySQL 8 table has several columns id, src (holding the JSON
document as a JSON column) and an updated (holding the timestamp of when
the metadata was harvested).</p>
<h1 id="configuration">CONFIGURATION</h1>
<p>ep3harvester can generate an example settings JSON document. You can
then edit it with any plain text editor (e.g. nano). Then you’ll need to
setup a MySQL 8 database and tables to store havested data in.</p>
<p>ep3harvester uses a MySQL 8 database for a JSON document store. It
will generate one table for EPrint repository. You can generate a SQL
program for creating the MySQL database and tables from your settings
JSON file using the “-sql-schema” option. Using the option will require
a JSON settings filename parameter. E.g.</p>
<pre><code> ep3harvester -init harvester-settings.json
nano harvester-settings.json
ep3harvester -sql-schema harvester-settings.json >collections.sql</code></pre>
<h1 id="options">OPTIONS</h1>
<dl>
<dt>-help</dt>
<dd>
display help
</dd>
<dt>-version</dt>
<dd>
display version
</dd>
<dt>-license</dt>
<dd>
display license
</dd>
<dt>-groups</dt>
<dd>
Harvest groups from CSV files included configuration
</dd>
<dt>-init</dt>
<dd>
generate a settings JSON file
</dd>
<dt>-eprintids</dt>
<dd>
harvest the eprintids indicated by the filename, one id per line
</dd>
<dt>-people</dt>
<dd>
Harvest people from CSV files included configuration
</dd>
<dt>-people-groups</dt>
<dd>
Harvest people and groups from CSV files included configuration
</dd>
<dt>-repo string</dt>
<dd>
Harvest a specific repository id defined in configuration
</dd>
<dt>-simple</dt>
<dd>
Crosswalk the harvested eprint record to the simplified record model
before saving the JSON to the SQL database.
</dd>
<dt>-sql-schema</dt>
<dd>
display SQL schema for installing MySQL jsonstore DB
</dd>
<dt>-verbose</dt>
<dd>
use verbose logging
</dd>
</dl>
<h1 id="examples">EXAMPLES</h1>
<p>Harvesting repositories for the month of May, 2022.</p>
<pre><code> ep3harvester harvester-settings.json \
"2022-05-01 00:00:00" "2022-05-31 59:59:59"</code></pre>
<p>Harvesting a caltechauthors repo using harvester-settings.json for
week month of the month of May, 2022.</p>
<pre><code> ep3harvester -repo caltechauthors harvester-settings.json \
"2022-05-01 00:00:00" "2022-05-31 59:59:59"</code></pre>
<p>ep3harvester 1.2.4</p>
</section>
<footer>
<span>© 2021 <a href="https://www.library.caltech.edu/copyright">Caltech Library</a></span>
<address>1200 E California Blvd, Mail Code 1-32, Pasadena, CA 91125-3200</address>
<span><a href="mailto:[email protected]">Email Us</a></span>
<span>Phone: <a href="tel:+1-626-395-3405">(626)395-3405</a></span>
</footer>
<!-- START: PrettyFi from https://github.com/google/code-prettify -->
<script>
/* We want to add the class "prettyprint" to all the pre elements */
var pre_list = document.querySelectorAll("pre");
pre_list.forEach(function(elem) {
elem.classList.add("prettyprint");
elem.classList.add("linenums");/**/
elem.classList.add("json"); /**/
});
</script>
<style>
li.L0, li.L1, li.L2, li.L3, li.L4, li.L5, li.L6, li.L7, li.L8, li.L9
{
color: #555;
list-style-type: decimal;
}
</style>
<link rel="stylesheet" type="text/css" href="/css/prettify.css">
<script src="https://cdn.jsdelivr.net/gh/google/code-prettify@master/loader/run_
prettify.js"></script>
<!-- END: PrettyFi from https://github.com/google/code-prettify -->
</body>
</html>