[Dachs-support] Question about loading remote data from git
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Tue Apr 27 08:36:26 CEST 2021
Hi Baptiste,
On Mon, Apr 26, 2021 at 08:51:15PM +0200, Baptiste Cecconi wrote:
> I have a prototype I'm working on, which is taking data from a
> series of git repositories. Those repositories are registered as
> submodules of the RD directory.
>
> I would like to run the command:
> $ git submodules update --remote
> and then run:
> $ dachs imp q
>
> So that I have the latest content at hand.
> Is there a way to have a "pre-processor" script that would do this
> when I try to import ?
You mean: run the submodules update as part of the imp? If so, I
suppose a preImport script would about fit your bill (cf.
http://docs.g-vo.org/DaCHS/ref.html#scripting for the larger
picture).
Right now, there is no "shell" language for the scripts (I'd could be
talked into adding one), but you can substitute by doing something
like:
<data id="import">
...
<make>
<script type="preImport" lang="python">
<code>
import subprocess
subprocess.call("git submodules update --remote".split())
</code>
</script>
</make>
</data>
There's a spot of trouble, though: DaCHS will not run preImport,
preIndex, and postCreation scripts on existing tables (i.e., in
updating data items once they are created). To work around that, you
could use a last_update (on-disk) table that only contains the date
of the last update (say) and that is re-created on every imp; its
(auto) data item would contain the script. And that would then have
the (non-auto) updating data in recreateAfter.
You can then get even fancier and have timed updates in that RD,
perhaps like
<execute at="m15 6:00" title="Monthly data update">
<job>
<code>
execDef.spawn(
"dachs --ui stingy imp \schema/q".split())
</code>
</job>
</execute>
(you want the machine to be able to send mail with this kind of
thing, though, so it can cry for help when necessary).
Let me know how it goes,
Markus
More information about the Dachs-support
mailing list