[Dachs-support] Few questions on non-fits data

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Mon Apr 25 17:11:07 CEST 2022


Hi Yan,

On Mon, Apr 25, 2022 at 12:49:26PM +0000, Yan Grange wrote:
> I’ve got two questions for you:
> 
>   1.  We have a list of pdfs that we want to expose in a table. I
>   understand the making them available by changing the mime type
>   and using the //products#define macro, but that should happen
>   inside a grammar and I am not sure what grammar to use here. Is
>   there a grammar that basically does an ls and that we can link to
>   the other columns?

The ls part is done by the sources element; the grammar just needs to
produce a dictionary from whatever file name it gets fed.

There is of course no grammar built into DaCHS that could do anything
sensible with a PDF, and I don't think I see a point in building one.
Hence, you'd need to do a customGrammar (with an external module;
that's a good idea if your grammar is complex and requires external
tests to be somewhat trustworthy) or an embeddedGrammar (if it's just
a few lines of code).  You'll find examples for both in the element
reference.

Now, I suspect your case will be one where all the metadata is given
in the file name.  If that's so, I'd do the parsing of the file name
in the grammar rather than in lots of string manipulations in the row
maker: it's tidier and more maintainable that way.

I'd hence do something like:

  <embeddedGrammar>
    <iterator>
      <setup imports="re">
        <par name="namePattern">re.compile("([0-9.]+)-([0-9.]+).pdf")</par>
      </setup>
      <code>
        mat = namePattern.match(self.sourceToken)
        if not mat:
          raise ValueError(f"Bad file name: {self.sourceToken}")
        yield {
          "ra": mat.group(1),
          "dec": mat.group(2),
        }
      </code>
    </iterator>

    <rowfilter procDef="//products#define">
      <bind name="mime">"application/pdf"</bind>
      <!-- and presumably previews? see
        http://docs.g-vo.org/DaCHS/ref.html#precomputed-previews -->
    </rowfilter>
  </embeddedGrammar>

(disclaimer: not tested; expect the usual typos).

Knowing astronomers' habits, I suspect you'll want to be a lot more
careful writing down the regular expression for the file name, as it
will probably end up being humonguos.

Python actually has a way to write "verbose" REs, but I've found just
using normal string contcatenation is fine when used this way:

<par name="namePattern">re.compile(
  "(?P<ra>[0-9.]+)"   # ra
  "-"                 # constant token separator
  "(?P<dec>[0-9.]+)"  # dec
  r"\.pdf"            # constant file extension
  )</par>

Using named groups like this could even let you just say 

  yield mat.groupdict()

in the iterator.

>   2.  More of an advice than necessarily a need here. Is it useful
>   in your opinion to have pdf-only data in the obscure? I can
>   imagine that there are not that many clients that will be able to
>   handle pdfs so it may not be worth the effort.

Well, if there's science content that people can sensibly use,
they'll be grateful and figure out how to deal with these PDFs.

While I give you that PDFs *are* a bit funky in this setting: if it's
all you got and there's no way to produce nice, useful FITSes from
them, by all means publish them, provided you give at least enough
metadata that people can find them if interested, and they won't
poison too many queries that would rather not have them (i.e.: at
least have sound limits in space, time, and spectrum).

> Also: is it know top you that the certificate of
> https://lists.g-vo.org/mailman/listinfo/dachs-support has expired?

Yeah... Unfortunately, I have no control over that machine.  Sorry
about that, I've asked the people how run it to fix it, and I trust
they're on it, but this is running below a metric ton of cruft
("plesk"), and so I understand why it's not immediate.

        -- Markus


More information about the Dachs-support mailing list