[Dachs-support] Few questions on non-fits data

Yan Grange grange at astron.nl
Mon Apr 25 17:24:58 CEST 2022


On 25/04/2022 17:11, Markus Demleitner wrote:
> Hi Yan,
>
> On Mon, Apr 25, 2022 at 12:49:26PM +0000, Yan Grange wrote:
>> I’ve got two questions for you:
>>
>>    1.  We have a list of pdfs that we want to expose in a table. I
>>    understand the making them available by changing the mime type
>>    and using the //products#define macro, but that should happen
>>    inside a grammar and I am not sure what grammar to use here. Is
>>    there a grammar that basically does an ls and that we can link to
>>    the other columns?
> The ls part is done by the sources element; the grammar just needs to
> produce a dictionary from whatever file name it gets fed.
>
> There is of course no grammar built into DaCHS that could do anything
> sensible with a PDF, and I don't think I see a point in building one.
> Hence, you'd need to do a customGrammar (with an external module;
> that's a good idea if your grammar is complex and requires external
> tests to be somewhat trustworthy) or an embeddedGrammar (if it's just
> a few lines of code).  You'll find examples for both in the element
> reference.
>
> Now, I suspect your case will be one where all the metadata is given
> in the file name.  If that's so, I'd do the parsing of the file name
> in the grammar rather than in lots of string manipulations in the row
> maker: it's tidier and more maintainable that way.
>
> I'd hence do something like:
>
>    <embeddedGrammar>
>      <iterator>
>        <setup imports="re">
>          <par name="namePattern">re.compile("([0-9.]+)-([0-9.]+).pdf")</par>
>        </setup>
>        <code>
>          mat = namePattern.match(self.sourceToken)
>          if not mat:
>            raise ValueError(f"Bad file name: {self.sourceToken}")
>          yield {
>            "ra": mat.group(1),
>            "dec": mat.group(2),
>          }
>        </code>
>      </iterator>
>
>      <rowfilter procDef="//products#define">
>        <bind name="mime">"application/pdf"</bind>
>        <!-- and presumably previews? see
>          https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdocs.g-vo.org%2FDaCHS%2Fref.html%23precomputed-previews&data=05%7C01%7Cgrange%40astron.nl%7C7bbf3baabb2f4d9c03b508da26cdd683%7C03e9259c32624694ad934dce7c9fa403%7C1%7C0%7C637864963488563381%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Sk9LNNO3j3tEwScCFMLnSEEFg9Wbdp6CGwoQGbhxy1g%3D&reserved=0 -->
>      </rowfilter>
>    </embeddedGrammar>
>
> (disclaimer: not tested; expect the usual typos).
>
> Knowing astronomers' habits, I suspect you'll want to be a lot more
> careful writing down the regular expression for the file name, as it
> will probably end up being humonguos.
>
> Python actually has a way to write "verbose" REs, but I've found just
> using normal string contcatenation is fine when used this way:
>
> <par name="namePattern">re.compile(
>    "(?P<ra>[0-9.]+)"   # ra
>    "-"                 # constant token separator
>    "(?P<dec>[0-9.]+)"  # dec
>    r"\.pdf"            # constant file extension
>    )</par>
>
> Using named groups like this could even let you just say
>
>    yield mat.groupdict()
>
> in the iterator.

(first of all: sorry for our email server messing up the URL, seems like 
Microsoft wants to know when I click which link I click on).


So we may have it easier, or harder: We’ve basically got a csv where the 
file name is a file name. However I don’t really understand how to tell 
Dachs that that file should be served by dachs, because I think the 
source will contain basically the csv file itself (that I’d not 
necessarily want to share).


So I can probably quite easily define what URL would be used if I have a 
way to tell dachs: "here are the files to share". Would that require a 
custom service or can that somewhat be incooperated in a table?

>>    2.  More of an advice than necessarily a need here. Is it useful
>>    in your opinion to have pdf-only data in the obscure? I can
>>    imagine that there are not that many clients that will be able to
>>    handle pdfs so it may not be worth the effort.
> Well, if there's science content that people can sensibly use,
> they'll be grateful and figure out how to deal with these PDFs.
>
> While I give you that PDFs *are* a bit funky in this setting: if it's
> all you got and there's no way to produce nice, useful FITSes from
> them, by all means publish them, provided you give at least enough
> metadata that people can find them if interested, and they won't
> poison too many queries that would rather not have them (i.e.: at
> least have sound limits in space, time, and spectrum).
Clear!
>> Also: is it know top you that the certificate of
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.g-vo.org%2Fmailman%2Flistinfo%2Fdachs-support&data=05%7C01%7Cgrange%40astron.nl%7C7bbf3baabb2f4d9c03b508da26cdd683%7C03e9259c32624694ad934dce7c9fa403%7C1%7C0%7C637864963488563381%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=kRz1oguBXlz8WTlLvejfxNopcQYfBPQ%2BQFNTechryJk%3D&reserved=0 has expired?
> Yeah... Unfortunately, I have no control over that machine.  Sorry
> about that, I've asked the people how run it to fix it, and I trust
> they're on it, but this is running below a metric ton of cruft
> ("plesk"), and so I understand why it's not immediate.
I feel sorry for you :).
>          -- Markus

Yan



More information about the Dachs-support mailing list