[Dachs-support] Few questions on non-fits data
Yan Grange
grange at astron.nl
Mon Apr 25 17:24:58 CEST 2022
On 25/04/2022 17:11, Markus Demleitner wrote:
> Hi Yan,
>
> On Mon, Apr 25, 2022 at 12:49:26PM +0000, Yan Grange wrote:
>> I’ve got two questions for you:
>>
>> 1. We have a list of pdfs that we want to expose in a table. I
>> understand the making them available by changing the mime type
>> and using the //products#define macro, but that should happen
>> inside a grammar and I am not sure what grammar to use here. Is
>> there a grammar that basically does an ls and that we can link to
>> the other columns?
> The ls part is done by the sources element; the grammar just needs to
> produce a dictionary from whatever file name it gets fed.
>
> There is of course no grammar built into DaCHS that could do anything
> sensible with a PDF, and I don't think I see a point in building one.
> Hence, you'd need to do a customGrammar (with an external module;
> that's a good idea if your grammar is complex and requires external
> tests to be somewhat trustworthy) or an embeddedGrammar (if it's just
> a few lines of code). You'll find examples for both in the element
> reference.
>
> Now, I suspect your case will be one where all the metadata is given
> in the file name. If that's so, I'd do the parsing of the file name
> in the grammar rather than in lots of string manipulations in the row
> maker: it's tidier and more maintainable that way.
>
> I'd hence do something like:
>
> <embeddedGrammar>
> <iterator>
> <setup imports="re">
> <par name="namePattern">re.compile("([0-9.]+)-([0-9.]+).pdf")</par>
> </setup>
> <code>
> mat = namePattern.match(self.sourceToken)
> if not mat:
> raise ValueError(f"Bad file name: {self.sourceToken}")
> yield {
> "ra": mat.group(1),
> "dec": mat.group(2),
> }
> </code>
> </iterator>
>
> <rowfilter procDef="//products#define">
> <bind name="mime">"application/pdf"</bind>
> <!-- and presumably previews? see
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdocs.g-vo.org%2FDaCHS%2Fref.html%23precomputed-previews&data=05%7C01%7Cgrange%40astron.nl%7C7bbf3baabb2f4d9c03b508da26cdd683%7C03e9259c32624694ad934dce7c9fa403%7C1%7C0%7C637864963488563381%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Sk9LNNO3j3tEwScCFMLnSEEFg9Wbdp6CGwoQGbhxy1g%3D&reserved=0 -->
> </rowfilter>
> </embeddedGrammar>
>
> (disclaimer: not tested; expect the usual typos).
>
> Knowing astronomers' habits, I suspect you'll want to be a lot more
> careful writing down the regular expression for the file name, as it
> will probably end up being humonguos.
>
> Python actually has a way to write "verbose" REs, but I've found just
> using normal string contcatenation is fine when used this way:
>
> <par name="namePattern">re.compile(
> "(?P<ra>[0-9.]+)" # ra
> "-" # constant token separator
> "(?P<dec>[0-9.]+)" # dec
> r"\.pdf" # constant file extension
> )</par>
>
> Using named groups like this could even let you just say
>
> yield mat.groupdict()
>
> in the iterator.
(first of all: sorry for our email server messing up the URL, seems like
Microsoft wants to know when I click which link I click on).
So we may have it easier, or harder: We’ve basically got a csv where the
file name is a file name. However I don’t really understand how to tell
Dachs that that file should be served by dachs, because I think the
source will contain basically the csv file itself (that I’d not
necessarily want to share).
So I can probably quite easily define what URL would be used if I have a
way to tell dachs: "here are the files to share". Would that require a
custom service or can that somewhat be incooperated in a table?
>> 2. More of an advice than necessarily a need here. Is it useful
>> in your opinion to have pdf-only data in the obscure? I can
>> imagine that there are not that many clients that will be able to
>> handle pdfs so it may not be worth the effort.
> Well, if there's science content that people can sensibly use,
> they'll be grateful and figure out how to deal with these PDFs.
>
> While I give you that PDFs *are* a bit funky in this setting: if it's
> all you got and there's no way to produce nice, useful FITSes from
> them, by all means publish them, provided you give at least enough
> metadata that people can find them if interested, and they won't
> poison too many queries that would rather not have them (i.e.: at
> least have sound limits in space, time, and spectrum).
Clear!
>> Also: is it know top you that the certificate of
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.g-vo.org%2Fmailman%2Flistinfo%2Fdachs-support&data=05%7C01%7Cgrange%40astron.nl%7C7bbf3baabb2f4d9c03b508da26cdd683%7C03e9259c32624694ad934dce7c9fa403%7C1%7C0%7C637864963488563381%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=kRz1oguBXlz8WTlLvejfxNopcQYfBPQ%2BQFNTechryJk%3D&reserved=0 has expired?
> Yeah... Unfortunately, I have no control over that machine. Sorry
> about that, I've asked the people how run it to fix it, and I trust
> they're on it, but this is running below a metric ton of cruft
> ("plesk"), and so I understand why it's not immediate.
I feel sorry for you :).
> -- Markus
Yan
More information about the Dachs-support
mailing list