Sunday, February 19, 2017

Bombing of veterans


Did anybody else read this and wonder if bombing veterans in celebration of an anniversary isn't a bit cruel? Writing news headlines is an art.

Another one I always wonder about is the form, "suspect still at large: police". I have yet to understand how the fact that a suspect is still at large can say the word police.

Saturday, February 18, 2017

Robert Lanfear on the state of molecular phylogenetics

I had hoped to write this up earlier, but there we are. On 9 February I went to a presentation by Robert Lanfear, the author among other things of PartitionFinder, a software that assists in the selection of models of nucleotide evolution and, as the name implies, dataset partitioning. His talk gave an overview of where he sees the field of (model-based) molecular phylogenetics, its problems and potential solutions.

I will structure my notes on his talk and my own thoughts about it as a kind of numbered list, for easier cross-reference, with no claim to having written this up in a particularly beautiful way.

1. The problem

Lanfear started out with the observation that the current practice in molecular phylogenetics works well, but it works increasingly less well. What he means here is that if there is a phylogenetic question that has a clear and strongly supported answer, then even cutting a lot of corners and making some mistakes will produce that correct answer.

Now, however, those "low-hanging fruit" have largely been harvested, and what is left are really hard to resolve relationships. In those cases small differences in how the analysis is done will lead to different answers (see point 2 below). An example he referred to at least twice during the talk was the relationship between crocodiles, birds and turtles, another one were relationships between major clades of birds.

What I find interesting here is how people set their priorities. Apparently there are a lot of researchers who care very deeply about, for example, whether crocodiles are sister to birds or to turtles. Honestly I couldn't care less, and the same would be true for comparable cases in plant phylogenetics. What phylogenetics is about for me is to identify monophyletic groups for classification and to provide phylogenies for downstream analyses in biogeography and evolutionary biology. For the former, the most relevant observation is that turtles, crocs and birds form three reciprocally monophyletic groups, but if we don't know their relationships to each other we can simply classify them next to each other at the same rank, problem solved. For the latter, there are ways of taking uncertainty into consideration, problem solved.

In other words, where I see need for more work in the field is in the many clades of plants, insects, nematodes, mites, etc., that have so far not been well studied, as opposed to re-analysing over and over and over the same few charismatic but overstudied groups of vertebrates. Each to their own I guess, but the thing is that all the considerations that follow assume first that being unable to decisively resolve every single node in a phylogeny is at all important to anything or anybody. I am just not sure I see that.

2. How do we know that the current practice is working less well now?

Partly because people get very different results with high confidence. Lanfear called this the "new normal": large amounts of genomic data give strong statistical support for contradictory results.

This is a very good observation that will hopefully also be convincing to those who like to stress our inability to know the truth, and that we can only hope to build hypotheses.

3. The current best practice for genomic sequencing

Data cleaning of genomic data is crucial because everything is full of microbes. Even DNA extraction kits are contaminated, so never do genomic sequencing without a negative control.

I must admit that I have not always followed that advice, but with amplicon sequencing or target enrichment for example it may not be that relevant, given that non-targeted DNA is unlikely to amplify and you know if a sequence comes totally out of left field. The example Lanfear used, however, was a de novo genome assembly where contaminants were presented as evidence of horizontal gene transfer. That would have been embarrassing.

He also argued for inclusion of a positive control, as in adding a known genome to check for contamination percentage. That does of course assume that you always have a known genome in your study group, which is unlikely to be the case in most groups.

Finally, there should be biological and technical replicates, probably the sampling guideline that the largest number of people are aware of and follow.

4. The current best practice for assembling the data matrices

Remove parts of the alignment that cannot be trusted. Lanfear mentioned the software GBlocks, which I personally have never used. However, he cited a paper that argues it doesn't seem to help (Tan et al. 2015, Syst Biol 64: 778) and seemed to advise against using it. His own preference is to pragmatically make an automated alignment and then check by eye and delete non-homologous sites manually.

5. Examining the individual gene trees

Next comes paralog detection, if that is relevant to the data type. One of the most stunning observations Lanfear mentioned was that in multi-locus species tree analyses some loci may have massive leverage on the results. He cited a case in which two undetected paralogs made the difference between 100% support for one and 100% support for the other answer.

His suggested positive control here: be suspicious if a gene tree does not show a very well established clade. Keep that one in mind as it will come up again.

6. Multi-locus analysis versus concatenation

We are talking phylogenomics here, so there are always multiple independent loci. A full Bayesian analysis of gene trees and species tree together in StarBEAST is best but limited to max. 50 species. I wasn't aware of the ballpark number, so this is good to know. Interestingly, the next best thing is concatenation, because according to Lanfear short-cut methods using previously inferred gene trees to infer the species tree in a second step (ASTRAL et al.) perform worst. Not sure how easy it will generally be to convince peer reviewers of this.

7. Model selection

Not many people are aware that we have to guess a topology to even do an alignment, and also to do model selection. Then we co-estimate all model parameters at the same time as the final topology is inferred.

We may need a separate model for each codon position and gene, stem vs. loop for rDNA; even for only three genes, the possibilities are already too huge. Also, there is a trade-off between having enough parameters and not being able to estimate all of many parameters. Here cometh PartitionFinder to help with that. However, as the author of that software Lanfear himself stresses that thinking carefully about data may be better than using the automated approach.

He was what I cannot help but call surprisingly cynical about how little we know about model selection and alignment.

8. Tree inference

Be aware that all software has bugs and limitations. Lanfear cited a few examples including a known but so far unresolvable branch length bug in RAxML (10x branch length inflation in 5 of 34 datasets tested). He also said that RAxML does not implement linked branch lengths across parts of the partition, and that few people were aware of that. Me neither.

At any rate he suggested to use more than one software and compare, as a "sanity check". His suggestions for likelihood were RAxML, PhyML, and IQ-tree; for Bayesian phylogenetics obviously MrBayes and BEAST.

Parsimony seemed to be The Method That May Not Be Named, although there is a long tradition in the area I am working in to run at least Bayesian and parsimony analyses and then perhaps also likelihood for comparison. Indeed if I remember correctly the word parsimony was only mentioned once at the beginning of the talk, and it was in the context of something like "parsimony also makes assumptions". Hardly anybody would doubt that; the arguments of parsimony advocates appear to be mostly epistemological (I have discussed before why that doesn't convince me personally) and on the lines of modelling making too many and/or unjustified assumptions, whether that is true or not.

From my own perspective as a methods pragmatist who happily uses all of them as long as they are a good match for the data and computationally feasible, I was once more surprised that a likelihood phylogeneticist like Lanfear explicitly mentioned Neighbor Joining as perfectly fine, something that I had seen previously in that BMC Evolutionary Biology editorial. I am sorry to say that I don't really get it. It seems like saying that you shouldn't use your kitchen knife for emergency surgery because it wasn't properly sterilised, but the muddy shovel from the garden shed will do in a pinch.

9. Special considerations for Bayesian phylogenetics

Keep an eye on sampling and convergence using software such as Tracer and RWTR; effective sample size needs to be > 200 so that samples are independent enough. None of this should be news to anybody who is using Bayesian phylogenetics, one would hope, but I haven't tried RWTY so far.

Two things Lanfear mentioned were less familiar to me, unsurprisingly given that I am not really a Bayesian. First, in theory Markov Chain Monte Carlo only works if run for infinite time, but it "works in practice". Second, apparently there is no good way yet of calculating ESS for tree topology or covergence, but "RWTY helps".

10. The way forward

Lanfear's hopes for improving molecular phylogenetics in the future are based on what he called "integrated analyses". They include trying to infer the model of evolution at the same time as tree topology.

Next there is the need for "better" models, e.g. non-reversible ones, which he mentioned as coming soon to IQ-tree and PartitionFinder, and different models for different parts of tree, which however may be computationally too hard.

Stationarity of model parameters across evolutionary history, reversibility, homogeneity, and tree-likeness (no recombination) are model assumptions that are universal and hardly ever tested. But tests are possible, and then the data that don't fit the model can be removed. Most generally, instead of big data use the data that can reliably be modelled only. I found this really refreshing to hear, as many people seem to prefer throwing more data at a problem in the hope it goes away.

Finally, Lanfear suggested to conduct blinded analyses. He said that in many cases there was a hidden extra step after tree inference: is the tree the one we wanted? If yes, it gets published; if no, if it disagrees with preconceived notions, some people go back and tweak the data. Clearly this is problematic, but I was not the only one in the audience who thought back to what I have here written up as point number 5 and observed a bit of a self-contradiction.

I assume the answer is that there is a difference between being sceptical about a gene tree that contradicts really well established knowledge and tweaking the results that your study really are about. To use a non-phylogenetic example, if you want to find out if one brand of car can go faster than another it is not okay to tweak data after the results show that your favoured brand is the slower one; but it is okay to go back to check your data if they show one of them to have speed of 50,000 km/h, because that just doesn't seem plausible.

Wednesday, February 15, 2017

Why is public reporting about science often so frustrating?

Reading a bit of ABC online over breakfast, I was surprised at the claim, to quote the title of the piece, that a "pregnant reptile fossil suggests bird ancestors gave birth to live young". Wow, that would be quite something, if the ancestor of the birds had given birth to live young and then later down the lineage they had re-invented egg laying. I would not have thought something like that possible, Dollo's Law and all.

Closer examination of the article shows that the title is quite a bit at variance with the rest. There is no mention of the reptile in question being the actual ancestor of the birds. It is sitting on a side branch of the phylogeny, and the conclusion made by the authors is merely "scientists can at least rule out the possibility that animals in this group", i.e. the clade that birds and crocodiles belong to, "were somehow incapable of evolving the ability to give birth to live young". They actually show the phylogenetic tree from the original paper and it shows the relevant reptile on a side branch.

So the title is not merely misleading but actually downright wrong. Don't science journalists know what an "ancestor" is? Did they not show the final article to somebody who knows that stuff and ask for feedback?

Sunday, February 12, 2017

Sturgeon's law

While on the topic of the book fair, I have to say that as much as I love browsing through the books and finding gems, it is also one of the moments that produce a certain sense of alienation from the majority of humanity in me. The only other moment that parallels it is "standing in front of the magazine rack in a supermarket".

As far as I am concerned, there are generally no more than two to three journals in the average magazine rack that one could reasonably count as a loss if somebody were to torch the lot. In fact, not only would there be no loss to the wealth and welfare of humanity if titles like "Kim Kardashian's new bikini body" or "Nicole Kidman's relationship crisis", most of them blatantly invented anyway, went up in flames, but burning the paper to generate energy would be considerably more productive than using it to print this kind of dreck. And people are actually wasting hard-earned money on all of it.

Similarly, I cannot help but observe, as I look across the dozens of tables in the book fair, that there are entire sections on astrology and "alternative medicine". These kinds of books have only one goal, and that is to make their readers more ignorant and less capable of critical thought. (You might argue that the ultimate goal is to sell, okay. But they will only sell if they first achieve the goal I mentioned. A swindler first has to swindle, only then can they extract money.) In a way it is, of course, nice to see them being sold again for a few bucks to finance a crisis hotline, but there is no way around the fact that as long they are in circulation some of these works will continue to harm gullible people by getting them to rely on snake oil and forgoing real treatment for their illnesses.

As for fantasy and science fiction novels, there are so many crappy books out there that it is extremely hard to find the few worthwhile ones between them. And I don't even have very high standards - some of the ones mentioned in my previous post are not exactly Nobel Prize in literature material either. But for an example of the 90% crud that makes browsing books so hard, I would like to present a novel that I bought on a whim at the previous fair we went to:

Stan Nicholls, Legion of Thunder. Book 2 of Orcs: First Blood.

Being part of a series is not decisive evidence of being crud, but it is a first warning sign. At a minimum I am starting to think that the better authors are the ones that write a series so that each novel can stand by itself. Think Martin Scott's Thraxas, Barry Hughart's Master Li chronicles, or Terry Pratchett's Discworld novels; each book is a self-contained story. When everything has to end on a cliff-hanger, however, it just looks cheap and like trying too hard. There is also the risk that the story will never be brought to a resolution and instead end with author existence failure.

Now as for the book itself, I was fooled into buying it because I had read other, fairly good books by different authors written from the perspective of the usual fantasy underdogs like orcs or dark elves. In the present case, however, the plot of the novel can comfortably be summarised as follows:

Protagonists search for McGuffins (yes, plural; they have to collect several).
Protagonists get into fight.
Protagonists search for McGuffins.
Protagonists get into fight.
Protagonists search for McGuffins.
Protagonists get into fight.
Novel ends on a cliff-hanger.

The fights appear to be the main attraction here, as they are written in a very voyeuristic manner. Apparently some readers really look forward to knowing which evil mook gets a knife into the eye, which one gets its arm cut off, and how far the blood sprays.

But the insults to the reader's intelligence don't stop there. In the background there is a big bad sorceress who is so comically evil and so prone to randomly killing her own followers that she should have been murdered in a palace coup years ago. During what is clearly meant to be a pivotal moment in her character development, she demands of one of her sisters, who is ruling over a people of aquatic semi-humanoids, to help her hunt for the protagonists, who are moving entirely on land. Her sister rejects the demand, and so she magics her dead.

The things is, it never really becomes clear how helping would have looked like. Why didn't her sister simply agree, on the lines of: "I will gladly help you, let me just command all my soldiers who can operate on dry land to assist you OH WAIT I DON'T HAVE ANY"?

Seriously, the world does not need this kind of book to use up paper that could be used to print decent ones.

Saturday, February 11, 2017

This season's Lifeline Bookfair haul so far

Not sure if I go another time tomorrow, but so far today's visit to the Lifeline Bookfair here in Canberra has netted the following:

Tolkien JRR, The Silmarillion.
I have read that one before, although in German I think (?). But we didn't own the book ourselves, and I may want to read it again.

Orwell G, Animal Farm.
Another one that I have read once before, but as a teenager. Again I did not have the book myself, having at that time borrowed it from a friend.

Wells HG, The Invisible Man.
A classic that caught my interest.

Scott W, Ivanhoe.
Likely not the best book I have bought today. My understanding is that it is pretty cheesy. But when I was younger I played Defender of the Crown and watched Ivanhoe movies, so it might be nice to read the novel that started it all.

MacDonald G, The Wise Woman and other Fantastic Stories.
Sounds interesting because the author is billed as "the great nineteenth-century innovator of modern fantasy" who "came to influence" CS Lewis, Charles Williams and JRR Tolkien. The back cover further calls the book one of a set of four, but sadly the other three were nowhere to be seen.

Silverberg R, The Longest Way Home.
A science fiction novel from an author some of whose books I have read in Germany translation years ago (mostly Majipoor novels). Not sure how it will turn out.

Bramah E, Kai Lung Unrolls His Mat.
Finally, this is probably the weirdest of them all. My hope is it will be something in the vein of Barry Hughart's chronicles of Master Li. We shall see.

In addition, we bought several books and a puzzle for our daughter, and yesterday my wife already went for several books and CDs herself. May have to donate some books back one of these days, or the bookshelf with the novels will fold into itself and turn into a singularity.

Update 12 Feb 2017: Went back again today and spent more time in non-fiction.

James W, The Varieties of Religious Experience.
A very famous book originally published in 1902, it examines the origin of religion from a psychological perspective. The critical introduction claims that the author was actually fairly charitable ("a classic that is ... too religious to have influenced much psychological research"), but one can imagine that the whole idea behind the work wouldn't have sat too well with many of the faithful.

Baggini J, Freedom Regained.
Having participated in the never-ending online discussion on Free Will I thought it might be good to read something by a philosopher on the subject. Admittedly there might be some bias on my side, as the author clearly has the same stance as I have, at least in the broad outlines.

Machiavelli N, Il Principe.
The classic's classic of all the books I bought, this is the 16th century book that Machiavelli is famous for. I got the German translation.

Astonishing, by the way, how much has been sold since yesterday.

Friday, February 10, 2017

Botany picture #240: Neottia nidus-avis, and parasitic plants in general


At the moment parasitic plant expert Sasa Stefanovic is visiting our herbarium to study the genus Cuscuta (Convolvulaceae; but unfortunately I do not have good picture of it). Today he gave a seminar at the ANU, and I noted with interest what terminology he used to distinguish the two main groups of parasitic plants.

The first group are plants that have haustoria, organs that they use to attack the phloem of other plants and draw water, nutrients and energy from them. This adaptation occurs across several groups of eudicots but interestingly not in the monocots.

The second group are plants that parasitise on fungi. An example is the strange European orchid Neottia nidus-avis depicted above. They have clearly evolved from ancestors that used mutually beneficial mycorrhiza, trading sugar against otherwise hard to obtain nutrients, but then turned the relationship into pure exploitation. This adaptation is found in Asterids, Monocots and one truly bizarre New Caledonian conifer, but apparently not in Rosids. In the past, people often believed that this second group was saprophytic, and one can even now see books making that mistake. In reality, there are no saprophytic plants; they are all either photosynthetic or parasitic.

Now the interesting thing is that according to Sasa Stefanovic, the community of parasitic plant researchers calls only the first group parasites, whereas the second group is called mycotrophic or heterotrophic. I must admit I find this a bit strange, as they are clearly both parasitic, only on different groups of organisms, and both heterotrophic. What is more, the people who are still stuck with the impression that the second group is not parasitic would not have their confusion cleared up if they heard these two terms used in this way.

But well, if that is what the community has decided, that is that. Not my area. At any rate it was interesting to learn how these two forms of parasitism are distributed phylogenetically.

Sunday, February 5, 2017

Two weird realisations about the gay marriage discussion

Certainly other people will have had the very same realisations long before me, but it has only recently occurred to me how absurd some of the rules around what I will for the moment call 'gay marriage' are.

First, in Australia two women, for example, cannot get married at the moment. However, a man and a woman can get married, and then the man could come out as transgender and transition to become a woman. This results in two women being married.

So how do these two make sense at the same time?

When we discussed that issue, it was pointed out to me that in a specific European country, gay couples may currently not adopt children, presumably under the idea that children need a mother and a father to grow up normally. However, in that country a single person can apparently adopt children.

Again, how do these two make sense at the same time?

As always this is my personal opinion only, and I am not speaking in any official capacity, but in my personal opinion this drives home how utterly ridiculous it is to disallow gay marriage. I have yet to see an argument against gay marriage that passes the laugh test, let alone critical examination.