Dall-E is happy to be an author named on a scientific journal article.

Too Good to be Accepted

13 minute read

TL;DR You have been rejected at DH. This may well be because you are more expert than many DH peers and your work is more advanced in a very particular niche. There is no hard data on this type of rejection, but it would be interesting to figure out how to do the bibliometrics and actually do them. Anyway, do not sulk, do not mock. Submit to satellite conferences such as CHR and CCLS for your true peers. However, stay connected to DH, remain welcoming and open minded to the ideas and newbees that circulate in DH venues.

Is the quality at DH venues sinking?

Many really good papers get rejected at DH conferences. Complaints about the quality of the papers that get in are rife. This is not all knee-jerk response from the authors that get rejected. Yes, getting a rejection on a good paper that you poured your heart and soul in, is an utter personal insult and just very hard to deal with in general. Trust me, there are forty years tenured full professors that get drunk the same night just to prevent themselves from buying a chainsaw to go primeval on the program committee. But to the outside world we maintain the civility that “rejection is part of academic life”. However the levels of good-but-rejected and bad-but-accepted do seem unsatisfying particularly in the case of DH conferences.

My personal perspective on this is that this is an inevitability for a growing community in an intrinsic interdisciplinary field. You cannot be expert in everything. I think I can just about manage vouching for a paper that attacks a literary problem in a not too advanced statistical manner. Let’s say I can judge the methodological and subject matter quality of a paper that deals with the rise of gender as a motive in Dutch literature from 1800CE to 2000CE. I think I would be able to evaluate the operationalization of that question. This alone involves many difficult problems. How does one establish that a motive of gender is actually present in a novel? And if it is present how is it counted? Do we count instances of gender motives into some categories? Do we take the number of words dedicated to the motive as a proxy of its prevalence? How to deal with the near sure possibility that 19th century authors were less open on this matter than 1980s authors? Do we acknowledge the fact that the subject has had many unheard voices or do we take what has been published as de facto “literature”? The problem is so drastically multifaceted that we will have years of test driving different operationalizations before we can even start thinking about truly taking our results as more than “interesting”. Oh… and this is the subject matter side. On the other hand I need to evaluate whatever the author decided to do with middle of the road statistics, or worse with fancy Bayesian cultural evolutionary models, or LLMs. I can actually check frequentist probabilities, I know how to interpret their numbers and am able to see if the calculations make sense. I can more or less follow what the Bayesian kool kids are doing, and I developed more or less a feeling for when their numbers do not really add up. I am as non-knowledgeable about LLMs as any of us currently.

And I am kind of an expert in my field. So people tell me, okay? I would never self-address myself as such. That is something dumb people do. That is all just to say that being a decent peer in a deeply interdisciplinary field is really really hard. As soon as an author is talking about, say, 12th century Chinese Buddhist manuscripts, I am totally lost on subject matter. All I can do is try to understand what the author wants to do regarding that subject matter, and assume that that research question is actually opportune and interesting for the people in the research community that is involved with the study of 12th century Chinese Buddhist manuscript. Operationalization: not a chance in hell my judgement is going to be expert level informed. Way too many confounding factors that I do not even know exist. Checking the statistics? Well, to the point that I will have to believe the counts of some form that arise from an operationalization I cannot judge that is applied to data of which I cannot evaluate if the curation is congruent with the research challenge. Bracket all that insecurity, and I can tell you if the researcher applied a cosine distance measure decently. If I can read the code, that is. Or, barring access to code, what the author says the techy did who was actually supervised by a statistician from the math department whom the author asked for advice and, chances are, who knows even less about manuscripts than I do.

You can kind of see, I hope, of how many factors and aspects one actually needs to have at least a bit of a grasp of to be a good peer reviewer in DH. Now, this problem scales. Put more domains in the meanwhile proverbial big tent and the chances of hitting a subject matter knowledgeable peer dwindle to parts of percentages.

What does your evidence look like? Different styles of science

With domains and subdomains come particular conventions for method and style. The NLP people like the following. You start with a very concrete, very clear, very detailed problem description. Then you follow by a concise and clear data description. The next section describes the methods and techniques used. It is important (for some reason) that this section is undone from any, and I really mean any, fluff and detail that may be found in previous publications already, so that your less informed peer will have to look up at least twelve other publications to puzzle together what the bone dry skeleton idiomatic sentences interspersed with “[3,12,4,5]” and “[6,5]” mean. Now come the results, preferably as F1 measures listed together with how other approaches in the past performed. This is called evaluation. Your purpose in life as a peer reviewer coming from the NLP domain is to stomp down on any paper that does not have such a meticulous evaluation section. To kill it. To mock it. And to disqualify the whole community that enabled the existence of such a paper. Just saying, really. Lastly, you provide a bit of discussion and, if you really cannot help yourself, a future directions section. Done.

This is rather remote from someone arriving from philosophy whose paper as a first sentence has: “My project is to interrogate the tacit metaphysical presuppositions underwriting our ordinary grammar of agency, and thereby to delineate the conditions under which responsibility can be said to emerge as a normatively binding structure within shared forms of life.” My NLP peer will have ran screaming before even reaching the first comma. Yet the philosopher may be making a completely valid contribution to a DH conference, because she is interested in how the subtleties of Wittgenstein’s thinking may speak to the methods we use for inquiring into language used in scientific discourse. This does not need stringent computational tractability and operationalization yet. It just needs decent examples that intrigue and challenge our current thinking. Because, you know, that is what we do in the humanities: create and test drives new intellectual ideas and reasoning, to see if they are palatable and lead anywhere interesting. Yep, that is very different from problem solving, and from task scoring boards hunting. It did however inspire centuries worth of new thought. If you are not interested in that as a computational linguist, that is fine. Just do not exhibit the hubris of some finer specimens I witnessed, saying that it is all just rubbish in DH because they did not recognize it as being research. That is your limited understanding of science speaking sir, not actual wisdom. Oops. Apologies if I sounded a bit… nasty there, that is because I have an opinion about that. Go read a book on the history of philosophy some time to value its actual pivotal importance.

No worries, everyone gets a wack with the same bat here. Plenty of stupidity and misunderstanding of how other fields operate in other specialists too. Because yes, the historian specializing in ancient Nubia who got into DH “because data” is very much at risk doing the same thing. He just read a marvelous argumentative contribution on the sense and nonsense of LLMs for historical research. It was enlightening, for in fact it was well written, it did point out some valuable do’s and don’ts, and it informed him how he could make use -responsibly and FAIR of course- of some LLM annotation for his data. A great timesaver found there. There, 90% relevance, solid contribution. But now… he is looking at this terribly densely written paper, with formulas and what have you. It seems to want to pack years of math learning into five hundred words, but it focuses on improving HTR with what? 1%? Why?! What use? HTR was last year! Okay, let’s write it is “solid work”, “a moderate contribution”, “more fit for a poster maybe then a long paper”. He is actually right, from his domain’s perspective, and at the same time he is “not even wrong”.

So it is all rather like too many Venn diagrams of domains with different research styles and different styles of communication. What counts as a valid research question and what counts as evidence in domain A will be vastly different from what counts as research and valid contribution in another. For all of the above, my reasoning is that any intrinsic interdisciplinary field will have more peer review misunderstanding than any sharply focused single subject or single method venue. So yes, the good-but-rejected paper ratio will be higher, so will be the bad-but-accepted paper ratio. Now, the hard questions are: is DH doing particularly worse than other interdisciplinary fields? And: is DH particularly harsh on more technical contributions?

Could we measure it?

I do not have the answers to that. Yet. It would be interesting research wouldn’t it? Has anybody done this? Not so much for DH I think, but there might be other conferences that have their bibliographies perused for similar questions. The measure is not too hard to think up. It is just an F1, right?

where:
GA = Good and Accepted
GR = Good but Rejected
BA = Bad but Accepted

That will give you a harmonic mean telling you how good DH is at selecting ‘good’ papers. The harder part is figuring out what the “good-but-rejected” and “bad-but-accepted” papers are. We cannot just take the word of the authors for it. Or well we actually could. We could just do that survey. We could do an appeal to honesty and ask if they themselves thought it really should have been accepted, also because they can point to a later paper that was accepted elsewhere. See what those numbers tell us. But it will require some thought on how to research this well from a bibliometrics point of view. There must be ways however, I am sure. We can probably do some Bayesian statistics, maybe even just based on keywords and titles of papers, see what becomes more probable for rejection or acceptance over time, controlling for unavoidable buzzword fads. Something such. Maybe also a simulation? Any takers yet?

Open minds, generous research

The particular unease that we feel about having been rejected, or accepted for that matter, to DH, has driven many of us towards CHR and CCLS. In a sense this is a good thing. It is good that there is broad DH and deep CH. At least my personal take on that is: it is good that there is some sort of entry level where curious minds can come into the field of DH and CH. This is what ADHO DH and EADH are good for. Plus, DH is the community motor around the world. I think hardliners in CHR and CCLS should be wiser than to just ridicule the work that is put into these venues. Yes, sometimes not too clever things are done there, but people coming from a humanities background cannot go from zero to 100% full on computational savvy in a matter of a year. Big tent venues like DH have a role here. CHR and CCLS should actually make sure they keep their ties to DH venues warm. We all need these connections to be strong to make sure that the right experts find each other. Being an expert is easy. Connecting expertise, fostering a community for the better of science is hard, and mostly still not seen as a core contribution.

I think the worst thing we can do is feed some animosity between venues and subdomains. But it is hard to not walk into that pitfall. “Another rejection! And see here, that awful paper got accepted. F*** it. DH just sucks. I will only submit to x, y, z in the future.” Backing into our expertise and niche corners, sulking, is narrow minded. However, we really need broad-minded, and generosity. Of subject, of method, of heart. Be an expert. Then be generous with your expertise.

–JZ_20260304_2012

Postscript

This post grew out of a tongue in cheek thinking up of peer review scenarios gone wonky. By genuine good intent and being all expert professional we create our own rejection hell. I think these things genuinely happen. They drive the dynamics I wrote about in the post.

Peer incompatibility Suppose the paper is highly technical, but the peer is a theorist; or vice versa. The theorist reviewer will point out practical problems that are non-pertinent to the specifics of the use case the technologists present, like that it does not solve one very obscure very different problem that he witnessed in 1996. Peer will ask for more theoretical framing. Technical peer will complain that there is no application, let alone any evaluation, and that the paper comes across as muddled and confused anyway.

Peer Dunning-Kruger effect People susceptible to this effect will generally tick the box “My expertise, I am highly knowledgeable about this subject”, while true experts usually are more modest. Reviews will contain things like “the authors have ignored [this or that] body of research” or “there is a whole field dedicated to this problem, why didn’t the authors…” or, the worst: “this is a solved problem”. The level of brutality in the review is usually reversely proportional to the lack of knowledge the expert exhibits. (This last sentence requires thinking.)

Domain reporting style mismatch This is where NLP/STEM ‘problem-data-method-results-discussion’ style papers or expectations meet humanities ‘why-this-proposed-solution-actually-is-a-problem’ style papers or expectations. A humanist peer will likely say “Interesting subject, but I think this would be more fit as a short paper, or rather better even a poster”. The technical peer will point out that there is not even a research question in the paper, and that it in general seems a non-problem that is being discussed which the author could have known if he had read x, y, and z in the NLP domain literature.

Subject matter SOTA (state of the art) mismatch When the technical solution is highly appreciated, but non of the technical peers actually succeeded in pointing out that we literally know what happened at Waterloo basically second by second from historical sources and re-enactment and that therefore the 14 million Euros spent on the simulation with 100,000 individually modeled computational agents, which also required an expensive industry game studio to recreate the battle field and physics, actually does not add all that much.

Technology SOTA mismatch When the historians go absolutely wild and wanking because someone made a nice GUI and a model that you can ask “How long did it take to get from Colonia (Cologne) to Roma on horseback?” Nobody thinks it weird that the answer it comes up with is an averaged 800 hours or a 6 months turnaround, while we know that it would take a mere three days if the message was really important. The one peer that had statistical and topological knowledge tried to point out that the network model used basically added up to a stochastic modeling, but none of the PC understood what she was pointing out.

Last update