Large language models as research assistants

56 points by leononame a year ago

As a counterpoint, I appreciated this recent post by Martin Kleppmann[1]:

I've worked out why I don't get much value out of LLMs. The hardest and most time-consuming parts of my job involve distinguishing between ideas that are correct, and ideas that are plausible-sounding but wrong. Current AI is great at the latter type of ideas, and I don't need more of those.

[1]: https://bsky.app/profile/martin.kleppmann.com/post/3kquvol6s...

WanderPanda a year ago

In the end an "idea" is by definition contrarian which is the opposite of the training objective of LLMs. The question is how far fine-tuning and tree-search can go to extrapolate from the data-manifold. And the answer is probably not that far, currently.
- archagon a year ago
  
  I’m personally not interested in using LLMs in my work, but I’d push back on this. A good idea can also synthesize a number of existing concepts into a new one, and LLMs appear to be well-suited to pattern recognition.
mr_mitm a year ago

LLMs aren't a good fit to do the hardest part of our job. They're great at doing mental routine tasks, though. They take the easy, boring, menial yet necessary parts of our jobs off our hands.
leononame a year ago

I think it's valuable for brainstorming and refining my texts. As a non native, it helps me immensely in correcting errors and sometimes weird phrases that I accidentally translate literally without noticing. But it's only helpful when I know enough about the source material to judge the output. I wouldn't trust it e.g. in sieving through other people's ideas or applications
- wizzwizz4 a year ago
  
  > As a non native, it helps me immensely in correcting errors and sometimes weird phrases that I accidentally translate literally without noticing.
  Warning: every time I have seen somebody write that, and seen an example of their writing, it's been fine to start with but the LLM has completely trashed it. See: https://meta.stackexchange.com/a/396009/308065

ukuina a year ago

> Idea generation. I used to spend a lot of time chatting with colleagues about a vague idea I had. “How could we check whether X is true?” A tool like ChatGPT can help you get started. If you ask how to design an experiment to check a given hypothesis, it can often do a surprisingly good job.

While GPT4 can recognize an innovative idea, I have yet to see it suggest such an idea, or successfully extrapolate or question the idea. If you are beyond the "concept space" of the model, it is not going to help you explore it.

drycabinet a year ago

But some random word in its response can trigger an idea in your mind. Getting an idea from a conversation is not always about getting it directly. It's already in you and you just wanted a trigger.
- jprete a year ago
  
  Rubber-ducking is useful, but nobody gives the rubber duck anywhere near as much credit as AI enthusiasts give to chatbots.
  
  littlestymaar a year ago
  
  You underestimate how much credit I give to rubber ducks ;)

cl42 a year ago

I think LLMs can do a lot more than people assume, but they need to be given the proper frameworks.

When was the last time a researcher, economist, etc. was given 10,000 papers and simply told "do some original work"? That's not how it works. Daniel (the author) provides some good examples where _streamlined_ work can happen, but again, this is pretty basic stuff.

To push this further, though, imagine LLMs that fill in frameworks... A few steps here: (1) do a lit review, (2) fill in the framework, (3) discuss what might be missing, and maybe even try and fill in the missing information.

I'm doing something like this with politics and economics (see: https://emergingtrajectories.com/) and it works generally well. I think with a ton more engineering, curating of knowledge bases, etc., one can get these LLMs to actually find some new "nuggets" of information.

Admittedly, it's very hard, but I think there's something there.

lnkdinsuxs a year ago

The achilles heels of current LLMs are:

1. Hallucinations

2. Prompt injections

Currently, there is no known way to detect either using LLMs themselves. As a research assistant, if the LLM hallucinates, and it always sounds extremely confident when it does so, the LLM itself is of little use and causes additional burden, defeating the whole point of this.

Maybe an external validation step that employs a pagerank like algorithm is needed to detect and flag hallucinations? If so, how valuable would that company be?

anon373839 a year ago

I’d add:
3. Steering them
The way prompting devolves into trial and error with little rhyme or reason can make it pretty frustrating to slot them into automation pipelines. It’s hard to foresee all the ways they can go off the rails or give strangely inconsistent results.

julienchastang a year ago

> It is quite certain that in the near future, a majority of all research papers will be written with the help of artificial intelligence. I suspect that they will be reviewed with artificial intelligence as well. We might soon face a closed loop where software writes papers while other software reviews it.

This is fine as long as there are humans trained in critical thinking skills (i.e., a liberal arts education) are monitoring every step in this loop ensuring that the scholarly output is of high quality. I am unfortunately not sanguine about this optimistic scenario.

> And this new technology should mediocre academics even less useful, relatively speaking. If artificial intelligence can write credible papers and grant applications, what is the worth of someone who can barely do these things?

Actually, I think the opposite of this is true where AI has the potential of leveling the playing field and increasing the productivity of less productive employees.

> Unsurprisingly, software and artificial intelligence can help academics, and maybe replace them in some cases.

I don't think so. Instead the individual components of academic workflows can potentially be accelerated by AI.

julienchastang a year ago

> Grant applications.

Inspired by a Wharton Business School study [0] I went down this road recently where I "primed" ChatGPT4 with an RFP (Request for Proposal from a US granting agency) and publicly available documents about the organization I work for. The ideas that it generated made sense, but were unfortunately way too generic to be useful. I am open to the idea that through better prompting, LLMs could be helpful here. As a first attempt in this arena, however, my initial results were disappointing.

[0] https://mackinstitute.wharton.upenn.edu/2023/new-working-pap...

andy99 a year ago

I think it's generally insulting to your audience to write with an LLM. If you don't care about what you're saying, why should someone care to read it? I hope the advent of automated writing will lead to reforms in the way research is presented, with less focus on boilerplate or other stuff nobody cares about (societal impact statements some conferences force on us).

For grant applications, I agree it's a great tool because they're rife with bureaucratic crap that nobody really needs to read or write. In time, again hopefully the system will be reformed to not waste time asking for stuff an LLM could generate.

bjourne a year ago

> I think it's generally insulting to your audience to write with an LLM.
But not as insulting as commenting on HN before even skimming the article you're commenting on. :)
BeetleB a year ago

> I think it's generally insulting to your audience to write with an LLM. If you don't care about what you're saying, why should someone care to read it?
I do care about what I'm saying. That's why I review the LLMs output and edit before sending. If an LLM can express what I meant to say better than I can, why would I not use it?
Personally, I don't do this because the LLM changes the style of my text too much and doesn't sound like me any more. But oh, I so do wish it could. Often I type a first draft of an email, and I know it needs (simple) editing. If an LLM could do it for me, I'd be very happy.
For research papers, writing the introduction is a large headache, and frankly, is often more of a ritual. It's the least important part of the paper. I mean, if all I had to do is describe the purpose of my paper, etc, that would be great. But a lot of referees want me to load it up with a lot more verbiage to satisfy dubious traditions.
Unfortunately, GPT can't do it for me. But it should.

vouaobrasil a year ago

If we truly need LLMs to be research assistants, then I question, are we really still doing useful things or just "playing the game" of research? I mean, if we need datacenters and models that cost millions to train and megawatts to run, is what comes out of it any use for us?

Scientific research has come to resemble gambling more and more these days, where there is an extremely obsessive quest to accumulate more data, theories, and information, rather than trying to figure out improvements to life.

falcor84 a year ago

I'm not following your argument at all. Yes, there are diminishing returns in science as in everything, but generally speaking, all other things being equal, the more resources you put into an endeavor, the more you get out of it.
One big example from recent years is AlphaFold, which required massive computation resources, and has since its release been an ongoing fountain of innovation for biomedical (and particularly pharmacological) application.
- hesiintle a year ago
  
  > has since its release been an ongoing fountain of innovation for biomedical (and particularly pharmacological) application.
  [Citation Needed]
  Last time I looked into it, as is often the case, the ‘actual’ results were much much more sobering than the headlines seemed to suggest.
  
  falcor84 a year ago
  
  Yes, the actual results are definitely not as impressive as the overly hyped headlines, but there's still a lot. First off, in terms of research building up on top of it, as of today, Pubmed shows 9,364 articles citing their 2021 paper, and Google Scholar shows 21,719 results as a whole[1], but these include non-biomedical papers (e.g. applications of similar ML models to other disciplines).
  As for actual applications, it's a bit early, but here's a relatively extensive review from 2023[2], and there's a particularly notable example showing progress on liver cancer [3]. Researchers are still figuring out how to utilize it, e.g. with search mechanisms like Foldseek[4], and I'm sure we'll see a lot over the coming years.
  But you're definitely right it's not a silver bullet on its own, and there's still need for actual scientific research. On this note there's this interesting paper on the "illusion of understanding" we get from working with AI.
  [0] https://pubmed.ncbi.nlm.nih.gov/?linkname=pubmed_pubmed_cite...
  [1] https://scholar.google.com/scholar?cites=6286436358625670901...
  [2] https://www.nature.com/articles/s41392-023-01381-z
  [3] https://www.genengnews.com/insights/first-application-of-alp... (paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9906638/)
  [4] https://www.nature.com/articles/s41587-023-01773-0
  [5] https://www.nature.com/articles/s41586-024-07146-0
hesiintle a year ago

I agree whole heartedly, except:
> quest to accumulate more data, theories, and information, rather than
You forgot “more money”.
canadianfella a year ago

[dead]