Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> a state-of-the-art research tool over Hacker News, arXiv, LessWrong, and dozens

what makes this state of the art?





It's just marketing.

It is not a protected term, so anything is state-of-the-art if you want it to be.

For example, Gemma models at the moment of release were performing worse their competition, but still, it is "state-of-the-art". It does not mean it's a bad product at all (Gemma is actually good), but the claims are very free.

Juicero was state-of-the-art on release too, though hands were better, etc.


> It's just marketing. [...] It is not a protected term, so anything is state-of-the-art if you want it to be.

But is it true?

I think we ought to stop indulging and rationalizing self-serving bullshit with the "it's just marketing" bit, as if that somehow makes bullshit okay. It's not okay. Normalizing bullshit is culturally destructive and reinforces the existing indifference to truth.

Part of the motivation people have seems to be a cowardly morbid fear of conflict or the acknowledgment that the world is a mess. But I'm not even suggesting conflict. I'm suggesting demoting the dignity of bullshitters in one's own estimation of them. A bullshitter should appear trashy to us, because bullshitting is trashy.


I would vote for you as dictator.

If my comments were only state of the art I wouldn't need to write them.

just like "cruelty free" and "not tested on animals" in usa

The scale. How many tools do you know that can query the content of all arxiv papers.

Doesn't look like the scale is there, even for HN:

> Currently have embedded: posts: 1.4M / 4.6M comments: 15.6M / 38M That's with Voyage-3.5-lite


The scale is there. I'm scraping, cleaning, token efficientizing dozens of sources every single hour. The lack of monies for embedding everything was a temporary problem.

in the direction of "empowering the public with new capabilities they didn't have before", Scry offers, with the copy and paste of a prompt and talking with an agent:

1) Full readonly-SQL + vector manipulation in a live public database. Most vector DB products expose a much narrower search API. Basically only a few enterprise level services let you run arbitrary SQL on remote machines. Google BigQuery gives users SQL power, but it mostly doesn't have embeddings, connect public corpora, have as good of indexes, and doesn't have support an agentic research experience. Beyond object-level research, Scry a good tool for exploring and acquiring intuitions about embedding-space.

2) An agent-native text-to-SQL + lexical + semantic deep research workflow. We have a prompt that's been heavily optimized for taking full advantage of our machine and Claude Code for exploration and answering nuanced questions. Claude fires off many exploratory queries and builds towards really big queries that lean on the SQL query planner. You can interrupt at any time. You have the compute limits to do lots of exhaustive exploration--often more epistemically powerful than finding a document often, is being confident than one doesn't exist.

3) dozens of public commons in one database, with embeddings.


The tool is state of the art, the sources are historical.

First, so best in this?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: