Artificial Intelligence, Center For Practice Management, Research, Search

Tools to Compare Outputs from Multiple GenAI Models

AI aggregatorsThere are many free and paid generative AI LLM models available. When comparing their outputs, each displays strengths and weaknesses. This is similar to search engines like Bing and Google, which produce different results for the same query. To search multiple web indexes at once, we used search aggregators or “meta” search engines like Dogpile and MetaCrawler. Now, a few new generative AI aggregators let you send a query and compare or combine results from multiple AI platforms.

Poe (Platform for Open Exploration)

Poe (https://poe.com/) gives you access to AI models from many different companies in a single interface, including ChatGPT, Claude, Gemini, DeepSeek, Grok, Llama, plus image, video, and audio generation models, and millions of user-created bots. Quora, a discussion platform, owns and develops Poe.

You must create a login to use Poe. The subscription is free with a limited number of usage points. Paid plans start at $50 a year with 10 thousand points per day and full context length for each bot – up to 2M tokens (equivalent to 1,400,000 words).

To get started with Poe for comparison of model outputs, go to the website or download the iOS, Windows, or Android app and log in. In your settings, you can choose your default large language model (LLM) or bot. On the home page, you will see a box to start a new chat. Three LLMs appear at the top. To change the model, click the magnifying glass icon and choose a different product. Start your chat with a prompt. You can also upload a file. Poe will generate a response, which you can share or retry. Below the response, you can click to compare it with other models without needing to retype the prompt. You can continue to compare outputs with other models. In the left navigation pane, you can see the prompt, how many models you compared, and see each of the results by clicking on the prompt. If you right-click on the results page, you can reply, follow up, copy the message, share, edit, or delete the chat session.

Because Poe interacts with third-party LLMs, their Privacy Policy states: “Keep in mind, any information and files you provide to the bots on Poe will be shared with third-party AI model providers and developers powering the bots… For bots powered by third-party LLMs and developers, your creation and use of such bots is subject to their policies (OpenAI’s can be found here; Anthropic’s can be found here; Google’s can be found here; Llama 2 can be found here; Ideogram’s can be found here).” In other words, users should not input confidential, privileged, protected, or sensitive information. This is the rule, rather than the exception, when using any free LLM or chat tool.

In addition to comparing outputs from multiple models, Poe has an “Explore” option. You can search or browse through bots, apps, or people. You can find tools for image, audio, and video generation, reasoning, search, hobbies, games, text analysis, and more. Each tool in the directory includes a description with details such as which LLM powers it, its purpose, and additional information. For example, if you want an AI add-on for Excel, just type “Excel” in the search to see a list of bots and apps, what they do, how many monthly users they have, and more.

Ithy

Ithy’s (https://ithy.com/) home page states, “What happens when you combine every AI?”.  The search box at the top has a toggle between Fast Research and Deep Research beckoning “ask me anything”.  Ithy stands for “I think why” and generates research papers by aggregating the capabilities of multiple AI models.

You do not have to create an account to use Ithy, nor do you have the option to upload files. Simply input your query, choose fast or deep research, and Ithy brings ChatGP, Gemini, and Perplexity responses together and synthesizes them into a single response. One exceptional quality of Ithy’s output is that in one panel you can see each model’s response to the query, as well as a main panel that shows the single “article” generated. This article generated from the aggregated output can be downloaded, shared, or you can continue to chat. The generated articles provide a table of contents, a list of sources, include images, charts, visualizations, tables, video content, recommended reading and more.

You can create a free account with Ithy to access saved queries and earn bonus points, as the free account is limited to 10 free articles every 10 days. A free account also lets you provide custom writing instructions, select a provider for external chat, and toggle off specific visuals that appear in the output articles. Signed-in free users can create folders for the output articles and search within their generated articles.

Be aware that while you can use Ithy without a login, in the Terms of Service the company notes that questions asked publicly (without logging in) may have their responses indexed by search engines. Whether you have a paid or free account, do not share confidential, sensitive, or protected information in your query or uploaded files.

Ithy has a paid Pro plan for $120 per year that includes unlimited research, more models, longer inputs, and the ability to upload unlimited images and PDFs.

SNEOS

SNEOS (https://sneos.com/) tagline is “Write Once, Get Insights from Multiple AI Models”. The tools was created by Victor Antofica, a developer in Sweden, with the vision to leverage the unique strengths of multiple AI models simultaneously.

Like Ithy, you do not have to register or create a login to use SNEOS. The home page is simple to use. At the bottom of the screen there is a query box, where you can also upload a PDF or add a screenshot. Type your query, hit “Send” and the responses from ChatGPT, Claude, and Gemini appear side by side on the screen. Below the search bar an AI Response Comparison is generated by Gemini and highlights areas of significant difference and unique insights from each model, including a detailed comparison table. Gemini also produces a score for best answer to the query, where it almost always gives Gemini the highest score.

Despite being logged in, SNEOS does not preserve queries, so make sure to copy the results. You can refine your query, but the refinement will be the same for all the models, even if they display different enough responses to require prompt refinement for one but not all of the models.

There is a premium version of SNEOS, for $29 per month that includes 5 models, live web search, and more. SNEOS is useful for a quick look at whether one LLM handles a query better than another, but probably not worth the price for the premium version at this time.

As with any free tool, read the terms and privacy policy and refrain from sharing, prompting, or uploading confidential, sensitive, or private information.

Conclusion

The main advantage of tools that aggregate and query multiple large language models (LLMs) is that they help determine which model provides the best output, as well as helping identify biased results.