Author |
Topic |
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
sno4wy
Senior Scribe
USA
466 Posts |
Posted - 26 Dec 2022 : 13:43:36
|
I'm always for compilation of Realms knowledge and the increase of FR resources. I am however curious about something: how would this project differ from the FR Wiki? It's difficult to convey tone through text so please allow me to clarify that I am genuinely curious, thanks in advance! |
|
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
Wooly Rupert
Master of Mischief
USA
36805 Posts |
Posted - 26 Dec 2022 : 16:18:54
|
We already have the FR Wiki.
I believe it was Steven Schend that said that TSR once looked at making a comprehensive database of FR lore, and that even then, in the days of 2E, they decided it was going to take too much time and effort to be worth it. |
Candlekeep Forums Moderator
Candlekeep - The Library of Forgotten Realms Lore http://www.candlekeep.com -- Candlekeep Forum Code of Conduct
I am the Giant Space Hamster of Ill Omen! |
|
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
TheIriaeban
Master of Realmslore
USA
1289 Posts |
Posted - 26 Dec 2022 : 22:44:34
|
Personally, I would create a database. I would scan/OCR the pdfs/documents by chapter. I would then write something that could read the resulting text documents that would break it up by paragraph and send the data to the database. The database would consist of records containing the book name, author(s), edition, year printed, chapter number, chapter name, page number, paragraph text, associated tables. Tables could be stored as graphic files and as a list of strings of the data in the table (that way the table data can be searched as well). If really wanted, you could include images from the books as well in a similar manner (graphic file, description of the image).
But, I am just crazy that way. |
"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."
My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents
|
|
|
Italian Archmage Karsus
Learned Scribe
126 Posts |
Posted - 26 Dec 2022 : 23:37:09
|
Uh, would that be in any way similar to www.Askvalhaeria.com ? I've been working on it on and off lately. Maybe we can exchange some notes? |
|
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
questing gm
Master of Realmslore
Malaysia
1456 Posts |
Posted - 27 Dec 2022 : 09:31:45
|
I was working on my own wiki (using Tiddlywiki) that is essentially copying and pasting all the text from sourcebooks (and eventually all printed materials if my remaining lifetime lasts long enough). But given the demotivating state of lore, I have pretty stopped at Ruins of Undermountain (I skipped most of the crunch-heavy supplements in between). Maybe I will pick it up again once I'm done with my Eberron wiki.
Would very much like to know if there were a faster way of copying and pasting into digital text (scanning sounds great if there's a way to convert them into eligible digital text that can be searched later), and/or open to any effort that needs cooperation/collaboration for this sort of thing to make lore-finding easier and deeper.
quote: I believe it was Steven Schend that said that TSR once looked at making a comprehensive database of FR lore, and that even then, in the days of 2E, they decided it was going to take too much time and effort to be worth it.
Well, we need some crazy ones who will try.
I'm already pretty happy that I have a wiki of Ed's #realmslore tweets down to his first tweet on Twitter (and that sounded like a crazy project to myself when I started). |
Edited by - questing gm on 27 Dec 2022 09:32:53 |
|
|
Ayrik
Great Reader
Canada
7989 Posts |
Posted - 27 Dec 2022 : 10:06:14
|
A problem inherent to databases is that they tend to fail when queries and fields contain the same definitions but different contents.
The Realms has seen many inconsistencies and self-contradictions from many authors over many years, many revisions/retcons, many editions. Separating and categorizing different versions of the "same" data vastly increases database complexity. Even wikis, archives, and repositories tend to only include "preferred" versions of data while discarding all others which cause conflict. |
[/Ayrik] |
|
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
TheIriaeban
Master of Realmslore
USA
1289 Posts |
Posted - 27 Dec 2022 : 16:31:51
|
Even MS SQL has a limit of 1024 columns per table. That is why you have multiple tables to hold the data.
Also, I am not sure about licensing on your side of the pond but in the US, we can get a free copy of MS SQL Express.
Finally, depending on your home network topology, you can open a path to a machine so you can host the database yourself. That means sacrificing some of your internet connection speed to people connecting to the database. I am not sure about how much your ISPs charge but you could get a second connection just for database access. (I have looked into this kinda thing before because I was writing a game that would need a backing DB and I needed someplace to host it. I have since cancelled that project as my interests have moved to other areas.) |
"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."
My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents
|
|
|
Seethyr
Master of Realmslore
USA
1151 Posts |
Posted - 27 Dec 2022 : 17:01:17
|
Slightly off topic but I’ve been begging for a historical update - like a second GhotR for years now. Fill in any bits of lore left out of the first iteration and then continue on as it was into the modern FR day. A document lime that could be living and breathing and also wipe out inconsistencies using some explanation and handwaiving as to why the confusion. |
Follow the Maztica (Aztec/Maya) and Anchorome (Indigenous North America) Campaigns on DMsGuild!
The Maztica Campaign The Anchorome Campaign |
|
|
Wooly Rupert
Master of Mischief
USA
36805 Posts |
Posted - 27 Dec 2022 : 17:05:04
|
quote: Originally posted by Seethyr
Slightly off topic but I’ve been begging for a historical update - like a second GhotR for years now. Fill in any bits of lore left out of the first iteration and then continue on as it was into the modern FR day. A document lime that could be living and breathing and also wipe out inconsistencies using some explanation and handwaiving as to why the confusion.
That would be awesome but the current design team wouldn't want to do that. They've created many of the inconsistencies themselves and they have no interest in doing anything that would limit their "vision" -- like adhering to existing lore. |
Candlekeep Forums Moderator
Candlekeep - The Library of Forgotten Realms Lore http://www.candlekeep.com -- Candlekeep Forum Code of Conduct
I am the Giant Space Hamster of Ill Omen! |
|
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
TheIriaeban
Master of Realmslore
USA
1289 Posts |
Posted - 27 Dec 2022 : 22:13:17
|
I am guessing the plan is to have multiple people, each with their own copy of the database, inserting data? What is your plan for merging those databases? You are going to have to watch for duplicated records during the merge.
I typically write stuff in PowerShell since it is easy to prototype and maintain. Here is a link to some stuff for SQLite and PowerShell: http://ramblingcookiemonster.github.io/SQLite-and-PowerShell/ By using PowerShell or some other automation tool, it will make it faster to get data into the database.
I would help now but life has decided that the year was just going to well for me and that I needed to get a reset: my sewer line broke right after Thanksgiving and flooded my basement. I lost a few computers in that (happily, my network stuff and domain controllers were not on the floor). On the bright side, I am getting my basement renovated and a newer computer or two. But. that means that things are going to be hectic enough for an unknown time that my ability to really contribute is going to be restricted. |
"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."
My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents
|
|
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
TheIriaeban
Master of Realmslore
USA
1289 Posts |
Posted - 27 Dec 2022 : 22:28:12
|
Ok, here is a link to something that will allow a tool to be written that can pull data from a pdf to create the required csv file. That would allow a published work to be processed into the needed csv files in a week or so (assuming you process a chapter at a time and use a database schema similar to the one I suggested above). Tables/graphics would have to be handled manually to a certain extent.
http://allthesystems.com/2020/10/read-text-from-a-pdf-with-powershell/
|
"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."
My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents
|
|
|
BadCatMan
Senior Scribe
Australia
401 Posts |
Posted - 28 Dec 2022 : 06:29:23
|
Just a note, publicly presenting large volumes of text copied from the sourcebooks opens up problems of copyright infringement. While WotC don't seem to care unless it's non-OGL rules from the current edition, it could be an issue.
The Forgotten Realms Wiki has been going for 17 years and is still not even close to complete coverage of the Realms, which goes to show the scale of the task proposed by TSR and it's even bigger now. Of course, the need to research, write, and curate original articles does slow the FRW down.
Ask Valhaeria is the slimmed-down version of Italian Archmage Karsus's project. The full version has novels and sourcebooks as well, and it has been an Oghma-granted blessing to Realmslore research for us FRW editors. Italian Archmage Karsus and I discussed it a bit more here: https://candlekeep.com/forum/topic.asp?TOPIC_ID=24572 |
BadCatMan, B.Sc. (Hons), M.Sc. Scientific technical editor Head DM of the Realms of Adventure play-by-post community Administrator of the Forgotten Realms Wiki |
|
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
questing gm
Master of Realmslore
Malaysia
1456 Posts |
Posted - 28 Dec 2022 : 10:48:27
|
quote: Originally posted by TheIriaeban
Ok, here is a link to something that will allow a tool to be written that can pull data from a pdf to create the required csv file. That would allow a published work to be processed into the needed csv files in a week or so (assuming you process a chapter at a time and use a database schema similar to the one I suggested above). Tables/graphics would have to be handled manually to a certain extent.
http://allthesystems.com/2020/10/read-text-from-a-pdf-with-powershell/
I like where this discussion is going and would very much like to try the writing tool shared on top. But not being the computer programmer that I am, I'm not sure how do I use said tool. If you don't mind guiding a complete dummy, I've downloaded and extracted the tool but what else do I need to install before I can run the script, and do I actually need a database infrastructure for this?
|
|
|
TheIriaeban
Master of Realmslore
USA
1289 Posts |
Posted - 28 Dec 2022 : 17:48:39
|
quote: Originally posted by questing gm
quote: Originally posted by TheIriaeban
Ok, here is a link to something that will allow a tool to be written that can pull data from a pdf to create the required csv file. That would allow a published work to be processed into the needed csv files in a week or so (assuming you process a chapter at a time and use a database schema similar to the one I suggested above). Tables/graphics would have to be handled manually to a certain extent.
http://allthesystems.com/2020/10/read-text-from-a-pdf-with-powershell/
I like where this discussion is going and would very much like to try the writing tool shared on top. But not being the computer programmer that I am, I'm not sure how do I use said tool. If you don't mind guiding a complete dummy, I've downloaded and extracted the tool but what else do I need to install before I can run the script, and do I actually need a database infrastructure for this?
That article just shows an example. It doesn't do anything with the text read from the pdf so there isn't anything to see after it is run. You would need something like the following line at the end (which puts an output file in your My Documents folder named pdftext.txt:
$text | Set-Content -LiteralPath ([Environment]::GetFolderPath("MyDocuments") + "pdftext.txt")
If you are new to PowerShell, there is an intro at the link below that does not appear to be overly complicated (the one from MS includes stuff about Visual Studio that may unnecessarily confuse people):
Honkin' big link
One note on that intro, when it mentions about changing your computer's script execution policy, I would HIGHLY suggest you DO NOT use Unrestricted. It is mentioned there out of completeness, and I am sure the author would agree with my recommendation (I am sure the other tech sages here would, too).
Mod edit: Did something with that URL that was stretching out the page. |
"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."
My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents
|
Edited by - Wooly Rupert on 02 Jan 2023 19:33:17 |
|
|
TheIriaeban
Master of Realmslore
USA
1289 Posts |
Posted - 28 Dec 2022 : 18:32:56
|
quote: Originally posted by Gary Dallison
Copyright and capitalist greed mean little and nothing to me. Sqlite is a local database so it is not going to be for everyone, mostly hardcore fans and designers as a research aid. Assuming that is, I get anywhere close to completion.
I'll check out the powershell. I prefer to go over text manually as accuracy is important. But at the same time, getting large volumes of formatted text would be nice. A big stumbling block will be the sourcebooks that are not OCR or are poorly OCR, for instance many of the dragon magazines dont OCR well because of the background and the Faith's and pantheons and other god books is a pain. If anyone knows a foolproof OCR program that would be great.
Otherwise it's just me, and any other nutter that wants to read and copy out every paragraph of realmslore and then painstakingly categorise it with a series of 0 / 1 flags in a csv file.
It's a good job I have no life or friends as that would seriously get in the way.
I have a tool that I wrote to OCR stuff. It is over 10 years old at this point and needs an update. I may be able to convert that into a PowerShell script as well that will use Tesseract (a widely available OCR code library). If I can get that done, let me know if you would like that. |
"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."
My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents
|
Edited by - TheIriaeban on 28 Dec 2022 18:36:15 |
|
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
Posted - 28 Dec 2022 : 19:08:10
|
Well thus far i am building out the schema.
I have a single column for the quote itself which can be a few words, or a paragraph, or several paragraphs in size (thankfully sqlite doesnt care).
Then there is the Source column which will hold the document and page or chapter number (for novels where the structure is different depending upon the version) as a text string.
After that are a series of integer flags holding either a 0 (for false) and 1 (for true) which are labelled according to the person, place, item, deity, etc. Thus far i have 60+ such flags and i believe sqlite can hold many thousands of them. A numeric flag keeps the size low (1 byte per flag)
I intend to tag each quote with a 1 for whatever subject flag is appropriate (so a quote including information about Elminster and his dealings with Manshoon in Shadowdale would have a 1 for Elminster, a 1 for Manshoon, and a 1 for Shadowdale, everything else would be 0 which all flags default to).
I will then create a procedure for each flag to pull the quote and source field for every row where the specific flag = 1 (for example every row where Elminster = 1).
sqlite allows this data to be exported to a csv file.
sqlite stores the data, schema, and procedures in a single file that can be loaded using the command line, or using a gui program (i'm using sqlitestudio. So to share it with others, all i have to do is send them the file and they install sqlitestudio. |
Forgotten Realms Alternate Dimensions Candlekeep Archive Forgotten Realms Alternate Dimensions: Issue 1 Forgotten Realms Alternate Dimensions: Issue 2 Forgotten Realms Alternate Dimensions: Issue 3 Forgotten Realms Alternate Dimensions: Issue 4 Forgotten Realms Alternate Dimensions: Issue 5 Forgotten Realms Alternate Dimensions: Issue 6 Forgotten Realms Alternate Dimensions: Issue 7 Forgotten Realms Alternate Dimensions: Issue 8 Forgotten Realms Alternate Dimensions: Issue 9
Alternate Realms Site |
|
|
Italian Archmage Karsus
Learned Scribe
126 Posts |
Posted - 28 Dec 2022 : 23:43:30
|
Then, GD, let me suggest something.
I think your schema may be improved by an intermediate table, a many-to-many relationship rather than subject flag rows. You don't even know if 1024 rows will be enough lol, the Realms have a ton going into them. You might be better served by having a SUBJECT model, and an intermediate table for RELEVANCY. Modeling the SUBJECT as a STRING, you can then make an intermediate table of RELEVANCIES with a UNIQUE constraint for each pair for QUOTE and SUBJECT. Valhaeria does this- I can add tags to books at will by simply declaring a new tag, and I don't have to worry about the number of tags.
If most of your task is going to be copypasting, I suggest you find a way to script the gruntwork, too. Populating the database for Valhaeria is obviously a task beyond my frail hooman endurance or even lifespan, so obviously I automated as much of it as possible. Also, unlike with Valhaeria, you seem to intend to be more involved with it, so perhaps adding tools to curate your work might be better. Maybe let an automated script do a rough pass, looking for obvious hits, then you can eliminate any false positives yourself and manually add whatever's missing?
As for OCR, personally, I suggest you should look at OCR that's already been done for you to start with. There's no perfect OCR, as we've oft lamented with Valhaeria: the only hardware that gets anywhere near is Eyeball v0.97 and those cost a fortune. Most OCR gets you 97% of the way; if you can handle manually typing the remaining 3% on yer own, you've got it made, and there's a lot of other people you can help, that way. DJVU files at Archive have some of the Dragon mag articles OCR'd, and the kind fellas in this website may yet hold txt copies that don't need to be OCR'd. I know of a scarce few articles in Dragon that were typed up- just wish there were more of those.
Ah, I forgot to ask- do you have a programming language in mind, or someone who can code? Even TSR passed up on the task; every grain of automation will mean take score long tons from your shoulders. I think you're going to have a much greater strength in that you intend to be way more involved with your encyclopedia than I am with wiki editing or Valhaeria itself. To use that specific strength the best, you'd better focus your efforts away from anything that can be automated- and onto the tasks that will require that love you have for the Realms. Because the small tasks will deplete your strength just the same as the big ones, for vanishingly thinner payoffs, and I'm speaking from experience. |
|
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
TheIriaeban
Master of Realmslore
USA
1289 Posts |
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
Gary Dallison
Great Reader
United Kingdom
6361 Posts |
|
Topic |
|