Candlekeep Forum
Candlekeep Forum
Home | Profile | Register | Active Topics | Active Polls | Members | Private Messages | Search | FAQ
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Forgotten Realms Journals
 General Forgotten Realms Chat
 FR Encyclopedia
 New Topic  New Poll New Poll
 Reply to Topic
 Printer Friendly
Next Page
Author Previous Topic Topic Next Topic
Page: of 3

Gary Dallison
Great Reader

United Kingdom
6339 Posts

Posted - 26 Dec 2022 :  09:37:08  Show Profile Send Gary Dallison a Private Message  Reply with Quote  Delete Topic
To help with research i've decided to organise everything FR by subject - region, organisation, deity, npc, etc (a mammoth task i know).

At the moment i've got it on a wordpress site (and its all private while Work in Progress).

Thus far i've done all the Realms By Night and Forging the Realms articles and am starting on the Class Chronicles now.

If anyone wants to help out i'd be glad of the assistance, as well as any suggestions on what i should use to actually host it as i fear wordpress will not be able to contain the volume of text for some subjects (like Cormyr, Mystra, or Waterdeep)

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site

sno4wy
Senior Scribe

USA
466 Posts

Posted - 26 Dec 2022 :  13:43:36  Show Profile Send sno4wy a Private Message  Reply with Quote
I'm always for compilation of Realms knowledge and the increase of FR resources. I am however curious about something: how would this project differ from the FR Wiki? It's difficult to convey tone through text so please allow me to clarify that I am genuinely curious, thanks in advance!
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6339 Posts

Posted - 26 Dec 2022 :  13:54:19  Show Profile Send Gary Dallison a Private Message  Reply with Quote
I'm literally just copying the text out of various books and articles and arranging it by subject.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

Wooly Rupert
Master of Mischief
Moderator

USA
36758 Posts

Posted - 26 Dec 2022 :  16:18:54  Show Profile Send Wooly Rupert a Private Message  Reply with Quote
We already have the FR Wiki.

I believe it was Steven Schend that said that TSR once looked at making a comprehensive database of FR lore, and that even then, in the days of 2E, they decided it was going to take too much time and effort to be worth it.

Candlekeep Forums Moderator

Candlekeep - The Library of Forgotten Realms Lore
http://www.candlekeep.com
-- Candlekeep Forum Code of Conduct

I am the Giant Space Hamster of Ill Omen!
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6339 Posts

Posted - 26 Dec 2022 :  16:49:34  Show Profile Send Gary Dallison a Private Message  Reply with Quote
I've got another 20 years or so at least (i hope)

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

TheIriaeban
Master of Realmslore

USA
1289 Posts

Posted - 26 Dec 2022 :  22:44:34  Show Profile Send TheIriaeban a Private Message  Reply with Quote
Personally, I would create a database. I would scan/OCR the pdfs/documents by chapter. I would then write something that could read the resulting text documents that would break it up by paragraph and send the data to the database. The database would consist of records containing the book name, author(s), edition, year printed, chapter number, chapter name, page number, paragraph text, associated tables. Tables could be stored as graphic files and as a list of strings of the data in the table (that way the table data can be searched as well). If really wanted, you could include images from the books as well in a similar manner (graphic file, description of the image).

But, I am just crazy that way.

"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."

My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents
Go to Top of Page

Italian Archmage Karsus
Learned Scribe

108 Posts

Posted - 26 Dec 2022 :  23:37:09  Show Profile Send Italian Archmage Karsus a Private Message  Reply with Quote
Uh, would that be in any way similar to www.Askvalhaeria.com ? I've been working on it on and off lately. Maybe we can exchange some notes?
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6339 Posts

Posted - 27 Dec 2022 :  08:12:58  Show Profile Send Gary Dallison a Private Message  Reply with Quote
Always happy to exchange notes. If I understand it correctly askvalharia doesnt reference sourcebooks or include the quotes themselves. Whereas I intend to copy it all.

I had considered a database, but that would be local only and if I wanted to get it online I would have to figure out how to do it and then pay for hosting. I might look into it. At the moment I'm just using wordpress to collect and organise the quotes. There have been some interesting finds though, I have narrowed the borderkkingdoms king that ate spiders to prolong his life down to two individuals.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

questing gm
Senior Scribe

Malaysia
921 Posts

Posted - 27 Dec 2022 :  09:31:45  Show Profile  Visit questing gm's Homepage Send questing gm a Private Message  Reply with Quote
I was working on my own wiki (using Tiddlywiki) that is essentially copying and pasting all the text from sourcebooks (and eventually all printed materials if my remaining lifetime lasts long enough). But given the demotivating state of lore, I have pretty stopped at Ruins of Undermountain (I skipped most of the crunch-heavy supplements in between). Maybe I will pick it up again once I'm done with my Eberron wiki.

Would very much like to know if there were a faster way of copying and pasting into digital text (scanning sounds great if there's a way to convert them into eligible digital text that can be searched later), and/or open to any effort that needs cooperation/collaboration for this sort of thing to make lore-finding easier and deeper.
quote:
I believe it was Steven Schend that said that TSR once looked at making a comprehensive database of FR lore, and that even then, in the days of 2E, they decided it was going to take too much time and effort to be worth it.

Well, we need some crazy ones who will try.

I'm already pretty happy that I have a wiki of Ed's #realmslore tweets down to his first tweet on Twitter (and that sounded like a crazy project to myself when I started).

Edited by - questing gm on 27 Dec 2022 09:32:53
Go to Top of Page

Ayrik
Great Reader

Canada
7924 Posts

Posted - 27 Dec 2022 :  10:06:14  Show Profile Send Ayrik a Private Message  Reply with Quote
A problem inherent to databases is that they tend to fail when queries and fields contain the same definitions but different contents.

The Realms has seen many inconsistencies and self-contradictions from many authors over many years, many revisions/retcons, many editions. Separating and categorizing different versions of the "same" data vastly increases database complexity. Even wikis, archives, and repositories tend to only include "preferred" versions of data while discarding all others which cause conflict.

[/Ayrik]
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6339 Posts

Posted - 27 Dec 2022 :  11:51:24  Show Profile Send Gary Dallison a Private Message  Reply with Quote
A database could work if you have the text associated with a number of columns that correspond to tags of subjects.

The only problem there is the number if tags that might be needed, I think mysql can have 1000 columns max

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

TheIriaeban
Master of Realmslore

USA
1289 Posts

Posted - 27 Dec 2022 :  16:31:51  Show Profile Send TheIriaeban a Private Message  Reply with Quote
Even MS SQL has a limit of 1024 columns per table. That is why you have multiple tables to hold the data.

Also, I am not sure about licensing on your side of the pond but in the US, we can get a free copy of MS SQL Express.

Finally, depending on your home network topology, you can open a path to a machine so you can host the database yourself. That means sacrificing some of your internet connection speed to people connecting to the database. I am not sure about how much your ISPs charge but you could get a second connection just for database access. (I have looked into this kinda thing before because I was writing a game that would need a backing DB and I needed someplace to host it. I have since cancelled that project as my interests have moved to other areas.)

"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."

My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents
Go to Top of Page

Seethyr
Master of Realmslore

USA
1135 Posts

Posted - 27 Dec 2022 :  17:01:17  Show Profile  Visit Seethyr's Homepage Send Seethyr a Private Message  Reply with Quote
Slightly off topic but Iíve been begging for a historical update - like a second GhotR for years now. Fill in any bits of lore left out of the first iteration and then continue on as it was into the modern FR day. A document lime that could be living and breathing and also wipe out inconsistencies using some explanation and handwaiving as to why the confusion.

Follow the Maztica (Aztec/Maya) and Anchorome (Indigenous North America) Campaigns on DMsGuild!

The Maztica Campaign
The Anchorome Campaign
Go to Top of Page

Wooly Rupert
Master of Mischief
Moderator

USA
36758 Posts

Posted - 27 Dec 2022 :  17:05:04  Show Profile Send Wooly Rupert a Private Message  Reply with Quote
quote:
Originally posted by Seethyr

Slightly off topic but Iíve been begging for a historical update - like a second GhotR for years now. Fill in any bits of lore left out of the first iteration and then continue on as it was into the modern FR day. A document lime that could be living and breathing and also wipe out inconsistencies using some explanation and handwaiving as to why the confusion.



That would be awesome but the current design team wouldn't want to do that. They've created many of the inconsistencies themselves and they have no interest in doing anything that would limit their "vision" -- like adhering to existing lore.

Candlekeep Forums Moderator

Candlekeep - The Library of Forgotten Realms Lore
http://www.candlekeep.com
-- Candlekeep Forum Code of Conduct

I am the Giant Space Hamster of Ill Omen!
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6339 Posts

Posted - 27 Dec 2022 :  20:24:42  Show Profile Send Gary Dallison a Private Message  Reply with Quote
Well i think i'm crazy enough to try this. I'm gonna abandon the wordpress idea and use SQLite.

The reason for SQLite is its free, open source, relatively easy compared to other db programs, its a local database (no servers required) and the whole db is easy to export as a single file, it allows for locally stored procedures (which i hope can help with pulling out required data). I'm pretty sure i could make the db and then other people could import it and use it by just uploading the single file i produce.

I'll have a try at producing a Proof of Concept, if anyone is interested in helping then send me a message. I could probably do with technical help (i'm amateur at best with dbs) and then the mammoth task of cataloguing everything FR related.


Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6339 Posts

Posted - 27 Dec 2022 :  21:39:10  Show Profile Send Gary Dallison a Private Message  Reply with Quote
Nevermind, i have sqlite up and running and have started populating it with some example data and even run a few queries against it that pull out the correct data i wanted.

So all i need now is a tech savvy volunteer that can install sqlite on his own computer and i then send over the PoC file i have prepared.

If it works then we have a local working db that is easily created and transported.

Then the final step is finding willing volunteers to populate said database.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

TheIriaeban
Master of Realmslore

USA
1289 Posts

Posted - 27 Dec 2022 :  22:13:17  Show Profile Send TheIriaeban a Private Message  Reply with Quote
I am guessing the plan is to have multiple people, each with their own copy of the database, inserting data? What is your plan for merging those databases? You are going to have to watch for duplicated records during the merge.

I typically write stuff in PowerShell since it is easy to prototype and maintain. Here is a link to some stuff for SQLite and PowerShell: http://ramblingcookiemonster.github.io/SQLite-and-PowerShell/ By using PowerShell or some other automation tool, it will make it faster to get data into the database.

I would help now but life has decided that the year was just going to well for me and that I needed to get a reset: my sewer line broke right after Thanksgiving and flooded my basement. I lost a few computers in that (happily, my network stuff and domain controllers were not on the floor). On the bright side, I am getting my basement renovated and a newer computer or two. But. that means that things are going to be hectic enough for an unknown time that my ability to really contribute is going to be restricted.

"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."

My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6339 Posts

Posted - 27 Dec 2022 :  22:18:52  Show Profile Send Gary Dallison a Private Message  Reply with Quote
Actually I think it best if one person holds the master (me) and the others provide csv files which can be used to update the database which I would then periodically update onto google drive for people to download.


Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

TheIriaeban
Master of Realmslore

USA
1289 Posts

Posted - 27 Dec 2022 :  22:28:12  Show Profile Send TheIriaeban a Private Message  Reply with Quote
Ok, here is a link to something that will allow a tool to be written that can pull data from a pdf to create the required csv file. That would allow a published work to be processed into the needed csv files in a week or so (assuming you process a chapter at a time and use a database schema similar to the one I suggested above). Tables/graphics would have to be handled manually to a certain extent.

http://allthesystems.com/2020/10/read-text-from-a-pdf-with-powershell/


"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."

My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents
Go to Top of Page

BadCatMan
Senior Scribe

Australia
401 Posts

Posted - 28 Dec 2022 :  06:29:23  Show Profile Send BadCatMan a Private Message  Reply with Quote
Just a note, publicly presenting large volumes of text copied from the sourcebooks opens up problems of copyright infringement. While WotC don't seem to care unless it's non-OGL rules from the current edition, it could be an issue.

The Forgotten Realms Wiki has been going for 17 years and is still not even close to complete coverage of the Realms, which goes to show the scale of the task proposed by TSR and it's even bigger now. Of course, the need to research, write, and curate original articles does slow the FRW down.

Ask Valhaeria is the slimmed-down version of Italian Archmage Karsus's project. The full version has novels and sourcebooks as well, and it has been an Oghma-granted blessing to Realmslore research for us FRW editors. Italian Archmage Karsus and I discussed it a bit more here:
https://candlekeep.com/forum/topic.asp?TOPIC_ID=24572

BadCatMan, B.Sc. (Hons), M.Sc.
Scientific technical editor
Head DM of the Realms of Adventure play-by-post community
Administrator of the Forgotten Realms Wiki
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6339 Posts

Posted - 28 Dec 2022 :  08:20:59  Show Profile Send Gary Dallison a Private Message  Reply with Quote
Copyright and capitalist greed mean little and nothing to me. Sqlite is a local database so it is not going to be for everyone, mostly hardcore fans and designers as a research aid. Assuming that is, I get anywhere close to completion.

I'll check out the powershell. I prefer to go over text manually as accuracy is important. But at the same time, getting large volumes of formatted text would be nice.
A big stumbling block will be the sourcebooks that are not OCR or are poorly OCR, for instance many of the dragon magazines dont OCR well because of the background and the Faith's and pantheons and other god books is a pain. If anyone knows a foolproof OCR program that would be great.

Otherwise it's just me, and any other nutter that wants to read and copy out every paragraph of realmslore and then painstakingly categorise it with a series of 0 / 1 flags in a csv file.

It's a good job I have no life or friends as that would seriously get in the way.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

questing gm
Senior Scribe

Malaysia
921 Posts

Posted - 28 Dec 2022 :  10:48:27  Show Profile  Visit questing gm's Homepage Send questing gm a Private Message  Reply with Quote
quote:
Originally posted by TheIriaeban


Ok, here is a link to something that will allow a tool to be written that can pull data from a pdf to create the required csv file. That would allow a published work to be processed into the needed csv files in a week or so (assuming you process a chapter at a time and use a database schema similar to the one I suggested above). Tables/graphics would have to be handled manually to a certain extent.

http://allthesystems.com/2020/10/read-text-from-a-pdf-with-powershell/

I like where this discussion is going and would very much like to try the writing tool shared on top. But not being the computer programmer that I am, I'm not sure how do I use said tool. If you don't mind guiding a complete dummy, I've downloaded and extracted the tool but what else do I need to install before I can run the script, and do I actually need a database infrastructure for this?

Go to Top of Page

TheIriaeban
Master of Realmslore

USA
1289 Posts

Posted - 28 Dec 2022 :  17:48:39  Show Profile Send TheIriaeban a Private Message  Reply with Quote
quote:
Originally posted by questing gm

quote:
Originally posted by TheIriaeban


Ok, here is a link to something that will allow a tool to be written that can pull data from a pdf to create the required csv file. That would allow a published work to be processed into the needed csv files in a week or so (assuming you process a chapter at a time and use a database schema similar to the one I suggested above). Tables/graphics would have to be handled manually to a certain extent.

http://allthesystems.com/2020/10/read-text-from-a-pdf-with-powershell/

I like where this discussion is going and would very much like to try the writing tool shared on top. But not being the computer programmer that I am, I'm not sure how do I use said tool. If you don't mind guiding a complete dummy, I've downloaded and extracted the tool but what else do I need to install before I can run the script, and do I actually need a database infrastructure for this?





That article just shows an example. It doesn't do anything with the text read from the pdf so there isn't anything to see after it is run. You would need something like the following line at the end (which puts an output file in your My Documents folder named pdftext.txt:

$text | Set-Content -LiteralPath ([Environment]::GetFolderPath("MyDocuments") + "pdftext.txt")

If you are new to PowerShell, there is an intro at the link below that does not appear to be overly complicated (the one from MS includes stuff about Visual Studio that may unnecessarily confuse people):

Honkin' big link

One note on that intro, when it mentions about changing your computer's script execution policy, I would HIGHLY suggest you DO NOT use Unrestricted. It is mentioned there out of completeness, and I am sure the author would agree with my recommendation (I am sure the other tech sages here would, too).

Mod edit: Did something with that URL that was stretching out the page.

"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."

My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents

Edited by - Wooly Rupert on 02 Jan 2023 19:33:17
Go to Top of Page

TheIriaeban
Master of Realmslore

USA
1289 Posts

Posted - 28 Dec 2022 :  18:32:56  Show Profile Send TheIriaeban a Private Message  Reply with Quote
quote:
Originally posted by Gary Dallison

Copyright and capitalist greed mean little and nothing to me. Sqlite is a local database so it is not going to be for everyone, mostly hardcore fans and designers as a research aid. Assuming that is, I get anywhere close to completion.

I'll check out the powershell. I prefer to go over text manually as accuracy is important. But at the same time, getting large volumes of formatted text would be nice.
A big stumbling block will be the sourcebooks that are not OCR or are poorly OCR, for instance many of the dragon magazines dont OCR well because of the background and the Faith's and pantheons and other god books is a pain. If anyone knows a foolproof OCR program that would be great.

Otherwise it's just me, and any other nutter that wants to read and copy out every paragraph of realmslore and then painstakingly categorise it with a series of 0 / 1 flags in a csv file.

It's a good job I have no life or friends as that would seriously get in the way.



I have a tool that I wrote to OCR stuff. It is over 10 years old at this point and needs an update. I may be able to convert that into a PowerShell script as well that will use Tesseract (a widely available OCR code library). If I can get that done, let me know if you would like that.

"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."

My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents

Edited by - TheIriaeban on 28 Dec 2022 18:36:15
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6339 Posts

Posted - 28 Dec 2022 :  19:08:10  Show Profile Send Gary Dallison a Private Message  Reply with Quote
Well thus far i am building out the schema.

I have a single column for the quote itself which can be a few words, or a paragraph, or several paragraphs in size (thankfully sqlite doesnt care).

Then there is the Source column which will hold the document and page or chapter number (for novels where the structure is different depending upon the version) as a text string.

After that are a series of integer flags holding either a 0 (for false) and 1 (for true) which are labelled according to the person, place, item, deity, etc. Thus far i have 60+ such flags and i believe sqlite can hold many thousands of them. A numeric flag keeps the size low (1 byte per flag)

I intend to tag each quote with a 1 for whatever subject flag is appropriate (so a quote including information about Elminster and his dealings with Manshoon in Shadowdale would have a 1 for Elminster, a 1 for Manshoon, and a 1 for Shadowdale, everything else would be 0 which all flags default to).

I will then create a procedure for each flag to pull the quote and source field for every row where the specific flag = 1 (for example every row where Elminster = 1).

sqlite allows this data to be exported to a csv file.

sqlite stores the data, schema, and procedures in a single file that can be loaded using the command line, or using a gui program (i'm using sqlitestudio. So to share it with others, all i have to do is send them the file and they install sqlitestudio.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

Italian Archmage Karsus
Learned Scribe

108 Posts

Posted - 28 Dec 2022 :  23:43:30  Show Profile Send Italian Archmage Karsus a Private Message  Reply with Quote
Then, GD, let me suggest something.

I think your schema may be improved by an intermediate table, a many-to-many relationship rather than subject flag rows. You don't even know if 1024 rows will be enough lol, the Realms have a ton going into them. You might be better served by having a SUBJECT model, and an intermediate table for RELEVANCY. Modeling the SUBJECT as a STRING, you can then make an intermediate table of RELEVANCIES with a UNIQUE constraint for each pair for QUOTE and SUBJECT. Valhaeria does this- I can add tags to books at will by simply declaring a new tag, and I don't have to worry about the number of tags.

If most of your task is going to be copypasting, I suggest you find a way to script the gruntwork, too. Populating the database for Valhaeria is obviously a task beyond my frail hooman endurance or even lifespan, so obviously I automated as much of it as possible. Also, unlike with Valhaeria, you seem to intend to be more involved with it, so perhaps adding tools to curate your work might be better. Maybe let an automated script do a rough pass, looking for obvious hits, then you can eliminate any false positives yourself and manually add whatever's missing?

As for OCR, personally, I suggest you should look at OCR that's already been done for you to start with. There's no perfect OCR, as we've oft lamented with Valhaeria: the only hardware that gets anywhere near is Eyeball v0.97 and those cost a fortune. Most OCR gets you 97% of the way; if you can handle manually typing the remaining 3% on yer own, you've got it made, and there's a lot of other people you can help, that way. DJVU files at Archive have some of the Dragon mag articles OCR'd, and the kind fellas in this website may yet hold txt copies that don't need to be OCR'd. I know of a scarce few articles in Dragon that were typed up- just wish there were more of those.

Ah, I forgot to ask- do you have a programming language in mind, or someone who can code? Even TSR passed up on the task; every grain of automation will mean take score long tons from your shoulders. I think you're going to have a much greater strength in that you intend to be way more involved with your encyclopedia than I am with wiki editing or Valhaeria itself. To use that specific strength the best, you'd better focus your efforts away from anything that can be automated- and onto the tasks that will require that love you have for the Realms. Because the small tasks will deplete your strength just the same as the big ones, for vanishingly thinner payoffs, and I'm speaking from experience.
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6339 Posts

Posted - 29 Dec 2022 :  08:25:42  Show Profile Send Gary Dallison a Private Message  Reply with Quote
I would love to do all the things you just suggested, but the sum total of my database experience began when I installed it a few days ago.

I will see what google can provide, otherwise it will be the hard way for me.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6339 Posts

Posted - 29 Dec 2022 :  10:15:48  Show Profile Send Gary Dallison a Private Message  Reply with Quote
Got it sorted I think, thanks for the tips

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

TheIriaeban
Master of Realmslore

USA
1289 Posts

Posted - 29 Dec 2022 :  15:18:09  Show Profile Send TheIriaeban a Private Message  Reply with Quote
Gary, if you are as new to databases as you mentioned, I would suggest you read the link below. Once you get data in a database, it can be a nightmare to change the design.

https://support.microsoft.com/en-us/office/database-design-basics-eb2159cf-1e30-401a-8084-bd4f9c9ca1f5


"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."

My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6339 Posts

Posted - 29 Dec 2022 :  15:24:15  Show Profile Send Gary Dallison a Private Message  Reply with Quote
I'm happy to report it is all working as expected, thanks to the lovely db improvements suggested by italian archmage karsus.

I now have 8 paragraphs with multiple tags and a quick sql script looking for any particular tag will retrieve the relevant text and source.

I'm wondering if i should add dates into it, i'm thinking a start and end date and hopefully sql can retrieve the text if the date searched for lies between the start date and end date (something to add later perhaps)

Also not sure what to do about pictures as that will seriously increase the size and potentially break the size limit for exporting. I might just add a text reference to the map and the source then people can look it up and or i can add the picture in later as a blob if it doesnt break exporting.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6339 Posts

Posted - 29 Dec 2022 :  15:47:14  Show Profile Send Gary Dallison a Private Message  Reply with Quote
If anybody does want to contribute then all i need is a csv file with the columns: quote, source, tags, start-date, end_date

The quote column contains a line or a paragraph (or multiple paragraphs) depending upon the topic covered. Mandatory

The source column contains the name of the source and the page number or chapter number. Mandatory

The tags column contains any relevant tags for the paragraph (like bhaal, calimshan, elminster, orsraun-mountains) separate words with a hyphen. Tag names should have a dash between words (if it is a multi word tag like baldurs-gate), and don't include any apostrophes or anything else (i've no idea if sqlite will handle it correctly). Mandatory

The start_date and end_date columns contain the earliest and latest relevant date (if any is referenced in the paragraph / line etc). Optional

Do a row per line / paragraph / logical section of text.

If its a map or a picture that seems relevant then put the description of the picture or name of the map in the Quote column. I can get the actual picture later if i am going to use the picture.

To make it easy if you can cover a whole page or a whole article then i can cross things off the list.

I will do the rest.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page
Page: of 3 Previous Topic Topic Next Topic  
Next Page
 New Topic  New Poll New Poll
 Reply to Topic
 Printer Friendly
Jump To:
Candlekeep Forum © 1999-2023 Candlekeep.com Go To Top Of Page
Snitz Forums 2000