Candlekeep Forum
Candlekeep Forum
Home | Profile | Register | Active Topics | Active Polls | Members | Private Messages | Search | FAQ
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Forgotten Realms Journals
 General Forgotten Realms Chat
 FR Encyclopedia
 New Topic  New Poll New Poll
 Reply to Topic
 Printer Friendly
Previous Page
Author Previous Topic Topic Next Topic
Page: of 2

TheIriaeban
Master of Realmslore

USA
1289 Posts

Posted - 29 Dec 2022 :  17:43:21  Show Profile Send TheIriaeban a Private Message  Reply with Quote
quote:
Originally posted by TheIriaeban

quote:
Originally posted by questing gm

quote:
Originally posted by TheIriaeban


Ok, here is a link to something that will allow a tool to be written that can pull data from a pdf to create the required csv file. That would allow a published work to be processed into the needed csv files in a week or so (assuming you process a chapter at a time and use a database schema similar to the one I suggested above). Tables/graphics would have to be handled manually to a certain extent.

http://allthesystems.com/2020/10/read-text-from-a-pdf-with-powershell/

I like where this discussion is going and would very much like to try the writing tool shared on top. But not being the computer programmer that I am, I'm not sure how do I use said tool. If you don't mind guiding a complete dummy, I've downloaded and extracted the tool but what else do I need to install before I can run the script, and do I actually need a database infrastructure for this?





That article just shows an example. It doesn't do anything with the text read from the pdf so there isn't anything to see after it is run. You would need something like the following line at the end (which puts an output file in your My Documents folder named pdftext.txt:

$text | Set-Content -LiteralPath ([Environment]::GetFolderPath("MyDocuments") + "pdftext.txt")

If you are new to PowerShell, there is an intro at the link below that does not appear to be overly complicated (the one from MS includes stuff about Visual Studio that may unnecessarily confuse people):

Honkin' big link

One note on that intro, when it mentions about changing your computer's script execution policy, I would HIGHLY suggest you DO NOT use Unrestricted. It is mentioned there out of completeness, and I am sure the author would agree with my recommendation (I am sure the other tech sages here would, too).



I got this example working and it seems to work ok on pdfs that have already been OCR'd. Obviously, the quality of that OCR is reflected in the text file. For files that have not been OCR'd, it doesn't understand multiple columns so you get alternating lines from each column which can make reading it quite a challenge.

Hint: since you need to download the iText DLL, once you get it on your machine, you may need to go to the file's properties and unblock it (that is a security measure in newer OSes to help protect your system). Also, I always do a virus scan on anything I download before I do anything with it. I am just paranoid that way.

Mod edit: As in the original post, shortened that URL that was stretching out the page.

"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."

My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents

Edited by - Wooly Rupert on 02 Jan 2023 19:32:47
Go to Top of Page

HighOne
Learned Scribe

198 Posts

Posted - 29 Dec 2022 :  18:16:42  Show Profile Send HighOne a Private Message  Reply with Quote
Just so you know, you guys are repeating a lot of work that other people have already done.
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6290 Posts

Posted - 29 Dec 2022 :  18:34:15  Show Profile Send Gary Dallison a Private Message  Reply with Quote
I'm not sure that it does repeat what others have done. If what I am intending to make already existed I would be using it and not making it.

I'm intending that this database will contain every paragraph or line or large block of text ever written about the realms. It wont paraphrase, it wont be interpreted, it will be the full text.

Each block will be tagged by any number of subjects, from geographic region, to deity, to important npcs or events.

It will be a lot of work and may never get finished but what does that matter. The wiki sort of does this but it doesnt include the exact text, is open to the writers interpretation and bias. Other things are limited in what they can include because they are on the internet so cannot include text that is not publicly available.

My solution is a personal one so I can do what I want. When finished it may be available to others but will not be openly available to anyone as it is a research tool for those continuing to develop the realms.

If I'm wrong and something does exist that includes the exact text of everything arranged by subject, then please point me to it as it will save me 50 years of work.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

HighOne
Learned Scribe

198 Posts

Posted - 29 Dec 2022 :  19:25:07  Show Profile Send HighOne a Private Message  Reply with Quote
quote:
Originally posted by Gary Dallison

I'm not sure that it does repeat what others have done. If what I am intending to make already existed I would be using it and not making it.
The scanning and OCRing have been done by others. But these books are still under copyright, so no one is going to come into a public forum like this and say, "Hey, I digitized all the FR books and put them online! Here's the link!"
Go to Top of Page

hashimashadoo
Master of Realmslore

United Kingdom
1143 Posts

Posted - 31 Dec 2022 :  13:18:44  Show Profile  Visit hashimashadoo's Homepage Send hashimashadoo a Private Message  Reply with Quote
quote:
Originally posted by Gary Dallison

I'm not sure that it does repeat what others have done. If what I am intending to make already existed I would be using it and not making it.

I'm intending that this database will contain every paragraph or line or large block of text ever written about the realms. It wont paraphrase, it wont be interpreted, it will be the full text.

Each block will be tagged by any number of subjects, from geographic region, to deity, to important npcs or events.

It will be a lot of work and may never get finished but what does that matter. The wiki sort of does this but it doesnt include the exact text, is open to the writers interpretation and bias. Other things are limited in what they can include because they are on the internet so cannot include text that is not publicly available.

My solution is a personal one so I can do what I want. When finished it may be available to others but will not be openly available to anyone as it is a research tool for those continuing to develop the realms.

If I'm wrong and something does exist that includes the exact text of everything arranged by subject, then please point me to it as it will save me 50 years of work.



Yeah, the reason we don't copy the text verbatim over at the wiki is because that would be copyright infringement and would likely result in a bunch of takedown notices.

When life turns it's back on you...sneak attack for extra damage.

Head admin of the FR wiki:

https://forgottenrealms.fandom.com/
Go to Top of Page

Italian Archmage Karsus
Seeker

59 Posts

Posted - 02 Jan 2023 :  16:22:49  Show Profile Send Italian Archmage Karsus a Private Message  Reply with Quote
Let us know how it goes, Gary! You'll want some experience under your belt with this thing before the next improvement- because believe me, the first one is never enough.
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6290 Posts

Posted - 02 Jan 2023 :  16:43:05  Show Profile Send Gary Dallison a Private Message  Reply with Quote
I'm busy populating the db with information.

I've got 100 entries sharing about 300 tags in various ways.

Thus far it all seems to work fine. I can run a query to pull out all records with a specific tag and export them to csv.

Its lightweight and I intend to keep it that way I think so improvements will be limited if any beyond the date query I might add.

Still need someone to try it out as user if anyone wants to volunteer.

I have a one page instruction sheet all ready to go.

Otherwise I'm happy with it and will likely spend the rest of my life copy pasting paragraphs and tagging them.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

TBeholder
Great Reader

2298 Posts

Posted - 02 Jan 2023 :  18:03:06  Show Profile Send TBeholder a Private Message  Reply with Quote
quote:
Originally posted by Wooly Rupert

We already have the FR Wiki.

It’s a wiki.
It’s on a wikifarm without decent search.
And it was overrun by sneaky edition wars and assorted loons long ago. Either the same who raids the entire fandom.com, or it’s a slow zombie apocalypse.

People never wonder How the world goes round -Helloween
And even I make no pretense Of having more than common sense -R.W.Wood
It's not good, Eric. It's a gazebo. -Ed Whitchurch
Go to Top of Page

Wooly Rupert
Master of Mischief
Moderator

USA
36608 Posts

Posted - 02 Jan 2023 :  19:28:38  Show Profile Send Wooly Rupert a Private Message  Reply with Quote
But it's still a collaborative effort, with MANY people adding to it, and where the work of setting it up and making it accessible to everyone is already done.

While I'd prefer to see all editions embraced (like the way Wookieepedia has a "Canon" and a "Legends" entry for a lot of stuff), the FR wiki already has a lot of info there and people adding to it on a regular basis.

Maybe it's just me, but I think it's better to have one thing that everyone can access and work on, rather than many individual efforts going in a lot of different directions. Even if the one thing is less than perfect, it's still centralized.

Also, I've never had any real issue searching the FR wiki. If the info is there, I've found what I was looking for.

Candlekeep Forums Moderator

Candlekeep - The Library of Forgotten Realms Lore
http://www.candlekeep.com
-- Candlekeep Forum Code of Conduct

I am the Giant Space Hamster of Ill Omen!

Edited by - Wooly Rupert on 02 Jan 2023 19:29:10
Go to Top of Page

Italian Archmage Karsus
Seeker

59 Posts

Posted - 02 Jan 2023 :  21:22:10  Show Profile Send Italian Archmage Karsus a Private Message  Reply with Quote
XD for what it's worth, Valhaeria is supposed to index freely available sources, precisely so that people intending to write about a subject in the wiki but who haven't the book can still get those hands on something.

Gary, I think over time you will need to figure sone automatization in order to speed things up. Let me know if you need help on that front! Valhaeria has taught me some things lol.
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6290 Posts

Posted - 02 Jan 2023 :  22:09:41  Show Profile Send Gary Dallison a Private Message  Reply with Quote
Cheers for the offer. In a few months when I'm sick of copy and paste I may take you up on the offer, but despite being a software tester by trade I'm very old school and prefer to do things the manual way.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6290 Posts

Posted - 04 Jan 2023 :  21:08:28  Show Profile Send Gary Dallison a Private Message  Reply with Quote
I've reached the 200 lore entries (and 656 tags) milestone

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6290 Posts

Posted - 07 Jan 2023 :  11:06:16  Show Profile Send Gary Dallison a Private Message  Reply with Quote
300 entries and counting.

Got a list of 8 or 9 of azoun's bastards

A few high knights

A growing list of harpers

A few netherese archmages (which i take to mean were former owners of an enclave)

3 poisons

6 minor secret societies

30 books


The list of tags i have used so far is astoundingly large, but on the plus side, the db is really easy to search in and find lore that i havent tagged previously (because i only created the tag later).

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

Lord Karsus
Great Reader

USA
3725 Posts

Posted - 08 Jan 2023 :  19:46:29  Show Profile Send Lord Karsus a Private Message  Reply with Quote
-I can offer this, it's not much, but it might be useful for tags for what you're doing. Years ago (damn, literally like a decade ago), me and some other people started tagging topics in novels. Never really got that far, but hopefully it helps you tag things or give you leads where to find info about things:


https://web.archive.org/web/20090430101828/http://forums.gleemax.com/showthread.php?t=1062902

(A Tri-Partite Arcanist Who Has Forgotten More Than Most Will Ever Know)

Elves of Faern
Vol I- The Elves of Faern
Vol. III- Spells of the Elves
Vol. VI- Mechanical Compendium
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6290 Posts

Posted - 09 Jan 2023 :  21:19:09  Show Profile Send Gary Dallison a Private Message  Reply with Quote
Thanks for that i'll check it out.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6290 Posts

Posted - 12 Jan 2023 :  21:37:35  Show Profile Send Gary Dallison a Private Message  Reply with Quote
50% through the Anauroch sourcebook.

Up to 500 lore entries (although some entries are 3 pages long as they are just a great big blurb about flora or fauna in the sword).

Lots of other random bits though.

The number of tags i have to remember is a slight problem, and whether to make a new tag or not is sometimes a tricky decision.

On the plus side though, i get to read all the sourcebooks and adventures again, and this time in lots of detail because i have to pay attention to it to figure out the tags.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6290 Posts

Posted - 04 Feb 2023 :  21:41:39  Show Profile Send Gary Dallison a Private Message  Reply with Quote
Nearly done with anauroch, just debating whether to include the rumours section or not but I think I will for completeness sake.

It's been eye opening cataloguing a sourcebook in that level of detail. Never knew anauroch once referred to the high ice, makes much more sense of ranauroch in giantcraft and the perilous gateway article. Might also help explain why giants suddenly invaded netheril and general matick had to fight them off (because netheril plundered one of the last remnants of ostoria.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

zyzzyva
Acolyte

USA
14 Posts

Posted - 08 Feb 2023 :  16:59:50  Show Profile Send zyzzyva a Private Message  Reply with Quote
I've been working on something similar in MySQL, essentially a geographical bibliography to use as a DM's reference (because I'm both the type of masochist to run my campaigns open world and the type of sicko who finds data entry fun.)

I'm using a three table structure--one table of sources, one table of geographic regions, and one table of references, which notes the pages where information on a particular region is found within a particular source. Regions also include a recursive reference to their parent region, so you could search within, say, Unapproachable East, and see Aglarond, Thay, etc. and all locations within them.

I don't really have an end goal for it yet, though may end up linking it to a map or somesuch eventually.
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6290 Posts

Posted - 08 Feb 2023 :  18:53:55  Show Profile Send Gary Dallison a Private Message  Reply with Quote
Sounds like a good idea. I'd be interested to see how you did that recursive link. I'm a novice at this so I've just been tagging things with parent tags.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

TheIriaeban
Master of Realmslore

USA
1289 Posts

Posted - 08 Feb 2023 :  19:22:26  Show Profile Send TheIriaeban a Private Message  Reply with Quote
I wrote a tool years ago that would allow me to OCR text out of a screenshot that was in the clipboard. MS has pretty much replicated that with one of the tools in their Power Toys:

https://learn.microsoft.com/en-us/windows/powertoys/text-extractor

That way, whether or not what you are looking at has been OCR'd, you can still pull out the text as you need. Hope this helps.

"Iriaebor is a fine city. So what if you can have violence between merchant groups break out at any moment. Not every city can offer dinner AND a show."

My FR writeups - http://www.mediafire.com/folder/um3liz6tqsf5n/Documents
Go to Top of Page

zyzzyva
Acolyte

USA
14 Posts

Posted - 08 Feb 2023 :  19:25:27  Show Profile Send zyzzyva a Private Message  Reply with Quote
quote:
Originally posted by Gary Dallison

Sounds like a good idea. I'd be interested to see how you did that recursive link. I'm a novice at this so I've just been tagging things with parent tags.



I've just been tagging them with parent tags as well (usually just one, but occasionally two when different regions overlap in non-contained ways.)

Then I'll use multiple JOIN statements in a query to display sub-regions within each region. For instance:

SELECT A1.region_id, A1.region_name, A2.region_id, A2.region_name, A3.region_id, A3.region_name, A4.region_id,A4.region_name
FROM FR_Biblio.FR_Regions A1
LEFT JOIN FR_Biblio.FR_Regions A2 ON (A1.region_id = A2.parent_key1)
LEFT JOIN FR_Biblio.FR_Regions A3 ON (A2.region_id = A3.parent_key1)
LEFT JOIN FR_Biblio.FR_Regions A4 ON (A3.region_id = A4.parent_key1)
WHERE A1.region_name LIKE '%great dale%'

displays subregions within The Great Dale, then settlements within those subregions, then buildings within those settlements, etc. (LEFT JOIN used to avoid losing regions that don't have their own subregions)
Go to Top of Page

Gary Dallison
Great Reader

United Kingdom
6290 Posts

Posted - 21 Feb 2023 :  21:41:54  Show Profile Send Gary Dallison a Private Message  Reply with Quote
Finally finished the Anauroch sourcebook, the database is currently at 1 mb. Assuming a linear correlation in size, the database would reach 1 gb in size at 1000 sourcebooks.

Forgotten Realms Alternate Dimensions Candlekeep Archive
Forgotten Realms Alternate Dimensions: Issue 1
Forgotten Realms Alternate Dimensions: Issue 2
Forgotten Realms Alternate Dimensions: Issue 3
Forgotten Realms Alternate Dimensions: Issue 4
Forgotten Realms Alternate Dimensions: Issue 5
Forgotten Realms Alternate Dimensions: Issue 6
Forgotten Realms Alternate Dimensions: Issue 7
Forgotten Realms Alternate Dimensions: Issue 8
Forgotten Realms Alternate Dimensions: Issue 9

Alternate Realms Site
Go to Top of Page

Delnyn
Senior Scribe

USA
776 Posts

Posted - 22 Feb 2023 :  17:04:38  Show Profile Send Delnyn a Private Message  Reply with Quote
quote:
Originally posted by zyzzyva

Then I'll use multiple JOIN statements in a query to display sub-regions within each region. For instance:

SELECT A1.region_id, A1.region_name, A2.region_id, A2.region_name, A3.region_id, A3.region_name, A4.region_id,A4.region_name
FROM FR_Biblio.FR_Regions A1
LEFT JOIN FR_Biblio.FR_Regions A2 ON (A1.region_id = A2.parent_key1)
LEFT JOIN FR_Biblio.FR_Regions A3 ON (A2.region_id = A3.parent_key1)
LEFT JOIN FR_Biblio.FR_Regions A4 ON (A3.region_id = A4.parent_key1)
WHERE A1.region_name LIKE '%great dale%'




Why are you joining multiple tables with the same name?
Go to Top of Page

zyzzyva
Acolyte

USA
14 Posts

Posted - 22 Feb 2023 :  18:00:23  Show Profile Send zyzzyva a Private Message  Reply with Quote
quote:


Why are you joining multiple tables with the same name?



I structured my regions table so that I can use a single table for any region, regardless of what other regions it's located inside.

Basically, each region row includes a numerical ID for the region, the region name, some additional information (type, alternative names, etc.) and several parent key columns that specify the region id(s) of regions in which that region is located.

The above query structure is basically used just to visually display child regions within their parent regions (i.e., a query with 'Waterdeep' as the topmost region will show each of the wards, then each of the buildings located within those wards, etc.)
Go to Top of Page
Page: of 2 Previous Topic Topic Next Topic  
Previous Page
 New Topic  New Poll New Poll
 Reply to Topic
 Printer Friendly
Jump To:
Candlekeep Forum © 1999-2023 Candlekeep.com Go To Top Of Page
Snitz Forums 2000